Philippe Chassaing ; Lucas Gerin
-
Efficient estimation of the cardinality of large data sets
dmtcs:3492 -
Discrete Mathematics & Theoretical Computer Science,
January 1, 2006,
DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
-
https://doi.org/10.46298/dmtcs.3492
Efficient estimation of the cardinality of large data sets
Authors: Philippe Chassaing 1; Lucas Gerin 1
NULL##NULL
Philippe Chassaing;Lucas Gerin
1 Institut Élie Cartan de Nancy
Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Probabilistic counting algorithms for data base applications
4 Documents citing this article
Source : OpenCitations
Brandes, Philipp; Kardas, Marcin; Klonowski, Marek; PajÄ k, Dominik; Wattenhofer, Roger, 2016, Approximating The Size Of A Radio Network In Beeping Model, Structural Information And Communication Complexity, pp. 358-373, 10.1007/978-3-319-48314-6_23.
Cheng, Si; Eck, Daniel J.; Crawford, Forrest W., 2020, Estimating The Size Of A Hidden Finite Set: Large-sample Behavior Of Estimators, Statistics Surveys, 14, none, 10.1214/19-ss127.
Cohen, Reuven; Katzir, Liran; Yehezkel, Aviv, 2017, A Minimal Variance Estimator For The Cardinality Of Big Data Set Intersection, Proceedings Of The 23Rd ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 10.1145/3097983.3097999.
Kutyłowski, Mirosław; Lemiesz, Jakub; Słowik, Marta; Słowik, Marcin; Kluczniak, Kamil; Gebala, Maciej, 2019, GDPR-Compliant Reputation System Based On Self-certifying Domain Signatures, Information Security Practice And Experience, pp. 341-361, 10.1007/978-3-030-34339-2_19.