Philippe Chassaing ; Lucas Gerin
-
Efficient estimation of the cardinality of large data sets
dmtcs:3492 -
Discrete Mathematics & Theoretical Computer Science,
January 1, 2006,
DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
-
https://doi.org/10.46298/dmtcs.3492
Efficient estimation of the cardinality of large data setsArticle
Authors: Philippe Chassaing 1; Lucas Gerin 1
NULL##NULL
Philippe Chassaing;Lucas Gerin
1 Institut Élie Cartan de Nancy
Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Dingyu Wang;Seth Pettie, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Better Cardinality Estimators for HyperLogLog, PCSA, and Beyond, pp. 317-327, 2023, Seattle WA USA, 10.1145/3584372.3588680.
Jakub Lemiesz, 2023, Efficient Framework for Operating on Data Sketches, Proceedings of the VLDB Endowment, 16, 8, pp. 1967-1978, 10.14778/3594512.3594526.
Jakub Lemiesz, 2021, On the algebra of data sketches, Proceedings of the VLDB Endowment, 14, 9, pp. 1655-1667, 10.14778/3461535.3461553.
Si Cheng;Daniel J. Eck;Forrest W. Crawford, 2020, Estimating the size of a hidden finite set: Large-sample behavior of estimators, Statistics Surveys, 14, none, 10.1214/19-ss127, https://doi.org/10.1214/19-ss127.
Mirosław Kutyłowski;Jakub Lemiesz;Marta Słowik;Marcin Słowik;Kamil Kluczniak;et al., Lecture notes in computer science, GDPR-Compliant Reputation System Based on Self-certifying Domain Signatures, pp. 341-361, 2019, 10.1007/978-3-030-34339-2_19.
Reuven Cohen;Liran Katzir;Aviv Yehezkel, arXiv (Cornell University), A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection, 2017, Halifax NS Canada, 10.1145/3097983.3097999, https://arxiv.org/abs/1606.00996.
Philipp Brandes;Marcin Kardas;Marek Klonowski;Dominik Pająk;Roger Wattenhofer, Lecture notes in computer science, Approximating the Size of a Radio Network in Beeping Model, pp. 358-373, 2016, 10.1007/978-3-319-48314-6_23.