Philippe Chassaing ; Lucas Gerin
-
Efficient estimation of the cardinality of large data sets
dmtcs:3492 -
Discrete Mathematics & Theoretical Computer Science,
January 1, 2006,
DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
-
https://doi.org/10.46298/dmtcs.3492Efficient estimation of the cardinality of large data setsConference paper
Authors: Philippe Chassaing 1; Lucas Gerin 1
NULL##NULL
Philippe Chassaing;Lucas Gerin
- 1 Institut Élie Cartan de Nancy
Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.
Volume: DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
Section: Proceedings
Published on: January 1, 2006
Imported on: May 10, 2017
Keywords: [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], [INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM], [MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO], [en] cardinality, large multiset, approximate counting