Efficient estimation of the cardinality of large data sets

Philippe Chassaing; Lucas Gerin

doi:10.46298/dmtcs.3492

Philippe Chassaing ; Lucas Gerin - Efficient estimation of the cardinality of large data sets

dmtcs:3492 - Discrete Mathematics & Theoretical Computer Science, January 1, 2006, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities - https://doi.org/10.46298/dmtcs.3492

Efficient estimation of the cardinality of large data setsConference paper

Authors: Philippe Chassaing ¹; Lucas Gerin ¹

1 Institut Élie Cartan de Nancy

Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$ . In this note we propose an optimal estimation, using Kullback information and estimation theory.

https://doi.org/10.46298/dmtcs.3492

Source: HAL:hal-00095370v5

Volume: DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities

Section: Proceedings

Published on: January 1, 2006

Imported on: May 10, 2017

Keywords: cardinality,large multiset,approximate counting,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS],[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM],[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]

Philippe Chassaing ; Lucas Gerin - Efficient estimation of the cardinality of large data sets

Bibliographic References

9 Documents citing this article

Share and export

Consultation statistics