Jérémie Lumbroso - An optimal cardinality estimation algorithm based on order statistics and its full analysis

dmtcs:2780 - Discrete Mathematics & Theoretical Computer Science, January 1, 2010, DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10) - https://doi.org/10.46298/dmtcs.2780
An optimal cardinality estimation algorithm based on order statistics and its full analysisArticle

Authors: Jérémie Lumbroso 1

  • 1 Algorithmes, Programmes et Résolution

Building on the ideas of Flajolet and Martin (1985), Alon et al. (1987), Bar-Yossef et al. (2002), Giroire (2005), we develop a new algorithm for cardinality estimation, based on order statistics which, according to Chassaing and Gerin (2006), is optimal among similar algorithms. This algorithm has a remarkably simple analysis that allows us to take its $\textit{fine-tuning}$ and the $\textit{characterization of its properties}$ further than has been done until now. We prove that, asymptotically, it is $\textit{strictly unbiased}$ (contrarily to Probabilistic Counting, Loglog, Hyperloglog), we verify that its relative precision is about $1/\sqrt{m-2}$ when $m$ words of storage are used, and we fully characterize the limit law of the estimates it provides, in terms of gamma distribution―-this is the first such algorithm for which the limit law has been established. We also develop a Poisson analysis for the pre-asymptotic regime. In this way, we are able to devise a complete algorithm, covering all cardinalities ranges from $0$ to very large.


Volume: DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10)
Section: Proceedings
Published on: January 1, 2010
Imported on: January 31, 2017
Keywords: cardinality estimation,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS],[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO],[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM],[INFO.INFO-CG] Computer Science [cs]/Computational Geometry [cs.CG]

5 Documents citing this article

Consultation statistics

This page has been seen 524 times.
This article's PDF has been downloaded 375 times.