Building on the ideas of Flajolet and Martin (1985), Alon et al. (1987), Bar-Yossef et al. (2002), Giroire (2005), we develop a new algorithm for cardinality estimation, based on order statistics which, according to Chassaing and Gerin (2006), is optimal among similar algorithms. This algorithm has a remarkably simple analysis that allows us to take its $\textit{fine-tuning}$ and the $\textit{characterization of its properties}$ further than has been done until now. We prove that, asymptotically, it is $\textit{strictly unbiased}$ (contrarily to Probabilistic Counting, Loglog, Hyperloglog), we verify that its relative precision is about $1/\sqrt{m-2}$ when $m$ words of storage are used, and we fully characterize the limit law of the estimates it provides, in terms of gamma distribution―-this is the first such algorithm for which the limit law has been established. We also develop a Poisson analysis for the pre-asymptotic regime. In this way, we are able to devise a complete algorithm, covering all cardinalities ranges from $0$ to very large.

Source : oai:HAL:hal-01185578v1

Volume: DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10)

Section: Proceedings

Published on: January 1, 2010

Submitted on: January 31, 2017

Keywords: cardinality estimation,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS],[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO],[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM],[INFO.INFO-CG] Computer Science [cs]/Computational Geometry [cs.CG]

This page has been seen 161 times.

This article's PDF has been downloaded 181 times.