Philippe Flajolet ; Éric Fusy ; Olivier Gandouet ; Frédéric Meunier - HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

dmtcs:3545 - Discrete Mathematics & Theoretical Computer Science, January 1, 2007, DMTCS Proceedings vol. AH, 2007 Conference on Analysis of Algorithms (AofA 07) - https://doi.org/10.46298/dmtcs.3545
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithmArticle

Authors: Philippe Flajolet 1; Éric Fusy 1; Olivier Gandouet 2; Frédéric Meunier 1

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.


Volume: DMTCS Proceedings vol. AH, 2007 Conference on Analysis of Algorithms (AofA 07)
Section: Proceedings
Published on: January 1, 2007
Imported on: May 10, 2017
Keywords: cardinality estimation,Probabilistic algorithm,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS],[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM],[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO],[INFO.INFO-CG] Computer Science [cs]/Computational Geometry [cs.CG]

216 Documents citing this article

Consultation statistics

This page has been seen 1690 times.
This article's PDF has been downloaded 2889 times.