The Adaptive sampling revisited

Matthew Drescher; Guy Louchard; Yvik Swan

doi:10.23638/DMTCS-21-3-13

Matthew Drescher ; Guy Louchard ; Yvik Swan - The Adaptive sampling revisited

dmtcs:2652 - Discrete Mathematics & Theoretical Computer Science, June 25, 2019, Vol. 21 no. 3 - https://doi.org/10.23638/DMTCS-21-3-13

The Adaptive sampling revisitedArticle

Authors: Matthew Drescher ; Guy Louchard ¹; Yvik Swan ²

1 Département d'Informatique [Bruxelles]
2 Mathematics Department

The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R2 J , where R is the final bucket size and J is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of W = log(R2 J /n) is known, we rederive this distribution in a simpler way. We provide new results on the moments of J and W. We also analyze the final cache size R distribution. We consider colored keys: assume also that among the n distinct keys, m do have color K We show how to estimate p = m n. We study keys with some multiplicity : we provide a way to estimate the total number M of some color K keys among the total number N of keys. We consider the case where we know a priori the multiplicities but not the colors. There we want to estimate the total number of keys N. An appendix is devoted to the case where the hashing function provides bits with probability different from 1/2.

https://doi.org/10.23638/DMTCS-21-3-13

Source: HAL:hal-01432059v2

Volume: Vol. 21 no. 3

Section: Analysis of Algorithms

Published on: June 25, 2019

Accepted on: April 18, 2019

Submitted on: January 18, 2017

Keywords: 2010 Mathematics Subject Classification: 68R05, 68W40, [INFO]Computer Science [cs], [en] Adaptive sampling, asymmetric adaptive sampling, moments, periodic components, hashing functions, cache, colored keys, key multiplicity, Stein method, urn model

Funding:

Source : OpenAIRE Graph

Frontiers of Extended Formulations; Funder: European Commission; Code: 615640

Bibliographic References

Share and export

Consultation statistics

This page has been seen 1061 times.

This article's PDF has been downloaded 596 times.