Distributional Analysis of the Parking Problem and Robin Hood Linear Probing Hashing with Buckets

This paper presents the ﬁrst distributional analysis of both, a parking problem and a linear probing hashing scheme with buckets of size b . The exact distribution of the cost of successful searches for a bα -full table is obtained, and moments and asymptotic results are derived. With the use of the Poisson transform distributional results are also obtained for tables of size m and n elements. A key element in the analysis is the use of a new family of numbers, called Tuba Numbers, that satisﬁes a recurrence resembling that of the Bernoulli numbers. These numbers may prove helpful in studying recurrences involving truncated generating functions, as well as in other problems related with buckets.


Motivation and previous results
Throughout this paper we consider hash tables that have m locations (m is called the "length" of the table) each of them containing at most b ≥ 1 keys, and we let n (with 0 ≤ n ≤ bm) denote the total number of keys (the "size") in the table.The ratio α = n/bm is called the "load factor" of the table.Clearly, the number of tables (the number of hash sequences) with length m and size n is m n , The simplest collision resolution scheme for open addressing hash tables with hash function h(x) is linear probing [19,29,45], which uses the cyclic probe sequence h(K), h(K)+1, . . .m−1, 0, 1, . . ., h(K)− 1, assuming the table slots are numbered from 0 to m − 1. Linear probing works reasonably well for tables that are not too full, but as the load factor increases, its performance deteriorates rapidly.Its main application is to retrieve information in secondary storage devices when the load factor is not too high, as first proposed by Peterson [41].One reason for the use of linear probing is that it preserves locality of reference between successive probes, thus avoiding long seeks [35].
For each element x that gets placed at some location y, the circular distance between y and h(x) (that is, y − h(x) if h(x) ≤ y, and m + h(x) − y otherwise) is called its displacement.Displacement is both a measure of the cost of inserting x and of the cost of searching x in the table.Total displacement corresponding to a sequence of hashed values is the sum of the individual displacements of elements.and it determines the construction cost of the table.
Linear probing hashing has been the object of intense study; see the table on results and the bibliography in [19, pp. 51-54].The first published analysis of linear probing was done by Konheim and Weiss [34].In addition, there is also special value for these problems since the first analysis of algorithms ever performed by D. Knuth [26] was that of linear probing hashing.As Knuth indicates in many of his writings, the problem has had a strong influence on his scientific carreer.Moreover, the construction cost to fill a linear probing hash table connects to a wealth of interesting combinatorial and analytic problems.More specifically, the Airy distribution that surfaces as a limit law in this construction cost is also present in random trees (inversions and path length), random graphs (the complexity or excess parameter), and in random walks (area) [33,15].
Operating primarily in the context of double hashing, several authors [5,2,20] observed that a collision could be resolved in favor of any of the keys involved, and used this additional degree of freedom to decrease the expected search time in the table.We obtain the standard scheme by letting the incoming key probe its next location.So, we may see this standard policy as a first-come-first-served (FCFS) heuristic.Later Celis, Larson and Munro [8,9] were the first to observe that collisions could be resolved having variance reduction as a goal.They defined the Robin Hood heuristic, in which each collision occurring on each insertion is resolved in favor of the key that is farthest away from its home location.Later, Poblete and Munro [43] defined the last-come-first-served (LCFS) heuristic, where collisions are resolved in favor of the incoming key, and others are moved ahead one position in their probe sequences.These strategies do not look ahead in the probe sequence, since the decision is made before any of the keys probes its next location.As a consequence, they do not improve the average search cost.
It is shown in [7] that the Robin Hood linear probing algorithm minimizes the variance of all linear probing algorithms that do not look ahead.This variance, for a full table, is Θ(m), instead of the Θ(m 3/2 ) of the standard algorithm.They derived the following expressions for the variance of C m,n , the successful search time Moreover, in [25] and [48], a distributional analysis for the FCFS, LCFS and Robin Hood heuristic is presented.More specifically, for the Robin Hood heuristic, they obtain where C α (z) is the probability generating function of the successful search time when m, n → ∞ and m/n = α, 0 ≤ α < 1.
These results consider a hash table with buckets of size 1.However, very little is known when we have tables with buckets of size b.In [4], Blake and Konheim studied the asymptotic behavior of the expected cost of successful searches as the number of elements and buckets tend to infinity with their ratio remaining constant.Mendelson [36] derived exact formulae for the same expected cost, but only solved them numerically.These papers consider the FCFS heuristic.In [49] the first exact analysis of a linear probing hashing scheme with buckets of size b is presented.In that paper, they find the expected value and the asymptotic behavior of the average cost of successful searches when the Robin Hood heuristic is used.One of their main methodological contributions is the introduction of a new sequence of numbers T k,d,b for 0 ≤ d < b (that we call Tuba Numbers (i) ), and is one of the key components of the analysis presented in this paper.This is sequence EIS A124453 in Neil Sloane's Encyclopedia of Integer Sequences.
In this paper we complete the work presented in [49], and find the distribution for the search cost of a random element when we construct a linear probing hash table using the Robin Hood heuristic, in tables with buckets of size b.As far as we know this is the first distributional analysis of a hashing scheme with buckets of size b.An open problem presented in the first edition of [29] requested for the average search cost of a random element in a linear probing hash table with buckets of size b.This problem was solved in [49], and this paper generalize this result.
More specifically, we give the distribution for the search cost of a random element for bα-full tables (0 ≤ α < 1), as n, m → ∞ while n/m = bα.These results can also be derived for tables of fixed length m and size n, by the use of the Poisson transform presented in Section 3.However, the formulae lead to very lengthy and complicated expressions, and so we generally leave the results only in the Poisson model.It is mandatory to acknowledge that several of the main technical results needed to present this analysis have been presented in [4].These contributions are a key component to find the generating functions for the Tuba Numbers, that lead to exact expressions for the distributions presented in this paper.More specifically in Lemma 3.1 (page 595) they characterize the sequence T k,0,b (d = 0), and in Theorem 4.1 (page 602) they give its generating function.Other important related results presented in that paper are used as a starting basis for our analysis.Nevertheless, they do not exploit the combinatorial structure of the problem as presented in [15].This combinatorial structure allows us to find powerful results based on the methodology presented in [16].As a consequence, when we interpret the results in [4] under a combinatorial point of view, we may generalize those theorems and find the generating functions for the Tuba Numbers for all values of d.
Another main contribution of the paper (as a consequence of the methodology used to analyze the Robin Hood algorithm) is the full distribution of the number of cars that overflow in the parking problem with buckets.This problem was introduced in [34] and in [29] the problem is presented as "A certain one-way street has m parking spaces in a row numbered 1 to m.A man and his dozing wife drive by, and suddenly, she wakes up and orders him to park immediately.He dutifully parks at the first available space [. . .]."This problem has been extensively studied, and some later references are [39,11,10,40].A general framework for this problem can be found in [16,29].We have to notice that some the results presented in [39] could be derived from the results in [21,48] by depoissonization, since the analysis of Robin Hood needs as a subproblem the analysis of the parking problem.In this paper we give the distribution for the number of cars that overflow, when α < 1.This is the first result for the parking problem with buckets of size b > 1.
This paper is organized as follows.Section 2 and 3 are devoted to present some basic mathematical background used in the analysis, like the Tree Function and the Poisson Transform.It continues in Section 4 with the presentation of the Robin Hood Linear Probing Hashing Algorithm and in Section 5 with its relation with the Parking Problem.The main methodological contributions of the paper are presented in Section 6 where a combinatorial characterization of Linear Probing Hashing introduced already in [33,15] is used to proper interpret and generalize the results presented in [4], and in Section 7 where a new sequence of numbers T k,d,b (called Tuba Numbers), which is a key tool to perform this analysis, is studied.Finally in Section 8 a distributional analysis of the Parking Problem is presented and in 4 the distributional analysis of the search cost of a random element in the Robin Hood Linear Probing Hashing is derived, including the exact expressions in the Poisson Model for all the factorial moments.
2 The tree function and the Q functions One of the main characters in this paper is the tree function that is defined implicitly by T (z) = ze T (z) (and so T (ze −z ) = z) and that appears originally in problems related with the counting of rooted labeled trees [16,22,37,50].The Lagrange inversion theorem provides a number of series expansions like where . Most generating functions in this paper involve rational fractions in T (z) with denominators that are powers of (1 − T ) −1 .Lagrange inversion also provides The asymptotic form of coefficients of any rational function of T is also directly recovered by singularity analysis [18,38].An application of the method requires the singular expansion of T (z), itself obtained from the implicit function theorem.
Lemma 1.The function T (z) has a unique dominant singularity at z = 1/e, and its singular expansion in a slit neighbourhood of 1/e is where In close association with the tree function is what Knuth has popularized under the name of the "Ramanujan Q-function".This function [3,28,30,29,16] and its close relatives play a central rôle in the analysis of many algorithms and data structures -hashing with linear probing [26,29], union-find algorithms [32], interleaved memory [31], optimal caching [27], and random mappings [6,17,30], most notably.The Q-function is defined by or, in a way that is equivalent thanks to (2), Singularity analysis of the generating function yields immediately An asymptotic series for Q(n) was first derived by Ramanujan [3], and tight estimates are obtained in [14].
For the purpose of expressing the average-case analysis of sparse tables, Knuth [29] has extended the Ramanujan Q-function as Then, by differentation, for some polynomials V r (m, n) that can be mechanically obtained from (7).For α fixed with 0 ≤ α < 1, basic asymptotic approximations entail and where c and r are fixed.See [42] for a general framework.

The Poisson Transform
There are two standard models that are extensively used in the analysis of hashing algorithms: the exact filling model and the Poisson filling model.Under the exact model, we have a fixed number of keys, n, that are distributed among m locations, and all m n possible arrangements are equally likely to occur.Under the Poisson model, we assume that each location receives a number of keys that is Poisson distributed with parameter λ, and is independent of the number of keys going elsewhere.This implies that the total number of keys, N , is itself a Poisson distributed random variable with parameter λm: This model was first considered in the analysis of hashing by Fagin et al [12] in 1979.Consider a hash table of size m with n elements, in which conflicts are resolved by open addressing using some heuristic.Let P be a property (e.g.cost of a successful search) of a random element of the table, and f m,n be the result of applying a linear operator f (e.g. an expected value) to the probability generating function of P that was found using the exact filling model.Then fm (x), the result of computing the same linear operator f to the probability generating function of P computed using a model with m random independent Poisson distributed objects each with parameter x, is We may use (10) to define P m [f m,n ; x], the Poisson transform (also called Poisson generating function [13,24]) of f m,n , as If P m [f m,n ; λ] has a MacLaurin expansion in powers of λ, then we can retrieve the original sequence f m,n by the following inversion theorem [21]: In this paper we consider λ = bα.
The results obtained under the Poisson filling model can also be interpreted as an approximation of those one would obtain under the exact filling model when n, m → ∞ with n = bαm.This approximation can be formalized by means of an asymptotic expansion.Poblete, in [42], presents an approximation theorem and gives an explicit form for all the terms of the expansion. where and k k−j denotes the Stirling numbers of the first kind.For most situations and applications, this approximation is satisfactory.However, it cannot be used when we have a full, or almost full table (α is very close to 1).

The Robin Hood Linear Probing Hashing Algorithm
We follow the ideas presented in [48] and [49].Figure 1 shows the result of inserting elements with the keys 36, 77, 24, 69, 18, 56, 97, 78, 49, 79, 38 and 10 in a table with ten buckets of size 2, with hash function h(x) = x mod 10, and resolving collisions by linear probing using the Robin Hood heuristic.When there is a collision in location i, then the element that has probed the least number of locations, probes location (i + 1) mod m.In the case of a tie, we (arbitrarily) move the element whose key has largest value.
Figure 2 shows the partially filled table after inserting 58.There is a collision with 18 and 38.Since there is a tie (all of them are in their first probe location), we arbitrarily decide to move 58, the largest key.Then 58 is in its second probe location, 78 also, but 49 is in its first one.So 49 has to move.Then 49, 69, 79 are all in their second probe location, so 79 has to move to its final position by the tie-break policy described above.
The following properties are easily verified: • At least one element is in its home location.• The keys are stored in nondecreasing order by hash value, starting at some location k and wrapping around.In our example k=4 (corresponding to the home location of 24).
• If a fixed rule (that depends only on the value of the keys and not in the order they are inserted) is used to break ties among the candidates to probe their next probe location (eg: by sorting these keys in increasing order), then the resulting table is independent of the order in which the elements were inserted [8].

Linear Probing Sort and the Parking Problem
To analyze Robin Hood linear probing, we first have to discuss some ideas presented in [7,48,49] and [21].When the hash function is order preserving (that is, if x < y then h(x) < h(y)), a variation of the Robin Hood linear probing algorithm can be used to sort [21], by successively inserting the n elements in an initially empty table.In this case, instead of letting the excess elements from the rightmost location of the table wrap around to location zero, we can use an overflow area consisting of locations m, m + 1, etc.The number of locations needed for this overflow area is an important performance measure for this sorting algorithm.This problem is related to the study of the cost of successful searches in the Robin Hood linear probing algorithm, as follows.Without loss of generality, we search for an element that hashes to location 0.Moreover, since the order of the insertion is not important, we assume that this element is the last one inserted.If we look at the table after the first n elements have been inserted, all the elements that hash to location 0 (if any) will be occupying contiguous locations, near the beginning of the table.The locations preceding them will be occupied by elements that wrapped around from the right end of the table, as can be seen in Figure 2. The key observation here is that those elements are exactly the ones that would have gone to the overflow area.Furthermore, it is easy to see that the number of elements in this overflow area does not change when the elements that hash to 0 are removed.As a consequence, the cost of retrieving an elements that hashes to 0 can be divided in two parts.
• The number of elements that wrap around the table.In other words, the size of the overflow area.
• The number of elements that hash to location 0.
In Section 8 we study the distribution of the number of the elements that overflow, and in Section 9 we study the distribution for the successful search cost of a random element.In this setting, linear probing sort is the parking problem introduced in [34].
In this paper we give the distribution for the number of cars that overflow when α < 1, generalizing to b > 1 some of the results presented in [48,21,39] for the case b = 1.We thank very much to an anonymous referee for pointng us out that in [46] an exact formula for the generating function of the overflow (called defective bucket parking functions) in the exact filling model has been derived by a rather simple and direct approach.Thus in that thesis an equivalent of the main theorem of Section 8 (Theorem 9) is presented.
We work in the Poisson model, since the presentation is much simpler than in the exact model.Then, with the use of the Poisson transform, we may obtain the exact results for fixed m, n, and b.It is important to note that in [21,39,48] the exact distribution of the elements that overflow when b = 1 and in [49] the average number of elements that overflow (for general b) are calculated in the exact model.This analysis brings a new point of view on the problem.The key element to solve this problem is the use of a new sequence of numbers that we call Tuba Numbers  By circular symmetry [29] and for nonfull tables (n < bm), we may freely assume that one of the empty locations belongs to the rightmost bucket.This assumption of a last empty location in nonfull tables is made from now onwards.When all the empty places belong to the last bucket of the table we say that such a table is almost full.For a given length m there are b almost full tables since the last bucket may contain from 0 to b − 1 empty locations.
As a consequence, a general table decomposes as a labeled product of clusters (sometimes also figuratively called "islands") that are, up to relabeling, almost full tables.Furthermore, it will be enough to study almost full tables, and then generalize the results for general tables in a similar way as it is done in [15], by using the sequence construction for labelled structures as presented in [16].
We present now a combinatorial interpretation of several results presented in [4], that will bring light to some important generalizations.Let F bn+d be the number of ways to construct an almost full table of length n + 1 and size bn + d (that is, there are b − d empty slots in the last bucket).Define also In this setting N d (z, w) is the generating function for the number of almost full tables with more than d empty locations in the last bucket.
The elementary symmetric functions of variables γ j (z) are defined as the coefficients {σ k (z)} of the polynomial b k=0 σ k (z)x n−k = b−1 j=0 (x + γ j (z)).Let r be a primitive b-th root of unity and σ k (z) be the k-th elementary symmetric function of the variables {T (r j z), 0 ≤ j < b}, where T is the Tree function.Lemma 2.3 (page 594) in [4] states and formula 3.8 (page 597) states Moreover, since T (αe since the product is 0 because when i = 0, the factor is 0. Let also Q m,n,d be the number of ways of inserting n elements into a table with m buckets of size b, so that a given (say the last) bucket of the table contains more than d empty slots.The bucket size b remains fixed, so we do not include it as a subscript in the sequel.There cannot be more empty slots than the size of the bucket so Q m,n,b = 0.For each of the m n possible arrangements, the last bucket has 0 or more empty slots, and so Q m,n,−1 = m n .Observe that Q m,n,0 gives the number of ways of inserting n elements into a table with m buckets, so that the last bucket is not full.For notational convenience, we define Q 0,n,d = [n = 0] (following the notation presented in [23] we use [S] to represent 1 if S is true, and 0 otherwise).Let also define Then, Λ(z, w) is the generating function for the number of ways to construct hash tables such that their last bucket is not full.After a somehow tedious calculation, Lemma 3.2 (page 597) in [4] states Identity ( 16) could be directly derived, by interpreting a general hash table with more than 0 empty locations in its last bucket as a non-empty sequence of almost full tables, all of them with more than 0 empty locations in their last bucket.The generating function for Λ 0 (z, w) then follows by standard combinatorial techniques as presented in [16].This combinatorial interpretation allows a natural generalization of these results, to find Λ d (z, w) for all 0 ≤ d ≤ b − 1: a general hash table with more than d empty slots in its last bucket is a sequence of almost full table with more than 0 empty locations, followed by an almost full table with more than d empty slots.As a consequence, we directly deduce: Lemma 2.
We may find explicit expressions for these generating functions with the use of the Tuba Numbers.
It does not seem possible to find a closed formula for Q m,n,d .This sequence of numbers is important since However, a new approach to the study of the numbers Q m,n,d , is presented in [49], where a new sequence of numbers T k,d,b is introduced that satisfies a recurrence resembling that of the Bernoulli numbers.This new sequence may be helpful in solving problems involving recurrences with truncated generating functions. Let then Theorem 3 translates into the recurrence relation for The main problem is that we are dealing with a recurrence that involves truncated generating functions.
The strategy presented in [49] consists in finding an exponential generating function T d (u) such that where k! , for some coefficients T k,b,d to be determined, and independent of m.Here, b is an implicit parameter, and we use the expression T k,d .
The intuition behind this idea is as follows.From (17), we obtain Q m,d (u) by multiplying the truncated generating function Q m−1,d (u) by the series e u and then taking only the first bm − d terms of it (monomials of degree 0, 1, up to , bm − d − 1).Moreover, Q 0,d (u) is the first term of e u .It is clear that without any truncations Q m,d (u) would be e mu .However we have to consider a correcting factor originated by these truncations and this is the reason for defining this generating function T d (u).Then (18) gives a non recursive definition of Q m,d (u) that involves the truncated product of two series.The interesting aspect of this approach is that T d (u) does not depend on m.Furthermore, the only dependency on m is captured in the well known series that converges to e mu .
The Tuba Numbers T k,d satisfy some nice properties.For example, the next two theorems are proved in [47].The following can indeed be used as definition.
A very curious property of these numbers is that translates into Equation ( 21) is very useful to simplify several expressions in the analysis.In this paper we generalize equation (21).One of the key observations is that T d (bα) is the limit of the Poisson transform of Q m,n,d /m n (the probability that a given bucket contains more than d empty slots) when m, n → ∞, n = bαm and α < 1 since As a consequence T d (bα) is the probability that a random bucket has more than d empty locations when m, n → ∞, n = bαm and α < 1.The rate of convergence to this limit value is exponentially small.
is the probability, in the Poisson Model, that a given bucket contains more than d empty slots when bmα elements are inserted in a hash table with m buckets of size b, using linear probing as collision resolution scheme.Then, when α < 1, Proof: We first have to notice that when α < 1, then Since the last sum converges when α < 1, then Theorem 6 generalizes Theorem 3.1 in [21] since the latter only considers the case b = 1 and d = 0.The identity Υ m,d (bα) = e −mbα T d (bα)e mbα bm−d−1 leads to the following lemma that will be used in the proof of Theorem 9.
The rest of this section is devoted to find an explicit expression for T d (bα).The generating function T 0 (bα) has already been studied in [4]: where T is the Tree function and r is a b-th root of unity.
This result can be generalized for T d (bα) for all 0 ≤ d ≤ b − 1.We need first to prove an important Lemma.
Proof: The decomposition of a general full table as a sequence of almost full tables leads to The n elements inserted can be divided in the bj + s elements that go into the last cluster of length j + 1 and n − bj − s elements that go in the rest of the table of length m − 1 − j.Since the last cluster should have more than d empty slots, then s should be less than b − d.We then have to multiply for the number of ways to construct these partial tables (F bj+s and Q m−1−j,n−bj−s,0 ), and sum over all possible values of j and s.If we then divide by m n and apply the Poisson Transform to both sides of ( 22) we have Lemma 4 relates T 0 (bα) with T d (bα).Notice that since by equation ( 15) N 0 (bα, e −α ) = 1 then Lemma 4 is also valid for d = 0.A generalization of equation (15) to N d (bα, e −bα ) will then lead to explicit expressions for T d (bα).We use again elementary symmetric functions.
Proof: By equation ( 13) As a consequence We then have Lemmas 4 and 5 lead to the main theorem of this section: Theorem 8.
where T is the Tree function and r is a b-th root of unity.
The following Corollary will be very useful in the analysis of the parking problem with buckets.
8 The parking problem with buckets Notation.Given a bivariate function G(z, α) we define We first state the main Theorem of this section.
Theorem 9. Let w m,bα,k be the probability of having k cars going to overflow in a bα-full table with m buckets of size b and α < 1, and Then Remark 1.When b = 1 then, in the terminology of [39], w mbα,k is the probability of having a defective parking function of defect k.
Proof of Theorem 9: The proof follows closely the ideas presented in [21].We first derive a recurrence for Ω m (bα, z), and then solve it.Let define Since we have a Poisson process, with probability e −bα α k k! the last bucket receives, in addition to the elements that overflow from the previous bucket, k elements that hash to it.From these elements, all but b of them go to overflow, and their contribution to the recurrence is However, when the last bucket receives less than b elements in total, there is no overflow, and so we need a correction term.This correction term is Notice that Υ m,b (bα) = 0 since Q m,n,b = 0 because there cannot be more than b empty locations in a bucket.As a consequence, from equations ( 24) and ( 25) we obtain the following recurrence If we can establish that the sequence Ω m (bα, z) converges to Ω(bα, z) when m → ∞, then we finally get The result would then follow by equation ( 27) and Corollary 1.
The rest of the proof is devoted to prove the existence of such a limit.Let M (bα, z) = e bα(z−1) Then, by equation ( 26) we have As a consequence, when α < 1, the convergence from Ω m (bα, z) to Ω(bα, z) is established by equation ( 28) and Theorem 6.It remains to prove that for n = 0 . . .bm − 1 (α < 1), then By equation (27) and Lemma 3 we have Moreover, and so As a consequence, by considering all the contributions in the intersection of all these ranges of p when i = 1..m, we have Distributional Analysis of the Parking Problem and Robin Hood Linear Probing Hashing with Buckets 323 On the other hand, it is not difficult to see that and so We have then proved that when n ≤ bm − 1 (α < 1), then [α n ] Ω(bα, z) = [α n ] Ω m (bα, z).We remark than when n ≥ bm, then this derivation only guarantees equality from powers of z that are greater to zero (n − bm + 1 > 0), and so [α n ] Ω(bα, z) and [α n ] Ω m (bα, z) may differ.
It is very important to notice that Theorem 9 allows the use of Theorem 1 with Ω(bα, z) to obtain w m,n,k in the exact filling model when n ≤ bm − 1, since it only needs coefficients of powers of bα less than n.In any case, when n ≥ bm the range of potential non-zero coefficients of powers of z is included in n − bm + 1 . . .n − b + 1, so, at least formally, the use of Theorem 1 would return the exact probabilities in the exact filling model also for n ≥ bm.
When b = 1 we rederive the result presented in [48] Moreover, some of the results presented in [39] could be derived from the results in [48,21] by depoissonization.The probabilities can be extracted from equation (29), by expanding powers of z, and then the results in the exact filling model follow from depoissonization, after expanding those probabilities in a power series in α.An alternative approach would be to depoissonize the distribution (29) and then extract the coefficients.For general b however, the depoissonization is rather complicated by the expressions related with the Tuba Numbers.As a consequence we present the results in the Poisson Model, and translate them in the exact model when the final expressions are relatively simple.
The first b probabilities are given in the following theorem: Theorem 10.
More specifically, the probability that no car overflows is w bα,0 = e bα T b−1 (bα).
When b = 1 then T 0 (α) = (1 − α), and then we rederive the known result w α,0 = e α (1 − α) presented in [34].More generally we may expand A(z, bα) to obtain Notice, that in Theorem 10 we use expansion (30) only for n < b and so j = 0. We may now find all the set of probabilities by using expansion (30), to obtain Theorem 11.For all k ≥ 0 we have It does not seem possible to find a simple way to do depoissonization to the coefficients w m,n,k .From equation (23) we may also find the expected number of cars that overflow from a random bucket.
Theorem 12. Let Ω m,bα be the r.v for the number of cars that overflow from a bα-full table with m buckets of size b and α < 1.Then Proof: The result follows by taking derivatives with respect to z in equation ( 23), and noticing that We may expand the generating functions of the quasi-inverses of the Tree functions, and then use the depoissonization theorem.As a consequence we obtain: Corollary 2. Let Ω b,m,n be the r.v for the number of cars that overflow from a hash table of length m and size n with buckets of size b.Then It is important to notice that these results have already been presented in [49].For b = 1 and after using the identity where Q 0 (m, n) = i≥0 n i m i is the Ramanujan Q function.

Analysis of Robin Hood Linear Probing
In this section we find the distribution of the cost of a successful search for a random element in a hash table of size m → ∞ that contains n = bαm elements with 0 ≤ α < 1.
Let Ψ m (bα, z) be the probability generating function for the cost of a successful search for a random element in a bα-full table with m buckets of size b and α < 1.
As mentioned in Section 5 the cost of retrieving a random element is composed by all the elements that hash to the same location (collisions), plus the number of elements that overflow from the previous location.We first derive the generating function C m (bα, z) for the total displacement, that is, the generating function for the total number of comparisons, without considering the fact that we have to count only the number of buckets probed.Then, if where r = e 2πi b is a b-th root of unity.Since the calculations are very involved we extract exact coefficients when possible, and then use the Poisson Transform to find the results in the exact model.A similar approach is used to find higher moments.The result follows after substituting all these identities in equation (41).
From Theorem 13 we may derive exact expressions for the factorial moments.We present here the main results in the Poisson model, that generalize the results presented in [48] leaving the details of the derivations since they are mechanical manipulation of the generating functions and their derivatives.The corresponding expressions in the exact model can be obtained with the use of the Poisson transform, although they are extremely complicated.
We first obtain the exact expression for all the factorial moments of Ψ m,bα .Given the probability generating function Ψ m (bα, z), then E[Ψ r m,bα ] can be can be obtained by differentiating this generating function r times and setting z = 1.
Theorem 15.Let Ψ m,bα be the random variable for the cost of searching a random element in a bα-full table with m buckets of size b and α < 1, using the Robin Hood linear probing hashing algorithm, and let Ψ m (bα, z) be its probability generating function.Then It is important to notice that the main asymptotic contribution when α → 1 is given by the first sum of equation (42), while the last two sums vanish for moments greater than 2.

b s=1 1 −
z −s P m,s (bα), where P m,s (bα) is the probability (in the Poisson Model) of having b − s elements in the last bucket.As we have seen in Section 7 this value is equal to Υ m,s−1 (bα) − Υ m,s (bα), and so the contribution of this correction term is b s=1 (−1) r r!, we rederive the known result presented in[21]

CC m bα, r d z 1/b b− 1 p=0 r d z 1
m (bα, z) = i≥0 c m,i (bα)z i , the probability generating function for the cost of a successful search isΨ m (bα, z) = z i≥0 c m,i (bα)z /b −p ,

6
[15]inatorial characterization of Linear Probing HashingUnder a combinatorial point of view Linear Probing can be seen as a sequence of almost full tables[15].