The distribution of m-ary search trees generated by van der Corput sequences

. We study the structure of m -ary search trees generated by the van der Corput sequences. The height of the tree is calculated and a generating function approach shows that the distribution of the depths of the nodes is asymptotically normal. Additionally a local limit theorem is derived.


Introduction
In the last years, the height of binary search trees generated by sequences which are uniformly distributed modulo 1 has been studied. Devroye [3] has shown for the Weyl sequences {nα} that the height of the tree with its first N elements satisfies H(N) ∼ 12 π 2 log N log log N for almost all α ∈ (0, 1). The minimal height is attained for the golden mean α = ( √ 5 + 1)/2, H(N) ∼ log N/ log α, and the maximal height is almost as large as the theoretical maximum for binary search trees. More precisely, for every sequence (c N ) N≥1 which decreases monotonically from 1 to 0, we have some α such that H(N) ≥ c N N infinitely often (Devroye and Goudjil [5]).
For general uniformly distributed sequences modulo 1, Dekking and van der Wal [1] have shown and that, for every c ≥ 1/ log 2, we have sequences with H(N) ∼ c log N. † This research was supported by the Austrian Science Foundation FWF, grant S8302-MAT.

1365-8050 c 2004 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France
Devroye and Neininger [6] studied random suffix search trees, which are binary search trees generated by the suffixes S n = 0.B n B n+1 B n+2 . . . of independent identically distributed random q-ary digits B 1 , B 2 , . . . for some q ≥ 2. For these trees, the expected value of the depth of S N is given by Note that the suffixes are uniformly distributed modulo 1 with probability 1.
For random binary search trees of size N, it was shown by several authors that the expected value of the depth of a node is again 2 log N + O (1) and we know from Mahmoud and Pittel [9] and Devroye [2] that the distribution of the depths is asymptotically normal with variance 2 log N.
A natural generalization of binary search trees are m-ary search trees, which are constructed by placing the first m − 1 keys in the root, sorted in increasing order from left to right, then guiding a subsequent key to the ℓth subtree of the root, 1 ≤ ℓ ≤ m, if that key is greater than exactly ℓ − 1 of the root keys. In the ℓth subtree, the newcomer is subjected recursively to the same procedure until a node with less than m − 1 keys is found.
Mahmoud and Pittel [10] showed that the distribution of the depths in random m-ary search trees is asymptotically normal with mean value This and other limit laws for various kinds of trees can also be found in Devroye [4].
In this article, we consider m-ary search trees generated by particular uniformly distributed sequences modulo 1, the van der Corput sequences (φ q (n)) n≥1 , where we omit n = 0 for convenience. Let n = ∑ j≥0 ε j (n)q j be the (unique) q-ary digital expansion with digits ε j (n) ∈ {0, 1, . . . , q − 1} for some integer q ≥ 2. Then the van der Corput sequence to the base q is defined by the radical-inverse function Let d(n) denote the depth of the node containing the nth element of the sequence. Besides the height H(N) = max n≤N d(n), we will study the distribution of d(n). To that end, we define a sequence of (discrete) random variables X N by i.e., X N is the depth of a key randomly chosen among the first N keys inserted into the tree.

Results
Throughout the paper let M = ⌊log q m⌋ be the integer part of the logarithm to the base q of m.

Theorem 1 The height of the tree is given by
where h q,M (x) is determined by the sequence Let J, p be the lengths of the preperiod and the period of µ j , i.e., µ j+p = µ j for all j > J. Then we have The functions h q,M : [1, q) → [0, 1) are monotonically increasing functions.

Theorem 2 Expected value and variance of X N are given by
with constants µ and σ given by (16) and (18). For m = q M , m = 2 (binary search trees) and q = 2 (the binary van der Corput sequence), we have simple formulae for µ and σ: For m = q M , we have µ ∈ ( 1 M+1 , 1 M ) and σ 2 > 0. The main result concerns the distribution properties of X N . We prove asymptotic normality in the weak sense and provide a local limit law.

Theorem 3
If m = q M , then we have, for every δ > 0, uniformly for all real x as N → ∞ and uniformly for all nonnegative integers k as N → ∞.
The (easy) case m = q M is treated in Section 3. The crucial part for all other cases is contained in Section 4, where the structure of the tree is analyzed and its generating function is calculated. Section 5 is devoted to the height of the tree, i.e., Theorem 1. Formulae for mean value and variance are derived in Section 6. The two parts of Theorem 3 are proved in Sections 7 and 8. These proofs are adapted from Drmota and Gajdosik [7]. Finally, the values of µ and σ for binary search trees and the binary van der Corput sequence are calculated in Section 9.
and the results for m = q M are proved.

Generating function
Define the bivariate generating function of the tree by i.e., z counts the depth of the elements and u the time of their insertion in the tree.
for some polynomials Q(z, u) and P(z, u) determined by (13) and Proof. For m = q M , the considerations of the previous section give The leftmost subtree contains therefore all keys φ q (n) < .0 M 1, i.e., those with prefix 0 M+1 . As in the case m = q M , this subtree has the same shape as the whole tree, and its generating function is zu M+1 B(z, u).
If m − 1 ≥ 2q M and q > 2, then the second smallest key in the root is φ q (2q M ) = .0 M 2. In this case, the second subtree contains all keys with prefix 0 M 1 (except the key .0 M 1) and its generating function is again zu M+1 B(z, u).
The other possibility for the second smallest key (m > 2) is φ q (q M−1 ) = .0 M−1 1. Then the second subtree contains all keys with prefix 0 M satisfying φ q (n) > .0 M 1. For q = 2, this is the same tree as in the latter case and its generating function is zu M+1 B(z, u). For q > 2, this means (if we omit the prefix 0 M since it does not change the structure) that we start with n = 2 and consider just those n with ε 0 (n) ≥ 1. Call this tree T 1 . In general, let T i , 0 ≤ i < q − 1, be the tree generated by the van der Corput sequence starting with n = i + 1 and omitting the n's with digit ε 0 (n) < i, i.e., just take the keys φ q (n) > .i = i/q. Denote its generating function by B i (z, u). Then the second subtree contributes zu M B 1 (u, z) to B(z, u). Furthermore note that T 0 is the whole tree and B 0 (z, u) = B(z, u).
The other subtrees have a similar structure. If n + q M ≤ m − 1 or ε M (n) = q − 1, then the contribution to the generating function is zu M+1 B(z, u). In the other cases, the tree is of type T ε M (n) and the contribution is zu M B ε M (n) (z, u).
We have thus, for m < (q − 1)q M , The sequence constituting and finally for M i = 0, with A(u) = (P i j (u)) 0≤i, j≤q−2 and B(z, u) = B 0 (z, u) is given by where I q−1 denotes the (q−1)-dimensional identity matrix, and the first equation determines P(z, u), Q(z, u). The functions F(z, u), G(z, u) are obtained by recurrently replacing the B i (z, u)'s, i > 0, in the equation for B 0 (z, u) by their expressions given in (10)-(12), for all i > 0 and some ρ > 0. Then, for (z, u) ∈ D ρ , the coefficients of B i (z, u) tend to 0 in the above expression of B 0 (z, u) and we obtain F(z, u) .
For the same reasons, F(z, u) and G(z, u) are analytic in D ρ . The f jk and g jk are nonnegative because the coefficients of Q i (u) and P i j (u) are positive. ✷

Height
For every k ≥ 0, we look for the minimal j such that b jk = 0. Since all Q i (u) have a constant term, this is, by (13), the minimal exponent of u in the first row of A(u) k . The ℓth element of this row is the sum of P 0s 1 (u)P s 1 s 2 (u) . . . P s k−1 s k (u) over all sequences s 1 , . . . , s k with s k = ℓ.

Recall from the last section that the ith row of A(u) consists of terms with exponent M i and (in the majority of the cases)
Thus the minimal exponent of u in the first row of A(u) k can be found by recursively choosing the minimal s 1 such that P 0s 1 (u) has a term u M 0 , the minimal s 2 such that P s 1 s 2 (u) has a term u M s 1 and so on.
The minimal j such that b jk = 0 is therefore and the height for N = q j+1 − 1 with this j is (1) is proved. It is easy to see that, for all k ≥ 0, M η 0 + · · · + M η k−1 does not decrease if m q M increases and thus the h q,M 's are monotonically increasing.

Expected value and variance
For m = q M , expected value and variance have been calculated in Section 3. Thus we can restrict to m = q M . The first step is to obtain proper information about b j (z).

Lemma 2 If we set L = ⌊log q N⌋ and d(0)
The element ∑ L s=ℓ+1 ε s (N)q s + cq ℓ + n with q j ≤ n < q j+1 and j < ℓ is located in a subtree under the node containing n. Its depth is therefore that of n plus some additional depth depending on the shape of the subtree (see the proof of Lemma 1), which can be bounded by d(∑ L s=ℓ+1 ε s (N)q s− j + cq ℓ− j ) + 2.
The depth of the remaining O (L) terms can be estimated by the height of the tree, O (L).
✷ Now, we can calculate the mean value Thus (3) is proved and we have, since F(z, u(z)) = 1 for u(z) = 1/q(z), where F can be replaced by P.
For the variance, we have to be more careful. First we distinguish the elements by their place inside the node and the type of the node, in order to obtain for some analytic functions G θ (z, u) (in D ρ ). This allows to refine (15) to and the variance is Thus (4) is proved with The last equation holds because of for all j, k with f jk > 0 and we have some j, k such that kM < j and some j, k such that j < k(M + 1) (see the proof of Proposition 1). Furthermore, for m = q M , j/k is not equal for all j, k with f jk > 0 which implies σ 2 > 0.

Global limit law
Now, we prove the asymptotic normality of X N . Observe that its characteristic function is Proposition 2 Suppose m = q M and set µ N = E X N , σ 2 N = V X N . Then for every δ > 0, we have uniformly for |t| ≤ (log N) 1 and, by using Proposition 1, in an open (real) neighbourhood of t = 0. By Lemma 2, we obtain ∑ k≥0 a Nk e ikt = which implies (19) directly for |t| ≤ (log N) δ/3 . For |t| > (log N) δ/3 , we have for some c > 0, which again implies (19). ✷ We can now prove the first part of Theorem 3. Set Then, by Esseen's inequality [8, p. 32], we have Finally, and Proposition 3 is proved.