Asymptotic variance of random symmetric digital search trees

Asymptotics of the variances of many cost measures in random digital search trees are often notoriously messy and involved to obtain. A new approach is proposed to facilitate such an analysis for several shape parameters on random symmetric digital search trees. Our approach starts from a more careful normalization at the level of Poisson generating functions, which then provides an asymptotically equivalent approximation to the variance in question. Several new ingredients are also introduced such as a combined use of the Laplace and Mellin transforms and a simple, mechanical technique for justifying the analytic de-Poissonization procedures involved. The methodology we develop can be easily adapted to many other problems with an underlying binomial distribution. In particular, the less expected and somewhat surprising $n(\log n)^2$-variance for certain notions of total path-length is also clarified.


Introduction
The variance of a distribution provides an important measure of dispersion of the distribution and plays a crucial and, in many cases, a determinantal rôle in the limit law 1 . Thus finding more effective means of computing the variance is often of considerable significance in theory and in practice. However, the calculation of the variance can be computationally or intrinsically difficult, either because of the messy procedures or cancellations involved, or because the dependence structure is too strong or simply because no simple manageable forms or reductions are available. We are concerned in this paper with random digital trees for which asymptotic approximations to the variance are often marked by heavy calculations and long, messy expressions. This paper proposes a general approach to simplify not only the analysis but also the resulting expressions, providing new insight into the methodology; furthermore, it is applicable to many other concrete situations and leads readily to discover several new results, shedding new light on the stochastic behaviors of the random splitting structures.
A binomial splitting process. The analysis of many splitting procedures in computer algorithms leads naturally to a structural decomposition (in terms of the cardinalities) of the form structure of size n substructure of size B n substructure of sizeB n Here B n ≈ Binomial and B n +B n ≈ n.
where B n is essentially a binomial distribution (up to truncation or small perturbations) and the sum of B n +B n is essentially n.
Concrete examples in the literature include (see the books [15,28,44,50,62] and below for more detailed references) • tries, contention-resolution tree algorithms, initialization problem in distributed networks, and radix sort: B n = Binomial(n; p) andB n = n−B n , namely, P(B n = k) = n k p k q n−k (here and throughout this paper, q := 1 − p); • bucket digital search trees (DSTs), directed diffusion-limited aggregation on Bethe lattice, and Eden model: B n = Binomial(n − b; p) andB n = n − b − B n ; • Patricia tries and suffix trees: P(B n = k) = n k p k q n−k /(1 − p n − q n ) andB n = n − B n .
Yet another general form arises in the analysis of multi-access broadcast channel where B n = Binomial(n; p) + Poisson(λ), B n = n − Binomial(n; p) + Poisson(λ), see [19,33]. For some other variants, see [2,6,25]. One reason of such a ubiquity of binomial distribution is simply due to the binary outcomes (either zero or one, either on or off, either positive or negative, etc.) of many practical situations, resulting in the natural adaptation of the Bernoulli distribution in the modeling.

Poisson generating function and the Poisson heuristic.
A very useful, standard tool for the analysis of these binomial splitting processes is the Poisson generating functioñ where {a k } is a given sequence, one distinctive feature being the Poisson heuristic, which predicts that If a n is smooth enough, then a n ∼f (n).
In more precise words, if the sequence {a k } does not grow too fast (usually at most of polynomial growth) or does not fluctuate too violently, then a n is well approximated byf (n) for large n. For example, if f (z) = z m , m = 0, 1, . . . , then a n ∼ n m ; indeed, in such a simple case, a n = n(n − 1) · · · (n − m + 1).
Note that the Poisson heuristic is itself a Tauberian theorem for the Borel mean in essence; an Abelian type theorem can be found in Ramanujan's Notebooks (see [3, p. 58]).
From an elementary viewpoint, such a heuristic is based on the local limit theorem of the Poisson distribution (or essentially Stirling's formula for n!) whenever x = o(n 1/6 ). Since a n is smooth, we then expect that On the other hand, by Cauchy's integral representation, we also have a n = n! 2πi |z|=n z −n−1 e zf (z) dz ≈f (n) n! 2πi |z|=n z −n−1 e z dz =f (n), since the saddle-point z = n of the factor z −n e z is unaltered by the comparatively more smooth functioñ f (z).
The Poisson-Charlier expansion. The latter analytic viewpoint provides an additional advantage of obtaining an expansion by using the Taylor expansion off at z = n, yielding where τ j (n) := n![z n ](z − n) j e z = 0≤ℓ≤j j ℓ (−1) j−ℓ n!n j−ℓ (n − ℓ)! (j = 0, 1, . . . ), and [z n ]φ(z) denotes the coefficient of z n in the Taylor expansion of φ(z). We call such an expansion the Poisson-Charlier expansion since the τ j 's are essentially the Charlier polynomials C j (λ, n) defined by C j (λ, n) := λ −n n![z n ](z − 1) j e λz , so that τ j (n) = n j C j (n, n). For other terms used in the literature, see [28,29]. The first few terms of τ j (n) are given as follows.
Proof. Sincef is entire, we have n≥0 a n n! z n = e zf (z) = e z j≥0f (j) (n) j! (z − n) j , and the lemma follows by absolute convergence.
Two specific examples are worthy of mention here as they speak volume of the difference between identity and asymptotic equivalence. Take first a n = (−1) n . Then the Poisson heuristic fails since (−1) n ∼ e −2n , but, by Lemma 1.1, we have the identity (−1) n = e −2n j≥0 (−2) j j! τ j (n).
See Figure 1 for a plot of the convergence of the series to (−1) n .  Now if a n = 2 n , then 2 n ∼ e n , but we still have 2 n = e n j≥0 τ j (n) j! .
So when is the Poisson-Charlier expansion also an asymptotic expansion for a n , in the sense that dropping all terms with j ≥ 2ℓ introduces an error of orderf (2ℓ) n ℓ (which in typical cases is of orderf (n)n −ℓ )? Many sufficient conditions are thoroughly discussed in [36], although the terms in their expansions are expressed differently; see also [62].
Poissonized mean and variance. The majority of random variables analyzed in the algorithmic literature are at most of polynomial or sub-exponential (such as e c(log n) 2 or e cn 1/2 ) orders, and are smooth enough. Thus the Poisson generating functions of the moments are often entire functions. The use of the Poisson-Charlier expansion is then straightforward, and in many situations it remains to justify the asymptotic nature of the expansion.
For convenience of discussion, letf m (z) denote the Poisson generating function of the m-th moment of the random variable in question, say X n . Then by Lemma 1.1, we have the identity and for the second moment provided only that the two Poisson generating functionsf 1 andf 2 are entire functions. These identities suggest that a good approximation to the variance of X n be given by which holds true for many cost measures, where we can indeed replace the imprecise, approximately equal symbol "≈" by the more precise, asymptotically equivalent symbol "∼". However, for a large class of problems for which the variance is essentially linear, meaning roughly that the Poissonized variancef 2 (n) −f 1 (n) 2 is not asymptotically equivalent to the variance. This is the case for the total cost of constructing random digital search trees, for example. One technical reason is that there are additional cancellations produced by dominant terms. The next question is then: can we find a better normalized function so that the variance is asymptotically equivalent to its value at n?
Poissonized variance with correction. The crucial step of our approach that is needed when the variance is essentially linear is to considerṼ and it then turns out that in all cases we consider for some c ≥ 0. The asymptotics of the variance is then reduced to that ofṼ (z) for large z, which satisfies, up to non-homogeneous terms, the same type of equation asf 1 (z). Thus the same tools used for analyzing the mean can be applied toṼ (z).
To see how the last correction term zf ′ 1 (z) 2 appears, we writeD(z) :=f 2 (z) −f 1 (z) 2 , so thatf 2 (z) = D(z) +f 1 (z) 2 , and we obtain, by substituting this into (2), Now takef 1 (n) ≍ n log n. Then the first term followingD(n) is generally not smaller thanD(n) because nf ′ 1 (n) 2 ≍ n(log n) 2 , whileD(n) ≍ n(log n) 2 , at least for the examples we discuss in this paper. Note that the variance is in such a case either of order n log n or of order n. Thus to get an asymptotically equivalent approximation to the variance, we need at least an additional correction term, which is exactly nf ′ 1 (n) 2 . The correction term nf ′ 1 (n) 2 already appeared in many early papers by Jacquet and Régnier (see [34]).
A viewpoint from the asymptotics of the characteristic function. Most binomial recurrences of the form as arising from the binomial splitting processes discussed above are asymptotically normally distributed, a property partly ascribable to the highly regular behavior of the binomial distribution. Here the (X * n ) are independent copies of the (X n ) and the random or deterministic non-homogeneous part T n is often called the "toll-function," measuring the cost used to "conquer" the two subproblems. Such recurrences have been extensively studied in numerous papers; see [36,52,58,59] and the references therein.
The correction term we introduced in (4) for Poissonized variance also appears naturally in the following heuristic, formal analysis, which can be justified when more properties are available. By definition and formal expansion Observe that with z = ne it , we have the local expansion for small t. It follows that by extending the integral to ±∞ and by completing the square. This again shows that nf ′ 1 (n) 2 is the right correction term for the variance. For more precise analysis of this type, see [36].
A comparison of different approaches to the asymptotic variance. What are the advantages of the Poissonized variance with correction? In the literature, a few different approaches have been adopted for computing the asymptotics of the variance of the binomial splitting processes.
• Second moment approach: this is the most straightforward means and consists of first deriving asymptotic expansions of sufficient length for the expected value and for the second moment, then considering the difference E(X 2 n ) − (E(X n )) 2 , and identifying the lead terms after cancellations of dominant terms in both expansions. This approach is often computationally heavy as many terms have to be cancelled; additional complication arises from fluctuating terms, rendering the resulting expressions more messy. See below for more references.
• Poissonized variance: the asymptotics of the variance is carried out through that ofD(n) =f 2 (n) − f 1 (n) 2 . The difference between this approach and the previous one is that no asymptotics off 2 (n) is derived or needed, and one always focuses directly on considering the equation (functional or differential) satisfied byD(z). As we discussed above, this does not give in many cases an asymptotically equivalent estimate for the variance, because additional cancellations have to be further taken into account; see for instance [34,35,36].
• Characteristic function approach: similar to the formal calculations we carried out above, this approach tries to derive a more precise asymptotic approximation to the characteristic function using, say complex-analytic tools, and then to identify the right normalizing term as the variance; see the survey [36] and the papers cited there.
• Schachinger's differencing approach: a delicate, mostly elementary approach based on the recurrence satisfied by the variance was proposed in [58] (see also [59]). His approach is applicable to very general "toll-functions" T n in (5) but at the price of less precise expressions.
The approach we use is similar to the Poissonized variance one but the difference is that the passage throughD(z) is completely avoided and we focus directly on equations satisfied byṼ (z) (defined in (4)).
In contrast to Schachinger's approach, our approach, after starting from definingṼ (z), is mostly analytic. It yields then more precise expansions, but more properties of T n have to be known. The contrast here between elementary and analytic approaches is thus typical; see, for example, [7,8]. See also Appendix for a brief sketch of the asymptotic linearity of the variance by elementary arguments.
Additional advantages that our approach offer include comparatively simpler forms for the resulting expressions, including Fourier series expansions, and general applicability (coupling with the introduction of several new techniques).
Organization of this paper. This paper is organized as follows. We start with the variance of the total path-length of random digital search trees in the next section, which was our motivating example. We then extend the consideration to bucket DSTs for which two different notions of total path-length are distinguished, which result in very different asymptotic behaviors. The application of our approach to several other shape parameters are discussed in Section 4. Table 1 summarizes the diverse behaviors exhibited by the means and the variances of the shape parameters we consider in this paper.
Shape parameters mean variance Internal PL n log n n Key-wise PL * n log n n Node-wise PL * n log n n(log n) 2 Peripheral PL n n #(leaves) n n Differential PL n n log n Weighted PL n(log n) m+1 n Applications of the approach we develop here to other classes of trees and structures, including tries, Patricia tries, bucket sort, contention resolution algorithms, etc., will be investigated in a future paper.

Digital Search Trees
We start in this section with a brief description of digital search trees (DSTs), list major shape parameters studied in the literature, and then focus on the total path-length. The approach we develop is also very useful for other linear shape measures, which is discussed in a more systematic form in the following sections.

DSTs
DSTs were first introduced by Coffman and Eve in [9] in the early 1970's under the name of sequence hash trees. They can be regarded as the bit-version of binary search trees (thus the name); see [44, p. 496 et seq.]. Given a sequence of binary strings, we place the first in the root node; those starting with "0" ("1") are directed to the left (right) subtree of the root, and are constructed recursively by the same procedure but with the removal of their first bits when comparisons are made. See Figure 2 for an illustration.
While the practical usefulness of digital search trees is limited, they represent one of the simplest, fundamental, prototype models for divide-and-conquer algorithms using coin-tossing or similar random devices. Of notable interest is its close connection to the analysis of Lempel-Ziv compression scheme that has found widespread incorporation into numerous softwares. Furthermore, the mathematical analysis is often challenging and leads to intriguing phenomena. Also the splitting mechanism of DSTs appeared naturally in a few problems in other areas; some of these are mentioned in the last section.
Random digital search trees. The simplest random model we discuss in this paper is the independent, Bernoulli model. In this model, we are given a sequence of n independent and identically distributed random variables, each comprising an infinity sequence of Bernoulli random variables with mean p, 0 < p < 1. The DST constructed from the given random sequence of binary strings is called a random DST. If p = 1/2, the DST is said to be symmetric; otherwise, it is asymmetric. We focus on symmetric DSTs in this paper for simplicity; extension to asymmetric DSTs is possible but much harder.
Stochastic properties of many shape characteristics of random DSTs are known. Almost all of them fall into one of the two categories, according to their growth order being logarithmic or essentially linear (in the sense of (3)), which we simply refer to as "log shape measures" and "linear shape measures".
Log shape measures. The two major parameters studied in this category are depth, which is the distance of the root to a randomly chosen node in the tree (each with the same probability), and height, which counts the number of nodes from the root to one of the longest paths. Both are of logarithmic order in mean. Depth provides a good indication of the typical cost needed when inserting a new key in the tree, while height measures the worst possible cost that may be needed.
Linear shape measures. These include the total internal path-length, which sums the distance between the root and every node, and the occurrences of a given pattern (leaves or nodes satisfying certain properties); see [24,26,30,31,35,40,42,44].
The profile contains generally much more information than most other shape measures, and it can to some extent be regarded as a good bridge connecting log and linear measures; see [15,17,45,46] for known properties concerning expected profile of random DSTs.
Nodes of random DSTs with p = 1/2 are distributed in an extremely regular way, as shown in Figures 3 and 4.

Known and new results for the total internal path-length
Throughout this section, we focus on X n , the total path length of a random digital search tree built from n binary strings. By definition and by our random assumption, X n can be computed recursively by with the initial condition X 0 = 0, since removing the root results in a decrease of n for the total path length (each internal node below the root contributes 1). Here B n ∼ Binomial(n; 1/2), X n d = X * n , and X n , X * n , B n are independent.    Known results. It is known that (see [26,30,57]) E(X n ) = (n + 1) log 2 n + n γ − 1 log 2 where γ denotes Euler's constant, c 1 := k≥1 (2 k − 1) −1 , and ̟ 1 (t), ̟ 2 (t) are 1-periodic functions with zero mean whose Fourier expansions are given by (χ k := 2kπi/L, L := log 2) respectively. Here Γ denotes the Gamma function. Thus we see roughly that random digital search trees under the unbiased Bernoulli model are highly balanced in shape. An important feature of the periodic functions is that they are marked by very small amplitudes of fluctuation: |̟ 1 (t)| ≤ 3.4 × 10 −8 and |̟ 2 (t)| ≤ 3.4 × 10 −6 . Such a quasi-flat (or smooth) behavior may in practice be very likely to lead to wrong conclusions as they are hardly visible from simulations of moderate sample sizes. V(X n )/n E(X n )/(n + 1) − log 2 n Figure 5: A plot of E(X n )/(n + 1) − log 2 n in log-scale (the decreasing curve using the y-axis on the right-hand side), and that of V(X n )/n in log-scale (the increasing curve using the y-axis on the left-hand side). Let In particular, Q(1) = Q ∞ . The variance was computed in [42] by a direct second-moment approach and the result is where ̟ kps (t) is again a 1-periodic, zero-mean function and the mean value C kps is given by (L := log 2) .
Here [̟ 1 ̟ 2 ] 0 denotes the mean value of the function ̟ 1 (t)̟ 0 (t) over the unit interval. The long expression obviously shows the complexity of the asymptotic problem. We show that this long expression can be largely simplified. Before stating our result, we mention that the asymptotic normality of X n (in the sense of convergence in distribution) was first proved in [35] by a complex-analytic approach; for other approaches, see [59] (martingale difference), [31] (method of moments), [52] (contraction method).
A new asymptotic approximation to V(X n ). Define where for 0 < ℜ(ω) < 3 and x > 0 which, by the relation can be represented as The last expression provides indeed a meromorphic continuation of ϕ(ω; x) into the whole complex ωplane whenever x > 0. In particular, Theorem 2.1. The variance of the total path-length of random DSTs of n nodes satisfies where and ̟ kps has the Fourier series expansion which is absolutely convergent.
One can derive more precise asymptotic expansions for V(X n ) by the same approach we use. We content ourselves with (11) for convenience of presentation.
Sketch of our approach. Following the discussions in Introduction, we first prove that the Poisson-Charlier expansion for the mean and that for the second moment are not only identities but also asymptotic expansions. For that purpose, it proves very useful to introduce the following notion, which we term JSadmissible functions (following the survey paper [35]). This is reminiscent of the classical H-admissible (due to Hayman) or HS-admissible (due to Harris-Schoenfeld) functions; see [28,§VIII.5].
Once we prove the asymptotic nature of the Poisson-Charlier expansions for the mean and the second moment, it remains, according again to the discussions in Introduction, to derive more precise asymptotics for the functionṼ (as defined in (4)), for which we will use first the Laplace transforms, normalize the Laplace transform properly, and then apply the Mellin transform. Such an approach will turn out to be very effective and readily applicable to more general cases such as bucket DSTs, which is discussed in details in the next section. The approach parallels closely in essence that introduced by Flajolet and Richmond in [24], which starts from the ordinary generating function, followed by an Euler transform, a proper normalization and the Mellin transform, and then conclude by singularity analysis; see also [10]. The path we take, however, offers additional operational advantages, as will be clear later. See Figure 7 for a diagrammatic illustration of the two analytic approaches.

Analytic de-Poissonization and JS-admissibility
The fundamental differential-functional equations for the analysis of random DSTs is of the form with suitably given initial value f (0) andg. For such functions, it turns out that the asymptotic nature of the Poisson-Charlier expansions for the coefficients (or de-Poissonization) can be justified in a rather systematic way by the introduction of the notion of JS-admissible functions.
Here and throughout this paper, the generic symbol ε ∈ (0, 1) always represents an arbitrarily small constant whose value is immaterial and may differ from one occurrence to another.

Definition 1.
An entire functionf is said to be JS-admissible, denoted byf ∈ JS , if the following two conditions hold for |z| ≥ 1.
For convenience, we also writef ∈ JS α,β to indicate the growth order off inside the sector | arg(z)| ≤ ε.

z). Then the Poisson-Charlier expansion (1) of f (n) (0) is also an asymptotic expansion in the sense that
Proof. (Sketch) Starting from Cauchy's integral formula for the coefficients, the lemma follows from a standard application of the saddle-point method. Roughly, condition (O) guarantees that the integral over the circle with radius n and argument satisfying ε ≤ | arg(z)| ≤ π is negligible, while condition (I) implies smooth estimates for all derivatives (and thus error terms).
The polynomial growth of condition (I) is sufficient for all our uses; see [36] for more general versions. The real advantage of introducing admissibility is that it opens the possibility of developing closure properties as we now discuss.
(iv) Iff ∈ JS , then the productPf ∈ JS , whereP is a polynomial of z.
Proof. Straightforward and omitted.
Specific to our need for the analysis of DSTs is the following transfer principle.
Consequently, since f (0) = 0, Now define Then, by (14), we have Thus if we choose m = ⌈log 2 r⌉ such that 2 m ≥ r and iterate m times the functional equation, then we obtain the estimateK

which establishes condition (O).
Our proof forf satisfying (I) proceeds in a similar manner and starts again from (14) but of the form where C = 2/ cos ε > 2. The same majorization argument used above for (O) then leads tõ This proves (I) forf . The necessity part follows trivially from Lemma 2.3.
The estimates we derived of asymptotic-transfer type are indeed over-pessimistic when 1 ≤ α ≤ log 2 C, but they are sufficient for our use. The true orders are those with ε → 0, which can be proved by the Laplace-Mellin-de-Poissonization approach we use later. Lemma 2.3 and Proposition 2.4 provide very effective tools for justifying the de-Poissonization of functions satisfying the equation (13), which is often carried out through the use of the increasing-domain argument (see [36]). The latter argument is also inductive in nature and similar to the one we are developing here, although it is less "mechanical" and less systematic.

Generating functions and integral transforms
Since our approach is purely analytic and relies heavily on generating functions, we first derive in this subsection the differential-functional equations we will be working on later. Then we apply the de-Poissonization tools we developed to the Poisson generating functions of the mean and the second moment and justify the asymptotic nature of the corresponding Poisson-Charlier expansions. Then we sketch the asymptotic tools we will follow based on the Laplace and Mellin transforms.
Generating functions. In terms of the moment generating function M n (y) := E(e Xny ), the recurrence (6) translates into with M 0 (y) = 1. Now consider the bivariate exponential generating function Then by (15), , and the Poisson generating functionF (z, y) := e −z F (z, y) satisfies the differential-functional equatioñ withF (0, y) = 1. No exact solution of such a nonlinear differential equation is available; see [35] for an asymptotic approximation toF for y near unity.

Proposition 2.5. The Poisson-Charlier expansion for the mean and that for the second moment are both asymptotic expansions
Proof. (Sketch) By Lemma 2.3 and Proposition 2.4, we see that bothf 1 ,f 2 ∈ JS , and thus we can apply Proposition 2.2. Indeed the proof of Proposition 2.4 provides already crude bounds for the growth order off 1 ,f 2 . The more precise estimatesf 1 (z) ≍ |z|| log z| andf 2 (z) ≍ |z| 2 | log z| 2 for z inside the sector {z : | arg(z)| ≤ ε} will be provided later in the next two subsections.
An asymptotic approach based on Laplace and Mellin transforms. Once the de-Poissonization steps are justified, all that remains for the proof of Theorem 2.1 is to derive more precise asymptotic approximations tof 1 andṼ (as defined in (4)). The approach we use begins with a more precise characterization off 1 (z). Bothf 1 andṼ satisfy a differential-functional equation of the form with the initial conditionf (0) = 0. To derive the asymptotics off for large complex z, we proceed along the following principal steps; see also [10].

Laplace transform:
The Laplace transform off satisfies which exists and defines an analytic function ifg grows at most polynomially for large |z|.
Inverting the process. We first derive the local behavior ofL [f ; s] for small s by the Mellin inversion (often by calculus of residues after justification of analytic properties), and then the asymptotic behavior off (z) for large z is derived by the Laplace inversion, similar to singularity analysis.

Expected internal path-length of random DSTs
We consider in details in this subsection the expected value µ n := E(X n ) of the total internal path-length, paving the way for the asymptotic analysis of the variance. Starting from either the equation (17) or the recurrence with µ 0 := 0, there are several approaches to the asymptotics of µ n . We will briefly describe the one using integral representation of finite differences (or Rice's integrals) and then present the Laplace and Mellin transforms we will use, which, as will become clear, is essentially the Flajolet-Richmond approach (see [24]).

Rice's integral representation.
By (17), we have, withμ n := n![z n ]f 1 (z), withμ 0 = 0, which by iteration yields Thus by Rice's formula ( [27]) where the integration path (c) is along the vertical line with real part equal to c and Q is defined in (9).
We then obtain (7) by standard arguments; see [26] or [50] for details. This approach readily gives the approximation (7) for the mean and can be refined to obtain a full asymptotic expansion. However, its extension to the variance becomes extremely messy, as shown in [42].
Laplace transform. We first show that the asymptotics off 1 (z) can be derived through a direct use of the Laplace and Mellin transforms, which relies on several ad hoc steps that are not easily extended. A more general procedure will be developed below.
By (17), we see that the Laplace transform of f 1 (z) satisfies the functional equation which exists and is analytic in C \ (−∞, 0]. By dividing both sides by s + 1 and by iteration, we get . On the other hand, from (20), we have This implies the identity However, neither form is useful for our asymptotic purpose. Now by partial fraction expansion, we obtain Thus Note that j≥0 2 j s (s + 1) · · · (2 j s + 1) = 1.

By the Euler identity
we see that

This gives
and thenf Consequently, Asymptotically, we have, by (23) and the identity the Mellin integral representatioñ from which we derive the asymptotic approximatioñ uniformly for |z| → ∞ and | arg(z)| ≤ π/2−ε, where ̟ 1 is given in (8). (As usual, we use the asymptotic estimate (12) for the Gamma function.) Laplace and Mellin transforms. We now re-do the analysis forf 1 (z) in a more general way that can be easily extended to other cases. We again start from (21) and considerL where Q(z) is defined in (9). Dividing both sides of (21) by Q(−2s) yields We now apply the Mellin transform. Note that we have, by the fact that X 0 = X 1 = 0 and the proof of Proposition 2.4,f Then On the other hand, by the Mellin transform, uniformly for |s| → ∞ and | arg(s)| ≤ π − ε, where χ k := 2kπi/ log 2, q 0 = log 2 12 + π 2 6 log 2 and q k = 1 2k sinh(2kπ/ log 2) (k = 0).

Inverse Mellin and inverse Laplace transforms.
We can now apply successively the inverse Mellin and then Laplace transforms to derive the asymptotics off 1 (z). Observe that G 1 (ω) has a simple pole at ω = 2. By (28) or Proposition 5 in [22], we obtain for large |t| and c ∈ R. Then by the calculus of residues, uniformly for |s| → 0 and | arg(s)| ≤ π − ε. Using the expansion we see that Finally, we consider the inverse Laplace transform. The following simple result is very useful for our purposes.
Proof. LetL (s) = L [f ; s]. Then by the inverse Laplace transform, where H is the Hankel contour consisting of the two rays te ±iε ± i/|z|, −∞ < t ≤ 0 and the semicircle exp(iϕ)/|z|, −π/2 ≤ ϕ ≤ π/2; see Figure 6. Assume from now on |z| is sufficiently large and lies in the sector with | arg(z)| ≤ π/2 − ε. We prove only the O-case, the other two cases being similar. For simplicity, we consider only the case m = 0, the other cases being easily extended.
We split the above integral along H into two parts where H > comprises the two rays te iε ± i/|z|, −∞ < t ≤ −T with T > 1 a fixed constant and H ⊃ represents the remaining contour.
The integral along H > is easily estimated the O-term holding uniformly for |z| → ∞ provided that | arg(z)| + ε < π/2, where c > 0 is a suitable constant.
For the second integral, we use (29). Then the integral along the semicircle is bounded as follows.
Note that the inverse Laplace transform of s −2 log(1/s) is z log z − (1 − γ)z. This, together with a combined use of Proposition 2.6, leads to (25).
The justification of the estimate (30) is easily performed by using the relation (31) below.
The Flajolet-Richmond approach [24]. Instead of the Poisson generating function, this approach starts from the ordinary generating function A(z) := n µ n z n .
Then invert the process by considering first the Mellin inversion, deriving asymptotics of as s → 0 in C. Then deduce asymptotics of as z → 1. Finally, apply singularity analysis (see [23]) to conclude the asymptotics of µ n . The crucial reason why the two approaches are identical at certain steps is that the Laplace transform of a Poisson generating function is essentially equal to the Euler transform of an ordinary generating function; or formally, ∞ 0 e −sz n≥0 a n n! z n dz = n≥0 a n (s + 1) −n−1 Thus the simple result in Proposition 2.6 closely parallels that in singularity analysis. While identical at certain steps, the two approaches diverge in their final treatment of the coefficients, and the distinction here is typically that between the saddle-point method and the singularity analysis, a situation reminiscent of the use before and after Lagrange's inversion formula; see for instance [28]. The relation (31) implies that the order estimate (30) for the Laplace transform at infinity can be easily justified for all the generating functions we consider in this paper since A(0) = 0, implying that This comparison also suggests the possibility of developing de-Poissonization tools by singularity analysis, which will be investigated in details elsewhere.

Variance of the internal path-length
In this section, we apply the Laplace-Mellin-de-Poissonization approach to the Poissonized variance with correctionṼ (z) :=f 2 (z) −f 1 (z) 2 − zf ′ 1 (z) 2 , aiming at proving Theorem 2.1. The starting point of focusing onṼ instead of onf 2 removes all heavy cancellations involved when dealing with the variance, a key step differing from all previous approaches.
Laplace and Mellin transform. The following lemma will be useful.
We now apply the Laplace transform to both sides of (32). First, observe that the Laplace transform of V (z) exists and is analytic in C \ (−∞, 0]. Then, by (32), ; By (23), we have Substituting this and the partial fraction expansion into (35), we obtain (10).
Finally, standard Laplace inversion gives uniformly for |z| → ∞ and | arg(z)| ≤ π/2 − ε. Sincef 2 (z) =Ṽ (z) +f 1 (z) 2 + zf ′ 1 (z) 2 , we see from (36) and (25) that This proves Proposition 2.5 and Theorem 2.1 by straightforward expansion. More refined calculations give the two terms followingṼ (n) being both O(1) and periodic in nature. It is possible to further extend the same idea and derive a full asymptotic expansion, which has also its identity nature; details will be presented in a future paper.

Bucket Digital Search Trees
In this section, we extend the same approach to bucket digital search trees (b-DSTs) in which each node can hold up to b keys. The construction rule is the same as DSTs, except that keys keep staying in a node as long as its capacity remains less than b; see Figure 8 for a simple example with b = 2. DSTs correspond to b = 1.
Note that when b ≥ 2 we can distinguish two different types of total path-length: the total path-length of all keys (summing the distance between each key to the root over all keys), which will be referred to as the total key-wise path-length (KPL) and the total path-length of all nodes (summing the distance between each node to the root over all nodes, regardless of the number of keys in each node), referred to as the total node-wise path-length (NPL). When b = 1 the two total path-lengths coincide. For simplicity, we will use KPL and NPL, dropping the collective adjective "total". While the expected values of both TPLs are of order n log n under the same independent Bernoulli model, their variances surprisingly turn out to exhibit very different behavior; see Table 1.

Key-wise path-length (KPL)
We assume the same independent Bernoulli model for the input strings. Let X n denote the KPL in a random b-DST built from n random stings. Then by definition and the independence model assumption with the initial conditions X 0 = · · · = X b−1 = 0. Here B n ∼ Binomial(n, 1/2), X n d = X * n , and X n , X * n , B n are independent.

Known and new results.
Hubalek [30] showed, by the Flajolet-Richmond approach, that the mean satisfies E(X n ) = (n + b) log 2 n + n (c 2 + ̟ 3 (log 2 n)) + c 3 + ̟ 4 (log 2 n) + O n −1 log n , where c 2 , c 3 are effectively computable constants and ̟ 3 and ̟ 4 are very smooth periodic functions. He also proved that the variance is asymptotically linear where C h is expressed in terms of a very long, involved expression and ̟ h is a periodic function. We improve this estimate by deriving a much simpler expression for the periodic function, including its average value C h . To state our result, we define the following functions. Let It is easily seen thatg(z) is of the form whereg i 1 ,i 2 ,g ′ i 1 ,i 2 ≥ 0 are given explicitly bỹ both coefficients being symmetric in i 1 and i 2 . Define which is well-defined for ℜ(ω) > 0, as we will see later.

Theorem 3.1. The variance of the total key-wise path-length of random b-DSTs of n strings satisfies
where By straightforward truncations, expansions and approximations, we obtain the following numerical values for b = 1, . . . , 5.
withF (0, y) = 1. From this form, the asymptotic analysis of the mean value and that of the variance proceed along exactly the same line we developed in the previous section. Thus we briefly sketch the principal steps of the analysis, leaving the details to the interested reader.
The expected value of X n . From (41), we derive the following differential-functional equation for the Poisson generating function of the mean with the initial conditionsf (j) 1 (0) = 0 for 0 ≤ j < b. Before applying the Laplace-Mellin approach, we need first a transfer-type result similar to Proposition 2.4.
Proof. (Sketch) The same proof as that for Proposition 2.4 applies mutatis mutandis to (42). The only difference is that we now have where f (z) := e zf (z) and g(z) := e zg (z), so that (14) has the extended representation All required estimates can be derived by the same arguments used there.
The Laplace transform off 1 now satisfies the functional equation for ℜ(s) > 0. From this equation, we obtain which extends (22). From this series and partial fraction expansions, we can derive a close-form expression forf 1 (z), which becomes messy especially for large b. Define as beforeL This relation is almost the same as (26). Thus the same Mellin analysis given there carries over and we deduce that Consequently, by the Laplace inversion, uniformly for |z| → ∞ and | arg(z)| ≤ π/2 − ε. From this and Propositions 3.2 and 2.2, we obtain for any k = 1, 2, . . . . Finally, Variance of X n . The analysis here is again similar to that for the mean. Letf 2 (z) denote the Poisson generating function of the second moment E(X 2 n ). Then, by (41), with the first b Taylor coefficients zero. Define agaiñ whereg(z) is given in (38). By the representations (39) and (43), we havẽ uniformly in the sector | arg(z)| ≤ π/2 − ε. This is similar to the corresponding estimate (34) in the analysis of the variance in the previous section. The same procedure there applies and we deduce (40).

Node-wise path-length (NPL)
We consider in this section the total node-wise path-length (NPL). Under the same independent Bernoulli model, we still use X n to denote the NPL in a random b-DST of n binary strings with node capacity b ≥ 2. Also let N n stand for the total number of nodes (space requirement) in random b-DST of n strings. Despite its being one of the most natural shape measures for b-DSTs, the consideration of X n here seems to be new. For N n , it is known that the distribution is asymptotically normal with the mean and the variance both asymptotically n times a different smooth periodic function; see [31]. In contrast to (40) for the variance of KPL, what is unexpected and surprising here is that the variance of X n is of order n(log n) 2 . Theorem 3.3. Assume b ≥ 2. The mean of N n and that of X n satisfy the following asymptotic relations.
Intuitively, that the variance of NPL is larger than that of KPL can be seen from the definition of NPL, which depends on the random variable N n (see (46)), while on the other hand, KPL depends on n only (in addition to on the two subtrees). The following figure shows the first few values of the variance of NPL and that of KPL.

KPL
We see that the variance of NPL increases faster than that of KPL. Note that the periodic functions of the dominant terms are all equal, implying that the correlation coefficient of N n and X n is asymptotically 1.
On the other hand, the mean value c 1,0 of P 1,0 (t) is given by numerical approximations to c 1,0 for the first few b are given as follows. by (28), which is consistent with the fact that N n ≡ n in this case. When b = 2, we see that about 42.5% of nodes on average contain two keys and 14% of nodes a single key. The storage utilization is thus not very bad.
From (44) and these numerical values, we see that, in contrast to the expected KPL, which is asymptotic to n log 2 n for all b, the expected NPL provides a better indication of the "shape variation" of random b-DSTs.
Our analysis is based on the following straightforward distributional recurrences with the initial conditions N 0 = 0, N 1 = · · · = N b−1 = 1 and X 0 = · · · = X b−1 = 0. Here again B n ∼ Binomial(n, 1/2), N n d = N * n , X n d = X * n and X n , X * n , B n as well as N n , N * n , B n are independent.
Generating functions. Define M n (u, v) = E(e Nnu+Xnv ). Then (46) translates into the recurrence Then the recurrence relation gives with the initial conditionsF (z, For the moments, if we expandF (z, u, v) in terms of u and v, thenf j,m−j (z) is the Poisson generating function of E(N j n X m−j n ). Thus all moments of X n and N n or their products can be computed by taking suitable derivatives of (47) with respect to u and v and then substituting u = v = 0.
Expected number of nodes and expected node-wise path length. By taking first derivatives of (47), we obtain the initial conditions beingf 1,0 (0) = 0,f We can apply the Laplace-Mellin approach as before, starting from the mean of N n . Note that provided that the Laplace transform exists for ℜ(s) > 0. This gives Unlike all previous cases, iterating this functional equation leads to a divergent series. Although this problem can be solved by subtracting a sufficient number of initial terms off 1,0 (z), the approach we use does not rely on this and avoids completely such a consideration.

Digital search trees. II. More shape parameters.
We consider in this section four additional examples on DSTs whose variances are essentially linear. The same tools we use readily apply to b-DSTs, but we focus on DSTs because the results are easier to state and the asymptotic behaviors do not differ in essence with those for the more general b-DSTs the corresponding expressions of which are however much messier.
The first parameter we consider is the so-called w-parameter (see [16]), which is the sum of the subtreesize of the parent-node of each leaf (over all leaves) 3 . Instead of w-parameter, we call it the total peripheral path-length (PPL), since it measures to some extent the fringe ampleness of the trees. Also this is in consistency with the two previous notions of path-length we distinguished.
Then we consider the number of leaves, which has previously been studied in details in [26,31,39] and which is well connected to PPL. Our expression for the variance simplifies known ones.
Yet another notion of path-length we consider here is the so-called Colless index in phylogenetics, which is the sum of the absolute difference of the two subtree-sizes of each node (over all nodes). We call this index the total differential path-length (DPL) as it clearly indicates the balance or symmetry of the tree. Another widely used measure of imbalance in phylogenetics is the Sackin index, which is nothing but the external path-length.
The last example we consider is the weighted path-length (WPL), which often arises in coding, optimization and many related problems.
The orders of the means and the variances exhibited by all the shape parameters we study in this paper are listed in Table 1.

Peripheral path-length (PPL)
The PPL (or w-parameter) was introduced in [16], the motivations arising from the analysis of compression algorithms. We start from the fringe-size of a leaf node λ, which is defined to be the size of the subtree rooted at its parent-node; see Figure 9. The PPL of a tree is then defined to be the sum of the fringe-sizes of all leaf-nodes. Let X n denote the PPL in a DST built from n random binary strings under our usual independent Bernoulli model.  Drmota et al. showed in [16] that where Note that by (24), we have the identities The asymptotic behavior (51) is to be compared with the n log n-order exhibited by most other log-trees such as binary search trees and recursive trees; see [16]. It reflects that most fringes of random DSTs are small in size; see Figure 3. Indeed, since the expected number of leaves is also asymptotic to n times a periodic function, the result (51) implies that the average size of a fringe in random DSTs is bounded. We show that the standard deviation is also small. Defineg where P w (t) is a smooth, 1-periodic function with the Fourier series expansion the series being absolutely convergent.
We provide only the major steps of the proof since it follows the same approach we developed above.
Recurrence and generating functions. By definition and by conditioning on the size of one of the subtrees of the root, we have the following different configurations n − 1 n − 1 n − 2 n − 2 k n − 1 − k from which we derive the recurrence for the PPL with probability 2 2−n ; n + X n−2 , with probability (n − 1)2 2−n ; X k + X * n−1−k , with probability 2 1−n n−1 k , 2 ≤ k ≤ n − 3, where X 0 = X 1 = 0, X 2 = 2 and X 3 has the distribution X 3 = 6, with probability 1/2; 2, with probability 1/2.
From this recurrence, it follows that the bivariate Poisson generating functioñ E(e Xny ) n! z n satisfies the nonlinear equatioñ with the initial conditionF (0, y) = 1.
The expected PPL. By (54), we obtain the differential-functional equation forf 1 (z) by taking derivative with respect to y and then substituting y = 1, giving with f 1 (0) = 0. The Laplace transform off 1 satisfies Then a straightforward application of the Laplace-Mellin-de-Poissonization approach yields E(X n ) = n log 2 k∈Z The O(1)-term can be further refined by the same analysis. In particular, we get an alternative expression for C w That the two expressions of C w are identical can be proved by standard calculus of residues; see [24] for similar details.
Applying again the Laplace-Mellin-de-Poissonization approach, we deduce (53). In particular, the mean value of the periodic function P w is given by

The number of leaves
The leaves of a tree are the locations where the nodes holding new-coming keys will be connected; thus different types of data fields can be used to save memory, notably for b-DSTs. The number of leaves then provides a quick and simpler look at the "fringes" of a tree. Such nodes are sometimes referred to as the external-internal nodes or internal endnodes in the literature; see [16,26,41,56]. Let X n denote the number of leaves in a random DST of n keys. Then X n satisfies the recurrence with X 0 = 0 and X 1 = 1, where B n ∼ Binomial(n; 1/2). Flajolet and Sedgewick [26], solving an open question raised by Knuth, showed that E(X n ) = n (C fs + ̟ fs (log 2 n)) where ̟ fs (t) is a smooth, 1-periodic function and A finer approximation, together with the alternative (and numerically better) expression was derived by Kirschenhofer and Prodinger [39]; see also [56]. They proved additionally the asymptotic linearity of the variance V(X n ) ∼ n (C kp + ̟ kp (log 2 n)) , where ̟ kp is a smooth, 1-periodic function with mean zero and a long, complicated expression is given for the leading constant C kp . We derive different forms for these two asymptotic approximations. Defineg wheref 1 (z) := e −z n≥0 E(X n )z n /n!. Theorem 4.2. The mean and the variance of the number of leaves are both asymptotically linear with the approximations E(X n ) = n log 2 k∈Z where the two series are absolutely convergent with G 1 , G 2 defined by We see in particular that ds, Sketch of proof. From (57), we derive the equation for the bivariate generating functionF (z, y) := e −z n≥0 E(e Xny )z n /n!F withF (0, y) = 1. Then the Poisson generating functions of the first two moments satisfỹ withṼ (0) = 0, whereg 2 is given in (58). The remaining analysis follows the same pattern as above and is omitted.
We provide instead some details for the numerical evaluation of the constant C kp as defined in (59), which is similar to the case of internal path-length of DSTs.
Then by the partial fraction expansion we obtain Obviously, lim ℓ→∞ δ ℓ = 4. Now, by the inverse Laplace transform, which converges for all z; also from [26] we havê with M 0 (y) = 1. Let alsõ Theorem 4.3. The mean and the variance of the DPL of random DSTs satisfy the asymptotic relations V(X n ) = 1 − 2 π n log 2 n + nP d,σ (log 2 n) where P d,µ and P d,σ are explicitly computable, smooth, 1-periodic functions.
These results are to be compared with the known results for random binary search trees for which the DPL has mean of order n log n and variance of order n 2 ; see [4].
Expected DPL. The approach we follow here for deriving the differential-functional equations satisfied by the Poisson generating functions of the first two moments is slightly different from the one we used since the corresponding nonlinear equation for the bivariate generating function F (z, y) := n≥0 M n (y)z n /n! is very involved as given below.
It is known (see [63]) that, as |z| → ∞, the O-term holding uniformly in z in each case. Thus, by (65),h 1 ∈ JS and Thus we can apply the same approach and deduce that This proves (63). Numerically, the mean value of the dominant periodic function is G 1 (2)/ log 2 ≈ 1.33907 46494.

Lemma 4.4. The functionh c is JS-admissible and satisfies
in their incoming order) to the root and w j the weight attached to the j-th node. The calculation of W n in the case of random DSTs can be carried out recursively by assuming that the root is labelled 1. We consider in this section the case when w j = (log j) m , m ≥ 1.
From a technical point of view, it suffices to consider the random variables X n+1 d = X Bn + X * n−Bn + (n + 1)(log(n + 1)) m (n ≥ 0), with X 0 = 0, since the partial sum 2≤j≤n (log j) m is nothing but on whose analytic properties our analytic approach heavily relies. The random variables X n represent the sole example on DSTs we discuss in this paper with nonintegral values; they also exhibit an interesting phenomenon in that the mean is of order n(log n) m+1 but the variance is asymptotic to n times a periodic function, in contrast to the orders of DPL. That the variance is linear is well-predicted by the deep theorem of Schachinger derived in [58] since the second difference of the sequence n(log n) m is o(n −1/2−ε ). Our approach has the advantage of providing more precise approximations.
The new ingredient we need is incorporated in the following lemma. Indeed, the tools developed in [21] can also be easily extended to similar "toll-functions" such as nH m n . Details are left for the interested readers.

Conclusions and extensions
We showed in this paper, through many shape parameters on random DSTs that the crucial use of the normalizationṼ (z) :=f 2 (z) −f 1 (z) 2 − zf ′ (z) 2 at the level of Poisson generating function is extremely helpful in simplifying the asymptotic analysis of the variance as well as the resulting expressions. The same idea can be applied to a large number of concrete problems with a binomial splitting procedure. These and some related topics and extensions will be pursued elsewhere. We briefly mention in this final section some extensions and related properties.
Central limit theorems. All shape parameters we considered in this paper are asymptotically normally distributed in the sense of convergence in distribution. We describe the results in this section and merely indicate the methods of proofs. The only case that requires a separate study is NPL of random b-DSTs with b ≥ 2 (a bivariate consideration of the limit laws is needed), details being given in a future paper.
Theorem 5.1. The internal path-length, the peripheral path-length, the number of leaves, the differential path-length, the weighted path-length of random DSTs, and the key-wise path-length of random b-DSTs with b ≥ 2 are all asymptotically normally distributed where X n denotes any of these shape parameters, d −→ stands for convergence in distribution, and N (0, 1) is a standard normal distribution with zero mean and unit variance.
See Figure 10 for a plot of the histograms of DPL. The method of moments applies to all these cases and establishes the central limit theorems; similar details are given as in [31] (the asymptotic normality of the number of leaves being already proved there as a special case).
In a parallel way, contraction method also works well for all these shape parameters; see [51,52,53]. On the other hand, Schachinger's asymptotic normality results cover the IPL, PPL, number of leaves and WPL, but not PPL and KPL on b-DSTs, although his approach may be modified for that purpose.
Finally, the complex-analytic approach used in [35] for internal path-length may be extended to prove some of these cases, but the proofs are messy, although the results established are often stronger (for example, with convergence rate). The depth. The asymptotic analysis we used in this paper can also be extended to the depth (the distance between a randomly chosen internal node and the root) although it is of logarithmic order. Let X n denote the depth of a random DST of n nodes. The starting point is to consider the expected profile polynomial P n (y) := 0≤k<n nP(X n = k)y k , where nP(X n = k) is nothing but the expected number of internal nodes at distance k to the root. Then we have the recurrence P n+1 (y) = 1 + y2 −n 0≤k≤n n k (P k (y) + P n−k (y)) (n ≥ 0), for |y − 1| ≤ ε. More precisely, if t ∈ C lies in a small neighborhood of the origin, then E(e Xnt ) = P n (e t ) n = (e t − 1)Q(e t ) Q(1) log 2 k∈Z Γ −1 − t log 2 − χ k n t/ log 2 +χ k 1 + O n −1 + O(n −1 ), uniformly for |t| ≤ ε. Alternatively, one can also apply the Laplace-Mellin-de-Poissonization approach and obtain the same type of result for not only DSTs but also for more general b-DSTs. See [48,49] for a more general and detailed treatment (by a different approach). The estimate (68) leads to effective asymptotic estimates for all moments of X n − log 2 n by standard arguments; see [32]. In particular, we obtain E(X n ) = log 2 n + γ − 1 log 2 + 1 2 − k≥1 1 2 k − 1 + ̟ 1 (log 2 n) + O n −1 log n , V(X n ) = 1 12 + 1 (log 2) 2 1 + π 2 6 − k≥1 2 k (2 k − 1) 2 + ̟ 5 (log 2 n) + O n −1 log 2 n , where the estimate for the mean is exactly (7) with ̟ 1 given in (8) and ̟ 5 is a smooth periodic function.
An analytic extension. From a purely analytic viewpoint, the underlying differential-functional equation (13) for the moments can be extended to an equation of the form 0≤j≤b b j f (j) (z) = αf z β +g(z) (α > 0; β > 1), for which our approach still applies, leading to the functional equation for the Laplace transform , and the corresponding Laplace-Mellin asymptotic analysis is similar.
In particular, the case when α = β = m corresponds to a straightforward extension of binary DSTs to m-ary DSTS (and the binary unbiased Bernoulli random variable to the uniform distribution over {0, 1, . . . , m−1}). The stochastic behaviors of all shape parameters on such trees follow the same patterns as showed in this paper.
Yet another concrete instance arises in the so-called Eden model studied by Dean and Majumdar [10], which corresponds to α = m and β > 1. The model is constructed in the following way. We start at time t = 0 at which we have an empty node. Then at time t = T , where T ∼ Exponential(1), we fill the empty node and attach to it m different empty nodes. The process then continues independently for each empty node by the following recursive rule. Once an empty node of depth j is attached to a tree at time t = t ′ , it is then filled at time point t ′ + T , where T ∼ E(β j ), and m new empty nodes are attached to it.
The mean and the variance of the number of filled nodes at a large time of such trees are studied in details in [10]. Since the model is continuous, there is no need to de-Poissonize to derive the asymptotics of the coefficient; as a consequence, no correction term as we used in this paper is required for the asymptotics of the variance.
Other DST-type recurrences. While the technique of Poissonized variance with correction remains useful for the natural case when the Bernoulli random variable is no longer symmetric, the Laplace-Mellin approach does not apply directly. Other asymptotic ingredients are needed such as a direct manipulation of the Mellin transforms; see [49] and the references therein.
DST-type structures and recurrences also arise in other statistical physical models such as the diffusionlimited aggregation; see [1,5].
In most cases, we have the estimate µ k =f 1 (k) + O(k ε ). This, together with the Gaussian approximation of the binomial distribution, implies that u n ≈ |k−n/2|=o(n 2/3 ) k=n/2+x √ n/2 π n,k f 1 But then (see (13) below) The order of the difference E(T n ) −h 1 (n) ≈ n|h ′′ 1 (n)| are expected to be small, roughly O(n ε ) in all cases we consider here. Consequently, the variance is asymptotically linear; see [31,58] for more precise details.
We see clearly that the smallness of the variance results naturally from the high concentration of the binomial distribution near its mean.