Persisting randomness in randomly growing discrete structures: graphs and search trees

The successive discrete structures generated by a sequential algorithm from random input constitute a Markov chain that may exhibit long term dependence on its first few input values. Using examples from random graph theory and search algorithms we show how such persistence of randomness can be detected and quantified with techniques from discrete potential theory. We also show that this approach can be used to obtain strong limit theorems in cases where previously only distributional convergence was known.


Introduction
Given a sequence t 1 , t 2 , . . . of input values, a sequential algorithm produces an output sequence x 1 , x 2 , . . ., where the next output x n+1 depends on the current state x n and the next input t n+1 only. For cases where x n is a discrete structure, such as a permutation or a graph, and where the input values are realizations of independent random variables with the same distribution, the output sequence is a Markov chain X = (X n ) n∈N that is adapted to a combinatorial family F in the sense that X n takes its values in the subset F n ⊂ F of objects with base parameter n. Markov chains of this type often exhibit persisting randomness: Informally, this means that the influence of early values does not disappear as time goes by; formally, it means that the tail σ-field T (X) associated with X is not trivial. Further, such chains eventually leave every fixed finite subset of F with probability 1, which leads to the related problem of finding a state space completion that captures the information contained in T (X).
The classical example is the Pólya urn: Initially, at time n = 1, the urn contains one red and one blue ball. At time n = 2, 3, . . ., a ball is selected uniformly at random and put back, together with another ball of the same colour. Here F n is the set of the pairs (i, j) ∈ N 0 × N 0 with i + j + 1 = n, where i and j are the number of balls of the two colours added up to time n ∈ N. A suitable augmentation of the state space F = N 0 × N 0 is obtained by regarding a sequence ((i n , j n )) n∈N with i n + j n → ∞ as convergent if and only if (i n + 1)/(i n + j n + 2) (the proportion of red balls) tends to a value α ∈ [0, 1] as n → ∞, which leads to a state space completionF that may be represented by F ∪ [0, 1]. With respect to this convergence we have X n → X ∞ almost surely, where X ∞ generates the (non-trivial) tail σ-field T (X).
Discrete potential theory provides a general method for the construction of such state space boundaries. This was initiated in a fundamental paper by Doob (1959) and has been applied to Pólya urns by Blackwell and Kendall (1964). A recent textbook treatment is given in Woess (2009); see also the survey by Sawyer (1997). Many authors have used boundary theory for the analysis of random walks on discrete structures; see Kaȋmanovich and Vershik (1983) for a very influential review, and the more recent monograph by Woess (2000).
In the present paper we regard the discrete structures themselves as states of a stochastic process. Some standard search algorithms have recently been investigated from this point of view in Evans et al. (2012), and the results have been used in Grübel (2014) to prove strong limit theorems for functionals of the output sequence, such as the path length or Wiener index of search trees, that have attracted the attention of many researchers. We address the phenomenon of randomness persistence with these tools, specifically in connection with some popular models for random graphs, and for search algorithms. Further, we show that the method can be used to obtain almost sure convergence for the structures themselves or for functionals of the structures in cases where previously only convergence in distribution was known.
In the next section we provide some background on discrete potential theory and the Doob-Martin compactification. In order to keep this short we restrict ourselves to combinatorial Markov chains, where the state space is graded by the time parameter. Section 3 gives some elementary examples from random graph theory; the present author is not aware of any previous use of the Doob-Martin compactification in connection with (general) graph limits. In Section 4 we consider search trees, where we can build on the work of Evans et al. (2012) and Grübel (2014). In a final section we collect some comments on related work and provide further pointers to the literature.
We hope that such results contribute to the theoretical understanding of the growth models and algorithms. From an entirely practical point of view persisting randomness should be of interest as a strong dependence on the first few input values may be an entirely unwelcome aspect of an algorithm that the practitioner may have to address, for example by an additional randomization step.

An ultrashort summary of Markov chain boundary theory
A Markov chain is a sequence X = (X n ) n∈N of random variables that take their values in some countable set F, the state space, such that the Markov property holds, for all n ∈ N, x 1 , . . . , x n+1 ∈ F. In the cases we are interested in there will be a canonical state e ∈ F with X 1 = e, and the transitions are homogeneous in time, which means that for some function p : F × F → [0, 1], P (X n+1 = y|X n = x) = p(x, y) for all x, y ∈ F, n ∈ N.
These are the transition probabilities; together with the starting point e they determine the distribution of the stochastic process X. We also assume that In words: Every state has a chance to be visited-the chain is weakly irreducible. Boundary theory provides an approach to the asymptotics of chains that 'leave the state space' in the sense that lim n→∞ P (X n ∈ S) = 0 for every finite set S ⊂ F. It gives the 'right' extension (completion, compactification)F of the state space, in the sense that X n → X ∞ as n → ∞ with probability 1 for some random variable X ∞ with values in the boundary ∂F of F inF, and that In words: The limit generates the tail σ-field of the process, up to null sets. For property (4) we assume that X has the space-time property, by which we mean that each state can be visited at one particular point in time only. The combinatorial Markov chains in Section 1 are such space-time processes. This feat is achieved by the Doob-Martin compactification, where we regard a sequence (y n ) n∈N ⊂ F as convergent if the conditional probabilities P (X 1 = x 1 , . . . , X m = x m |X n = y n ) converge as n → ∞ for all fixed m ∈ N, x 1 , . . . , x m ∈ F. Due to the Markov property (1) the construction can be based on the Martin kernel K, where m and n are the time values associated with the states x and y respectively; here weak irreducibility (2) is important. Indeed, manipulations of elementary conditional probabilities lead to which connects the convergence condition on the conditional probabilities to the convergence of the values of the Martin kernel. We mention in passing that this approach to Doob-Martin convergence is equivalent to the usual approach via potential kernels; see also (Evans et al., 2012, Section 3) and (Evans et al., 2014, Section 2). From a general point of view, any family F of functions f : F → R that separates the points of F leads to an embedding of F into the space R F of functions from F to R via On R F we use the topology of pointwise convergence. If all f ∈ F are continuous (which they automatically are if we endow F with the discrete topology) and bounded, then the embedding is continuous and its range is a product of bounded intervals, hence compact by Tychonov's theorem. This is a variant of the Stone-Čech compactification, see (Kelley, 1955, p.152f). In this construction, all f ∈ F have a unique continuous extension to the whole of the compactified space. Alternatively, for a countable family F, a suitable metric can be defined on F using these functions, such that the associated completion has these properties; see (Woess, 2009, p.187).
In our present setup, the Doob-Martin compactification arises by taking F to be the set of functions y → K(x, y), x ∈ F. We use the same symbol for the extended functions and denote boundary elements by lower case Greek letters. With this construction, (3) and (4) are satisfied. In addition, we have the following remarkable properties: First, all non-negative harmonic functions h : F → R can be written as mixtures of the functions K(·, α), α ∈ ∂F. To be precise we recall that h : Then for each such h with h ≥ 0 and h(e) = 1 there is a probability measure µ h on (the Borel subsets of) the boundary ∂F such that The distribution µ 1 of the limit X ∞ represents the (trivial) harmonic function h ≡ 1. Secondly, conditioned on a limit value X ∞ = α, the process is again a Markov chain, with transition probabilities p h given by p(x, y)K(y, α).
This is an instance of Doob's h-transform, with K(·, α) the corresponding harmonic function h. Of course, the interpretation of these transforms by a conditioning on the final value is a natural consequence of the initial idea of conditioning on the values y n at time n and then letting n tend to ∞.
As it is central to our theme of persisting randomness we briefly explain why (and how) X ∞ generates the tail σ-field, up to null sets, that is, why property (4) holds.
The limit is obviously T (X)-measurable, which means that σ(X ∞ ) ⊂ T (X). For the other direction we need, for each tail event A, a Borel subset B of ∂F such that where X −1 ∞ (B) := {ω ∈ Ω : X ∞ (ω) ∈ B}. Let A ∈ T (X) and let 1 A be the associated indicator function; we may assume that κ := P (A) > 0. As the state space is graded in the sense that it can be written as the disjoint union of the 'slices' F n of states that are possible at time n, we can define h : F → [0, ∞) by setting h(x) = κ −1 P (A|X n = x) for x ∈ F n , n ∈ N. With (1) it follows that h is harmonic, and it turns out that the measure µ h representing h as in (5) has a density Φ with respect to the distribution µ 1 of X ∞ . The set required in (7) can now be given as In particular, if the distribution of X ∞ is concentrated on a single value of the boundary then T (X) is P -trivial, so randomness 'disappears in the limit'.

Graph limits
Our basis in this section is the recent monograph by Lovász (2012), which also gives references to the original research articles. Let G[n] be the set of simple graphs G = (V, E) with vertex set V = [n] := {1, . . . , n}. The set G[1] has only one element, the graph e = G 1 with the single node 1 and no edges. A number of popular models for randomly growing graphs fits into the framework of combinatorial Markov chains, with state space F = G := ∞ n=1 G[n] and start at G 1 . We work out the boundary for two of them, the uniform attachment process, and the Erdős-Rényi graphs, where we consider two variants of the latter. We note that the state space compactifications are abstract constructions so that the only uniqueness that we may expect is up to homeomorphisms; usually there are many possibilities for a concrete description.
On its own the question of how to define limits of finite graphs, interpreted as the search for a completion or compactification of the countable set G, does not involve any probability and, of course, it can have quite different answers depending on the specific circumstances. For example, we might distinguish between sparse and dense graphs, referring to the rate of growth of the number e(G n ) of edges E(G n ) in relation to the number v(G n ) of vertices V (G n ) of G n in a sequence (G n ) n∈N ⊂ G. For the dense case the notion of subgraph sampling has turned out to be important (there are several equivalent definitions): For two graphs G, H ∈ G let t(H, G) be the number of possibilities to embed H into G or, more formally, with Γ(H, G) the set of injective functions φ : We then say that a sequence (G n ) n∈N converges if for all H ∈ G the relative number ρ(H, G n ) of these possibilities converges as a sequence of real numbers. The value ρ(H, G n ) can be interpreted as the probability that, choosing m = v(H) elements of V (G n ) randomly and without replacement, the subgraph of G n induced on these nodes is isomorphic to H. The convergence may be rephrased in a somewhat abstract manner: We define an embedding of G into the set [0, 1] G of functions on G with values in the unit interval by and then consider the closure of the range of the embedding as a compactification of G. Note that the function space is compact with respect to pointwise convergence by Tychonov's theorem. Viewed this way, the similarity to the Doob-Martin compactification becomes apparent, where we use the embedding based on the Martin kernel instead.
Returning to the Markov chain models of randomly growing graphs, we first consider the uniform attachment model; see (Lovász, 2012, Example 11.39). In order to describe its dynamics suppose that we are in state G n ∈ G[n] at time n. We then construct G n+1 ∈ G[n + 1] by adding those edges {i, j} ⊂ [n+1] not (yet) in G n with probability 1/(n+1), independently of each other. Let X = (X n ) n∈N be the corresponding Markov chain, which has state space G and starts at G 1 . Let and 0 otherwise. Recall that X n = (V (X n ), E(X n )) is a random graph with V (X n ) = [n]; let M n be the corresponding (random) adjacency matrix. Expressed in graph theoretical terms, part (a) of the following result shows that the Doob-Martin convergence associated with the uniform attachment graphs is the same as pointwise convergence of the adjacency matrices M n . The limit M may be regarded as the adjacency matrix of the limit graph Theorem 1 (a) The Doob-Martin boundary of the uniform attachment process X consists of the set {0, 1} ∆ , where convergence of a sequence (G n ) n∈N of graphs G n ∈ G[n] to a limit M ∈ {0, 1} ∆ means that the edge indicators 1 {i,j} (G n ) converge to M (i, j) as n → ∞, for each {i, j} ∈ ∆.
and any l ∈ {m + 1, . . . , n} let q i,j,l be the probability that this edge appears in the X-sequence from time l onwards. Clearly, if i < j ≤ m, and with the understanding that an empty product has the value 1, If i < j and j > m, . . , m and those in E j (G n ) with j = m + 1, . . . , n have to enter the graph at some time l ∈ {m + 1, . . . , n}, and the edges {i, j}, 1 ≤ i < j, not in E j (G n ), j = 2, . . . , n, must remain unchosen. Hence, by independence, .
Using this with m = 1 we get .
Taking ratios we arrive at Because of e j (G) ≤ j − 1 for all G ∈ G, the first two of these factors will converge as n → ∞ with m fixed, and the respective limits will always be 1. Now let F i,m , 1 ≤ i < m, be the graph with node set [m] and a single edge {i, m}. If this edge appears in G n , then exist for all {i, m} ∈ ∆. On the other hand, from (12) we obtain the existence of the limits for all j ∈ N, j ≥ 2, which in turn implies the convergence of K 3 (G m , G n ). Taken together this characterizes Doob-Martin convergence as stated in part (a). In (12) and (13) convergence means that the sequence elements do not change from some index n onwards.
For the proof of part (b) let τ ij := inf n ∈ N : {i, j} ∈ E(X n ) be the entry time of the edge {i, j}; we need to show that P (τ ij < ∞) = 1 for all {i, j} ∈ ∆. This, however, is an easy consequence of the construction of X as we have, for n ≥ j, As a consequence of part (b) of the theorem, the tail σ-field of the uniform attachment process is trivial. Further, using (6), the chain conditioned on some limit value M ∈ {0, 1} ∆ can easily be described as follows: We proceed as before, but only edges {i, j} ∈ ∆ with M (i, j) = 1 are allowed to enter.
Remark (a) An embedding interpretation as in (10) and (11) of the topology in Theorem 1 results if we identify a graph with the values of the edge indicators, Here the boundary is the full function space {0, 1} ∆ , which is usually not the case.
(b) The graph sequence generated by the uniform attachment model converges in the sampling topology too, and in fact to what is arguably a more interesting limit; see (Lovász, 2012, Proposition 11.40). However, this 'more global' topology does not capture the tail information. To be specific, consider a random variable ξ with values in N and define a random element α of the boundary by α(i, j) = 0 if j = ξ and α(i, j) = 1 otherwise, and let Y be the corresponding h-transform. In Y the random node ξ remains isolated forever. Then T (Y ) = a.s. σ(ξ), which means that some randomness persists. The 'local' topology in Theorem 1 detects this, whereas from the global point of view Y and the original chain X are asymptotically indistinguishable.
(c) Whereas (14) is an embedding in the strict sense of being one-to-one, (10) is not: If G and G are of the same isomorphism type, then the functions ρ(·, G) and ρ(·, G ) coincide; see also Theorem 5.29 and its proof in Lovász (2012).
The second model that we consider is perhaps the most famous of all random discrete structures: To obtain the Erdős-Rényi-Gilbert graph (Lovász, 2012, p.8) or binomial random graph (Janson et al., 2000, p.2) Y n with node set [n] and parameter θ ∈ [0, 1] we include each of the n 2 possible edges with probability θ, independently of each other. In contrast to the uniform attachment model discussed above the variables Y n are now defined for each n separately, with but there is a well-known and canonical method to combine these into a Markov chain Y = (Y n ) n∈N : In order to move from Y n to Y n+1 we add the node n + 1 and then, independently of each other, each of the edges {i, n + 1}, 1 ≤ i ≤ n, with probability θ. A moment's thought reveals that, in this model, none of the randomness will ever go away, which is the other extreme as compared with tail triviality. Indeed, it is possible to reconstruct the complete sequence G 1 , . . . , G n from its last element G n , which implies that T (Y ) = σ(Y ). Roughly, for such chains with perfect memory 'the sequence is the limit'; see also (Evans et al., 2012, Section 9). In order to formalize , that is, we delete the nodes m + 1, . . . , n and the incident edges. This defines a family {Ψ n m : for all n ∈ N, and Ψ n m (G n ) = G m for all m, n ∈ N with m < n. This set is known as the projective (or inverse) limit associated with the sequence (G[n]) n∈N and the family {Ψ n m : 1 ≤ m ≤ n < ∞}. In the set of sequences, we regard a sequence (of sequences) as convergent if the respective elements at any particular position l ∈ N 'freeze', i.e. converge in the discrete topology on G[l]. With the discrete topology the individual components are compact in view of #G[l] < ∞, so that their (infinite) product is compact. The set G proj is closed therein, hence compact too.
There is a slightly different point of view that connects this abstract procedure to the material in the next section and that is also useful for the description of probability measures on the projective limit: The transition graph of a perfect memory chain on G is a rooted and locally finite tree, with root G 1 and directed edges (G n , G n+1 ), G n = Ψ n+1 n (G n+1 ). The projective limit then coincides with the boundary of the ends compactification of the transition tree. For a node G of this tree with k vertices let be the set paths through G. It is easy to see that a probability measure µ on the ends compactification is completely specified by the values µ(A G ), G ∈ G.
Theorem 2 (a) The boundary of the Doob-Martin compactification of G with respect to Y is given by the projective limit G proj .
Proof: Again, we look at the Martin kernel: If G m is on the unique path from G 1 to G n then P (Y n = G n , Y m = G m ) = P (Y n = G n ) so that K(G m , G n ) = 1/P (Y m = G m ), and K(G m , G n ) = 0 otherwise. From this part (a) of the theorem follows easily. For (b) we note that a boundary point In this situation conditioning on a limit value leads to a deterministic motion along the sequence of graphs that represents the limit. Part (b) and (15) imply that the limit distribution is diffuse. In particular, the tail σ-field is not trivial, as we have already noted before.
The perfect memory property is a consequence of the labelling of the nodes in the order of their appearance and the fact that in the step from n to n + 1 only edges incident to n + 1 are added. Uniform attachment graphs do not have this property; for example, if {1, 2} ∈ E(X 3 ) then it is not clear whether this edge has been added at time 2 or 3.
We now show that in the perfect memory case random relabelling may lead to a more interesting topology. We recall that the group S Now let Π n , n ∈ N, be a sequence of independent random variables, with Π n uniformly distributed on S[n]. We define X = (X n ) n∈N inductively by X 1 ≡ G 1 , X n+1 = Π n+1 (X n+1 ), whereX n+1 is constructed from X n as in the chain Y above: V (X n+1 ) = [n + 1] and E(X n+1 ) is obtained from E(X n ) by adding each of the edges {i, n + 1}, i ∈ [n], independently with probability p. In view of the fact that the transition from X n to X n+1 only involves X n and quantities that are independent of X n the process X = (X n ) n∈N is again a Markov chain and it continues to be adapted to G. Also, the distribution (15) is invariant under S[n] as the action (16) does not change the number of edges. This implies that X n and the variable Y n from the perfect memory version without random relabelling have the same distribution (but they will in general not be equal). Below, we will refer to the sequence X as the Erdős-Rényi chain with parameter θ.
The following result shows that the Doob-Martin compactification for this model leads to the sampling topology mentioned at the beginning of this section; see (8) and (9).
Note that the right hand side does not depend on φ. Using (15) a decomposition with respect to the value of Φ now gives Clearly, this converges for fixed G m as n → ∞ if and only if ρ(G m , G n ) does.
(b) This follows from the sampling interpretation of ρ and the independence of the edge indicators. 2 Again, part (b) implies that the tail σ-field of X is trivial. Further, the Erdős-Rényi chains with different parameter values are easily seen to be h-transforms of each other.
Theorem 3 identifies the Doob-Martin boundary of the Erdős-Rényi chain as a subset of the set of all functions G ∞ : G → [0, 1]. For a concise description of this subset, by 'graphons', we refer the reader to Lovász (2012).
By a binary tree we mean a subset x ⊂ V that is prefix-stable or, equivalently, contains the ancestor of each of its non-root elements. If u / ∈ x,ū ∈ x, then we call u external, and we write ∂x for the set of external nodes of x. The (fringe) subtree of x rooted at u is given by x(u) := {v ∈ V : u + v ∈ x}. Let B n be the set of binary trees with n nodes; B := ∞ n=1 B n . Prefix stability implies that any x ∈ B can be regarded as a contiguous subset of V and hence be described by its boundary function (This seems to be the most natural term, but in view of all the other occurrences of boundaries in the present paper, 'frontier' may be a sensible alternative.) Given a sequence (t n ) n∈N of pairwise distinct real numbers the binary search tree (BST) algorithm generates a sequence (x n ) n∈N of labelled binary trees as follows: The first value is stored at the root node; given x n the next value t n+1 is stored at the first empty node found when travelling through x n , moving from u to u0 if the new value is smaller that the label of an occupied node and to u1 otherwise. This is one of the standard algorithm for searching and also arises in the context of sorting; see Knuth (1973), Mahmoud (1992) and Drmota (2009). Suppose that the t i 's are realizations of independent random variables η i , i ∈ N, with the same continuous distribution. Then the random binary trees X n obtained for η 1 , . . . , η n , n ∈ N, can be collected into a Markov chain X = (X n ) n∈N with a simple transition structure: X 1 is the tree that consists of the root node ∅ only, and in the transition from X n to X n+1 one of the n + 1 external nodes of X n is chosen uniformly at random and incorporated into the tree. The BST chain has B as its state space, and P (X n ∈ B n ) = 1 for all n ∈ N.
The Doob-Martin compactification of B with respect to X and the distribution of the limit X ∞ were obtained by Evans et al. (2012) and can be described as follows: ∂B is the set of probability measures µ on ∂V. Convergence to µ ∈ ∂B of a sequence (x n ) n∈N ⊂ B means that a n := #x n → ∞ and that the relative number #x n (u)/a n of nodes in the subtree rooted at u ∈ V converges to µ(A u ) for all u ∈ V, where A u consists of all infinite 0-1 sequences with prefix u. As with the transition tree in the previous section, the values µ(A u ), u ∈ V, determine µ.
We have P (X ∞ (A u ) > 0) = 1 for all u ∈ V. The distribution of X ∞ is a probability measure on ∂B, hence on the set of probability measures on ∂V, where the latter is endowed with the σ-field generated by the projections µ → µ(A), A a Borel subset of ∂V. This distribution has the (characterizing) property that the random variables are independent and uniformly distributed on the unit interval. This in turn implies that X ∞ (A u ) can be written as the product of independent, identically distributed random variables, a fact that we will use repeatedly below.
In the present section we apply this to the asymptotics of the random functions B Xn , n ∈ N. The idea of describing randomly growing sets by their boundary appears in connection with models now known under the acronym 'IDLA' (internal diffusion limited aggregation). This subject area was initiated by Diaconis and Fulton (1991), an early important contribution is Lawler et al. (1992). Both papers deal with integer lattices, but the basic model has since then been applied to various other infinite discrete background sets, for example to the 'comb' by Huss and Sava (2012). BST chains may be seen as an IDLA variant on the background set V, where the exploration process is a reinforced random walk in the sense that the probabilities of moving from u to u0 and u1 respectively depend on the number of previous particles that have travelled along the respective edge. The BST boundary functions have earlier been investigated under the name of 'silhouette' in Grübel (2005Grübel ( , 2009, where they were regarded as functions on the unit interval via (binary rationals do not matter as X ∞ has no atoms). Figure 1 shows the boundary functions of X n for various n, with pseudorandom data, where (19) has been used to display B Xn as a function on [0, 1].
We begin with two real-valued functionals of the boundary functions. First we consider the growth of the trees along a fixed path through the infinite binary tree.
Theorem 4 Let u ∈ ∂V be fixed. Then the tail σ-field of the sequence (B Xn (u)) n∈N is P -trivial.
A proof can easily be obtained on using the well-known connection to records: The BST dynamics imply that (B Xn (u)) n∈N is identical in distribution to the sequence (S n ) n∈N , S n = n k=1 ζ k , of partial sums of independent random variables ζ k with P (ζ k = 1) = 1 − P (ζ k = 0) = 1/k, k ∈ N, which also appears when counting records in random samples. It follows that ((n, S n )) n∈N is a Markov chain with state space F = {(n, i) : n ∈ N, i ∈ [n]} and transition probabilities p (n, k), (n + 1, k + 1) = 1 − p (n, k), (n + 1, k) = 1 n + 1 .
The structural similarity to the Pólya urn mentioned in Section 1 should be apparent. In the records chain, a sequence ((n, k n )) n∈N of states converges in the sense introduced in Section 2 if and only if which leads toF = F ∪ [0, ∞]. This can be proved by 'path-counting', the asymptotics of unsigned Stirling numbers of the first kind, and an interesting monotonicity argument; see Gnedin and Pitman (2005) and the references given there. In the compactification, S n tends to the constant value 1, which implies triviality of the tail σ-field as explained at the end of Section 2. Let be the harmonic numbers. From the representation of B Xn (u) as a sum of independent Bernoulli variables we obtain the expected value EB Xn (u) = H(n) and, using H(n) ∼ log n, the distributional convergence where Z has a standard normal distribution. However, by Theorem 4, there is no transformation of the random variables B Xn (u), u ∈ ∂V fixed, that leads to strong convergence with a non-degenerate limit.
For the second functional we integrate the boundary functions with respect to the measure λ on ∂V given by λ(A u ) = 2 −|u| . This is the unique normalized Haar measure if we regard the set of infinite 0-1 sequences as a compact group under the pointwise addition modulo 2. Recall from (18) Lemma 5 The random variables L ∞,k := |u|<k 2 −|u| C(ξ u ), k ∈ N, converge almost surely and in L 2 as k → ∞.
Proof: A straightforward calculation shows that EC(ξ u ) = 0 and var(C(ξ u )) = 1 − (π 2 /12). In particular, using independence of the ξ u 's, which implies that |u|≤k 2 −|u| C(ξ u ), σ({ξ u : |u| ≤ k}) k∈N is an L 2 -bounded martingale, so that the corresponding limit theorem can be used. 2 We write L ∞ = v∈V 2 −|v| C(ξ v ) for the limit of L ∞,k as k → ∞. In this series we do not have absolute convergence: For k ∈ N fixed the mean of the random variable |u|=k 2 −|u| |C(ξ u )| is a positive value that does not depend on k.
Obviously, ψ n (u) = 0 for u ∈ ∂X n , which completes the proof of (21). The integral defining L n may be rewritten as follows, where the last equality can easily be proved by induction. Combining this with (21) and the convergence of L 2 -bounded martingales we obtain the assertion. 2 As a sum of independent and non-degenerate random variables the limit L ∞ is not almost surely constant; in particular, the tail σ-field of the L-sequence is not P -trivial.
A strong limit theorem for L n − H(n) has already been obtained in (Grübel, 2009) by proving directly that (L n − H(n), F n ) n∈N is an L 2 -bounded martingale. Our approach here differs insofar as it replaces the search for a suitable martingale by projecting the limit on the natural filtration (F n ) n∈N of the Markov chain, and it also provides a representation of the limit in terms of X ∞ . The representation in turn leads to an interpretation of the limit as a distance from X ∞ to the Haar measure λ: Let G k be the σ-field on ∂V generated by the sets A u with |u| = k. The Kullback-Leibler divergence KL(µ 1 , µ 2 ) of two measures µ 1 and µ 2 on a measure space (Ω, F), with F generated by a partition A 1 , . . . , A l of Ω, is given by We write µ| G for the restriction of the measure µ to a sub-σ-field G of its domain F.
Proof: If we restrict the sum in the definition of L ∞ to the nodes of depth less than k then we obtain with φ(u) := 2 −|u| log X ∞ (A u ) , using (18) and a summation by parts as in the proof of Theorem 6, It is tempting to think of the limit as the Kullback-Leibler divergence of X ∞ and λ. Note, however, that the density Z k : V → [0, ∞) of X ∞ | G k with respect to λ| G k is given by whereξ j is equal to either ξ v or 1 − ξ v with the ξ-variables as in (18), depending on the value of the jth entry of the sequence u, and with v the corresponding length j − 1 prefix of u. From this representation as a product of independent, identically distributed and non-degenerate random variables with mean 1 it follows that P lim inf k→∞ Z k (u) = 0 = P lim sup k→∞ Z k (u) = ∞ = 1.
In particular, X ∞ and λ are mutually singular with probability 1. Clearly, some cancellation occurs in the sum defining L ∞ , due to the fact that X ∞ is a random measure. xs 0 1 0 1 0 1 0 1 Fig. 2: Two values of Yn, with n = 500 (blue) and n = 1000 (red).
We return to the boundary functions. From Theorem 4 it is clear that we cannot expect these to converge pointwise; see also Figure 1. Further, the asymptotic normality in (20) shows that, at a specific point, the functions increase roughly as log n but that there are fluctuations of the order √ log n. Hence, apart from shifting, some smoothing is needed, as has already been noticed in (Grübel, 2009). We first adapt the smoothing procedure introduced in (Grübel, 2009)  (recall that the nth harmonic number is the expectation of B Xn (v) for each v ∈ ∂V). It is easy to deduce from (Grübel, 2009, Theorem 8) that Y n converges in distribution to a process with continuous paths. However, as the Y n 's are all defined on the same probability space, it makes sense to ask whether these variables themselves converge. Figure 2 shows the values of Y n (ω) for two ω's, with n = 500 and n = 1000 respectively, where instead of two such ω's in the left and the right part of the figure two separate streams of numbers were used that the present author regards as plausible substitutes for truly random numbers: The two sequences were generated from alternating blocks of ten digits in the decimal expansion of π − 3, so that the left stream begins with t 1 = 0.1415926535, t 2 = 0.2643383279, . . ., whereas the right stream has t 1 = 0.8979323846, t 2 = 0.5028841971 and so on. As in Figure 1, ∂V is mapped to [0, 1] by the function β defined in (19) in order to be able to draw the functions.
The figure supports the conjecture that the random functions Y n themselves converge, and that the limit is not a fixed function. Incidentally, it also demonstrates the influence of the first few values on the output of the BST algorithm: The long-term proportion #X n ((0))/n of nodes in the left subtree is equal to the value of the first input variable, for example, and for the above π-data the t 1 -values are quite different.
The theorem below confirms this conjecture. The theorem also provides a representation of the limit process in terms of the Doob-Martin limit X ∞ of the BST sequence and, in fact, its proof is closely connected to this representation.
We need to specify what convergence of the random functions Y n means. For this, we define a metric d on ∂V by d(u, v) = 2 −l+1 for u, v ∈ ∂V, u = v, where l is the first coordinate in which the two sequences differ as in the definition of the total order on ∂V; also, l − 1 = |u ∧ v| where u ∧ v denotes the longest common prefix (last common ancestor) of u and v. This turns ∂V into a compact metric space; we write C(∂V) for the set of continuous functions on (∂V, d). Endowed with the supremum norm, f ∞ = sup u∈V |f (u)|, C(∂V) becomes a Banach space. Further, for u = (u k ) k∈N ∈ ∂V let K(u) := {k ∈ N : u k = 1}, and u(k) := (u 1 , . . . , u k−1 , 0) for k ∈ K(u). Generalizing the notation introduced above in connection with the second functional of the boundary functions we write where '≥' now refers to prefix order. Clearly, L ∞ (u) has the same distribution as 2 −|u| L ∞ (∅), and we know from Lemma 5 that L ∞ (∅) = L ∞ has zero mean and finite variance. We require two auxiliary results.
This shows that that (Y ∞,m (u)) m∈N is a Cauchy sequence in L 2 and that, with Y ∞ the limit, In particular, using Chebyshev's inequality, we get which implies almost sure convergence. (b) It follows from (18) that withξ 1 , . . . ,ξ |u| independent and uniformly distributed on the unit interval; see also the discussion following Theorem 7. This leads to Using this bound on the L 2 -norm of the individual summands we can now proceed as in the proof of part (a). 2 In view of Lemma 8 it makes sense to define two random functions The random functions Y ∞ and Y ∞ may also be regarded as stochastic processes with time parameter u ∈ ∂V.
Lemma 9 (a) With probability 1, the processes Y ∞ and Y ∞ have continuous paths.
(b) Both processes are integrable in the sense that Proof: We consider Y ∞ first. For the proof of continuity we adapt the well-known chaining argument, see e.g. (Kallenberg, 1997, p.35), to the present situation. Let For nodes v on a fixed level k the variables L ∞ (v) are independent. Using ρ 2 k ≤ |v|=k L ∞ (v) 2 we obtain from which it follows that This implies that on a set A of probability 1 we have with some C(ω) < ∞ that does not depend on k. Suppose now that u, v ∈ V are such that d(u, v) ≤ . Then the first l = l( ) = − log 2 ( ) entries of u and v coincide, so that their connecting path does not go below height l. With Lemma 8 and the triangle inequality we therefore get whenever ω ∈ A. This implies that almost all paths of Y ∞ are continuous.
For the proof of integrability we first note that Y ∞ ∞ ≤ ∞ k=1 ρ k . Using (25) and with a k = 2 −k/4 we see that the upper bound has finite mean. As in the proof of the previous lemma, the arguments used for Y ∞ can be transferred to the other process Y ∞ : We now put and again, we will show that these decrease rapidly enough as k → ∞. However, we no longer have independence of the individual random variables in the maximum, so we need a different argument. As in (Grübel, 2014) in connection with the maximum of the probabilities X ∞ (A u ), |u| = k, we use the connection to the branching random walks discussed by Biggins (1977). This rests upon the observation that the variables are the positions of the members of the kth generation in a branching random walk with offspring distribution δ 2 , meaning that each particle has exactly two descendants, and with for the point process of the positions of the children relative to their parent. Let and let Z (k) + (t) be the number of particles in generation k that are located to the right of t. The random measure Z (k) is the kth convolution power of Z, which leads to for all t ∈ R, 0 ≤ θ < 1 and k ∈ N. Using (26) we get, with θ = 1/2, so that, for some constant C < ∞, Using this instead of (25) we can now proceed as in the first part of the proof. 2 There is obviously room to spare in the above chaining inequalities; tightening these leads to path properties beyond continuity.
In the proof of our final result we will use infinite-dimensional martingales; see (Neveu, 1975, Chapter V-2). For this, we require separability of the Banach space C(∂V), · ∞ : The sets A u , u ∈ V, are closed and open in ∂V, so their indicator functions are continuous. Moreover, the intersection of two such sets is again of this form, and the indicator functions separate the points of ∂V. The required separability now follows on using the Stone-Weierstraß theorem.
Theorem 10 With probability 1, Y n converges in C(∂V), · ∞ to Y ∞ := Y ∞ + Y ∞ as n → ∞. where we have used the same notation and the same arguments as in the proof of (21). Combining these we get, for all u ∈ V, Au B Xn − H(n) dλ = E[L ∞ (u)|F n ] + 2 −|u| H(#X n (u)) − H(n) + |u| .
Now let u = (u k ) k∈N ∈ ∂V be such that #K(u) < ∞. The integration range appearing in the definition of Y n (u) can be decomposed as follows, With (27) we thus obtain, setting Y n := E Y ∞ F n , 2 −k H(#X n (u(k))) − H(n) + k .
From Lemma 9 we know that Y ∞ is a C(∂V)-valued integrable random variable. Hence (Y n , F n ) n∈N is a martingale with values in the separable Banach space C(∂V), · ∞ , and by (Neveu, 1975, Proposition V.2.6) Y n converges almost surely in this space to Y ∞ as n → ∞. It remains to prove that, as n → ∞, k∈K(u) 2 −k H(#X n (u)) − H(n) + k → Y ∞ (u), with probability 1 and, with both sides regarded as functions of u, in C(∂V), · ∞ . For this, we first show that E log X ∞ (A u ) F n = H #X n (u) − H(n) for all u ∈ V, n ∈ N.
Further, we know from the proof of Theorem 6 that E[log ξū|F n ] = H #X n (ū0) − H #X n (ū) .
Hence, if (29) holds forū, then so it does for u. The same arguments work in the case u k+1 = 1, with 1 − ξū and #X n (ū1) instead of ξū and #X n (ū0) respectively. This completes the induction proof for (29).
As in the first part of the proof we now get E Y ∞ (u) F n = k∈K(u) 2 −k H #X n (u(k)) − H(n) + k .
From this (28) follows on using the infinite-dimensional martingale convergence theorem again. 2

Comments and complements
We collect some references to related work and also put the above results into a larger perspective.
(a) The approach of the present paper is not limited to graphs and search trees but may be used quite generally in the context of combinatorial Markov chains. For an elementary introduction to such processes and their boundaries, with many examples and algorithms, see Grübel (2013) (written in simple German).
(b) In concrete cases, the results provided by a general method such as the Doob-Martin approach can often be obtained more directly, using the additional structures then present. For example, in Grübel (2014) a proof of the basic BST result from Evans et al. (2012) is given that is based on the BST algorithm; this direct approach also leads to a representation of X ∞ in terms of the input sequence. Obviously, the same applies to the theory of graph limits, but the exposition of a common structure provided by a general theory may lead to a deeper understanding of such individual cases.
(c) As seen above, the boundary theory approach may lead to strong limit theorems for discrete structures and their functionals, occasionally improving on previous results. In Grübel (2014) such an amplification from convergence in distribution to convergence of the random variables is carried out for the Wiener index of search trees, where distributional convergence had earlier been obtained by Neininger (2002) with the contraction method. In both cases it is instructive to compare the proofs, which are quite different and seem to be less involved for the stronger result (once the Doob-Martin compactification has been worked out). The records chain provides an example where we have distributional convergence with a non-degenerate limit, but where a strong limit is necessarily degenerate, i.e. constant.
(d) At a qualitative level functionals of discrete random structures may have a non-trivial tail σ-field, which we interpret as persisting randomness, or they may not, even if the structures themselves show such a persistence; see Theorem 4. A similar phenomenon has been observed in connection with the subtree size profile of binary search trees by Dennert and Grübel (2010).