A note on limits of sequences of binary trees

We discuss a notion of convergence for binary trees that is based on subtree sizes. In analogy to recent developments in the theory of graphs, posets and permutations we investigate some general aspects of the topology, such as a characterization of the set of possible limits and its structure as a metric space. For random trees the subtree size topology arises in the context of algorithms for searching and sorting when applied to random input, resulting in a sequence of nested trees. For these we obtain a structural result based on a local version of exchangeability. This in turn leads to a central limit theorem, with possibly mixed asymptotic normality.


Introduction
A description of large discrete objects can be based on a suitable convergence concept, together with a characterization of the possible limits.For graphs Lovász and Szegedy (2006) used subgraph counts and obtained a description of the limits as graphons; see also Lovász (2012) and the references given there.A similar approach has been used in Janson (2011) for posets, in Elek and Tardos (2022) for trees and in Hoppen et al. (2013) for permutations; in the latter case pattern counting leads to permutons as limit objects.(Some details are given below at the end of Section 2.) In the present note we use subtree sizes in a similar fashion to obtain a convergence concept for binary trees, and we obtain a description of the limit objects as probability distributions on the set of infinite sequences of zeros and ones.
In Lovász and Szegedy (2006); Hoppen et al. (2013); Elek and Tardos (2022) randomness appears somewhat implicitly in the relation to subsampling.It is further used in Lovász and Szegedy (2006) and Hoppen et al. (2013), via a suitable probabilistic construction, to show that each of the potential limit objects indeed occurs for some sequence of graphs or permutations.Sequences of binary trees arise in connection with algorithms for searching and sorting: With random input both the binary search tree (BST) and the digital search tree (DST) algorithms (Knuth, 1973, Chapter 6) lead to increasing random sequences (X n ) n∈N of binary trees, where X n has n nodes (again, details are given below).The asymptotics of such sequences have been studied in Evans et al. (2012) where the subtree size topology appears in the context of Markov chain boundary theory.A similar boundary theory interpretation for the topologies of substructure sampling has been found for graph sequences in Grübel (2015) and for sequences of permutations in Grübel (2023+).
A different class of binary trees appears in connection with Rémy's algorithm, which provides a sequence (X n ) n∈N of trees where each X n is uniformly distributed on the set of trees with n nodes.This is again a combinatorial Markov chain in the sense of Grübel (2013), and its Martin boundary has been determined in Evans et al. (2017).In contrast to Evans et al. (2012), where the boundary was worked out directly through the Martin kernel, the approach in Evans et al. (2017) is based on the construction of exchangeable arrays and an associated representation theorem, as discussed in depth in Kallenberg (2005).This also leads to a description of (X n ) n∈N as the result of sampling from a real tree in a specific manner, similar to the use of graphons and permutons.For graphons an exchangeability approach is outlined in (Lovász, 2012, Section 11.3.3)and studied in more detail in Diaconis and Janson (2008); for a similar treatment of randomly growing permutations see Grübel (2023+).Recently, Elek and Tardos (2022) constructed dendrons as limit objects for general trees, with an approach based on regarding trees as metric spaces, together with a suitable rescaling, and using the machinery of ultraproducts and ultralimits.Finally, rooted general trees and their limits appear in connection with classical branching processes; see the survey Janson (2012a) and the references given there.
It is well known that search trees and uniform trees belong to two different 'universality classes', often labeled by the asymptotics of their height, which is log n in the first and √ n in the second case.Another aim of this note is to show that an approach based on probabilistic symmetries can also be used in the context of trees of logarithmic height.
In Section 2 we first introduce some basic notation for binary trees and then study the subtree size topology, proceeding essentially as in Hoppen et al. (2013) for permutation sequences.In Section 3 we consider tree sequences that grow by one node at a time, such as the output sequences obtained with the BST and DST algorithms mentioned above, where we introduce a local notion of exchangeability.This is then applied in Section 4 to obtain a second order result for subtree size convergence, where a possibly mixed Gaussian process arises as the distributional limit.
We restrict ourselves to binary trees in order to arrive at a compact presentation.Many related varieties of trees, such as quad trees, may be treated in a similar manner; see also the models considered in Devroye (1998) and in Evans et al. (2012).

Subtree size convergence
Let V := {0, 1} ⋆ := ∞ k=0 {0, 1} k be the set of finite words with letters from the alphabet {0, 1}.We write |u| = k for the length of the word u = (u 1 , . . ., u k ) and V k for the set of words of length k.We will also use the notation |A| for the size of a set A. The concatenation of u = (u 1 , . . ., u k ) and v = (v 1 , . . ., v l ) is given by u + v = (u 1 , . . ., u k , v 1 , . . ., v l ).On V, the prefix order is defined by By a binary tree x we mean a subset of V (the potential nodes or vertices of the tree) with the property that In short, binary trees are sets of words that are prefix stable.The node v = ∅ (arising if k = 0) is the root of the tree.Further, v1 := (v 1 , . . ., v k , 1) and v0 := (v 1 , . . ., v k , 0) are the right and left descendant of v respectively.The set V of all nodes may be seen as the complete infinite binary tree, B n is the set of binary trees with n nodes, and B := ∞ n=0 B n is the set of all binary trees with finitely many nodes.The (external) boundary ∂x of a finite tree consists of all external nodes v ∈ V \ x with v = u0 or v = u1 for some u ∈ x.It is easy to see that |∂x| = |x| + 1 for all x ∈ B.
Two tree-related notions that are particularly important for us are the subtree σ(x, u) of a tree x rooted at u ∈ x and the (relative) subtree size function t(x, We say that a sequence (x n ) n∈N converges in the subtree size topology if, for all u ∈ V, the real numbers t(x n , u) converge as n → ∞.It is easy to see that finite binary trees are characterized by their subtree size function.We may therefore regard the mapping B ∋ x → (u → t(x, u)) ∈ [0, 1] V as an embedding of the set of finite binary trees into a set that is compact by Tychonoff's theorem under the topology of pointwise convergence, as in the definition of subtree size convergence, and may even identify trees with their subtree size functions.In this sense the closure B of the image of the embedding provides a compactification where the limits are given by the functions on V that appear as pointwise limits of the sequences (t(x n , •)) n∈N for convergent sequences of trees.Obviously, not all functions on V can arise in this way, and the identification of subtree size limits amounts to finding a tractable space that is homeomorphic to the boundary B \ B. Note, however, that the general abstract setting immediately yields that each sequence of trees has a convergent subsequence.Let V ∞ := {0, 1} ∞ be the set of infinite sequences of zeros and ones and let B ∞ be the σ-field on the sequence space that is generated by the coordinate projections which implies that B 0 := {B u : u ∈ V} is a countable and intersection stable generator of B ∞ .As a consequence, elements of M ∞ are determined by their values on B 0 .With componentwise addition V ∞ becomes a compact group.Its unique Haar measure µ with total mass 1, the uniform distribution on (V ∞ , B ∞ ), is characterized by µ(B u ) = 2 −|u| for all u ∈ V.
We will need the following measure-theoretic property of binary trees, which seems to be part of the folklore of the subject.I have not found a suitable reference, and therefore include a proof.
Proof: We define a set function µ 0 : B 0 → [0, 1] by µ 0 (B u ) = ψ(u) for all u ∈ V. Using (3) it is easy to show by induction that µ 0 is finitely additive on each system The finite additivity of µ 0 on B 0 extends to the field B 1 generated by B 0 .We now use a topological argument: An ultrametric d can be defined on V ∞ by d(v, w) = 2 −|v∧w| , where |v ∧ w| denotes the length of the longest common prefix of the sequences v, w.Endowed with d the sequence space becomes a totally disconnected and compact topological space, with B ∞ as its Borel σ-field.The σ-additivity of µ 0 on B 1 now follows from the finite intersection property of compact sets, so that we may apply Carathéodory's extension theorem.
The lemma shows that M ∞ can be embedded into [0, 1] V , as done above for B. In its proof we have chosen a topological argument, in line with the general thrust of the paper; Kolmogorov's consistency theorem can be used to obtain a probabilistic alternative.
The digital search tree (DST) algorithm turns a sequence (ξ i ) i∈N of elements of V ∞ into an increasing sequence (x n ) n∈N of binary trees, with x n ∈ B n for all n ∈ N: Starting with x 1 = {∅} we obtain x n+1 from x n and ξ n by interpreting ξ n as a routing instruction, with 0 as a move to the left and 1 as a move to the right, and the inclusion of the external node u where exit from the current tree x n occurs.For µ ∈ M ∞ let DST(µ) be the distribution of the B-valued random sequence (X n ) n∈N generated by the digital search tree algorithm if the input sequence (ξ i ) i∈N consists of independent random variables with distribution µ.For example, if µ is concentrated at the single sequence (0, 0, 0, . ..) ∈ V ∞ then the DST mechanism produces the infinite tree that consists of all nodes on the left-most infinite branch in V.A special case of the DST family is the Bernoulli model with parameter p ∈ (0, 1), see (Drmota, 2009, Section 1.4.3),where each ξ i consists of a sequence of independent {0, 1}-valued variables (ξ ik ) k∈N with P (ξ ik = 1) = p for all k ∈ N. Especially the symmetric case, with p = 1/2, has been studied extensively.
It is easy to see that for a sequence (x n ) n∈N of trees with lim inf n→∞ |x n | < ∞ subtree size convergence implies that the sequence is constant from some n 0 ∈ N onwards.The following may be regarded as the binary tree analogue of (Hoppen et al., 2013, Theorem 1.6).
Theorem 2. (a) If a sequence (x n ) n∈N of binary trees with lim n→∞ |x n | = ∞ converges in the subtree size topology then, for some unique µ ∈ M ∞ , (4) (b) Let µ ∈ M ∞ and let (X n ) n∈N be distributed according to DST(µ).Then X n converges with probability one to µ in the subtree size topology.
Proof: (a) Let ψ(u) := lim n→∞ t(x n , u) for all u ∈ V.It is easy to see that ψ satisfies (3) and clearly, ψ(∅) = 1.Lemma 1 now supplies the probability measure µ and, by (4), the tree sequence converges to µ in the subtree size topology.
The first part of the theorem shows that convergence with respect to the subtree size topology leads to convergence of binary trees to a measure in M ∞ , and the second part shows that indeed each µ ∈ M ∞ arises as the limit of a sequence of binary trees.Using the above identifications of B, B and M ∞ as subsets of [0, 1] V we may summarize the result by the simple formula B = B ⊔ M ∞ .
Example 3. Let x n := n k=0 {0, 1} k be the complete finite binary tree of height n.Then, for k ≤ n the subtree of Similar to the graphon and permuton situation, the limit of a random sequence of binary trees may be a (truly) random element of M ∞ .The next example has already been considered in Evans et al. (2012), with methods from Markov chain boundary theory.
Example 4. (BST, see also (Devroye, 1998, Example 1)) Let (ξ i ) i∈N be a sequence of independent random variables, all uniformly distributed on the unit interval.We may assume that the values are pairwise different, and may then define a random sequence As in the DST case, the BST algorithm generates a sequence (X n ) n∈N of increasing trees, with X 1 = {∅} and X n+1 = X n ⊔ {v} with some v ∈ ∂X n .To specify the respective new node as a function of X n and R n+1 we first note that the n + 1 elements of ∂X n can be ordered lexicographically, and we then take the node v ∈ ∂X n with left-right position R n+1 .We write X = (X n ) n∈N ∼ BST for the result.In a nutshell, BST uses the ranks whereas DST uses the bit structure of the input values.This implies that we may replace unif(0, 1) by any other distribution µ as long as µ({a}) = 0 for all a ∈ R. With this construction all ξ-values less than ξ 1 end up in the left subtree, the larger ones in the right subtree of the root node.It follows that t(X n , (0)) converges almost surely (a.s.) to ξ 1 and t(X n , (1)) to 1 − ξ 1 .Further, given ξ 1 = a the values less than a and greater than a are independent and uniformly distributed on [0, a) respectively (a, 1].Hence, given ξ 1 , the left and right subtree are independent and, after passing to the appropriate subsequence, equal in distribution to X. Taken together this shows that for any u ∈ V, the sequence of pairs t(X n , u0)/t(X n , u), t(X n , u1)/t(X n , u) converges almost surely to (η u , 1 − η u ), where η u , u ∈ V, are independent and uniformly distributed on [0, 1].Thus, the BST sequence converges almost surely to a random element M BST of M ∞ , with In particular, P (M BST = µ) = 0 for all µ ∈ M ∞ .⊳ Our final example requires a slight shift of perspective, from random variables to their distributions.We use the classic Billingsley (1968) as our basic reference for weak convergence.By Prohorov's theorem (Billingsley, 1968, Theorems 6.1 and 6.2), the space M 1 (B) of probability measures on (the Borel subsets of) B, together with the topology of weak convergence, is a compact metrizable space.We write temporarily Mn , M for (non-random) elements of M 1 (B) in order to distinguish these from random elements M n , M of B (thus, we may have M = L(M )).The topological structure implies that any sequence ( Mn ) n∈N must have a limit point in M 1 (B), and convergence on this level, which we denote by Mn → w M , holds if and only if there is only one such point.
In contrast to the previous example, where the model specifies the distribution of the full sequence (X n ) n∈N , we now only have the distributions of the individual variables X n , n ∈ N. In view of its connection to enumerative combinatorics the uniform distribution is of special interest.
Example 5. Let Mn = unif(B n ) for all n ∈ N, and let M be an associated limit point, so that Mn(k) → w M as k → ∞ for some subsequence (n(k)) k∈N .Then, by the Skorohod representation theorem, see e.g.(Kallenberg, 1997, Theorem 3.30), there exists a probability space carrying random variables X ∞ , X 1 , X 2 , . . .with L(X ∞ ) = M , L(X k ) = Mn(k) for all k ∈ N, such that X k → X ∞ a.s. in the subtree size topology.We will show that For this, we note that any x ∈ B may be decomposed into its left and right subtree, given by σ(x, (0)) and σ(x, (1)) respectively.Further, for all n ∈ N, one of the many appearances of the Catalan numbers C n .Hence, if U n ∼ unif(B n ), and with Standard bounds for the Catalan numbers lead to and it follows that L t(U n , (0)) = L(L n /n) converges weakly to the uniform distribution on the finite set {0, 1}.For the representing sequence (X k ) k∈N we must have almost sure convergence of t(X k , (0)) to some real value X ∞ (B (0) ), hence (6) holds for u = (0) and u = (1).Uniformity of the distribution further implies that, conditionally on L n = k, the left and right subtree of U n are independent and uniformly distributed on B k and B n−1−k respectively.Applying the above argument to these we obtain (6) for nodes of length two, and iteration gives the statement for all u ∈ V.
We next apply a symmetry argument: The group V ∞ acts on V via v.u := (w 1 , . . ., w k ), with w j := v j + u j mod 2, j = 1, . . ., k, Taken together this shows the distribution of M is invariant under these transformations, which implies that M = Munif := L(δ V ), with V uniformly distributed on V ∞ .Thus, all limit points are identical, and we have unif(B n ) → w Munif as n → ∞. ⊳ It follows from this example that for any sequence (X n ) n∈N of random trees on some probability space with the properties that L(X n ) = unif(B n ) for all n ∈ N and that X n converges almost surely to some X ∞ in the subtree size topology, we must have L(X ∞ ) = L(δ V ) with L(V ) = unif(V ∞ ).Rémy's algorithm, see Rémy (1985), provides such a sequence.In Evans et al. (2017) a different topology has been introduced and discussed for the Rémy sequence, and this led to a more detailed class of limits.Stated somewhat informally, subtree sizes reflect the local behavior, and in the uniform case this amounts to a reduction of the limit tree to its spine, a term commonly used in connection with the asymptotics of Galton-Watson trees; see (Janson, 2012a, p115).Moreover, for uniform binary trees the spine can be constructed from a sequence of coin tosses.
Another opportunity for comparison between topologies arises if we ignore the root and the left-right positioning of the descendants in a binary tree, so that we arrive at an isomorphism class of tree graphs.For these, a 'global' topology is introduced and discussed in Elek and Tardos (2022) and Janson (2012b).With the complete binary trees in Example 3 the situation turns out to be somewhat reversed as, for these, the subtree size topology leads to an arguably more interesting limit; see (Janson, 2012b, Example 7.3).
We next investigate the topological structure of subtree size convergence.In the general setup, with convergence meaning the pointwise convergence of the functions t(x, •), a suitable metric can be obtained as with an arbitrary w : V → (0, ∞) such that u∈V w(u) < ∞.However, in this generality this does not reflect the specific structures considered here.For sequences of graphs and permutations embeddings of the discrete structures into the respective limit spaces of graphons and permutons have been given in (Lovász, 2012, Section 1.5.2) and (Hoppen et al., 2013, Definition 3.4).To obtain a similar embedding of B into M ∞ we first recall the metric d from the proof of Theorem 2 that makes V ∞ a compact ultrametric space.The σ-field B ∞ on V ∞ is the associated Borel σ-field, and weak convergence µ n → µ in M ∞ means that f dµ n → f dµ for all bounded continuous f : V ∞ → R. We now associate with x ∈ B an element µ x ∈ M ∞ by Here, for v = (v 1 , . . ., v k ) ∈ V, the probability measure unif(B v ) is the distribution of the sequence (v 1 , . . ., v k , ξ 1 , ξ 2 , ξ 3 , . ..) ∈ V ∞ , where ξ i , i ∈ N, are independent and uniformly distributed on the set {0, 1}.
Theorem 6.Let (x n ) n∈N be a sequence of binary trees with lim n→∞ |x n | = ∞.Then (x n ) n∈N converges in the subtree size topology if and only if the associated sequence (µ xn ) n∈N of elements of M ∞ defined in (7) converges in the weak topology, and then the limits are the same.
Proof: The path through x ∈ B defined by v ∈ V ∞ leaves x at some unique u ∈ ∂x.Hence, in view of (2), the set system which implies the general bounds It follows that, for any sequence (x n ) n∈N ⊂ B with |x n | → ∞, subtree size convergence is equivalent to the convergence of µ xn (B u ) as n → ∞ for all u ∈ V.By the Portmanteau theorem (Billingsley, 1968, Theorem 2.1), as each B u is open and closed in the compact ultrametric space (V ∞ , d), weak convergence of a sequence Thus it remains to show that {B u : u ∈ V} is a convergence determining class, but this follows easily with the criteria given in (Billingsley, 1968, p14f).
Let G n be the set of simple graphs with [n] as its set of vertices.For G ∈ G n and H ∈ G k , k ≤ n, let T (G, H) be the number of injections φ : [k] → [n] with the property that, for all 1 ≤ j < l ≤ k, {j, l} is an edge in H if and only if {φ(j), φ(l)} is an edge in G. Similarly, with S n the set of permutations of [n] and π ∈ S n , τ ∈ S k , k ≤ n, let T (π, τ ) be the number of strictly increasing functions φ : [k] → [n] with the property that, for all 1 ≤ j < l ≤ k, τ (j) < τ (l) holds if and only if π(φ(j)) < π(φ(l)).Dividing by the respective number of functions φ leads to subgraph frequencies t(G, H) and pattern frequencies t(π, τ ), and convergence of a sequence (G n ) n∈N of graphs or (π n ) n∈N of permutations may be defined as the convergence of all substructure frequencies H → t(G n , H), respectively τ → t(π n , τ ).The associated limit objects are graphons and permutons: A graphon is a symmetric and measurable function W : [0, 1] 2 → [0, 1], and the analogue of Theorem 2 (b) consists in defining an isomorphism class X n of graphs with vertex set [n] by choosing U 1 , . . ., U n uniformly at random from the unit interval and then connecting vertices i and j with probability W (U i , U j ), independently for 1 ≤ i < j ≤ n.A permuton is a distribution function C : [0, 1] 2 → [0, 1] of a distribution with uniform marginals (hence a twodimensional copula) and the analogue of Theorem 2 (b) is based on constructing a random permutation For the binary trees considered here, the role of subgraph respectively pattern is taken over by a node u ∈ V, and instead of substructures we use the prefix relation: T (x, u) is now the number of nodes v ∈ x with u v, and standardization means that we divide by |x|.All three cases have an obvious sampling interpretation.For a permutation π ∈ S n , for example, we select a strictly increasing function φ : [k] → [n] uniformly at random from the n k possibilities, and t(π, τ ) emerges as the probability that the random choice leads to pattern containment.For binary trees x ∈ B n we select a node v of x uniformly at random, and t(x, u) is the probability that u is a prefix of the chosen node.All three modes of convergence are thus connected to a view according to which two large discrete structures of the same type are close to each other if they appear to be similar when viewed through the 'sampling lens'.As in the permuton case, we obtain a description of the limit space as the space of all probability measures on some compact metric space, with the topology of weak convergence of distributions.With the Prohorov metric (Billingsley, 1968, p237f) this is again a compact metric space.
Another parallel is the use of accompanying sequences, corresponding to the transition from x to µ x in the space of limits, together with a result such as Theorem 6 relating the convergence of the sequence of interest to the associated sequence in the limit space; see e.g.(Hoppen et al., 2013, Theorem 1.8) for permutations.Example 3 can be used to show that x → µ x is not one-to-one, in contrast to the permutations case, but in analogy to the poset and graph situation.The basis for such results are equations such as (8).For graphs and posets different versions of the substructure sampling are discussed in the literature.For trees, given the deterministic relation between number of nodes and size of the external boundary, we could have worked with which would lead to the more concise the version µ x (B u ) = t 0 (x, u) of ( 8).

Local exchangeability
The topological approach of the previous section applies to arbitrary sequences (x n ) n∈N of elements of B. In the present section we assume that x n ∈ B n and x n ⊂ x n+1 for all n ∈ N.Such sequences that grow by one node at a time appear in connection with the DST and BST algorithms, for example.Also, the boundary theory approach in Evans et al. (2012) refers to random sequences X = (X n ) n∈N with these properties, where it is further assumed that the stochastic process X has the Markov property.We assume that X = (X n ) n∈N satisfies P (B ↑ ) = 1, with the path space defined by We endow B ↑ with the σ-field B ↑ generated by the coordinate projections and write M ↑ for the set of probability measures on (B ↑ , B ↑ ).As n∈N X n = V with probability one, we have P (τ u < ∞) = 1 for all entry times τ u := inf{n ∈ N : u ∈ X n }, u ∈ V. Ignoring a set of probability zero, we may define the local increment process at u by Y (u) = (Y n (u)) n∈N by where s(n, u) := |{v ∈ V : u + v ∈ X n }|.Thus the value of Y n (u) indicates if the right or left subtree of u, or none of them, receives another node at time τ u + n.We note for later use that the transition from X to Y (u) may be seen as the result of a deterministic function, say Ψ u , defined on B ↑ and with values in {−1, 0, 1} N .In connection with the representation part of the following theorem we recall that a statement on conditional distributions such as L(X|Y = y) = Q(y, •) means that Q is a probability kernel and that, for a class A of measurable sets sufficiently rich to characterize the distribution of X, it holds that P (X ∈ A) = Q(y, A) L(Y )(dy).In order to be able to formalize this in the present context, where the values of X and Y are distributions, we need a measurable structure on M ↑ .As in the case of M ∞ we use the σ-field generated by the insertion functions µ → µ(A).Finally, we say that an element µ of M ∞ has full support if its support is equal to the whole of V ∞ .It is easy to see that this is equivalent to the condition that µ(B u ) > 0 for all u ∈ V.
Theorem 7. Suppose that X = (X n ) n∈N is such that P (X ∈ B ↑ ) = 1.
(a) If X is locally exchangeable in the sense that all local increment processes Y (u), u ∈ V, are exchangeable, then there exists a possibly random M ∈ M ∞ such that and X n converges to M almost surely in the subtree size topology.Further, with probability one, M has full support.
(b) Suppose that (12) holds for some possibly random M ∈ M ∞ , where M has full support with probability one.Then X is locally exchangeable.
Proof: (a) For each u ∈ V de Finetti's theorem provides a possibly random driving measure, here represented by a probability vector p u = (p u (−1), p u (0), p u (1)), such that the sequence Y (u) is conditionally i.i.d. with distribution p u , which may be written as By the convergence of de Finetti's theorem, for all n ∈ N.With ψ(u) := p u (−1) + p u (1) we thus obtain t(X n , u) → ψ(u) a.s. as n → ∞ (13) for all u ∈ V.As ψ(u) = ψ(u0) + ψ(u1) for all u ∈ V the set function M with M (B u ) = ψ(u), u ∈ V, satisfies condition (3) in Lemma 1. Together with M (V) = ψ(∅) = 1 this provides a (unique) M ∈ M ∞ , and (13) shows that X n converges a.s. in the subtree size topology to M as n → ∞.
For the proof of ( 12) we first argue that (µ, A) → DST(µ)(A) defines a probability kernel from M ∞ to M ↑ , both endowed with the measurable structure generated by the insertion maps.For each µ ∈ M ∞ , A → DST(µ)(A) is obviously a probability measure on (B ↑ , B ↑ ).For the measurability of µ → DST(µ)(A) we may take A to be of the form The increasing trees are described by the nodes v i with x i = x i−1 ∪ {v i } for i = 2, . . ., k, and with these the algorithm leads to The right hand side of ( 14) is a measurable function of µ.
We now use that, conditionally on M ≡ µ for some fixed µ ∈ M ∞ , each of the local counting processes Y (u), u ∈ V, is simply a sequence of independent random variables with values in {−1, 0, 1} and probability mass function p For the proof we may assume that v = u0 ∈ ∂x with u ∈ x, the argument for the other case v = u1 being similar.Then x n , so that the left hand side of (15) evaluates to p u (−1) = µ(B v ).Further, from the definition of the DST algorithm it follows that the right hand side of ( 15) is equal to P (ξ n+1 ∈ B v ), where (ξ n ) n∈N is the input sequence.As these have distribution µ, this is again equal to µ(B v ).
In order to prove the support statement we first note that the representation (12) gives x n = V the assumption leads to the value 1 on the left hand side.Also, 0 ≤ DST(µ)(A) ≤ 1 for all µ ∈ M ∞ .Now suppose that M (B u ) = 0 has positive probability for some u ∈ V, so that For each µ in this set, DST by the definition of the DST algorithm.This means that the integrand on the right hand side of (16) vanishes on a set of positive probability for the integrating distribution, which implies that the integral is strictly smaller than 1.(b) If µ(B u ) > 0 for all u ∈ V then it follows from the definition of the DST algorithm that each of the local increment processes is a sequence of independent and identically distributed random variables, with ) for all u ∈ V. Hence a tree sequence with distribution DST(µ), µ with full support, is locally exchangeable.To see that this property survives the mixing operation we recall that Y (u) is a deterministic function Ψ u of X, so that (12) leads to It follows from the above argument for the DST case that only distributions of i.i.d.sequences appear inside the integral, hence Y (u) is exchangeable.
It may seem surprising that only local conditions on the processes Y (u), u ∈ V, are needed.However, the general structure of the tree sequence leads to deterministic relations between these.For example, let B u(1) , . . ., B u(d) be a partition of V ∞ (or, equivalently, {u(j) : j ∈ [d]} = ∂x for some x ∈ B) and let where the convergence in distribution refers to R V endowed with the product topology.
Proof: Suppose that A ⊂ V is finite and such that the sets B u , u ∈ A, are pairwise disjoint with u∈A µ(B u ) = 1.We know from the proof of Theorem 2 (b) that τ u = inf{n ∈ N : u ∈ X n } is finite with probability one, for all u ∈ V. Let ρ := sup{τ u : u ∈ A} and fix some k ∈ N. Then it follows from the description of the DST algorithm with input (ξ n ) n∈N that, conditionally on ρ ≤ k, the random vector Y n = (Y n,u ) u∈A with components has a multinomial distribution, with parameters n − k (for the number of trials) and (µ(B u )) u∈A (for the vector of success probabilities).By the central limit theorem for these distributions, where the random vector Z is centered normal with covariances This implies as the difference between the left hand sides in ( 21) and ( 22) converges to zero with probability one as n → ∞ because of (20).All this is conditionally on ρ ≤ k for some k ∈ N.However, as k does not appear in ( 22) and as ρ < ∞ with probability one, the last statement even holds unconditionally.
On this basis we now deduce the convergence of the finite-dimensional distributions together with the covariance function of the limit process.
For k ∈ N fixed we have asymptotic normality of the random vector Z(k) associated with A := V k from the above argument.This yields joint asymptotic normality for all Z u with |u| ≤ k, u ∈ V, as these variables are all linear functions of the vector Z(k).Now let u, v ∈ V.If u ≺ v and |v| = k then, due to the asymptotic negligibility of the difference, we may use the partition of B u into sets B w with |w| = k to obtain, with A(u, v) := {w ∈ V k : w = v, u ≺ w} The case v ≺ u follows by symmetry.Finally, if neither u ≺ v nor v ≺ u then {B u , B v } can be augmented to a system B A to which the first step applies.
As explained in (Billingsley, 1968, p17), for R V this already implies the asserted convergence in distribution, together with the existence of the Gaussian process Z.
The following is now an immediate consequence of the theorem and the mixture representation of the BST distribution, as M BST has support V ∞ with probability one.