Descendants and ascendants in binary trees

There are three classical algorithms to visit all the nodes of a binary tree – preorder, inorder and postorder traversal. From this one gets a natural labelling of the n internal nodes of a binary tree by the numbers 1;2;:::;n , indicating the sequence in which the nodes are visited. For given n (size of the tree) and j (a number between 1 and n ), we consider the statistics number of ascendants of node j and number of descendantsof node j . By appropriate trivariate generating functions, we are able to ﬁnd explicit formulæ for the expectation and the variance in all instances. The heavy computations that are necessary are facilitated by M APLE and Zeilberger’s algorithm. A similar problem comes from labelling the leaves from left to right by 0;1;:::;n and considering the statistic number of ascendants (=height) of leaf j . For this, Kirschenhofer [1] has computed the average. With our approach, we are also able to get the variance. In the last section, a table with asymptotic equivalents is provided for the reader’s convenience.


Introduction
Binary trees are not only interesting for combinatorialists, but also for computer scientists, since they constitute a basic data structure.As such, naturally, one needs traversal algorithms, which visit every node of the tree.Basically, there are three traversal algorithms: inorder, preorder,andpostorder traversal.They are all recursive, the left subtree being treated before the right subtree, and they differ with respect to the visit of the root: first (preorder), middle (inorder), and last (postorder).As general references, we cite Knuth [2], Sedgewick [3], and Sedgewick and Flajolet [4].
The performance of such algorithms is directly related to the following parameters, which are discussed in this paper: the number of ascendants of a node j, which is the number of nodes between the root and the node j; and the number of descendants of a node j, which is the number of nodes of the subtree rooted at node j.(The sum of the numbers of ascendants, over all j, is the so-called path length, a quantity that is traditionally used to measure the complexity of tree algorithms.Clearly, it is independent of the labelling.) The three traversal algorithms induce different labellings of the nodes with the natural numbers 1; 2;:::; the first node visited gets number 1, and so on.Thus, if we have a binary tree with n internal nodes, it makes sense to speak about node j,f or1 j n (see Figures 1-3).
From a probabilistic point of view, there are different models, e.g. the binary search trees (e.g.see Martínez and Prodinger [5]).In this paper, we concentrate on Catalan statistics, i.e. we assume that each binary tree with n internal nodes is equally likely.Fig. 2: A binary tree with 11 internal nodes, that are labelled by preorder traversal.Node 3 has three descendants and three ascendants.
We denote by B the family of binary trees, sometimes called extended binary trees, which is defined by the formal identity where is the symbol for an internal node and 2 is the symbol for a leaf or external node.
The family of objects from B with exactly n internal nodes is denoted by B n .Further, we write B n for the number of elements from B n .Finally, we denote with B(z)= P n 0 B n z n the ordinary generating function of the B n 's.
We are considering exact formulae for the expectation and variance of the number of ascendants (resp.descendants) of node j in a random binary tree with n internal nodes ('of size n'), for all three traversal algorithms.
A similar problem, namely the number of ascendants of the leaves (labelled from left to right by 0; 1;:::;n), was considered by Kirschenhofer [1,6].He obtained an explicit formula for the expected value of leaf j. (Kirschenhofer's definition of 'height' differs by 1 from our quantity.)We rederive this and compute the variance additionally.
Apart from Kirschenhofer's formula, all results seem to be new to the best of our knowledge.The reason that people were not considering such problems earlier might be related to the fact that we used the g g g g computer algebra system MAPLE quite heavily in this paper, and also the powerful algorithm of Zeilberger [7,8], tools that were not generally available in the eighties.This approach might be less elegant than some clever ad hoc arguments, but it is definitely more flexible.
Except for two instances, we always get closed formulae, although sometimes of a quite complicated nature.Therefore we give asymptotic equivalents in the last section.In these two cases, the answer is given in terms of a sum, which cannot be evaluated in closed form, which might be seen for instance from the asymptotic equivalent or also with Petkovšek's algorithm [7].
The labelling by inorder traversal is equivalent to the sequence of MIN-turns; the jth MIN-turn is defined to be the root of the smallest subtree containing the leaves labelled j 1 and j.Some results about MINturns in the context of planted plane (=ordered) trees can be found elsewhere [9,10,11].
We would like to emphasize, however, that the spirit of this paper is on exact enumeration, not on asymptotics.Questions about limiting distributions, as treated in Gutjahr and Pflug [12] (related to Kirschenhofer's analysis) are not considered here, and referred to further research.Now we sketch the methodology that we use.We always translate the symbolic equation (1) directly into an algebraic equation for the (ordinary) trivariate generating functions where the coefficients f n;j;l are defined as the number of objects from B n , where the node with label j has exactly l ascendants (or descendants, respectively).In short, we do not need to set up recurrences, but we use the symbolic method [4] to get the functions F (z; u; v) immediately.In general, when we are interested in a certain node j, we distinguish three cases, namely whether the root is this node or whether it lies in the left or right subtree.Consequently, the functional equations of later sections have three terms on the right-hand side.In the instance of the next section, where we consider leaves, things are similar but slightly different.
To obtain the expectation E n;j , the second factorial moment M 2 n;j and the variance V n;j of the number of ascendants (or descendants, respectively) of the node with label j in a tree with n internal nodes ('of size n') we use the relations V n;j = M (2) n;j + E n;j E 2 n;j : (3) Using the symbolic method, the formal equation (1) translates into an equation of the generating function B(z): B(z)=1+zB 2 (z) : (4) From the solution we get the coefficients of B(z), the so called Catalan numbers To shorten the presentation, we will use the following abbreviations throughout the whole paper: 2 The Height of Leaves We begin by reviewing and extending the results given by Kirschenhofer, but with the method mentioned above.We will translate the formal identity (1) into a functional equation for the trivariate generating function of the desired parameter.
In this section, f n;j;l denotes the number of binary trees of size n, where the leaf labelled with j has height l.Then we get for the trivariate generating function F (z; u; v) of the f n;j;l , defined as in equation ( 2), the functional equation F (z; u; v)=v+zvF (z; u; v)B(z)+zuvF (z; u; v)B(zu) : (7) This leads immediately to the solution By differentiating (8) we get with the use of the abbreviations from above To find the coefficients in the expansion of 1 (X+Y ) 2 , we can expand this expression about u =1 .T h i s leads to The coefficients are then given by the following sum: k j 4 n k : We find a closed form of this sum by using Zeilberger's algorithm y [7,8]: We write k j 4 n k ; then Zeilberger's algorithm finds the relation To find an expression for the coefficients we sum the above equation over all k 2 Z.
y We note that every summation done with Zeilberger's algorithm could of course also be done with hypergeometric summations.
Theorem 1.The expectation E n;j of the height of the leaf with label j in a binary tree of size n, is given by Differentiating equation (8) twice and evaluating at v =1leads to For this, we also need the coefficients of 1 (X+Y ) 3 .To get them we make the following rearrangements: From this we see that for 0 j n; (18) which finally leads to With the relations (3), ( 6), ( 14) and ( 19) we get (20) Observing equation (3), we get as a consequence Theorem 2. The variance V n;j of the height of the leaf with label j in a binary tree of size n, is given by for 0 j n: (21) 3 The Number of Descendants of Nodes Labelled by Inorder Traversal Now we label the nodes of the tree by 1; 2;:::;nas we visit them by inorder traversal.Then f n;j;l denotes the number of binary trees of size n, where the node with label j has exactly l descendants.The trivariate generating function F (z; u; v) (cf.( 2)) fulfills the algebraic equation which can be found by translating the formal equation ( 1) appropriately.Therefore, we have the following explicit formula: From this we get by differentiating The expansion of the coefficients is easy and we get This gives, with the relations (3), ( 6) and ( 25), Theorem 3. The expectation E n;j of the number of descendants of the node with label j, where the nodes are labelled by inorder traversal, in a binary tree of size n, is given by Further, we get from (23) the equation With the relations and the expansion of ( 27) is also easy to do (preferably with the use of a computer algebra system like MAPLE), and we get for and thus Hence, it follows by differentiation from equation (34) that (35) Therefore, we need the coefficients of and of for 1 j n: (37) Combining equations ( 14), ( 36) and (37), we get for For G 2 (z; u) we get from equation (34) by differentiating twice and evaluating To expand this expression we need additionally With the relations (39), ( 19), ( 36), ( 37) and (40) we get (for (43)

The Number of Descendants of Nodes Labelled by Preorder Traversal
The following results are obtained by the same methods that we used in the instance of inorder traversal.So we can omit the details and only sketch the main steps to get analogous results in the case of preorder (and postorder traversal, in later sections).The trivariate generating function F (z; u; v) of the numbers f n;j;l , counting the trees of size n,w here the node with label j (labelled by preorder traversal) has exactly l descendants, is given by the functional equation and explicitly by From ( 45) we get For 1 j n only the first summand contributes a nonzero coefficient.At first we get the sum which we evaluate as before with Zeilberger's algorithm and obtain This gives us and furthermore, Theorem 7. The expectation E n;j of the number of descendants of the node with label j, where the nodes are labelled by preorder traversal, in a binary tree of size n, is given by For G 2 (z; u) we get the following expression: To read off the coefficients in this expression, we still need to expand X 2 (X+Y ) and we get the coefficients from For the coefficients from 1 X 3 (X+Y ) we first get the following sum: which can be simplified using equations ( 47) and (48), yielding It can be shown that the remaining sum cannot be brought into a closed form; see the general comment from the introduction.
Combining these results, we get n;j of the number of descendants of the node with label j, where the nodes are labelled by preorder traversal, in a binary tree of size n is given by M (2) n;j = n(2n 2j +1) This leads immediately to Theorem 8.The variance V n;j of the number of descendants of the node with label j, where the nodes are labelled by preorder traversal, in a binary tree of size n, is given by We note that the full sum can be easily evaluated; This relation might be useful if j is rather small; the sum can always be rewritten so that it has not more than roughly n=2 terms.

Asymptotics
In this section we evaluate our findings asymptotically, under the assumptions j fixed; j n, with 0 <<1; n jfixed.
The asymptotics of the closed form formulae are easy.We only need to apply Stirling's formula for the factorial: To evaluate the two formulae containing sums asymptotically for a fixed ration = j=n, we need additionally the arcsine-law (cf.Feller [13]): All the computations to get the asymptotic formulae are then standard, thus we only collect the results in a table.

Fig. 1 :
Fig. 1: A binary tree with 11 internal nodes, that are labelled by inorder traversal.Node 4 has seven descendants and two ascendants, leaf 6 has a height of 5.

Fig. 3 :
Fig. 3: A binary tree with 11 internal nodes, that are labelled by postorder traversal.Node 5 has one descendant and four ascendants.

Lemma 1 .
The second factorial moment M (2) n;j of the height of the leaf with label j in a binary tree of size n is given for all 0 j n by

and thus Lemma 4 .
The second factorial moment M ;j of the number of descendants of the node with label j, where the nodes are labelled by inorder traversal, in a binary tree of size n, is given by nTheorem 4. The variance V n;j of the number of descendants of the node with label j, where the nodes are labelled by inorder traversal, in a binary tree of size n, is given by Here we denote by f n;j;l the number of binary trees of size n, where the node labelled j by inorder traversal has exactly l ascendants.For the usual trivariate generating function F (z; u; v) we get from equation (1) the functional equation F (z; u; v)=zuvB(zu)B(z)+zvF (z; u; v)B(z)+zuvB(zu)F (z; u; v) ; The variance V n;j of the number of ascendants of the node with label j, where the nodes are labelled by inorder traversal, in a binary tree of size n, is for all 1 j n given by

6
The Number of Ascendants of Nodes Labelled by Preorder TraversalNow f n;j;l counts the number of binary trees of n, where the node with label j has exactly l ascendants.For the usual trivariate generating function, we get the relation The expectation E n;j of the number of ascendants of the node with label j, where the nodes are labelled by preorder traversal, in a binary tree of size n, is given for all 1 j n by : The Number of Descendants of Nodes Labelled by Postorder TraversalNow f n;j;l counts the number of binary trees of size n, where the node with label j (labelled by postorder traversal) has exactly l descendants.For the trivariate generating function we get the relation 7 F (z; u; v)=zuvB 2 (zuv)+zF(z; u; v)B(z)+zB(zu)F (z; u; v) ; F (z; u; v)= zuvB 2 (zuv)1 zB(z) zB(zu) :