The Degree Distribution of Thickened Trees †

We develop a combinatorial structure to serve as model of random real world networks. Starting with plane oriented recursive trees we substitute the nodes by more complex graphs. In such a way we obtain graphs having a global tree-like structure while locally looking clustered. This ﬁts with observations obtained from real-world networks. In particular we show that the resulting graphs are scale-free, that is, the degree distribution has an asymptotic power law.


. . 160 1 Introduction
There has been substantial interest in random graph models where vertices are added to the graph successively and are connected to several already existing nodes according to some given law.The so-called Albert-Barabási model (see Albert and Barabási (2002)) joins a new node to an existing one with probability proportional to the degree.The idea behind is to model various real-word graphs like the internet or social networks.
It turns out that the Albert-Barabási model is not unambiguously defined.One rigorous approach is due to Bollobás and Riordan (2004) They introduced a random (multi)graph G n m .For example, G n 1 is described in the following way.One starts with an initial node 1 with a loop.(This means that 1 has degree 2).Then at step k we add one node that is connected to exactly one of the k already present nodes, say to j ≤ k, with probability Of course, after n steps we have produced a random (multi)graph with vertex set {1, 2, . . ., n} and n edges.to a new node (and all edges within the nodes {( − 1)m + 1, ( − 1)m + 2, . . ., m} are now loops of the new node ).Of course, this procedure results in a random (multi)graph with vertex set {1, 2, . . ., n} and mn edges (see Figure 1).
It turns out that the degree distribution of G n m satisfies a power law.The probability that a randomly chosen node of G n m has degree d is asymptotically 2/d 3 (see Bollobás et al. (2001)).Graphs with this property are called scale-free.
The PORT-model.We first introduce a slightly modified evolution process that always leads to a labelled recursive tree.The process starts with the root that is labeled with 1. Then inductively at step j a new node (with label j) is attached to any previous node of out-degree k with probability proportional to k + 1.These kinds of trees are also called plane oriented recursive trees (PORTs).This evolution process is quite similar to the process that produces (usual) recursive trees.A (usual) recursive tree is a rooted tree (with n nodes) where the nodes are labeled with 1, 2, . . ., n such that all successors of each node have a larger label.In particular, the root has label 1, and every path from the root to a leaf has strictly increasing labels.As above, we can consider a recursive tree as the result of an evolution process.The process starts (as above) with the root (that gets label 1).Next, another node is attached to the root (that gets label 2) and in every step a new node is attached to an already existing node (and gets the next label).The labels are the history of the tree evolution.
The PORT-model, where we attach a new node according to the degree-distribution of the already existing tree, can be also seen as a planar version of the recursive trees.Namely, if a node of a planar (rooted) tree has out-degree k, that is, it has degree d = k + 1, then there are precisely d ways to attach there a new node in order to get different planar trees.This explains the name plane oriented recursive trees.
We will indicate in the next section that the degree distribution of these random trees is scale-free as in the case of G n m .In fact, we have , where p n (d) denotes the probability that a random node in random PORT of size n has degree d.
The thickened PORT-model.Now, we introduce a substitution process that creates random graphs that have a global tree structure that is governed by plane oriented recursive trees.
For every k ≥ 0 let T k denote a non-empty set of labelled graphs with half edges are attached to their nodes in such a way that each graph receives in total k + 1 half edges.In addition the half edges are ordered.Now consider the following random process.Take a tree T according to the PORT-model.Then we substitute every node v (of out-degree k) in the following way: Cut v and one half of each edge incident with v. Then take a randomly chosen graph G of T k and glue the k + 1 half edges of G to those left in T by the cutting of v respecting the given order, i.e., the half edge coming from the predecessor of v is glued to the 0th half edge of G and the 1st, 2nd,. . ., kth successor of v is attached to the 1st, 2nd, . . ., kth half edge of G, respectively.Further we relabel all nodes in the new graph G = G(T ) in a way that is consistent with the original labelling.We denote the graphs that are obtained by this process thickened trees or more precisely thickened PORTs.T 0 ={ , }, Fig. 2: A simple example of a thickened PORT.The original tree has only nodes of out-degree 0,1, or 2. So the choice of all sets T k with k > 2 is not relevant for the thickening process.Node 2, e.g., is cut along the circular dashed line.Since it has out-degree 2, we choose one of the three graphs in T2 (here the first one was chosen) and put it into the place of node 2. Applying the same procedure to all nodes and relabelling afterwards yields the right graph, a thickened PORT.
The idea behind this model is to simulate real networks that are produced by an evolution process (following, for example, the Albert-Barabási principle) and have a global tree structure with local clusters.Of course, we make a strong simplification.Our model relies on a two-step-procedure.We first let a tree network evolve (following the Albert-Barabási principle) and then replace the tree nodes by random clusters.Hence, the clusters are not produced by an evolution process.Nevertheless we think that our model has several advantages and can be used to explain several properties that are observed in practice: • There is large flexibility in choosing the structure of local clusters and thus the choice of the sets T k can be adapted to the situation.
• The model is feasible for an analytic treatment.
• It can be used to study (analytically) the influence of local changes of the network to the global behaviour.
The main focus of our paper is the degree distribution.We show that under natural conditions the resulting network is scale-free and that the number of nodes of given degree satisfy a central limit theorem.

The degree distribution of PORTs
In this section we shortly present a proof that PORTs are scale-free.In particular we show the following property that is due to Mahmoud et al. (1993), see also Bergeron et al. (1992) for similar results and Kuba and Panholzer (2007) for generalizations.The reason for presenting a proof is that the proof of a corresponding property for thickened PORTs will work along similar lines.
Theorem 1 Let p n (d) denote the probability that a random node in a random PORT of size n has degree d.Then n denote the number of nodes of degree d in random PORTs of size n.Then X (d) )n and B(a, b) denotes the Beta-function.
Proof: The proof of Theorem 1 is based on a generating function approach.It is well known that the generating function of PORTs satisfies the differential equation that reflects the recursive structure of PORTs.The solution is and, thus, one has We now turn to the degree distribution.By obvious reasoning we have n .
Thus, we can get the degree distribution with help of the distribution of X and the double generating function satisfies the differential equation Equivalently we have We first determine the expected value E X (d) n .For this purpose we set From (2) we get that this function satisfies the differential equation and has solution Consequently, we have which implies (with help of the transfer-lemma of Flajolet and Odlyzko Flajolet and Odlyzko (1990)) that Consequently, we get the degree distribution .
In order to prove the central limit theorem we go back to (2) and similarly expand the integral Observe that D(y, u) is asymptotically given by Hence (2) translates to which can be inverted to .

The degree distribution of thickened PORTs
We now deal with a substitution process that creates random graphs that have a global tree structure that is governed by plane oriented recursive trees.
Let us consider the formal solution y = y(z, x 0 , x 1 , x 2 , . ..) of the differential equation where denotes differentiation with respect to z.It is clear that y = y(z, x 0 , x 1 , x 2 , . ..) can be considered as a power series in z, x 0 , x 1 , . . .By construction the coefficient is exactly the number of PORTs T of size n and k j nodes of out-degree j (j ≥ 0).
For every k ≥ 0 let T k denote a non-empty set of labelled graphs with k + 1 additional half edges ẽ0 , ẽ1 , . . ., ẽk .Further, let denote the exponential generating function of these graphs.
We recall that we consider the following random process.For every PORT T we substitute every node v (of out-degree k) by a randomly chosen graph of T k where the half edges ẽ0 , ẽ1 , . . ., ẽk are glued to the edge coming from the predecessor of v resp.the k successors of v corresponding to the left-to-right order.Further we relabel all nodes in the new graph G = G(T ) in a way that is consistent with the original labelling.Thus, the generating function of the number g(z) = n≥1 g n z n n! of the numbers g n of graphs that are produced in this way is given by g(z) = y(z, t 0 (z)/z, t 1 (z)/z, . ..).
The graphs that are obtained by this process are denoted here by thickened trees or thickened PORTs.
Then g(z) satisfies the functional equation Proof: We recall that y = y(z, x 0 , x 1 , . ..) satisfies the differential equation y = k≥0 x k y k .Hence, for some constant c.Of course, if we substitute x k by t k (z)/z then we immediately get the result.Note further that g(0) = 0 and F (0, 0) = 0. Thus, we can fix c = 0. 2 In a similar way we can deal with parameters.For example, fix a degree d and let where N d (G) denotes the number of nodes in G of degree d including the half-edges ẽ0 , . . ., ẽk .Then the generating function g(z, u) = y(z, t 1 (z, u)/z, . ..) encodes the distribution of nodes of degree d of thickened trees.Of course, the above lemma extends to this case.It is now convenient to introduce the notation Of course, for all d we have T d (z, y, 1) = k≥0 t k (z)/z y k .
Then g(z, u) satisfies the functional equation Example We just give a simple example.Suppose that T k consists for every k ≥ 0 of exactly two graphs, the first one with one node and the second with two nodes, where all half-edges ẽ1 , . . ., ẽk that will be linked to the k subgraphs are on the second node.Figure 3 depicts the set T 3 .
In this example we have for d ≥ 3 and consequently The cases d = 1 and d = 2 are similar.
Our main result is the following theorem: Theorem 2 Let T k be substitution sets (as described above) so that the equation has a unique positive solution in the region of convergence of T d (z, y, u) and that the T d (z, y, u) can be represented as where r and r are real numbers with 0 < r ≤ r, α in an integer, C 0 (z, y) and C 1 (z, y) are power series that contain z = ρ and y = 1 in their regions of convergence and that satisfy C i (ρ, 1) = 0 for i = 0, 1, and the O (•)-term is uniform in a neighbourhood of z = ρ and y = 1.Let p n (d) denote the probability that a random node in a thickened PORT of size n has degree d.Then the limits lim n are both asymptotically proportional to n. Remark.The conditions on the generating function T (z, y, u) can be (more of less) interpreted in the following way.If we set u = 1 then the singularity 1/(1 − y) r in T (z, y, 1) essentially says that the set T k consists ≈ k r−1 graphs.Furthermore for k ≥ d + α there are ≈ k r−r −1 graphs with a vertex of degree d, compare also with the examples given in Section 4.

Proof:
The proof runs along similar lines as that of Theorem 1.We start by inspecting the generating function g(z) of all thickened PORTs.For simplicity we assume that the substitution sets T k are of a form that g n > 0 for sufficiently large n ≥ n 0 , that is, we exclude, for example, the case that the number of nodes of graphs in T k are all congruent to 1 modulo some integer m > 1. (i) Then it follows that |g(z)| < g(|z|) if z is not contained in the positive real line.
We first observe that ρ > 0 is the only singularity on the circle of convergence |z| ≤ ρ and that g(ρ) = 1, that is, g(z) is convergent at z = ρ.First it is clear that g(z) can be analytically continued starting with g(0) = 0 and the functional equation F (z, g) = z.However, if g(z 0 ) = 1 for some z 0 contained in the region of convergence of g(z) then we have Thus, we can continue analytically with help of the implicit function theorem.Thus, if g(z) has a singularity ρ and if g(ρ) is convergent then g(ρ) = 1.Since g(z) is monotone and analytic it certainly reaches a value with g(ρ) = 1 where it has to be singular.Further, ρ is characterized by the equation F (ρ, 1) = ρ.
Next we characterize the kind of singularity of g(z) at z = ρ.By Lemma 1 we have Hence, by expanding 1/C 0 (z, t) locally around t = 1 we thus get Since G(ρ) = ρ and C 0 (z, y) is increasing in z we can represent (G(z) − z)/c 0 (z) = K(z)(1 − z/ρ).Furthermore, we can invert the relation (3) and obtain .
Since there are no other singularities on the circle |z| ≤ ρ and g(z) can be analytically continued to a larger range (despite at the point z = ρ) it follows from Flajolet and Odlyzko (1990) that Next we determine asymptotics on the average value E X n .Set S(z) = ∂ ∂u g(z, 1).Then it follows from Lemma 2 that (i) We call this the aperiodic case.In the periodic case we have to deal with m singularities on the boundary of the circle of convergence of g(z) which are all of the same kind.
As in the proof of Theorem 1 it follows that Thus, the limit p n /n exists and is asymptotically given by for some constant C > 0.
Finally the proof that the limiting distribution is normal is very similar to the corresponding proof of Theorem 1.We skip the details. 2

Substituting by one or two nodes
Let us continue the example preceding Theorem 2.Here we have (for d ≥ 3): Thus, Theorem 2 applies with r = r = 1 and α = −1.The degree distribution p(d) is scale-free with tail p(d) ∼ C/d 3 .A detailed computation (including the instances d < 3 also) shows that the degree distribution is given as follows:

Thickening with triangles 1
We consider two examples where we substitute each node of the original tree by a triangle.More precisely, each node of out-degree k is then substituted by a triangle with k + 1 half-edges ẽ0 , . . ., ẽk attached to it.The ẽ0 is glued to the predecessor of the original nodes and the other half-edges to the successors of the original nodes according to their naturally given order (e.g.left to right in the case of plane trees).Let the parameter of interest be the number of nodes of degree d.Then we have where a i,k is the number of configurations (triangles with k half-edges) containing i nodes of degree d We first consider the special case where the ingoing edge of each triangle is separated from the outgoing edges.That means that ẽ0 connects the predecessor of the triangle to a vertex of degree 3 (the edges are then ẽ0 and two edges of the triangle) while ẽ1 , . . ., ẽk are connected to the other two vertices of the triangle.Clearly we have then t k (z) = (k + 1) z 3 3! .
When focusing on the number of nodes of degree d, then it is easy to see that (for d ≥ 4) we have This holds because of the following argument.Let us label the nodes of the triangle by 0, 1, 2 where ẽ0 is attached to 0. Then 1 of the edges ẽ1 , . . ., ẽk are attached to 1 and 2 = k − 1 to 2. The configuration contains at least one node of degree d if and only if 1 = d − 2 or 2 = d − 2. Exactly these cases do not contribute to a 0,k .Moreover we get a 3,k = 0,

Fig. 1 :
Fig. 1: The multigraph model of Bollobás and Riordan denote the number of PORTs of size n with exactly k nodes with degree d.Then the probability generating function E u X (d) n is given by