Bounded discrete walks

This article tackles the enumeration and asymptotics of directed lattice paths (that are isomorphic to unidimensional paths) of bounded height (walks below one wall, or between two walls, for $\textit{any}$ finite set of jumps). Thus, for any lattice paths, we give the generating functions of bridges ("discrete'' Brownian bridges) and reflected bridges ("discrete'' reflected Brownian bridges) of a given height. It is a new success of the "kernel method'' that the generating functions of such walks have some nice expressions as symmetric functions in terms of the roots of the kernel. These formulae also lead to fast algorithms for computing the $n$-th Taylor coefficients of the corresponding generating functions. For a large class of walks, we give the discrete distribution of the height of bridges, and show the convergence to a Rayleigh limit law. For the family of walks consisting of a $-1$ jump and many positive jumps, we give more precise bounds for the speed of convergence. We end our article with a heuristic application to bioinformatics that has a high speed-up relative to previous work.


Introduction
A lattice path is the drawing in Z 2 of a sum of vectors from Z 2 (where the vectors belongs to a finite fixed set, and where the origin of the path is usually taken as being the point (0,0) from Z 2 ). If all vectors are in N × Z, the path is called directed (the path is going "to the right"). In this article, we consider the case for which all vectors are in {1} × Z; despite its natural representation as a drawing in Z 2 , such a path is essentially a unidimensional object. If all vectors are in {1} × (N ∪ {−1}), the path is called a "simple lattice path" or a "Łukasiewicz path" because there exists a bijection with simple families of trees. Unidimensional lattice paths pop up naturally in numerous fields (probability theory, combinatorics, algebra, economics, biology, analysis of algorithms, language theory, . . . ).
For lattice paths (or their probabilistic equivalent, random walks) which are space and time homogeneous, it was proven [4] that symbolic and analytic combinatorics [17] are quite powerful tools to study unidimensional lattice paths from an enumerative and asymptotic point of view. The authors of these articles developed a generating function approach to get exact enumeration of lattice paths (via the kernel method), and then used singularity analysis to study some basic parameters (like number of returns to zero, final altitude) according to some constraints (drift/reflecting conditions). For random walks there are numerous works using an approach with a flavour of "generating function" and "analysis of singularity", either with probabilistic or combinatorial methods, e.g. for interaction of random walks, for random walks on groups or for stationary distributions of 2-dimensional models in queueing theory.
In our simpler model of unidimensional lattice paths, for simple parameters which are in one sense "exactly solvable", one can expect more than for the above difficult problems: not only can one get here critical exponents, but also we get fast computation schemes, for exact enumeration and for full asymptotics (to any order). It is then natural to ask what can be obtained for less trivial parameters, like the area under the walks or the height of the walk. The area was investigated in another article [6], and we concentrate here on the height. The height has already been investigated in combinatorics, mainly for Dyck paths (walks having just +1 or −1 jumps) which have a nice relationship with continued fractions and Chebyshev polynomials.
In our approach, one of the key tricks to solve the model for any set of jumps is the so-called "kernel method", which is a way of solving a functional equation of the type K(z, u)F (z, u) = A(z, u) + B(z, u)G(z) where F and G are the unknowns one wishes to determine. The kernel method consists in getting additional equations by plugging the roots u(z) of the "kernel" K(z, u) in the initial equation, which in general is enough to solve the system. The kernel method shares the spirit of the "quadratic method" of Tutte and Brown (for enumeration of maps). In combinatorics, only the simplest case of the kernel method (namely, when there is only one root) was used for 30 years, see Knuth [22] for sorting with stacks, Chung et al. [11] for a pebbling game, or [25] for generation of binary trees. During the same time period, and independently, difficult 2-dimensional generalisations of this method were well studied in queueing theory; the classification of the different cases for the nearest neighbour walks in N 2 was already quite a challenge, see the book [15] or [24]. This last decade, there has been a revival in combinatorics for functional equations, and the full power of the kernel method was better put into evidence, both for enumeration and for asymptotics. Solving equations is not the only miracle that the kernel method offers, it also gives compact formulae [9,1,3], offers some nice interactions with computer algebra (see the recent solution of Gessel's conjecture [20,7]), and gives access to asymptotics [4,2,26]. Our article thus adds a new stone to the "kernel method" edifice, and gives more results on enumeration and asymptotics for the height of directed lattice paths, while other approaches (like e.g. transfer matrices or probability theory) would be either of much higher complexity, or would not give access to exact enumeration or to higher order asymptotics.

Preliminaries and previous results
As stated previously, we consider in this article directed walks with jumps in {1} × J where J is any finite subset of Z. In this context, the altitude of the walk at time n is the sum of n increments taken in J. Our main tool is the generating function is the generating function the n-th coefficient of which is the number of walks of length n ending at altitude k. To each walk, we associate a Laurent polynomial which encodes all the possible jumps P (u) := d i=−c p i u i (where c is the size of the largest backward jump and d is the size of the largest upward jump, and where the p i 's are some "weights", "multiplicities", or "probabilities"). Figure 1 shows four drawings (for four different constraints) of lattice paths with jumps in J = {−2, −1, 0, 1, 2, 4}; the associated Laurent polynomial is therefore P (u) = p −2 u −2 + p −1 u −1 + p 0 + p 1 u + p 2 u 2 + p 4 u 4 . ending anywhere ending at 0 unconstrained (on Z) walk/path (W) The ui's are such that 1 − zP (ui(z)) = 0; the constants ε0, β0, and the µ0's are algebraic numbers which are made explicit in [4]. In the rest of this article, we want to consider walks of bounded height (i.e., between one or two walls).

Some asymptotics results for walks
In [4], Banderier and Flajolet showed that the kernel method was the key to get enumeration and asymptotics of directed lattice paths. The proofs rely on the following properties: Property 1. There are c distinct roots u 1 , . . . , u c of the "kernel" 1 − zP (u) = 0 which behave like z 1/c at z = 0 (we call them "small roots"). The remaining d roots v 1 , . . . , v d behave like z −1/d at z = 0 (we call them "large roots"). Property 2. A nice trick (the "kernel method") allows to write all the generating functions counting walks as functions of the u i 's (and/or of the v j 's). Property 3. There is a unique positive real number τ such that P (τ ) = 0, and the radius of convergence of the generating functions is ρ := 1/P (τ ). Property 4. There is a dominant small root u 1 (z) and a dominant large root v 1 (z) such that for 0 < |z| < ρ, one has |u i (z)| < u 1 (|z|) < v 1 (|z|) < |v j (z)| for all non dominant roots u i (z) and v j (z). Property 5. The asymptotics are coming from the dominant roots u 1 and v 1 , which are singular at ρ, the product of the other roots is analytical at ρ and therefore, these other roots only affect the multiplicative constant. (Some easy modifications have to be made here if the walk is "periodic").

Walks on Z ending at altitude k
In the next section, we will use a part of Theorem 1 in [4]: of paths starting at 0 having no bounding constraints and ending at altitude k is where u 1 , . . . , u c are the small roots and v 1 , . . . , v d are the large roots.

Generating functions of bounded walks
This section is dedicated to finding the explicit generating functions of numerous variants of bounded lattice paths: below one wall, between two walls.
The bivariate generating function of walks starting at 0, and remaining below the wall y = h is algebraic: Proof: Using a step by step decomposition of the walks gives the following functional equation: This theorem is then a consequence of the kernel method applied to (i) Throughout this article, we use the notation [u k ]F (z, u) for the coefficient of u k in F (z, u), and {u k }F (z, u) stands for the monomial in u k of F (z, u). We will write F k (z) for [u k ]F (z, u), and, for any interval [a, b], F [a,b] will correspond to a generating function of walks restricted to this interval.
This functional equation follows from the fact that a walk is either an empty walk, either it makes a new jump (i.e., we multiply by zP (u) and, since the walk is constrained to remain under altitude h, we remove all jumps which sent us above the border h; these are monomials in u h+1 , . . . , u h+d ); plugging the d large roots v i in this equation (this is a legitimate operation as the v i behaves like z −1/d and as F [−∞,h] is a Laurent power series in 1/u) leads to a linear system of d equations with d unknowns. Solving it with the Cramer formula provides expressions involving Vandermonde-like determinants or more precisely, quotients of the type Further simplifications lead to Theorem 2. 2

Theorem 3
The generating function of walks of bounded height ending at altitude k is where the e i 's are elementary symmetric functions defined by Equation 7 and the functions W k (z) are defined in Theorem 1.
Proof: Extracting the coefficient of u k in Equation (3) of Theorem 2 gives For any set of variables X , let X m := {subsets of size m of X }; we then introduce the elementary symmetric function e m : e m (X ) := X∈Xm x∈X x .
Using these e i 's and inverting the two sums in Equation (6) leads to the theorem. (Note that using Theorem 1 allows to express all the functions W k (z) in terms of the roots.) 2 Theorem 4 Walks in a strip [−h 2 , +h]. The bivariate generating function of walks starting at 0, and remaining between the walls y = −h 2 and y = h is rational: where the s λ 's are defined in Equations (9,10,11,12) and where the r j 's are defined in Equation (8).
Proof: The polynomials r k just encode the forbidden jumps when we are close to the walls: We have therefore the following functional equation: Substituting u by the roots u i 's and v i 's in this equation (this is legitimate since F is here a Laurent polynomial in u, and not a Laurent series) leads to a system with c + d unknowns; the Cramer formula gives the solutions of the system in terms of more complicated Vandermonde-like determinants, for which we need here to introduce a few new notations. Let S c+d be the group of permutations of c + d elements, and let |σ| be the signature of any permutation σ. For any integer partition λ = (λ 1 , . . . , λ c+d ) with c + d (possibly empty) parts, let the antisymmetric associated Schur function be (we note v i = u i+c ): In the Vandermonde-like determinants, after suitable multiplications to get polynomials instead of Laurent polynomials, the exponents of the u i 's are: Using Theorem 4, one can obtain the generating function of walks ending at altitude k (and therefore the generating function of bridges between 2 walls). Theorem 4 can be specified to the case h 2 = 0. This gives a formula for bounded excursions, already studied with this approach in [1]. Further links with Schur functions, the Jacobi-Trudi identity, and the platypus algorithm (ii) were investigated in [8].
In the full version of this article, we will use our formula to answer the following question (which comes from bioinformatics): what is the distribution of the maximum of the absolute value of some random discrete bridges? This case can be seen as a discrete equivalent of "reflected Brownian bridges" (or absolute value of the Brownian), continuous objects for which results exist [14]. In the discrete case, this corresponds to another specialisation of Theorem 4 above, via the case h 2 = h.

Asymptotic limit laws
Due to numerous works making the link between discrete random walks and Brownian motion variants [13,19,18], and in particular the Komlós-Major-Tusnády strong embedding result [23] (see Chatterjee [10] for a modern approach), it is expected that the limit law for the height of bridges satisfies a Rayleigh distribution. Although we do not investigate the case here, the limit law for the height of discrete excursions should satisfy a Theta distribution (note that the height of excursions is related to the height of some trees [12,16]).
From a probabilistic point of view, our work on bridges brings new results on the speed of convergence (exact error terms). From an analytic point of view, it also describes the exact nature of the corresponding singularities.

Convergence to the Rayleigh law
Definition 1 The Rayleigh distribution of parameter α is defined as a distribution which has support [0, +∞] with a density given by R α (x) = x α exp(− x 2 2α 2 ). Accordingly, its cumulative distribution function is Theorem 5 The height of discrete bridges satisfies a Rayleigh limit law of parameter This shows that whatever the drift is, we still get a Rayleigh limit law. This result is also coherent with the symmetry induced by "reversing the time", i.e., P (u) ↔ P (1/u).
The proof follows the same lines as the proof of Theorem 6 hereafter; indeed, to any set of jumps, one can associate a probabilistic model with drift 0 which can be done by Cramér's method of shifting the mean: (ii) The Platypus algorithm presented in [4] allows to get the minimal polynomial of our algebraic generating functions, and therefore, going to the associated differential equation, it allows to compute the number of walks of length n with O( √ n) operations. In the case of two boundaries, we are in the rational case; using a variant of binary exponentiation when taking powers of matrices leads to a complexity O((h + h 2 ) log n), where ≈ 2.38 is an improvement of Strassen's constant for matrix multiplication. This can be improved using D-finiteness of our generating functions, and then recurrences of order less than h + h 2 . defineP (u) := P (ux)/P (x) for a x such thatP (1) = 0. An elementary computation shows that x is then exactly τ , the unique real positive saddle point of P (u). We therefore consider hereafter the case of drift 0 without loss of generality.
We prove in the next theorem that under the usual normalisation, the height of the discrete bridge (built upon jumps with expectation zero, corresponding to walks without drift) converges asymptotically weakly to the height of the standard Brownian motion. We also prove that the speed of convergence is O(1/ √ n) and provide an algorithm computing asymptotically the distribution function of the height of a discrete bridge at any order in the case of Łukasiewicz walks (one downward jump −1).

Proof (sketch):
We consider the non-periodic case (the proof for the periodic case follows the same lines) for which we compute the probability that a bridge goes beyond an upper barrier at level xσ √ n for a fixed x ∈]0, +∞[. Using the product Q defined in Equation 5, Theorem 2 can be rewritten as: .
max |z|≤ρ |v 1 (z)| |v j (z)| and A < 1 by the domination property (see Section 2), the main contribution comes from the summand for i = 1: The last line just uses the rewriting Q(u) = q i u i . Taking the coefficient [u 0 ] of F [>h] in order to get bridges and using the functions W k (z) introduced in Theorem 1 for walks ending at altitude k leads to Using now B := max 2≤j≤c max |z|≤ρ |u j (z)| |u 1 (z)| < 1, the main contribution comes from the summand for k = 1: Using the product Q defined in Equation 5, we have for C := max(A, B) < 1: Singularity analysis. In our zero drift probabilistic setup, Property 6 from Section 2 can be rewritten using ρ = τ = 1, P (1) = 1, a 1 = ; this leads to Due to the organisation in cycles of the roots (in conjugated tuples of branches at their singularities), the sum and product of the v i 's (i > 1) is regular at 1, and therefore so are all the q i 's, thus one has: Asymptotic expansions. We compute now asymptotically b >h (z) is z = 1, as seen in Equation (15) and (16). We compute b >h n by a Cauchy integral, where Γ is a Hankel contour winding around the point z = 1. Note that there exists a suitable contour Γ on which the domination property of the roots hold, and as we deal with an algebraic function, it has therefore isolated singularities.
We follow here the proof from [5] for the limit distribution of semi-large powers and apply also the Big-Oh transfer theorem (see also Theorem VI.3 and IX.16 of [17]). Using the change of variable z = 1 − t n , we get b >xσ where Γ is a Hankel contour winding clockwise from −∞ around the origin. Expanding e −2x √ 2t , making the substitution t ; −t, integrating term-wise, and using the Hankel contour representation (iii) for the Gamma function, we obtain after simplification as n tends to infinity (since an unbounded weighted walk of length n is a bridge with probability b <∞ where β >xσ √ n n (i.e. the probability that a bridge of length n goes upon the barrier y = xσ √ n) thus follows a Rayleigh limit law for x ∈]0, +∞[. This concludes the first part of our proof.
The proof of the concentration property follows the same lines as the proof for x ∈]0, ∞[ with different error terms.
This concludes our proof in the case of non-periodic walks.
In the case of walks with period d, there are d singularities on the circle of convergence at positions ω j where ω is the primitive d-th root of 1. For unconditioned walks, this corresponds to d saddle-points on the circle |z| = 1. In the case of upper-bounded walks, we use a star-shape Cauchy contour that is deformed up to d Hankel contours, the jth contour coming from ω j × (+∞) winding around the point ω j and coming back to ω j × (+∞). In both cases, there is a multiplicative factor d occurring in b >xσ √ n n and in b <∞ n ; this factor cancels when taking the ratio of the two quantities.

Łukasiewicz bridges
When considering the case of Łukasiewicz walks (i.e. c = 1, see the Section 1), we obtain more precise asymptotics for the convergence to the Rayleigh law. Indeed, starting from Equation 5, one has: .
The value of Q(v 1 ) follows by interchanging the rôles of u 1 and v 1 . Equation (15) thus becomes By following the same steps as in Section 4.1 but at a higher asymptotic order, we find, with a 1 = 2/P " (1), that in which we recognise the Rayleigh distribution of parameter σ = τ a1 √ 2ρ . In the probabilistic setting (with zero drift), we give in the next formula more precise error terms: where ζ = σ 2 = P (1), ξ = P (1) and θ = P (1). The algolib Maple package (more precisely, the gdev and equivalent functions developed by Bruno Salvy, see algo.inria.fr/librairies) can naturally push the expansion to higher orders.

Bioinformatics application
A set of G genes is expressed in a given tissue; this provides a ranking of level of expression of these genes. Considering now the same ranking and a subset of specific interest of g genes, the question is how to characterise if the ranks of these g genes form an unexpected pattern. If these g genes have a high level of expression, they will mostly appear at the top of the ranking; on the contrary, if they have a low level of expression, they will mostly appear at the bottom of the ranking. Both cases are assumed to correspond to biological disorders. The aim is to provide a statistical estimator for exceptional behaviours. Keller et al. [21] proposed the following non-parametric approach. While scanning from left to right the ranking of the G genes, build a random walk (B i ) 0≤i≤G such that its jump at time i is G − g if the gene at rank i belongs to g, −g if the gene at rank i belongs to G − g.
By construction, we have B 0 = B G = 0 and these walks are therefore bridges. The tail probability (commonly referred to as p-value in applied mathematics) that Keller et al. choose as statistical indicator is p-value = Pr(max 1≤i≤G B i > h), for any chosen h. They provide a dynamic programming algorithm computing this indicator in complexity O(G × g). We consider in this extended abstract the case of over-expressed genes which corresponds to the maximum of the bridges. If (G − g)/g is entire, we are in a particular case of Łukasiewicz walk where P (u) = u d d+1 + d d+1 1 u and d = G−g g . This gives formula (which has a constant time complexity in n, h, and d).
Heuristics. We expect that using Equation (27) with a fractional d = (G − g)/g will give a good approximation of the probability that the height of a walk (+(G − g), −g) goes upon a given value, although we cannot prove this yet.
Simulations. In order to get enough precision, we apply here Equation (27), with d = 3 and d = 19. We plot the obtained distributions in Figure 2. The simulations are done by drawing random non-conditional walks of length n and then by reversing jumps at uniformly chosen random positions until we obtain bridges. We deal here with periodic walks with 2 jumps (+d and −c); for such walks, bridges have length n = 0 mod (c+d) and each reversing leads to a walk ending nearer and nearer from 0, while still ending in the same half-plane. This explains why our algorithm takes on average O( √ n) iterations since a walk of length n ends typically at altitude O( √ n).  In these examples, we performed 10 7 iterations for the simulations. The plots on the right are zooms for smaller p-values. These plots show a perfect concordance of the simulations with the discrete asymptotics we obtained, and a clear discrepancy with the Rayleigh distribution, perfectly coherent with our expansion terms.

Perspectives and conclusion
We provided in this article the generating functions for several cases of bounded lattice paths. We proved that the height of discrete bridges converges weakly to the Rayleigh distribution. In the case of Łukasiewicz bridges, we are able to compute at any order the asymptotic expansion of the discrete distribution of this height. Applying this method to a statistical estimator originating from bioinformatics provides excellent agreements with the corresponding simulations. The generating functions we gave for walks between one or two walls gives access to the asymptotics of the height of meanders (the drift then plays a rôle). It is also possible to apply the same proofs to directed walks in dimension 2. From a computer algebra point of view, it is also interesting to note that most of the formulas we got are involving symmetric functions, which can be handled in an efficient way with some variants of the platypus algorithm [4]. The full version of this article will also consider convergence of moments of the height in our different models. Note that asymptotics for bridges between two barriers also have applications in bioinformatics.