Culminating paths

. Let a and b be two positive integers. A culminating path is a path of Z 2 that starts from (0 , 0), consists of steps (1 ,a ) and (1 , − b ), stays above the x -axis and ends at the highest ordinate it ever reaches. These paths were ﬁrst encountered in bioinformatics, in the analysis of similarity search algorithms. They are also related to certain models of Lorentzian gravity in theoretical physics. We ﬁrst show that the language on a two letter alphabet that naturally encodes culminating paths is not context-free. Then, we focus on the enumeration of culminating paths. A step by step approach, combined with the kernel method, provides a closed form expression for the generating function of culminating paths ending at a (generic) height k . In the case a = b , we derive from this expression the asymptotic behaviour of the number of culminating paths of length n . When a > b , we obtain the asymptotic behaviour by a simpler argument. When a < b , we only determine the exponential growth of the number of culminating paths. Finally, we study the uniform random generation of culminating paths via various methods. The rejection approach, coupled with a symmetry argument, gives an algorithm that is linear when a ≥ b , with no precomputation stage nor non-linear storage required. The choice of the best algorithm is not as clear when a < b . An elementary recursive approach yields a linear algorithm after a precomputation stage involving O ( n 3 ) arithmetic operations, but we also present some alternatives that may be more eﬃcient in practice.


Introduction
One-dimensional lattice walks on Z have been extensively studied over the past 50 years.These walks usually start from the point 0, and take their steps in a prescribed finite set S ⊂ Z.A large number of results are now known on the enumeration of sub-families of these walks, and can be obtained in a systematic way once the set S is given.This includes the enumeration of bridges (walks ending at 0), meanders (walks that always remain at a non-negative level), excursions (meanders ending at level 0), excursions of bounded height, and so on.In particular, the nature of the associated generating functions is well understood: these series are always algebraic, and even rational for bounded walks [2,5,10,8,19,26,31,32,37].These algebraicity properties actually reflect the fact that the languages on the alphabet S that naturally encode these families of walks are context-free, and even regular in the bounded case.In many papers, these onedimensional walks are actually described as directed two-dimensional (2D) walks, upon replacing the starting point 0 by (0, 0) and every step s by (1, s).This explains why excursions are often called generalized Dyck paths (the authentic Dyck paths correspond to the case S = {1, −1}).This two-dimensional setting allows for a further generalisation, with steps of the form (i, j), with i > 0 and j ∈ Z, but this does not affect the nature of the associated languages and generating functions.The uniform random generation of these walks has also been investigated, through a recursive approach [39,24,20] or using an anticipated rejection [6,33].
This paper deals with a new class of walks which has recently occurred in two independent contexts, and seems to have a more complicated structure than the above mentioned classes: culminating walks.A 2D directed walk is said to be culminating if each step ends at a positive level, and the final step ends at the highest level ever reached by the walk (Figure 1).We focus here on the case where the steps are (1, a) and (1, −b), with a and b positive, hoping that this encapsulates all the possible typical behaviours.
In the case a = b = 1, culminating walks have recently been shown to be in bijection with certain Lorentzian triangulations [18], a class of combinatorial objects studied in theoretical physics as a model of discrete two-dimensional Lorentzian gravity.Using a transfer matrix approach, the authors derived the generating function for this case.We give two shorter proofs of their result.Also, while it is not clear how the method used in [18] could be extended to the general (a, b)-case, one of our approaches works for arbitrary values of a and b.
The general (a, b)-case appears in bioinformatics in the study of the sensitivity of heuristic homology search algorithms, such as BLAST, FASTA or FLASH [1,34,11].These algorithms aim at finding the most conserved regions (similarities ) between two genomic sequences (DNA, RNA, proteins...) while allowing certain alterations in the entries of the sequences.In order to avoid the supposedly intrinsic quadratic complexity of the deterministic algorithms, these heuristic algorithms first consider identical regions of bounded size and extend them in both directions, updating the score with a bonus for a match or a penalty for an alteration, until the score drops below a certain threshold.The evolution of the score all the way through the final alignment turns out to be encoded by a culminating walk.
In [30], we first studied the probability of a culminating walk to contain certain patterns called seeds, as some recent algorithms make use of them to relax the mandatory conservation of small anchoring portions.Then, we proposed a variant of the recursive approach for the random generation of these walks.Finally, we observed that the naive rejection-based algorithm, which consists in drawing uniformly at random up and down steps and rejecting the resulting walk if is not culminating, seemed to be linear (resp.exponential) when a > b (resp.a < b).This observation, which is closely related to the asymptotic enumeration of culminating walks, is confirmed below in Section 6.2.
To conclude this introduction, let us fix the notation and summarize the contents of this paper.Let a and b be two positive integers.A walk (or path) of length n is a sequence (0, η 0 ), . . ., (n, η n ) such that η 0 = 0 and η i+1 − η i ∈ {a, −b} for all i.The height of the walk is the largest of the η i 's, while the final height is η n .The walk is culminating if the two following conditions hold: See Figures 1 and 2 for examples and counter-examples.We encode every walk by a word on the alphabet {m, m} in a standard way: each ascending step (1, a) is replaced by a letter m and each descending step (1, −b) is replaced by a letter m.We denote by {m, m} * the set of words on the alphabet {m, m}.From now on, we identify a path and the corresponding word.Since these objects are essentially one-dimensional, we will often use a 1D vocabulary, saying, for instance, that our paths take steps +a and −b (rather than (1, a) and (1, −b)).We hope that Figure 2. Two walks that are not culminating, violating the final record condition (left) or the positivity condition (right).
this will not cause any confusion.Without loss of generality, we restrict our study to the case where a and b are coprime.
For any word w, we denote by |w| m (resp.|w| m ) the number of occurrences of the letter m (resp.m) that it contains.We denote by |w| the length of w.The function φ a,b : {m, m} * → N maps a word to the final height of the corresponding walk.That is, φ a,b (w) = a|w| m − b|w| m .The culmination properties can be translated into the following language-theoretic definition: Definition 1.1.The language of culminating words is the set C a,b ⊂ {m, m} * of words w such that, for every non-empty prefix w ′ of w: and, for every proper prefix w ′ of w: The main result of Section 2 is that the language C a,b is not context-free.In Section 3, we obtain a closed form expression for the generating function of culminating walks.This expression is complicated, but we believe this only reflects the complexity of this class of walks.This enumerative section is closely related to the recent work [10], devoted to a general study of excursions confined in a strip.In particular, symmetric functions play a slightly surprising role in the proof and statement of our results.We then derive in Section 4 the asymptotic number of culminating walks, in the case a ≥ b.Our result implies that, asymptotically, a positive fraction of (general) (a, b)-walks are culminating if a > b.We prove that this fraction tends to 0 exponentially fast if a < b.More precisely, we determine the exponential growth of the number of culminating walks.This asymptotic section uses the results obtained in [5] on the exact and asymptotic enumeration of excursions and meanders.Finally, in Section 6, we present several algorithms for generating uniformly at random culminating walks of a given length.Our best algorithms are linear when a ≥ b.When a < b, the choice of the best algorithm is not obvious.An elementary recursive approach yields a quasi-linear generating stage but requires the precomputation and storage of O(n 3 ) numbers.We exploit in this section several generation schemes, like the recursive method [39,24], the rejection method [14] and Boltzmann samplers [20].Moreover, we address in Section 5 the random generation of positive walks, which is a preliminary step in some of our algorithms generating culminating walks.We have implemented our algorithms in Java, and we invite the reader to generate his/her own paths at the address http://www.lri.fr/∼ponty/walks.Figure 3 shows random culminating paths of length 1000 generated with our software, for various values of a and b.

Language theoretic properties
We denote by C a,b⇒k the subset of C a,b that consists of the walks (words) ending at height k.It will be easily seen that this language (for a fixed k) is regular.However, we shall prove that the full language C a,b is not context-free.We refer to [27] for definitions on languages.  .In the first two cases, four paths are displayed, while for the sake of clarity, only one path is shown in the third case.

Culminating walks of bounded height
Proposition 2.1.For all a, b, k ∈ N, the language C a,b⇒k of culminating words ending at height k is regular.
Proof.The culminating paths of final height k move inside a bounded space.This allows us to construct a (deterministic) finite-state automaton that recognizes these paths.The states of this automaton are the accessible heights (that is, 0, 1, . . ., k), plus a garbage state ⊥.The initial state is 0, the final state is k, and the transition function δ is given, for 0 ≤ q < k, by: Clearly, this automaton sends any word attempting to walk below 0 (resp.above k) in the garbage ⊥, where it will stay forever and therefore be rejected.Moreover, it only accepts those words ending in the state k.Hence this automaton recognizes exactly C a,b⇒k .Since the state space is finite, C a,b⇒k is a regular language.

Unbounded culminating walks
Proposition 2.2.For all a, b ∈ N, the language C a,b of culminating walks is not context-free.
Proof.Recall that the intersection of a context-free language and a regular language is contextfree [27].Let L be the following regular language: L = m * .m* .m* .It can be seen as the language of "zig-zag" paths.Let K = C a,b ∩ L. It is easy to see that Assume that C a,b is context-free.Then so is K, and, by the pumping lemma for context-free languages [27,Theorem 4.7], there exists n ∈ N such that any word w ∈ K of length at least n admits a factorisation w = x.u.y.v.z satisfying the following properties: (i) |u.v| ≥ 1, (ii) |u.y.v| ≤ n, (iii) ∀ℓ ≥ 0, w ℓ := x.u ℓ .y.v ℓ .z∈ K. Since a and b are coprime, there exist i > n and j > n such that ia − jb = 1 (this is the Bachet-Bezout theorem).Hence the word w = m i m j m i belongs to K. In the rest of the proof, we will refer to the first sequence of ascending steps of w as A, to the descending sequence as B and to the second ascending sequence as C.
Where is the factor u.y.v? ℓ w ℓ Failing condition Why the pumping lemma is not satisfied.
In Table 1, we consider all eligible factorisations of w of the form w = x.u.y.v.z.Five cases arise, depending on which part of w contains the factor u.y.v.Condition (ii) implies that this factor cannot overlap simultaneously with the parts A and C. Each of the cases A ∪ B and B ∪ C is further subdivided into two cases, depending on whether u and v are monotone or not.
For each factorisation, the table gives a value of ℓ for which the word w ℓ does not belong to K.This is justified in the rightmost column: either w ℓ does not belong to the set L of zig-zag paths, or the positivity condition does not hold, or the last step of the walk is not a record.
Once all the possible factorisations have been investigated and found not to satisfy the pumping lemma, we conclude that the languages K and C a,b are not context-free.

Exact enumerative results
In this section, we give a closed form expression for the generating function of (a, b)culminating walks.More precisely, we give an expression for the series counting culminating walks of height k, and then sum over k.This summation makes the series a bit difficult to handle, for instance to extract the asymptotic behaviour of the coefficients (Section 4).We believe that this complexity is inherent to the problem.In particular, we prove that the generating function of (1, 1)-culminating walks is not only transcendental, but also not D-finite.That is, it does not satisfy any linear differential equation with polynomial coefficients [37,Ch. 6].

Statement of the results and discussion
Let us first state our results in the (1,1)-case and then explain what form they take in the general (a, b)-case.
• U 1 and U 2 are the two roots of the polynomial u − t(1 + u 2 ): • U stands for any of the U i 's.
The generating function of culminating walks, is not D-finite.
The above expression of C(t) is equivalent to the case x = y = 1 of [18,Eq.(2.26)].
The first expression of C k , in terms of the Fibonacci polynomials, is clearly rational.As explained in Section 2.1, the language of culminating walks of height k is regular for all a and b, so that the series C k will always be rational.Of course, C k is simply 0 when k < a.When k = a, there is only one culminating path, reduced to one up step, so that C k = t.More generally, the following property, illustrated in Figure 4 and proved in Section 3.2.1,holds.Property 3.2.For k ≤ a + b, there is at most one culminating path of height k.
As soon as k > a, culminating walks of height k have at least two steps.Deleting the first and last ones gives C k = t 2 W k , where W k counts walks (with steps +a, −b) going from a to k − a on the segment 1, k − 1 .General (and basic) results on the enumeration of walks on a digraph provide [36,Ch. 4]:  where A k = (A i,j ) 1≤i,j≤k−1 is the adjacency matrix of our segment graph: We note from Proposition 3.1 that, in the (1, 1)-case, both N k and D k are especially simple.Indeed, N k = t k−2 , while D k = F k−1 satisfies a linear recurrence relation (with constant coefficients) of order 2. We will prove that, for all a and b, both sequences N k and D k satisfy such a recurrence relation (of a larger order in general).The monomial form of N k will hold as soon as a = 1.
The second expression of C k given in Proposition 3.1 appears as a rational function of the roots of the polynomial u − t(1 + u 2 ).Even though both series U 1 and U 2 are algebraic (and irrational), the fact that C k is symmetric in U 1 and U 2 explains why C k itself is rational.In general, we will write C k as a symmetric rational function of the a + b roots of the polynomial The third expression of C k follows from the fact that U 1 U 2 = 1.In general, t = U b /(1 + U a+b ) for U = U i , so that it will always be possible to write C k as a rational function of U .However, this expression will not be always as simple as above.The equivalence of the three expressions of Proposition 3.1 follows easily from the fact that This can be proved by solving the recurrence relation satisfied by the F k 's -or can be checked by induction on k.
Let us now state our generalisation of Proposition 3.1 to (a, b)-culminating walks.Our first expression of C k , namely the rational form (2), involves the evaluation of two determinants of size (approximately) k.Our second expression of C k will be a fixed rational function of U 1 , . . ., U a+b , U k 1 , . . ., U k a+b , symmetric in the U i , which involves two determinants of constant size a + b.The existence of such smaller determinantal forms for walks confined in a strip has already been recognized in [3,Ch. 1].More recently, the case of excursions confined in a strip has been simplified and worked out in greater detail [10].As in [10], our results will be expressed in terms of the Schur functions s λ , which form one of the most important bases of symmetric functions in n variables x 1 , . . ., x n : for any integer partition λ with at most n parts, with 1≤i,j≤n .We refer to [37,Ch. 7] for generalities on symmetric functions.Proposition 3.3.Let k > a.With the above notation, the length generating function of (a, b)culminating paths of height k admits the following expressions: where A k is given by (3), the (a + b)-tuple U = (U 1 , . . ., U a+b ) is the collection of roots of the polynomial u b − t(1 + u a+b ), and the partitions λ and µ are given by λ = (k − 1) a and µ = ((k − 1) a−1 , a − 1).
The determinant D k of (1 − tA k ) and the relevant cofactor N k are respectively given by Both sequences N k and D k satisfy a linear recurrence relation with coefficients in Q[t], respectively of order a+b a and a+b a−1 .These orders are optimal.Note that the expression of C k in terms of Schur functions still holds for k = a.Examples will be given below.For the moment, let us underline that the case a = 1 of this proposition takes a remarkably simple form, which will be given a combinatorial explanation in Section 3.2.3.
Corollary 3.4.When a = 1, the generating function of culminating walks of height k ≥ 1 reads , Examples.Let us illustrate Proposition 3.3 by writing down explicitly the expression of C k for a few values of a and b.We use the determinantal form (4) of Schur functions.
Case a = b = 1.Here U 1 and U 2 are the two roots of the polynomial u − t(1 + u 2 ).The partition µ is empty, so that s µ = 1, while λ = (k − 1).This gives as in Proposition 3.1.The recurrence relations satisfied by the polynomials N k and D k can always be worked out from their expressions (5), as will be explained in Section 3.2.2.In the case a = b = 1, one finds Here U 1 , U 2 , U 3 are the three roots of the polynomial u 2 − t(1 + u 3 ).Again, µ is empty and λ = (k − 1) (this holds as soon as a = 1).One obtains Note that this expression allows us to compute in a few seconds the number c n of culminating walks for n up to 500.
The rational expression of C k reads Let us say that a path is positive if every step ends at a positive level.For instance, culminating walks are positive.For n ≥ 0 there exists a unique positive walk of length n and height at most a + b, denoted w n .Indeed, given h ∈ 0, a + b , exactly one of the values h + a, h − b lies in the interval 1, a + b .For the same reason, w i is a prefix of w j for i ≤ j.Let k ≤ a + b, and assume that there exist two distinct culminating walks of height k.These walks must be w i and w j , for some i and j, with, say, i < j.But then w i is a prefix of w j , and ends at height k, which prevents w j from being culminating.

3.2.2.
Proof of Proposition 3.3.The expression of C k in terms of the adjacency matrix A k has been justified in Section 3.1.Let us now derive the Schur function expression of this series.We will give actually two proofs of this expression: the first one is based on the kernel method [8,4,3], and the second one on the Jacobi-Trudi identity.The first proof is completely elementary.The second one allows us to relate the polynomials N k and D k to the Schur functions s λ and s µ .This derivation is very close to what was done in [10] for excursions confined in a strip.Some of the results of [10] will actually be used to shorten some arguments.
First proof via the kernel method.Consider a culminating walk of height k > a.Such a walk has length at least 2. Delete its first and last steps: this gives a walk starting from level a, ending at level k − a, and confined between levels 1 and k − 1. Shifting this walk one step down, we obtain a non-negative walk starting from level a − 1 and ending at level k − 1 − a, of height at most k − 2. Let G(t, u) ≡ G(u) denote the generating function of non-negative walks starting from a − 1, of height at most k − 2. In this series, the variable t keeps track of the length while the variable u records the final height.Write , where G h counts walks ending at height h.The above argument implies that the generating function of culminating walks of height k is We can construct the walks counted by G(u) step by step, starting from height a − 1, and adding at each time a step +a (unless the current height is k − a − 1 or more) or −b (unless the current height is b − 1 or less).In terms of generating functions, this gives: The kernel of this equation, that is, the polynomial u b − t(1 + u a+b ), has a + b distinct roots, which are Puiseux series in t.We denote them U 1 , . . ., U a+b .Recall that G(u) is a polynomial in u (of degree k − 2).Replacing u by each of the U i gives a system of a + b linear equations relating the unknown series G 0 , . . ., G b−1 and In matrix form, we have MG = C/t, where M is the square matrix of size a + b given by ).In view of the definition (4) of Schur functions, with λ = (k − 1) a .It has been shown in [10] that the generating function of excursions (walks starting and ending at 0) confined in the strip of height k − 2 is and that, in particular, s λ (U) = 0. Hence M is invertible, and applying Cramer's rule to the above system gives with λ and µ defined as in the statement of the proposition.Combining this with (6) gives the desired Schur function form of C k .
A second proof via symmetric functions.Let us now give an alternative proof of the Schur function expression of C k .It will be based on the dual Jacobi-Trudi identity, which expresses Schur functions as a determinant in the elementary symmetric functions e i [37, Cor.7.16.2]:for any partition ν, where ν ′ is the conjugate partition of ν.
Let us consider the identity (2), with D k = det(1 − tA k ).It turns out that this determinant is of the form (8). Indeed, let us define Then the only elementary symmetric functions of the V i that do not vanish are e 0 (V) = 1, e a (V) = −1/t and e a+b (V) = 1 (with V = (V 1 , . . ., V a+b )).Let us apply (8) since s λ is homogeneous of degree a(k − 1).This gives the Schur function expression of D k .Now, by the general inversion formula for matrices, , where (1 − tA k ) k−a,a is obtained by deleting row k − a and column a from (1 − tA k ).Let us apply (8) to ν = µ = ((k − 1) a−1 , a − 1).Then ν ′ = a a−1 (a − 1) k−a .The matrix e ν ′ j +i−j has size k − 1, and its last column contains only one non-zero entry (equal to e 0 (V) = 1), in row k − a.After deleting this row and the last column, one obtains: as s µ is homogeneous of degree k(a − 1).This gives the desired expression of N k .We use the definition (4) of Schur functions to write s (k−1) a−1 ,a−1 as a ratio of determinants of size n.The determinant occurring at the denominator is the Vandermonde V n in the u i 's, and is independent of k.The determinant at the numerator is obtained from ( 7) by replacing the column containing U b+k−1 i by a column of U a+b−1 i (and then each U i by the indeterminate u i ).We expand it as a sum over permutations of length n, and obtain: where σ acts on functions of u 1 , . . ., u n by permuting the variables: Equivalently, and P (z; u) is another polynomial in z and the u i , symmetric in the u i 's.This symmetry property shows that replacing u i by U i transforms N ′ (z; u) into a rational series in z and t.The link between N k and s (k−1) a−1 ,a−1 then gives another rational function of z and t.A similar argument, given explicitly in [10], yields for two polynomials P and Q in z and u 1 , . . ., u n .More precisely, By looking at the degree of Q and Q, this establishes the existence of recurrence relations of order a+b a−1 for N k , and a+b a for D k .If there were recursions of a smaller order, the polynomials Q(z; U) or Q(z; U) would factor.It has been shown in [10,Section 6] that Q(z; U) is irreducible, and the same argument implies that Q(z; U) is irreducible as well.
Let us now justify combinatorially the simplicity of N k and D k .Recall that, for k ≥ 2, one has C k = t 2 W k , where W k counts walks (with steps +1, −b) going from 1 to k − 1 on the segment graph 1, k −1 .The adjacency matrix of this graph is A k .The combinatorial description 1 of the inverse of the matrix (1 − tA k ) tells us that D k counts non-intersecting collections of elementary cycles on the segment 1, k − 1 , while N k counts configurations formed of a self-avoiding path w going from 1 to k − 1 together with a non-intersecting collection of elementary cycles that do not meet w.In the polynomials N k and D k , each cycle of length ℓ is given a weight (−t ℓ ) while the path w is simply weighted t ℓ if it has length ℓ.This gives directly N k = t k−2 , as the only possible path w is formed of k − 2 up steps, and leaves no place to co-existing cycles.Now the only elementary cycles are formed of b up steps and one down step −b.The recursion satisfied by D k is then obtained by discussing whether the point k − 1 is contained in one such cycle.
Note that this proof can be rephrased in terms of heaps of cycles using Viennot's correspondence between walks on a graph and certain heaps [38].The expression N k /D k then appears as a specialisation of the inversion lemma (also found in [38]).In particular, D k is the (alternating) generating function of trivial heaps of cycles.
Remark.For general values of a and b, the description of D k and N k in terms of cycles and paths on the graph 1, k − 1 remains perfectly valid.But the structure of elementary cycles and self-avoiding paths becomes more complicated.See an example in Figure 5.Let us first observe that C(t) is D-finite if and only if the power series (in u) B(u ) is D-finite.Indeed, one goes from C(t) to B(u), and vice-versa, by an algebraic substitution of the variable, as U is an algebraic function of t and t = U/(1 + U 2 ).It is known that D-finite series are preserved by algebraic substitutions [37,Thm. 6.4.10], so that we can now focus on the series B(u).
This series has integer coefficients, and radius of convergence 1. Hence it is either rational, or admits the unit circle as a natural boundary [12].As will be recalled later (10), the singular behaviour of B(u) as u approaches 1 involves a logarithm, which rules out the possibility of 1 This description seems to have been around since, at least, the 80's [25,38].See [9, Thm.2.1] for a modern formulation.
B(u) being rational.Thus B(u) has a natural boundary, and, in particular, infinitely many singularities.But D-finite series have only finitely many singularities, so that B(u) is not Dfinite.

Asymptotic enumerative results
In this section we present some results on the asymptotic enumeration of culminating walks.Intuitively, three cases arise, depending on the drift of the walks, defined as the difference a − b.Indeed, an n-step random walk of positive drift is known to end at level O(n) and is, intuitively, quite likely to be culminating.On the contrary, walks with a negative drift have a very small probability of staying positive.We first work out the intermediate case of a zero drift.When the drift is zero, the number of positive walks (walks in which every step ends at a positive level) of length n is known to be asymptotically equivalent to 2 n / √ 2πn.The average height, and the average final level of these walks both scale like √ n.Hence we can expect the number of culminating walks to be of the order of 2 n /n.This is confirmed by the following result.
Proof.We start from the expression (1) of C(t), with U = U 1 = O(t), and apply the singularity analysis of [23].Note that U (t) is an odd function of t.Let us first study the even part of C(t), which counts culminating paths of even length: .
The series S(z) has radius of convergence 1.Given that |Z(x)| < 1 in D, this implies that D(x) = S(Z(x))) is analytic in the domain D. It remains to understand how D(x) behaves as x approaches 1/4 in D.
Take x = (1 − re iθ )/4, with 0 < r < 1 and |θ| < π.Then In particular, arg(1 Choose α ∈ (π/4, π/2).The above identity shows that there exists η > 0 and π/2 < φ < π such that, in the indented disk This can be obtained using a Mellin transform or some already known results on the generating function of divisor sums [22].Combining ( 9) and (10) shows that, as x tends to 1/4 in the indented disk I, This allows us to apply the transfer theorems of [23].Indeed, the series D(x) is analytic in the following domain: with singular behaviour near x = 1/4 given by (11).From this we conclude that the coefficient of x n in D(x) is asymptotically equivalent to 4 n /(8n).Going back to the series C e (t), this means that the number of culminating paths of (even) length N = 2n is asymptotically equivalent to 2 N /(4N ).
The study of the odd part of C(t) is similar.

Walks with positive drift (a > b)
When the drift is positive, it is known that, asymptotically, a positive fraction of walks with steps +a, −b is actually positive (every step ends at a positive level).More precisely, as n → ∞, the number p a,b n of positive walks of length n satisfies for some positive constant κ a,b .We will show that the culmination and final record conditions play similar filtering roles in the paths of {m, m} * , and prove the following result.Proof.In what follows, we consider two families of paths that are close to the meanders and excursions defined in the introduction: the (already defined) positive walks, and certain quasiexcursions.The exact and asymptotic enumeration of meanders and excursions has been completely worked out in [5], and we will rely heavily on this paper.For instance, the estimate (12) follows from the results of [5] by noticing that a meander factors into an excursion followed by a positive walk.Let us call quasi-excursion a walk in which every step, except the final one, ends at a positive level.For instance, if a = 3 and b = 2, the word mmm is a quasi-excursion.By removing the last step of such a walk, we see that quasi-excursions are in bijection with positive walks of final height 1, 2, . . ., or b.We denote the number of quasi-excursions of length n by e a,b n .Using the results of [5], it is easy to see that, when the drift is positive, quasi-excursions are exponentially rare among general walks.That is, there exists µ < 2 such that for n large enough, e a,b n < µ n .
From now on, we drop the superscripts a and b, writing for instance c n rather than c a,b n .For any word w = w 1 • • • w k , denote by ← w the mirror image of w, that is, Let u be a culminating word of length n, and write u = vw, where the word v (resp.w) has length ⌊n/2⌋ (resp.⌈n/2⌉).Then both v and ← w are positive walks, and this proves that Conversely, let us bound the number of pairs (v, w), where v and w are positive walks of respective lengths ⌊n/2⌋ and ⌈n/2⌉, such that the word u = v ← w is not culminating.This means that • either u factors as v 1 w 1 , where v 1 is a quasi-excursion of length i > ⌊n/2⌋, • or, symmetrically, u factors as v 2 ← w 2 where w 2 is a quasi-excursion of length j > ⌈n/2⌉.This implies that In view of ( 13), we have, for n large enough: Combining this with ( 14) and the known asymptotics for the numbers p n gives the expected result.

Walks with negative drift (a < b): exponential decay
When the drift is negative, it is known that positive walks are exponentially rare among general walks.Indeed, there exist constants κ a,b > 0 and α a,b ∈ (1, 2), such that where q = a/b < 1.We show below that the constant α a,b also governs the number of culminating walks of size n.
where α a,b is given above.Moreover, Proof.The inequality ( 14) still holds, and gives the upper bound ( 16) on the number of culminating paths.
Let us now prove that the growth constant of culminating walks is still α a,b by constructing a large class of such walks.Let E n be the set of excursions of length n (from now on, we drop the superscripts a and b).Such excursions only exist when n is a multiple of a + b, and the number e n of such walks then satisfies e n ∼ κα n a,b n −3/2 for some positive constant κ.It is known that random (a, b)-excursions of length n converge in law to the Brownian excursion, after normalising the length by n and the height by κ ′ √ n, for some constant κ ′ depending on a and b [29].This implies that the (normalized) height of a discrete excursion converges in law to the height of the Brownian excursion (described by a theta distribution).In particular, the probability p n that an excursion of E n has height larger than √ n tends to a limit p < 1 as n goes to infinity.Take an excursion of E k of height less than √ n, with a + b and append one up step at its left, and n − k − 1 up steps at its right: this gives a culminating walk of length n, which proves that Taking nth roots gives the required lower bound on the growth of c n .
Hence there are exponentially few walks of size n with steps +a, −b that are culminating.It is likely that c n behaves like α n a,b n −3−γ , for some γ ≥ 0 that remains to be determined.Note that the final height of an n-step meander is known to have a discrete limit law as n → ∞ [5].

Random generation of positive walks
The random generation of positive walks will be a preliminary step in some of the algorithms we present in the next section for the generation of culminating walks.The main ideas underlying the generation are the same for both classes of walks, but the class of positive walks is simpler.We apply three different approaches to their random generation: recursive methods (two versions), anticipated rejection, and Boltzmann sampling.The choice of the best algorithm depends on the drift, as summarized in the top part of Table 2.We denote by P a,b the language of positive walks, but the superscript a, b will often be dropped.

Recursive step-by-step approach
The first approach we present is elementary: we construct positive walks step-by-step, choosing at each time an up or down step with the right probability.This is the basis of the recursive approach introduced in [39].Here are the three ideas underlying the algorithm: • Let W be a language, and let W p denote the language of the prefixes of words of W.
Assume that for all w ∈ W p such that |w| ≤ n, we know the number N w (n) of words of W of length n beginning with w (we call these word extensions of w).Then it is possible to draw uniformly words of length n in W as follows.One starts from the empty word, and adds steps incrementally.If at some point the prefix that is built is w, one adds the letter x to w with probability N wx (n)/N w (n).• When W = P a,b , the number of extensions of length n of a prefix w ∈ W p depends only on two parameters: -the length difference i = n − |w|, -the final height of w, j = φ a,b (w), • Let p i,j be the number of extensions of length n of such a prefix w.The numbers p i,j obey the following recurrence: As the two parameters i and j are bounded by n and an respectively, the precomputation of the numbers p i,j takes O(n 2 ) arithmetic operations and requires to store O(n 2 ) numbers.Then, the generation of a random word of length n can be performed in linear time.However, one should take into account the cost due to the size of the numbers in the precomputation stage.Indeed, the numbers p i,j are exponential in n, so that the actual time-space complexity for this stage may grow to O(n 3 ).However, using a floating-point technique adapted from [16], it should be possible to take advantage of the numerical stability of the algorithm to reduce the space needed to O(n 2+ε ).This naive recursive approach is less efficient than the one presented below, which is based on context-free grammars.But it will be easily adapted to the generation of culminating walks, which cannot be generated via a grammar, as was proved in Section 2.

Recursive approach via context-free grammars
It is easy to see that the language P a,b ≡ P is recognized by a non-deterministic push-down automaton.This implies that P is context-free.The same holds for the language D a,b ≡ D of excursions.A non-ambiguous context-free grammar generating excursions is given explicitly in [19].It suffices to add one equation to obtain a non-ambiguous grammar generating positive walks: In this system, ε is the empty word, D (resp.P) is the language of excursions (resp.positive walks) while L i , 1 ≤ i ≤ a and R j , 1 ≤ j ≤ b, are a + b auxiliary languages defined in [19].As above, m and m are the up and down letters in our alphabet.
From this grammar, we can apply the recursive approach of [24] for the uniform generation of decomposable objects, implemented in the combstruct package of Maple or in the stand-alone software GenRGenS [35].The generation of positive walks of size n begins with the precomputation of O(n) large numbers.These numbers count words of length r, for all r ≤ n, in each of the languages involved in the grammar.The fastest way to get them is to convert the algebraic system (17) into a system of linear differential equations, which, in turn, yields a system of linear recurrence relations (with polynomial coefficients) defining the requested numbers.This step requires a linear number of arithmetic operations.But one has to multiply numbers whose size (number of digits) is O(n), which may result, in practice, in a quadratic time-complexity for the precomputation stage.Then, the generation of a random positive walk can be performed in time O(n log n).
Note that a careful implementation [15] of the floating point approach of [16] using an arbitrary-precision floating-point computation library yields a O(n 1+ε ) complexity after a O(n 1+ε ) precomputation.

Anticipated rejection
The principle of this approach is to start with an empty walk, and then add successive up and down steps by flipping an unbiased coin until the walk reaches the desired length n, or a non-positive ordinate.In this case, the walk is rejected and the procedure starts from the beginning.Of course, no precomputation nor non-linear storage is required.This principle was applied to meanders, in the case a = b = 1, in [6], as a first step towards the uniform random generation of directed animals.The analysis of this algorithm yielded a linear time-complexity, later generalized in [7] to the case of coloured walks, in which up, down, and level steps come respectively in p, q and r different colours.There, it was shown that the time-complexity is linear when p ≥ q, but exponential when p < q.
Unsurprisingly, we obtain similar results for the general (a, b)-case.Proof.We first note that the language P of positive walks is a left-factor language.That is, it is stable by taking prefixes, and every word of P is the proper prefix of another word of P. It has been proved in [14] that the average complexity f L (n) of the anticipated rejection scheme for a left-factor language L on a k-letter alphabet is where L(z) is the length generating function of the words of L.
We now exploit the results of [5], giving the singular behaviour of the series M (z) and E(z) that count respectively meanders and excursions.As a meander factors uniquely as an excursion followed by a positive walk, we can derive from [5] the singular behaviour of the series P (z) = p n z n that counts positive walks.This series is always algebraic, so that singularity analysis applies.

Boltzmann sampling
A Boltzmann generator [20] generates every object in the class C with a probability proportional to x n , where n is the size of the object.More precisely, for every object w (a walk, in our context): x |w| C(x) where C(x) is the generating function of the objects of C. Of course, this results in a relaxation of the size constraint, since objects of all sizes can be generated.But, by tuning carefully the parameter x (which has to be smaller than or equal to the radius of convergence of C(x)), and rejecting the too large and too small objects, one can often achieve an approximate-size random sampling, with a tolerance ε, in linear time.This means that after a linear number of realarithmetic operations, and a number of attempts that is constant on average, the algorithm will produce an object of size |w| ∈ [(1 − ε)n, (1 + ε)n], which is uniform among the objects of the same size.
In particular, the grammar (17) shows that the class of positive walks is specifiable in the sense of [20].The analysis of the generating functions of meanders and excursions performed in [5] shows that the series P (z) counting positive walks is always analytic in a ∆-domain, with a dominant singularity in (1 − µt) −ν , where ν = 1 if a > b, ν = 1/2 if a = b and ν = −1/2 if a < b.In the first two cases, Theorem 6.3 of [20] gives an approximate sampling in linear time (and an exact sampling in quadratic time).In the third case, the standard deviation of the objects produced by a standard Boltzmann sampler is much larger than their mean, which makes rejection costly.However, we can generate instead pointed positive walks, that is, positive walks with a distinguished step, and forget the pointing: as guaranteed by Theorem 6.5 of [20], this gives again an approximate sampling in linear time.
To conclude, the uniform random generation of (a, b)-positive walks of size n can be performed in linear time when a ≥ b by an anticipated rejection, and this strategy does not require any precomputations nor storage.When a < b, our best algorithm for exact sampling remains the recursive approach based on the grammar (17).It runs in O(n 1+ε ) after a O(n 1+ε ) precomputation.However, one can achieve, in linear time and space, an approximate-size sampling using a Boltzmann generator.

Recursive step-by-step approach
This elementary procedure, introduced in [30], generates culminating walks step by step, choosing every new step with the right probability.This is again an instance of Wilf's recursive method.The arguments given in Section 5.1 for positive walks should now be replaced by the following ones: • For W = C a,b , the number of extensions of length n of a prefix w ∈ W p depends only on three parameters: -the length difference i = n − |w|, -the final height j = φ a,b (w), -the maximal height h reached by w. • Let c i,j,h be the number of extensions of length n of such a prefix w.The numbers c i,j,h obey the following recurrence: As the parameters i, j and h are bounded by n, an and an respectively, the precomputation of the numbers c(i, j, h) takes O(n 3 ) arithmetic operations and requires to store O(n 3 ) numbers.
Then, the generation of a random word of length n can be performed in linear time.But again, the numbers c i,j,h are exponential in n, so that the actual time-space complexity of the precomputation stage may grow to O(n 4 ).The above procedure is easily adapted to generate culminating walks ending at a prescribed height k.The number c (k) i,j of i-step extensions of a prefix ending at height j is given by c Now j is bounded by k, so that we only have to compute a table of O(kn) numbers, in O(kn) arithmetic operations.The actual time-space complexity is likely to grow to O(kn 2 ) due to the handling of large numbers.
However, whether the height of the walk is fixed or not, one should be able to limit the computational overhead due to the size of these numbers to O(n ε ), using a floating-point technique adapted from [16].

Rejection methods
We presented in Section 5.3 an example of the anticipated rejection approach.The more general rejection principle has been applied successfully to various problems [17,6,20].The principle of a rejection algorithm for words in W is to draw objects uniformly in a superset V ⊃ W until an object of W is found.The average-case complexity of a such a technique is then ζ(n)v n /w n , where ζ(n) is the cost for the generation of a word of size n in V, and w n and v n respectively denote the number of words of length n in W and V.
The aim is to find a superset V satisfying the following (sometimes conflicting) requirements: -the words of V can be generated quickly, so that ζ(n) is small, -the set V is not too large, so that the ratio v n /w n is small.Moreover, testing whether a word of V actually belongs to W should be doable in linear time.This is obviously the case when W = C a,b .
We investigate below two possibilities for the superset V, while fixing W = C a,b .6.2.1.Drawing from positive walks.Here, we take for V the set of positive walks.Their random generation has been discussed in Section 5, and we refer to the last lines of this section for our conclusions on this question.
-When a < b, the number v n of positive walks of length n grows like α n a,b n −3/2 (up to a multiplicative constant).If c a,b n grows like α n a,b n −3−γ for γ ≥ 0 (see Proposition 4.3), the cost will be O(n γ+5/2+ε ), with a preprocessing stage of O(n 1+ε ).However, approximatesize sampling can be performed in time O(n γ+5/2 ), with no preprocessing stage.It suffices to reject among the set of positive walks generated by a Boltzmann algorithm.
-If a = b, then v n grows like 2 n n −1/2 , while c n ∼ 2 n /n (Proposition 4.1).Hence the cost here is O(n 3/2 ).-Finally, for a > b, the number of culminating walks grows like 2 n (Proposition 4.2).This shows that the algorithm is linear.Remark.For a > b, culminating walks are so numerous that we can even perform the rejection in the set of general (a, b)-walks, and still obtain a linear complexity, as discussed in the introduction.However, it seems natural to perform an anticipated rejection, rejecting walks as soon as they stop being positive: but this amounts to performing rejection in the set of positive walks, obtained themselves via an anticipated rejection from general walks.6.2.2.Drawing from hybrid walks.We begin with a simple, yet crucial, observation: Let ← w denote the mirror image of the word w.Then if w ∈ C a,b , so is ← w.Graphically, taking the mirror image amounts to a central symmetry on walks.This remark implies that, on average, the mid-point of a culminating walk lies at a height which is half the final height.This suggests another possible superset of C a,b from which we may draw, namely the language H a,b of hybrid walks, defined by where P is the language of positive walks, and ← − P the language of mirror images of positive walks.As already observed in Section 4, C a,b ⊂ H a,b .
The intuition behind the choice of the superset H a,b is that a path that violates the positivity (resp.final record) condition is likely to do so at its beginning (resp.ending).Thus, ensuring positivity on the first half of the walk, and the final record condition on the second half, should yield a lower rejection probability than ensuring positivity everywhere, as we did when drawing from positive walks.
How can one generate hybrid walks uniformly at random?As a hybrid walk of length n is the (non-ambiguous) concatenation of a positive walk of size ⌊n/2⌋ and of the mirror image of another positive walk, of size ⌈n/2⌉, it is sufficient to draw positive walks uniformly at random.The cost of the generation of a hybrid walk of length n will be twice the cost of the generation of a positive walk of length (approximately) n/2.We refer again to the end of Section 5 for our conclusions on this cost.We do not use below the Boltzmann sampling for positive walks, since gluing two positive walks of approximate size n/2 does not give the same probability to all hybrid walks of a given size.Let us now discuss the efficiency of the rejection approach based on the language H.
-When a < b, we have |H n | = Θ(α n a,b /n 3 ), while m n = Θ(α n a,b /n 3/2 ), so that we gain an order O(n 3/2 ) in complexity (comparing with the rejection of positive walks).This leads to a cost O(n γ+1+ε (compared with the approach that generates positive walks) can only be Θ(1).The algorithm is still linear.

Conclusion and perspectives
We have studied culminating paths, from the point of view of formal languages, enumerative combinatorics and random generation.Our best results in terms of random generation are summarized in Table 2.
An important question that is left open is to determine the asymptotic growth of the number of culminating walks when the drift is negative (a < b).One possible approach would be to exploit  It would also be interesting to study how the height is distributed on random culminating walks of length n.Such a study may provide better algorithms for random generation, especially in the a < b case, where the height is expected to be small.How does the average height scale with n?Is there a limiting distribution for some normalized height?This is related to a more ambitious question: is there a limiting process for culminating walks, in the same way discrete excursions converge to the Brownian excursion [29], or discrete meanders to the Brownian meander [28]?In the case a = b = 1, a candidate for the limit process could be the meander conditioned (with care) to reach its maximum at time 1.Note that the joint law of the maximum and final position of a meander is known [21], and related to the law of the maximum and minimum of a Brownian bridge, both in the continuous and discrete cases [13].The case where the maximum coincides with the final position (an event of zero probability in the continuous case) is closely related to our culminating walks.
Future extensions of the present work may also include the study of culminating walks with more than two types of steps, in order to model different kinds of matches and mismatches, and thus capture the whole scoring scheme of the FLASH algorithm.For instance, it is usually considered less drastic to replace a purine base by another purine base (A↔G) rather than a pyrimidine one in DNA.It is thus natural to penalize differently different mismatches.This could be modelled by introducing down steps of different heights.
Lastly, a natural, biologically relevant perspective would be to address the non-uniform generation of culminating paths.Indeed, the matches and mismatches may not be uniform over a biological sequence, and be subject to local correlations.This is classically modelled by a Markov chain (further conditioned to yield culminating paths).Our algorithms could in principle be adapted to this more general context, but their analysis would need to be carefully worked out.In particular, the drift of random walks would depend on the chain and differ in general from a − b.We naturally expect the efficiency of our algorithms to depend of the model, culminating walks with positive drift being much easier to generate than those with a negative drift.

Figure 1 .
Figure 1.A culminating path (for a = 5 and b = 3) and the corresponding word.

Proposition 3 . 1 .
Let a = b = 1 and k ≥ 1.The length generating function of culminating paths of height k is
and C is the column vector (U a+b−1 1 , . . ., U a+b−1 a+b Linear recursions.Finally, let us prove that the sequences of polynomials N k and D k satisfy a linear recurrence relation with coefficients in Q[t], the ring of polynomials in t.Equivalently, we prove that each of the generating functionsN (z, t) := k≥a N k z k and D(z, t) := k≥a D k z kis actually a rational function in z and t.The existence of a linear recursion then easily follows by the general theory of rational series[36, Ch. 4].Given the expression (5) of N k , what we have to do is to evaluateN ′ (z; u 1 , . . ., u a+b ) := k≥a s (k−1) a−1 ,a−1 z kwhere the symmetric functions involve the a + b indeterminates u 1 , . . ., u n , with n = a + b.

Proposition 4 . 2 .
For a > b, the number c a,b n of culminating walks of length n satisfies c a,b n = κ 2 a,b .2n + O(ρ n ), where ρ < 2 and κ a,b is the constant involved in the asymptotics of positive walks.

Proposition 4 . 3 .
For a < b, the number c a,b n of culminating walks of length n satisfies c

Proposition 5 . 1 .
The anticipated rejection scheme applied to the uniform random generation of (a, b)-positive walks has a linear time-complexity when a ≥ b and an exponential complexity in Θ((2/α a,b ) n n √ n) when a < b, with α a,b = a+b a+b √ a a b b < 2.
) if c n scales like α n a,b n −3−γ , with a O(n 1+ε ) precomputation.-When a = b = 1, |H n | = Θ(2 n /n), while m n = Θ(2 n / √ n),so that the gain is of order √ n.Consequently, the complexity of the rejection algorithm based on H is linear.No precomputation nor storage is required.-For a > b, we have |H n | = Θ(2 n ), and similarly m n = Θ(2 n ).So the complexity gain n a,b n −3−γ if a < b. the closed form expression of Proposition 3.3, in the spirit of Proposition 4.1 and [5].The result might have interesting consequences regarding the random generation of culminating walks.In particular, if c a,b n = Θ((m a,b n/2 ) 2 n −γ ) = Θ(α n a,b n −3−γ ), with γ < 2, the generation algorithm based on hybrid walks would be faster than the recursive algorithm, at least for generating few paths.However, our numerical data suggest that the ratio c a,b n /(m a,b n/2 ) 2 decreases at least as fast as n −2 .

Table 2 .
The complexity of random generation of positive and culminating paths.The cost is that of one random drawing, once the precomputations have been performed.It is assumed that c n ∼ α