The expected number of inversions after n adjacent transpositions

We give a new expression for the expected number of inversions in the product of n random adjacent transpositions in the symmetric group S_{m+1}. We then derive from this expression the asymptotic behaviour of this number when n scales with m in various ways. Our starting point is an equivalence, due to Eriksson et al., with a problem of weighted walks confined to a triangular area of the plane.

Markov chains on finite groups have attracted a lot of interest in, at least, the past 30 years, at the interface of algebra and probability theory [1,11,14,15,26,27]. Such chains are also studied in computational biology, in connection with models of gene mutations [3,16,19,20,29]. A central question is to estimate the mixing time of the chain, for large values of m. (The mixing time is the number of steps the chain takes to approach its equilibrium distribution.) The above chain is periodic of period 2, since π (2n) is always in the alternating group. Thus it has no equilibrium distribution. However, an elementary modification (which consists in choosing π (n+1) =π (n) with probability 1/(m + 1) and otherwise multiplying by a transposition s i chosen uniformly) makes it aperiodic, and the equilibrium distribution is then uniform over S m+1 . The mixing time is known to be Θ(m 3 log m) (see Aldous [1], Diaconis and Saloff-Coste [14] and Wilson [30]).
More recently, Berestycki and Durrett [4] studied a continuous time version of this chain, denoted (X m t ) t≥0 , where multiplications by adjacent transpositions occur according to a Poisson process of rate 1. The connection with the chain π (n) described above is straightforward. The authors focussed on the inversion number D m t of X m t , as a function of m and t. They established the convergence in probability of this random variable, suitably normalized, when t ≡ t m scales with m in various ways. The limit is usually described in probabilistic terms, except in one simple regime, m ≪ t ≪ m 3 , where D m t / √ mt is shown to converge to 2/π. Of course, the interesting regimes are those occurring before stationarity. This paper actually stems from a remark made in Berestycki & Durrett's paper. They quote a paper by Eriksen [18], which gives a rather formidable expression for I m,n , the expected number The authors underline that "it is far from obvious how to extract useful asymptotic from this formula". This observation motivated our interest in the expectation I m,n . As the problem is of an algebraic nature, it should have much structure: would it be possible to find an alternative expression of I m,n , with a neater structure than (1), and to derive asymptotic results from this expression?
This is precisely what we do in this paper. Our alternative formula for I m,n reads as follows.
The expected number of inversions after n adjacent transpositions in S m+1 is Equivalently, the generating function I m (t) = n≥0 I m,n t n is This result is proved in Section 2, and asymptotic results are derived in Sections 3 to 5 for three main regimes: linear (n m = Θ(m)) and before, cubic (n m = Θ(m 3 )) and beyond, intermediate (m ≪ n m ≪ m 3 ). For the moment, let us give a few comments and variants on Theorem 1.
Limit behaviour. The chain π (n) has period 2, as π (n) is an even (resp. odd) permutation if n is even (resp. odd). But the sub-chains π (2n) and π (2n+1) are aperiodic on their respective state spaces, the alternating group A m+1 (the group of even permutations) for π (2n) and its complement S m+1 \ A m+1 for π (2n+1) . Moreover, each of these chains is irreducible and symmetric, and thus admits the uniform distribution as its equilibrium distribution. For m ≥ 3, the average number of inversions of an element of A m+1 is m(m + 1)/4, and the same is true of elements of S m+1 \ A m+1 . Thus when m ≥ 3 is fixed and n → ∞, we expect I m,n to tend to m(m + 1)/4. This can be seen on the above expression of I m,n , upon observing that, for m ≥ 3 and 0 ≤ j, k ≤ m with j + k = m, there holds x jk ∈ (−1, 1). (The condition j + k = m is equivalent to c j + c k = 0.) If m ≥ 8, the stronger property x jk ∈ (0, 1) holds, which shows that I m,n is an increasing function of n.
Eigenvalues of the transition matrix. Another consequence of the above theorem is that we have identified a quadratic number of eigenvalues of the transition matrix of the chain (see [18] for a detailed account of the connection between these eigenvalues and I m,n ).
2m+2 . The transition matrix of the adjacent transposition Markov chain on S m+1 admits -among others-the following eigenvalues: This result is not really new: as pointed to us by David Wilson, one can derive from the proof of Lemma 9 in [30] that for 0 ≤ p, q ≤ m, with p = q, is also an eigenvalue. This collection is larger than the one we have exhibited, as can be seen using The transition matrix has still more eigenvalues, for instance −1, with eigenvector (ε(σ)) σ∈Sm+1 where ε denotes the signature. The paper [17], originally motivated by a coding problem, gives a linear number of eigenvalues of the form i/m, where i is an integer. There exists a description of all eigenvalues in terms of the characters of S m+1 (see [15,Thm. 3]). This description is valid in a much more general framework, and it seems that the complete list of eigenvalues is not explicitly known.
Rationality of I m (t). That the series I m (t) is always rational should not be a surprise: this property is clear when considering the transition matrix of the chain. Here are the first few values: , , , A related aperiodic chain. If we consider instead the aperiodic variant of the chain (obtained by choosingπ (n+1) =π (n) with probability 1/(m + 1) and otherwise multiplying by a transposition s i chosen uniformly), it is easy to see that the expected number of inversions after n steps is nowĨ m,n = n k=0 n k m k (m + 1) n I m,k , so that the associated generating function is also rational: More generally, if a transposition occurs with probability p, the generating function of the expected number of inversions is .
(Above, p = m/(m + 1), but in [30] for instance, p = 1/2.) Alternative expressions. Theorem 1 gives a partial fraction expansion of I m (t). One of the advantages of (3) is that the coefficients involved in the sum over j and k are negative, which makes the list of poles of I m (t) clear (we have actually already used this to state Corollary 2).
A number of variations are possible, and we will use some of them in the asymptotic study of the numbers I m,n . For instance: Both formulas are proved in Section 2.

The expected number of inversions
We prove in this section Theorem 1 and its variants (4)(5). Our starting point is a functional equation satisfied by a series related to I m (t). We then solve this equation using the bivariate kernel method, which has proved successful in the past few years in various enumerative problems related to lattice paths, permutations, and other combinatorial objects [7,8,9,10,23,25,31]. The equation under consideration here can be interpreted in terms of weighted lattice paths confined to a triangular portion of the square lattice. Another problem of walks in a triangle, also related to the adjacent transposition Markov chain, was studied by Wilson [30] via a diagonalization of the adjacency matrix. The enumeration of (unweighted) walks in a triangle was performed by Flajolet [21] using the reflection principle, in connection with a problem of storage raised by Knuth

A functional equation
For π ∈ S m+1 , let us write π = π 0 π 1 · · · π m if π(i) = π i for all i. For n ≥ 0 and 0 ≤ i ≤ j < m, let p ). Then the expected number of inversions in π (n) is Examining how the numbers p (·) i,j may change at the n th step gives a recurrence relation for these numbers, first obtained by Eriksson et al. [20]. As shown by the lemma below, it converts our problem into the study of weighted walks confined to a triangular region of the square lattice. Consider the subgraph G m of the square lattice Z × Z induced by the points (i, j), with 0 ≤ i ≤ j < m ( Figure 1). We use the notation (i, j) ↔ (k, ℓ) to mean that the points (i, j) and (k, ℓ) are adjacent in this graph. i,j are characterized by the following recursion: As often, it is convenient to handle the numbers p (n) i,j via their generating function: Multiplying the above recursion by t n+1 , and then summing over n, gives the following functional equation for P (u, v).
whereū = 1/u,v = 1/v, and the series P ℓ , P t and P d describe the numbers p (n) i,j on the three borders (left, top, and diagonal) of the graph G m : In view of (6), the generating function we are interested in is which, according to the functional equation (7), may be rewritten In the next subsection, we solve (7), at least to the point where we obtain a closed form expression of P d (1), and hence a closed form expression of I m (t), as announced in Theorem 1.

Solution of the functional equation
We first establish a symmetry property of the series P (u, v).
In particular, the "diagonal" generating function P d (u) satisfies Proof. This can be derived from the functional equation satisfied by P (u, v), but we prefer to give a combinatorial (or probabilistic) argument. Let τ be the permutation of S m+1 that sends k to m − k for all k. Note that τ is an involution. Let Φ denote the conjugacy by τ : that is, Φ(σ) = τ στ . Of course, Φ(id) = id and Φ(σσ ′ ) = Φ(σ)Φ(σ ′ ). Also, Φ has a simple description in terms of the diagram of σ, in which σ(i) is plotted against i: the diagram of Φ(σ) is obtained by applying a rotation of 180 degrees to the diagram of σ. In particular, with s i = (i, i + 1), one has Φ(s i ) = s m−1−i for 0 ≤ i < m. These properties imply that the sequence of random permutations (Φ(π (0) ), Φ(π (1) ), Φ(π (2) ), . . .) follows the same law as the original Markov chain (π (0) , π (1) , π (2) , . . .). In particular, for 0 ≤ i ≤ j < m, This is equivalent to the first statement of the lemma. The second follows by specialization.
The next ingredient in our solution is the "obstinate" kernel method of [7,8,9,10,23,25,31]. The kernel of the functional equation (7) is the coefficient of P (u, v), namely Let (U, V ) be a pair of Laurent series in t that cancel the kernel: (7) cancels the left-hand side, and thus the right-hand side. That is, denoting as usuallȳ Let us now exploit the symmetries of the kernel: obviously, K(u, v) is invariant by the transformations u →ū and v →v. Hence the pairs (Ū , V ), (Ū ,V ) and (U,V ) also cancel K, and it follows that Let us form the alternating sum of the four previous equations: all occurrences of P ℓ and P t vanish, and, using (10), one obtains, after multiplying by as soon as K(U, V ) = 0, U = 1 and V = 1.
This equation involves only one unknown series, namely P d . So far, the series U and V are coupled by a single condition, K(U, V ) = 0. Let q be a complex root of q m+1 = −1, and let us add a second constraint on the pair (U, V ) by requiring that U = qV . That is, V must be a root of whereq = 1/q andV = 1/V . We further assume q = −1 (otherwise K(qv, v) is independent of v). Then V = 1, U = qV = 1, and the first term of (11) vanishes. We thus obtain an explicit expression of P d (q), which we write below in terms of V and its conjugate root V ′ =qV .
Lemma 6. Let q = −1 satisfy q m+1 = −1, and let V ≡ V (t) and V ′ ≡ V ′ (t) be the two roots of (12). Then V V ′ =q = 1/q and The series V and V ′ are algebraic (of degree 2) over C(t). But their symmetric functions are rational, and thus P d (q) is rational too, as expected. The following lemma gives an explicit rational expression of P d (q). Equivalently, Proof. We will establish a closed form expression for the coefficient of t n in (1 − t)P d (q), which is clearly equivalent to the first expression of P d (q) given above: for n ≥ 1, In order to obtain this expression, we begin with applying Cauchy's formula to the expression of (1 − t)P d (q) given in Lemma 6. Let V be the root of (12) that vanishes at t = 0. (The other root V ′ =qV has a term in O(1/t) in its expansion.) Then where the integral is taken along a small circle around the origin, in counterclockwise direction, (12), Thus the integral expression of a n reads a n = 1 2iπ where the integral is taken along a small circle around the origin, in counterclockwise direction, and satisfies S(qv) = −S(v). Note that the only possible poles of S(v)/v are 0 and the (2m+2)th roots of unity (because q m+1 = −1). Thus a n is simply the residue of S(v)/v at v = 0. Performing the change of variables v =qw, withw = 1/w, gives where the integral is now taken along a large circle around the origin, in clockwise direction. This integral thus collects (up to a sign) all residues of S(v)/v. The residue formula thus gives: where Take v = e iα with α = ℓπ/(m + 1) and 0 ≤ ℓ < 2m + 2 and recall that q = e iθ with θ = (2k+1)π m+1 . Then 1 +q = 2e −iθ/2 cos(θ/2), Putting these identities together, one obtains P (v) = 4i qm sin(θ/2) cos(θ/2) sin 2 (α + θ/2) + sin 2 (θ/2) cos(α + θ/2) , or, with the notation of the lemma, Returning to (14) now gives which is equivalent to (13), upon noting that c j = c −j−1 . The first expression of P d (q) given in the lemma follows.
For the second expression, we simply perform a partial fraction expansion in the variable t, and use m j=0 c j = 0 (this identity follows for instance from c m−j = −c j ).
Recall that P d (u) is a polynomial in u of degree m − 1. The above lemma gives its values at m, or even m + 1, distinct points. Thus P d (u) is completely determined by these values, and we can recover it by interpolation. We use a version of Lagrange's interpolation that is well-suited to symmetric polynomials. Lemma 8. Let P (u) be a polynomial of degree m − 1 with coefficients in some field containing C. Assume P (u) is symmetric. That is, P (u) = u m−1 P (ū), withū = 1/u. Let ℓ = ⌊ m−1 2 ⌋, and let q 0 , . . . , q ℓ be distinct elements of C such that q j q k = 1 for all k and j. Then where χ m,0 = 1 if m is even, and 0 otherwise. When q k = e iθ k with θ k = (2k+1)π m+1 , this can be rewritten as In particular, with the notation (2), Proof. For the first part, it suffices to observe that the expression given on the right-hand side of (16) has degree m − 1, and takes the same values as P (u) at the 2ℓ + 2 distinct points q 0 , . . . , q ℓ , 1/q 0 , . . . , 1/q ℓ , and also at −1 if m is even. This gives a total of m + 1 correct values, which is more than enough to determine a polynomial of degree m − 1.
For the second part, we observe that q 0 , . . . , q ℓ , 1/q 0 , . . . , 1/q ℓ , together with −1 if m is even, are the m + 1 roots of u m+1 + 1. Hence which, in the limit u → q k , gives The second result follows. The third one is obtained by setting u = 1, and noticing that sin θ k = 2c k s k , while 1 − cos θ k = 2s 2 k .
We can finally combine this interpolation formula with Lemma 7 to obtain an explicit expression of P d (u). As we are mostly interested in P d (1) (see (9)), we give only this value. Proposition 9. Let ℓ = ⌊ m−1 2 ⌋, and adopt the notation (2). Then the series P d (u) defined by (8) satisfies: Proof. We apply Lemma 8 to the second expression of P d (q k ) obtained in Lemma 7. This gives the first expression of P d (1) above, provided The latter identity is obtained by applying Lemma 8 to P (u) = 1 + u + · · · + u m−1 = (1 − u m )/(1 − u). We now seek a symmetric formula in j and k. Using and this gives the second expression of P d (1).
Proof of Theorem 1. Let us now return to the generating function I m (t) whose coefficients give the expected number of inversions. It is related to P d (1) by (9). Theorem 1 is obtained by combining the second expression of Proposition 9, a partial fraction expansion in t (based on (15)), and finally the identity m j,k=0 To prove this identity, we write and complete the proof thanks to (17).
To obtain the expression (4) of I m (t), we write m j,k=0 The latter identity follows from replacing j by m − j and k by m − k.
Let us now extract the coefficient of t 0 in (4). This gives and the expression (5) of I m (t) follows.

Small times: linear and before
When m is fixed and n → ∞, the asymptotic behaviour of the numbers I m,n is easily derived from Theorem 1, as sketched just after the statement of this theorem. For m ≥ 3, where x 00 = 1 − 4 m sin 2 π 2m+2 . In this section and the next two ones, we consider the case where n ≡ n m depends on m, and m → ∞. As in [4], three main regimes appear: linear (n m = Θ(m))), cubic (n m = Θ(m 3 )) and intermediate (m ≪ n m ≪ m 3 ). This can be partly explained using the following simple bounds. In particular, if n ≡ n m and m → ∞, Proof. These inequalities follow from the expression of I m,n given in Theorem 1. The upper bound is obtained by retaining only, in the sum over j and k, the term obtained for j = k = 0. The lower bound follows from |x jk | ≤ x 00 and (18).
Observe that x 00 = 1 − O(1/m 3 ). The lower bound on I m,n already shows that if n ≫ m 3 , then x 00 n = o(1) and I m,n ∼ m(m+1)

4
. Also, if n ∼ κm 3 for some κ > 0, then x 00 n ∼ α for some α ∈ (0, 1): then I m,n is still quadratic in m, but the upper bound shows that the ratio I m,n /m 2 will be less than 1/4. The other regimes correspond to cases where n ≪ m 3 .
This section is devoted to the linear (and sub-linear) regime. We first state our results, and then comment on their meaning.  Assume n ≡ n m = Θ(m). That is, κ 1 m ≤ n m ≤ κ 2 m for two positive constants κ 1 and κ 2 . Then Comment. When n m is sub-linear, the inversion number equals its largest possible value, n, with high probability. For instance, I 1000,10 ≃ 9.9. When n m grows linearly with m, the expected inversion number is still linear in n, but with a ratio f (κ), where κ = n/m. This ratio decreases from f (0) = 1 to f (∞) = 0 as κ increases from 0 to ∞. The fact that f (0) = 1 is consistent with the sub-linear result. Note that for a related continuous time chain, with inversion number D t m , it has been proved that when t = κm, the random variable D t m /t converges in probability to a function of t described in probabilistic terms [4].
Proof. The starting point of both results is the following expression of I m,n , which corresponds to (5): • Assume n m = o(m). Then uniformly in j and k. Thus (we have bounded |c j + c k | by 2) and the result follows using m k,j=0 To prove these two identities, one may start from the following "basic" identity: which follows for instance from (19).
• Assume now n m = Θ(m) and denote κ = n/m. Then Thus By (21), the absolute value of the second term above is bounded by Recall that c j = cos (2j+1)π 2m+2 . Hence the first term in the expression of I m,n looks very much like a (double) Riemann sum, but one must be careful, as the integral π 0 π 0 cos x + cos y (1 − cos x)(1 − cos y) (1 − exp (−4κ(1 − cos x cos y))) dx dy diverges (the integrand behaves like 1/x around x = 0, and like 1/y around y = 0). Let us thus write and The first sum reads (see (22)). Both terms in S 1 are now bona fide Riemann sums. More precisely, , and similarly Similarly, S 2 is a (double) Riemann sum associated with a converging integral, and is O(1). Thus Assume n ≡ n m = Θ(m 3 ). That is, κ 1 m 3 ≤ n m ≤ κ 2 m 3 for two positive constants κ 1 and κ 2 . Then Comment. When n ≫ m 3 , the inversion number equals, at least at first order, its "limit" value , which is the average number of inversions in a permutation of S m+1 taken uniformly at random. When n = Θ(m 3 ), the inversion number is still quadratic in m, but with a ratio g(κ), where κ = n/m 3 . This ratio increases from 0 to 1/4 as κ goes from 0 to ∞, which makes this result consistent with the super-cubic regime. Note that for a related continuous time chain, with inversion number D t m , it has been proved that when t = κm 3 , the random variable D t m /m 2 converges in probability to a function of t described in probabilistic terms [4].
As said above, I m,n ∼ m 2 /4 as soon as n ≫ m 3 . Whether the next term in the expansion of I m,n is exact, that is, equal to m/4, depends on how n compares to m 3 log m, as shown by the following proposition. In particular, it clarifies when I m,n = m(m+1) Thus if c < 1/π 2 , there exists γ > 0 such that while if c > 1/π 2 , there exists γ > 0 such that For the critical value c = 1/π 2 , the following refined estimate holds: if n ≡ n m ∼ 1/π 2 m 3 log m+ αm 3 + o(m 3 ), then Proof of Proposition 12. The first result is a direct consequence of Lemma 10, given that Assume now that n m = Θ(m 3 ). We start from the following expression of I m,n , which corresponds to (4): Let M ≡ M m be an integer sequence that tends to infinity in such a way M m = o( √ m). We split the sum over j and k into two parts: j ≤ M , k ≤ M in the first part, j > M or k > M in the second part. Let us prove that the second part can be neglected. We have: By (22), the sum over k is O(m 2 ). Moreover, the sum over j is a Riemann sum, and the function x → 1/(1 − cos x) is decreasing between 0 and π, so that This gives Let us now focus on small values of j and k. The following estimates hold uniformly in j and k, when 0 ≤ j, k ≤ M : = 128m 4 (2j + 1) 2 (2k + 1) 2 π 4 (1 + o(1)) , = exp − nπ 2 ((2j + 1) 2 + (2k + 1) 2 ) 2m 3 (1 + o(1)) .

The intermediate regime
Assume m ≪ n m ≪ m 3 . Then, denoting n ≡ n m , Comment. By Propositions 11 and 12, one has It can be proved that which makes all three regimes consistent. Note that for a related continuous time chain, with inversion number D t m , it has been proved that when m ≪ t ≪ m 3 , the random variable D t m / √ mt converges in probability to 2/π [4].
Proof. The proof mixes arguments that we have already used for small and large times. As in the small time case, we start from (20). As in the large time case, we split the sum over j and k into two parts: j ≤ M , k ≤ M in the first part, j > M or k > M in the second part. Here, M = M m is an integer sequence that satisfies . Such a sequence exists under the assumptions we have made on n.

Perspectives
Many interesting Markov chains on groups have been studied, and it is natural to ask to which similar problems the approach used in this note could be adapted. To make this question more precise, let us underline that such problems may involve changing the dynamics of the chain, changing the statistics under consideration, or changing the underlying group.
Changing the statistics. We have focused in this paper on the inversion number. It is an interesting parameter from a probabilistic point of view, as it gives an indication of whether the chain may be mixed at time n or not. However, in certain biological contexts, it may be more sensible to estimate instead the natural "distance" between π (0) and π (n) , defined as the minimum number of chain steps that lead from one to the other [3,4]. For the chain studied here, this coincides with the inversion number, but for other dynamics it will be a different parameter. For instance, if we multiply by any transposition, one step suffices to go from 012 to 210, whereas the inversion number of 210 is 3. Clearly, the approach we have used here relies heavily on the possibility of describing in simple terms the evolution of the parameter under consideration (as we did by the combination of (6) and Lemma 3). Note that an explicit expression for the expected distance after n (non-necessarily adjacent) transpositions has been given in [19].
Changing the group. Although many different chains in many different groups have been considered, we are primarily thinking of classical families of (finite or affine) Coxeter groups, because the inversion number admits in these groups a natural generalization (the length) which has usually a simple description [5,Chap. 8]. According to [18], an explicit formula, due to Troili, is already known for the average length of the product of n generators in the affine group A m . Another solved case is the hypercube Z m 2 [13].