A random walk on the symmetric group generated by random involutions

The involution walk is the random walk on $S_n$ generated by involutions with a binomially distributed with parameter $1-p$ number of $2$-cycles. This is a parallelization of the transposition walk. The involution walk is shown in this paper to mix for $\frac{1}{2} \leq p \leq 1$ fixed, $n$ sufficiently large in between $\log_{1/p}(n)$ steps and $\log_{2/(1+p)}(n)$ steps. The paper introduces a new technique for finding eigenvalues of random walks on the symmetric group generated by many conjugacy classes using the character polynomial for the characters of the representations of the symmetric group. Monotonicity relations used in the bound also give after sufficient time the likelihood order, the asymptotic order from most likely to least likely permutation. The walk was introduced to study a conjecture about a random walk on the unitary group from the information theory of black holes.


Introduction
This paper examines how a natural notion of "parallelization" affects the rate of convergence to stationary for a random walk on the symmetric group. The base walk that this paper parallelizes is the p lazy random transposition walk. It has as generators the identity with probability p and a uniformly random transposition with probability 1 − p. This is equivalent to putting n cards on the table and with probability 1 − p swapping a random pair. The transposition walk for p = 1 n takes order 1 2 n log(n) + cn steps to converge to its uniform stationary distribution [4]. Suppose the walk is parallelized by simultaneously transposing s disjoint pairs at the same time. This is like taking s steps of the non-lazy transposition walk, except it guarantees 2s distinct cards are moved. This problem can be explored in several ways. For n even, the maximum number of disjoint transpositions is n/2. If these are chosen via a random matching, a randomly chosen fixed point free involution results. The walk generated by all fixed point free involutions was analyzed by Lulov [9], who showed that it mixes in 3 steps. In this paper, each transposition in a fixed point free involution is discarded with some probability. This is a parallelized p-lazy transposition walk. When this probability is fixed and at least 1 2 the results here show that this walk has mixing time Θ(log(n)). More specifically, this paper studies the random walk on S n , for n even, generated by first choosing uniformly at random a fixed point free involution, also known as a perfect matching, then discarding or keeping each 2-cycle it contains independently with probability p, 1 − p respectively. This means the probability an involution with s 2-cycles is selected is n/2 s p n/2−s (1 − p) s . By considering general p, this gives a family of walks with pn as the expected number of fixed points of a generator. Taking p = 1 − 2/n gives an "expected transposition walk" where on average, a transposition will be selected, or as p → 0, an expected fixed-point free involution walk. The author conjectures that mixing occurs with cutoff at log 1/p (n) + log 1/p (c) steps for any p bounded away from 0. This mixing time would interpolate from an expected transposition walk to an expected s 2-cycle walk for any s < n/2 with comparable mixing times to their non-random cousins, in particular the transposition walk mixing with cutoff at 1 2 n log(n) + cn steps [4]. This paper, for p ≥ 1/2 fixed, for n sufficiently large, establishes for mixing a lower bound of log 1/p (n) in Theorem 20 and an upper bound of log 2/(1+p) (n) in Theorem 17. These are separated by just over a factor of 2.
This upper bound is found through a combination of two methods. Both use the expression of the eigenvalues of the walk in terms of the characters of the symmetric group. The character polynomial gives the characters of S n as a polynomial in the cycle decomposition of a permutation. The eigenvalues of this walk, as seen in (1), are a linear combination of characters evaluated at the n/2 + 1 conjugacy classes of involutions. Since all these involutions have only 1-and 2-cycles, understanding the character polynomial in these cycles will give a strong bound on the large eigenvalues of the walk. A recursive formula for the eigenvalues given in Proposition 3 is used to control the small eigenvalues. This recursion constructed via the Murnaghan-Nakayama rule leads to a series of monotonicity conditions on the eigenvalues. These monotonicity conditions require that p ≥ 1 2 . A secondary result of these monotonicity conditions, and so also restricted to p ≥ 1 2 , is a total order for the most likely to least likely element after sufficient time is identified in Corollary 9. At each step of a Markov chain, there is a partial ordering from the most likely to the least likely state. A linear extension to a total order is called a likelihood order. With mild conditions, after sufficient time, this converges to a fixed likelihood order. For the involution walk with p ≥ 1 2 , the limiting likelihood order is the cycle lexicographic order as defined in Definition 8. This means after sufficient time the identity will be the most likely element and an n-cycle the least likely element of the walk. This is the same likelihood order as p lazy transposition walk for p ≥ 1 2 [1]. Likelihood orders are motivated by the total variation distance and separation distance metrics of studying Markov chain convergence.
The most common quantification of the convergence to uniform of a random walk on a group G is total variation distance. Let P * t (g) denote the probability of being at g at time t for a random walk on G. Let A ⊂ G denote a subset A of G, and P * t (A) the total probability of elements of A. Then, Identifying where 1 |G| sits inside the likelihood order splits the group elements into this set A and its compliment. The principal technique for finding likelihood orders will also give what this set is after sufficient time. Even a partial identification of this maximal set is of use in constructing lower bounds on mixing, in other words, for showing the total variation distance is not yet small. Separation distance measures how much less likely than uniform the least likely element is. Identifying the least likely element means this can be computed directly. Separation distance can be measured indirectly through Strong Stopping Time arguments.
The first work on likelihood orders was done by Diaconis and Graham, see Chapter 3C Exercise 10 of [2], motivated by a statistical problem posed by Tom Ferguson. Likelihood orders were also later studied by Diaconis and Isaacs [3]. They found that the random walk on a cycle or hypercube showed a strong monotonicity condition consistent with the total order given by distance as counted by the generators from the start. Since their likelihood orders hold at all times, simple induction suffices. For more examples of inductive proofs of likelihood orders see Chapter 1 of [1].
The limiting likelihood order is a statement of eigenvalue monotonicity. One element is eventually more likely than another if, in the difference of their probabilities expressed in terms of the eigenvalues using the discrete Fourier inversion formula, the term with largest eigenvalue with non-zero coefficient is positive. Diaconis and Shahshahani [4] in their seminal paper on mixing for the random transposition walk used a formula of Frobenius to establish a monotonicity property for the eigenvalues of the walk. The eigenvalues are labeled by partitions of n, and they observed that a classical partial order on partitions called majorization order is consistent with a monotonic decline in the eigenvalues. Diaconis and Graham showed that no total monotonic order can hold at all times for the transposition walk on the symmetric group. The likelihood order fluctuates within even a small number of steps (for n ≥ 6, the first change occurs after four steps). Lulov, in his thesis [9], connected the ordering on eigenvalues to what he termed an "asymptotic monotonicity property", here called a likelihood order, of the elements of the walk after sufficient time. He showed the transposition walk restricted to even steps after sufficient time followed a cycle lexicographic order.
As p → 0 and n is held constant, due to a parity problem, the walk can no longer mix in O(log(n)) steps let alone the smaller O(log 1/p (n)) steps. The fixed point free involution walk at even steps is confined to A n inside of S n . While for any p > 0, the involution walk will mix to all of S n , as p → 0, the probability of selecting a fixed point free involution at each step of the walk will grow to 1. Since it becomes more and more unlikely as p → 0 anything other than a fixed point free involution is chosen, at even steps, the involution walk will be more and more prone to be stuck on even elements inside of S n and take longer and longer to approach uniformity over all elements. This is shown in Proposition 21.
A random permutation can be made from at most n transpositions chosen systematically. This systematic scan consists of transposing in order the number in each position with itself or a later position uniformly at random. If instead the transpositions are chosen uniformly at random, as in the transposition walk, a random permutation takes 1 2 n log(n) + cn transpositions to build [4]. The largest impediment is a coupon collector problem of never choosing a transposition containing a large fraction of numbers by 1 2 n log(n) − cn steps. These never moved numbers are fixed points of the permutation, resulting in insufficiently random permutations. On the other hand, generating the random even permutation from a walk generated by fixed point free involutions, takes 3 steps of the walk or 3 2 n transpositions [9]. In the involution walk by letting p vary, one can study this transition from the minimum of O(n) transpositions to O(n log(n)) to build a random permutation. The analysis here holds for fixed p ≥ 1 2 and n sufficiently large. In all such cases it takes O(n log(n)) transpositions to build a random permutation.
While studying the information theory of black holes, physicists became interested in a random walk on the unitary group. Information in the black hole is expressed in qubits, each an element of C 2 . Take as basis vectors e 1 = (1, 0), e 0 = (0, 1). The walk is on n qubits, and so it on a 2 n -dimensional space with a basis indexed by n-bit binary strings. At each step the walk takes a random U (4) operator and applies it to two random qubits, acting on the binary strings indexing the basis. All 2 n−2 basis vectors with the same 2-bit combination for those two qubits are effected the same way under the walk. This means each U (4) operator acts 2 n−2 times, giving rapid mixing for such a high dimensional space. This walk is known to "scramble" in n log(n) steps [10]. Recent work of Hayden and Preskill [6] and Sekino and Susskind [10] has developed interest in a version of this walk where n/2 commuting steps of the random walk are taken at once. A different U (4) operator is chosen to each act on the different 2-cycles of a perfect matching of the qubits. This faster walk is conjectured to mix in O(log(n)) steps. The involution walk was designed as a toy model to study the effects of independent random shuffling on the components of a perfect matching. Section 2 describes the upper bound lemma from discrete Fourier analysis and the eigenvalues of the walk. Section 3 finds monotonic decay of the eigenvalues needed for the bounds and the likelihood order of the walk. Section 4 finds an upper bound for the mixing time of the binomially distributed involution random walk. Section 5 finds the the lower bound of log 1 p (n). Section 6 calculates the separation distance for the involution walk assuming the conjecture that the likelihood order holds at all times.

Background
The upper-bound lemma gives a bound on that the total-variation distance between the walk on a group and its uniform stationary distribution using its eigenvalues expressed in terms of the groups representations [2]. The version below is specialized to conjugacy class walks on the symmetric group, in which every element of a conjugacy class is equally likely. Partitions of n index the non-trivial irreducible representations of the symmetric group as well as the eigenvalues of the walk; the representation for the partition λ has dimension d λ .
Proposition 1 (Diaconis-Shahshahani [4]). When K(t) is a class function of an aperiodic, irreducible walk on S n , The sum below is over conjugacy classes κ of size |κ| with K(κ) the probability of one element of the conjugacy class, For this walk, the formula for the eigenvalue ψ λ is the sum over conjugacy classes of the probability of it being a generator times its character ratio: Bounds for these eigenvalues will be established through a combination of monotonicity relations on the eigenvalues and upper bounds on a handful of the eigenvalues using the character polynomial.
One formula for the character for the representation indexed by λ evaluated at a conjugacy class α, χ λ (α) is given by the Murnaghan-Nakayama rule. This rule expresses the character in terms of all the ways to decompose the partition λ into borderstrips (also known as rimhooks) of sizes α 1 , ..., α r in any fixed order. A borderstrip is a skew-partition, or the difference of two partitions, containing no two by two boxes. A decomposition can be written as a sequence of partitions P = (ρ 0 , ..., ρ r ) where ρ 0 = λ, ρ r = ∅ and the difference between sequential partitions, ρ i /ρ i+1 each with α i+1 boxes, is a borderstrip. We say the height of a borderstrip is one less than the vertical height of that partition, and the height of P , ht(P ), is the sum of the heights of the border strips in the decomposition. Since the walk is generated by only one and two cycles, only borderstrips of size 1 and 2 will be necessary. There is only one borderstrip of size one -corresponding to the partition [1] or a single box in a Young diagram. The two borderstrips of size two are [2], with Young diagram two horizonal boxes, and [1,1], with Young diagram two verticle boxes. These three borderstrips have heights zero, zero, and one, respectively.

Monotonicity of Eigenvalues
The eigenvalues of this walk show intriguing connections to the eigenvalues of the transposition walk. For example, in the transposition walk, the eigenvalues decrease according to majorization order, where λ is smaller than ρ if the blocks of λ can be moved up and to the right to get ρ. Below, this pattern is shown for the eigenvalues of the involution walk when the eigenvalue pairs are restricted to any λ and ρ = [n − i, i]. This monotonicity relation on eigenvalues is used in the upper bound section to get a bound for λ with λ 1 < n 2 . The result also gives that the likelihood order after sufficient time is the cycle lexicographical order, the same order as for the transposition walk.
This section is based upon the following recursive construction of the eigenvalues. This is derived from a probabilistic Murnaghan-Nakayama rule. In the Murnaghan-Nakayama rule, the sizes of the borderstips are the sizes of the cycles in the conjugacy class. The decomposition can be done with any ordering of the these sizes. Since the cycles decomposition of the generators of this walk are probabilistic, the ordering of the sizes can as well. The following formula comes from decompositions in which the first borderstrips in the decomposition are two 1-cycles with probability p or a two-cycle with probability 1 − p. Examining all the configurations of borderstrips that can be removed and their heights amounts to: Proof. Recall, the involutions are generated by starting with a perfect matching and removing transpositions with probability p. In a generator the first transposition from the starting perfect matching remains with probability 1−p or becomes two fixed points with probability p. The single border strip of size one is [1] while the borderstrips of size two are [2] and [1,1], with height 0 and 1 respectively. By the Murnaghan-Nakayama rule, To show monotonicity conditions on the eigenvalues through the recursive definition, it will be shown that the sum of these three terms is larger for one partition than another. The monotonicity does not follow term by term, only collectively. The following observation will be useful in the arguments that follow. Proof.
Fix n and assume the lemma holds for partitions of n − 2. For i ≤ n 2 − 1 the eigenvalue decomposes using the hook length formula to: The sizes of these three eigenvalues for the involution walk on S n−2 and their coefficients will be compared to those appearing for the partition [n − i − 1, i + 1].
Case 1: For i ≤ n/2 − 2, for the eigenvalue for the partition [n − i − 1, i + 1] this becomes, Two of the eigenvalues of n−2 that appear in [n−i, i] are larger by induction than those in , which means by induction that its a larger eigenvalue. This means its enough to check the coefficients of The left side is decreasing with i, while all terms on the right increase with i, so its enough to consider i = n/2 − 2 and p = 1 2 , in which case it is true that, 1 2 In the terms above, the [n/2 + 1, n/2 − 3] and [n/2, n/2 − 2] in ψ [n/2+1,n/2−1] overcomes the analogous term [n/2, n/2 − 2] in ψ [n/2,n/2] for p ≥ 1 2 since, This shows the the sum of the coefficients of the first two terms in the decomposition of [n/2 + 1, n/2 − 1] is larger than the sum of the first coefficient in the decomposition of [n/2, n/2]. This was done with the two terms with longer first rows than the one term, [n/2, n/2 − 2]. It remains to show that the sum of all three coefficients in [n/2 + 1, n/2 − 1] is greater than the sum of both coefficients in [n/2, n/2]. This once again follows by Proposition 4.
Proof. The proof will follow by induction on n. The base case for n = 2 is trivial as [2] and [1,1] are the only partitions with their first rows. Suppose it holds for all i for n − 2. If i < n 2 , no vertical removals per Murnaghan-Nakayama are possible from the first row and, There are three cases of lengths of first row this can generate: n − i,n − i − 1 and n − i − 2. Call the coefficients in the decomposition of [n − i, i] of these terms a n−i ,a n−i−1 and a n−i−2 . Let λ = [n − i, ...] be another partition. Let b n−i be the sum of cofficients of any [n − i, ...] in its decomposition, similarly define b n−i−1 and b n−i−2 . By Lemma 5, the corresponding eigenvalues increase as the first row increases. It needs to be shown that a n−i ≥ b n−i , a ni + a n−i−1 ≥ b n−i + b n−i−1 and finally a n−i + a n−i−1 + a n−i−2 ≥ b n−i + b n−i−1 + b n−i−2 . This last equation holds by Proposition 4. The strategy to show the other is to first show that a n−i ≥ b n−i ; in other words that a removal all from below the first row is more likely [n − i, i] than λ. The last inequality is shown in indirectly, by finding that a n−i−2 ≤ b n−i−2 ; the probability of a 2 cycle removal from the first row, which gives the shortest first row, is more likely than the same removal in [n − i, i]. Then a n−i + a n−i− Consider removing two blocks from below the first row. This effects at most two hook lengths from the first row. The smallest such hook lengths its possible to effect occur in [n − i, i], causing the largest increase to the ratio of old versus new contributions of the first row. Let h ′ 1,j denote the new hook lengths of the first row. Let ρ be a partition of i − 2 obtained by removing the first row of λ and two additional squares. Then, When p = 1, the sum over all such ρ of . This gives that the coefficient of [n − i, i − 2] is larger than the sum of coefficients for all two block removals below the first row of λ.
For both λ and [n − i, i] there is exactly one way to remove 2 blocks from the first row. It must be shown that with λ ′ j = n − 2 and λ j decreasing. This gives an optimization problem with bounded region and linear constraints. The maximal solution without the bounded region is to have all h 1,j be equal. Given the constraints, the optimal solution is to take the h 1,j as close to equal as possible giving [n − i, i]. This product is maximized at [n − i, i], which in turn minimizes the ratio in 2. Therefore, the coefficient of [n − i − 2, i] is smaller than that of a two block removal from the first row of any other λ with λ 1 = n − i.
Proof. The difference in likelihood of two permutations α and β can be studied through the discrete Fourier transform. For the involution walk at two permutations α and β, The trivial representation has eigenvalue and coefficient one in the discrete Fourier decomposition for both α and β and so vanishes. Other partitions for which χ λ (α) = χ λ (β) will also not contribute to this quantity. After sufficient time, the terms for the partitions with largest eigenvalue in magnitude with χ λ (α) = χ λ (β) will be exponentially larger than any other terms and hense will will determine the sign of P * t (α) − P * t (β). In lazy walks the largest eigenvalue in magnitude almost always occurs for a single partition.
From [1], a partition is called an i-cycle detector if λ 2 + λ ′ 1 − 2 ≥ i and λ 1 + λ ′ 2 − 2 ≥ i. If λ is not an i-cycle detector and the smallest cycle differing in the cycle decomposition of α and β is an i-cycle, then χ λ (α) − χ λ (β) = 0 [1]. Therefore, one must only examine the i-cycle detecting partitions for each value of i from 1 to n/2 in order to find the eventual likelihood order. By Lemma 5, Lemma 6, and Theorem 7, the partition [n − i, i] has the largest magnitude of eigenvalue of all i-cycle detecting partitions. More over, when α and β first differ at an i-cycle, in this case and ψ [n−i,i] > 0, the permutation with more i-cycles is more likely after sufficient time. This is the cycle lexicographic order from Definition 8.

Upper Bound on Mixing
This section will be working towards bounds on ψ λ to use in the upper bound formula, Recall that, Instead of bounding χ λ (1 n−2s , 2 s ) for each s individually, the character polynomial will give an expression for the character as a polynomial in n − 2s and s. The character polynomial, q ρ (x 1 , ..., x k ) for the partition ρ of k is a polynomial in variables x 1 , ..., x k so that χ [n−k,ρ1,...,ρr] (1 x1 , ..., k x k , ..., n xn ) = q ρ (x 1 , ..., x k ) for any conjugacy class (1 x1 , ..., n xn ) of S n . Garsia and Goupil [5] give a formula for the character polynomial akin to the Murnaghan-Nakayama rule run backwards from its traditional order, peeling off border strips of the largest cycles first.
Where P ranges over all possible ways of removing border strips of size i from ρ so that a Young diagram remains at each step, as in Murnaghan-Nakayama. The formula says, choose j i-cycles of the x i i-cycles and attempt to peel them off from below the first row of λ, and take the remaining x i − j i-cycles from the first row of λ. Recurse on the remaining shape with the next largest cycle size. In Murnaghan-Nakayama, the first row does not receive this special treatment. Letting i = 2 gives the character polynomial for an involution as: Where the last term can be expanded as Then an upper bound on q ρ j that is more computationally tractable comes from ignoring the sign associated with the insertions, rounding n−2s−|ρ j |−k−i+1 n−2s−|ρ j |−k−i+(ρ j ) ′ i +1 to 1, and upper bounding the ways of inserting one and two cycles by the dimension of ρ giving: Then using this in ψ λ and splitting s into j 1 and j 2 gives: This says to approximate ψ take the expectation over the binomial distribution over all ways to choose j 1 and j 2 of the n/2 2-cycles to insert into the first row and the remaining partition and to split the remaining unused numbers into either the first row or the remaining partition. The d λ/λ1 factor takes into account that there may be may ways to arrange things in the lower part of the partition. When λ 1 ≥ n/2, the maximum value of where this a very good approximation.
Proof. Using the hook length formula [11], The λ ′ k are decreasing, and the product is maximized if these are taken to be as even as possible. So for 1 ≤ k ≤ i, λ ′ k = 1, for k > i, λ ′ k = 0. This is the partition [n − i, i]. The bound used above on the character polynomial, principly that P ={ρ0,...,ρj} (−1) P ≤ d ρ , was sufficiently strong for the partitions with first row at least n/2, but not for those with smaller first row. However, by Proposition 7, the eigenvalues for λ with λ 1 < n/2 are bounded by the eigenvalue for [n/2, n/2].
The next step is to handle the sum (4). Instead of counting how the the two cycles (1, 2), ..., (n− 1, n) and unchosen cycles used as fixed points are arranged separately, an easier approach exists. Consider instead, splitting the numbers 1, 2, ..., n into two parts. When 2i − 1 and 2i are in the same part, this could have happened using them as a single two cycle, or separately as fixed points, for a total weight under the binomial distribution of 1−p p + 1 = 1 p . And when 2i − 1 and 2i are not in the same part, this could only have happened from 1-cycle insertion but two different ways, for a weight of 2.
Note that j must be such that i − j,n − i − j are both even. So, Now to approximate the sum, one can use that it is less than i/2 times its largest term, except for i small where the largest term is the last and the other terms will be exponentially smaller. Note that nothing is assumed about p in this bound.
Proposition 13. When Proof. One version of Stirling's formula is that Applying the lower bound to j! n−i−j 2 ! i−j 2 ! gives: Separate this into two pieces Consider the maximal j for the first piece with the (2p) j added.
Solving for the value of j, j ′ , that gives 4j 2 4p 2 (n−i−j)(i−j) = 1 gives: gives: Where j ′ was chosen exactly to make the expression Attaching this back to the long neglected (6) gives: The is minimized over cases where it is non-zero when j = i − 1 where it is still at least n−2i+1 n−i+1 . The expression is 0 when one of j or i − j was 0. This problem occurs because in these cases, the use of Stirling's approximation that gave a 0 term was not needed. The bound will be adjusted to be non-zero and hold in all cases.
When j = 0, leaving out the j! term during the application of Stirling's formula drops a √ 2j so the square root of the fraction becomes The other case is i = n/2. Taking the reciprocal of (8), correcting (n−i−j)(i−j)j (n−i)i to 2 n , and adding a i+1 2 the number of terms in the sum, gives for i = n/2: This gives for λ with λ 1 = n − i > n/2: And for λ 1 ≤ n/2, ψ λ ≤ n/2 n/4 n n/2 (n/2 + 1) 2 /2 n/2 e 2 2 3/2 1 − 1 2 The next proposition brings things together into one expression: Proposition 14. For λ with λ 1 = n − i, i < n/2: For i < n 2 , α ≤ 1 can be seen to be decreasing with i (by differentiation with respect to i). Therefore it can be bounded from below by 2 log 2 1+p . So, Proof. This follows from the j = i term being larger than the j = i − 2 term by a factor of 2 under the condition on i, and the terms with j smaller continue to fall off even faster.

Lower Bound on Mixing
The representation slowest to vanish for this walk is [n − 1, 1], so its character gives a random variable where P * t (·) and π(·) differ significantly. Using a lower bound formula similar to Chebychev's inequality after calculating the first and second moments of this character will give a lower bound on mixing of log 1 p (n). Proposition 18. [7] For γ, ν two probability distributions on Ω, and f a real valued function on Ω, if In this case, ν = U is the stationary distribution of the walk, uniform over all permutations. As seen in [2], E U χ [n−1,1] = 0, V ar U (χ n−1,1 ) = 1 These follow for any non-trivial characters by basic tenets of representation theory. For the first, by orthogonality of characters, g∈G χ λ (g) = 0. For the second, g∈G χ λ g 2 = |G|.
Proof. For an irreducible representation λ, since P is a class function, by Schur's Lemma, the Fourier transform of P is a constant ψ λ times the identity matrix. P (λ) = ψ λ I d λ Moreover,P * t (λ) = P (λ) t . This leads to the following formula for the expected value of a character over the walk: E P * t (χ λ ) = P * t (g) tr(λ(g)) = tr( P * t (g)λ(g)) = trP * t (λ) = d λ ψ t λ The method of choice to compute the expectation for χ [n−1,1] will be to directly compute ψ λ . Recall,