An Alternative Proof for the Expected Number of Distinct Consecutive Patterns in a Random Permutation

Let $\pi_n$ be a uniformly chosen random permutation on $[n]$. Using an analysis of the probability that two overlapping consecutive $k$-permutations are order isomorphic, the authors of a recent paper showed that the expected number of distinct consecutive patterns of all lengths $k\in\{1,2,\ldots,n\}$ in $\pi_n$ is $\frac{n^2}{2}(1-o(1))$ as $n\to\infty$. This exhibited the fact that random permutations pack consecutive patterns near-perfectly. We use entirely different methods, namely the Stein-Chen method of Poisson approximation, to reprove and slightly improve their result.

The authors of [2] proposed and used two auxiliary variables, Y = Y n and Z = Z n as follows: Y = Y n is the number of repeated patterns of any length in π, and Y k n = Y k the number of repeated patterns of length k ∈ {1, 2, . . ., n}, so that (for a strategically chosen k 0 ),

Godbole and Swickheimer
Also we let Z = Z n = k Z k be the number of pairs of isomorphic patterns, with Z k being the number of pairs of isomorphic patterns of length k.Generically let η 1 and η 2 be two sets of k consecutive positions, and denote by ρ 1 , ρ 2 the patterns that are present in these two sets of positions.Thus For example, with k = 7, n = 15, η 1 and η 2 might equal {1, 2, 3, 4, 5, 6, 7} and {5, 6, 7, 8, 9, 10, 11} and ρ 7 1 ≃ ρ 7 2 if, e.g., the pattern 7263514 occurs in both sets of positions.
Proof: Counting Y k is equivalent to listing the patterns of length k, and starting a tally for any repeats among the min{(n− k + 1), k!} patterns that we actually observe.On the other hand, Z k increases by one for every pair of the same pattern which means that it also increases more than Y k if you find more than two of a pattern.For example, if you have three occurrences of the same pattern then Y k would count the 2 repeats and Z k would count the 3  2 pairs of repeats.
Since Y k ≤ Z k for all k, we have and, after substantial analysis, to the result in [2] that for large enough n.In Section 2, we will use the Stein-Chen method of Poisson approximation [3] to (slightly) improve the above bound when we prove Theorem 1.2.For sufficiently large n, The key difference, besides the use of an entirely different technique, is that fact that we work with the X variable directly, without involving Y and Z.The Stein-Chen method has been applied in several Combinatorics situations in [3]; see also, e.g., [1] and [7], where examples are given of the use of the technique in the context of permutations.Within the domain of Poisson approximation, moreover, we are using it in this paper when the mean of the underlying Poisson distribution is exceptionally small.This too is unusual.Remark on the Non-Consecutive Case: In the non-consecutive case, the conjecture is that E(X) ∼ 2 n (rather than E(X) ∼ n 2 2 ), so that once again the expected value of X would be close to its maximum.
When addressing this conjecture in [6], sole use is made of X when using subadditivity arguments, while the X, Y, Z trifecta is used to get close to proving the conjecture.Finally the sheer magnitude of the dependencies makes use of Poisson approximation techniques inappropriate in the non-consecutive case.
The same is true of central limit theorems and martingale inequalities.

Poisson Approximation and Consecutive Patterns
Proof of Theorem 1.2: Since a pattern adds to the tally of distinct patterns if and only if it appears at least once, our key variable X k , the number of distinct patterns of length k, can be written as I(the j th pattern N j of length k appears at least once), where I(A) = 1 iff A occurs (I(A) = 0 otherwise).The notation supposes that we have listed the patterns of length k in some fashion, perhaps lexicographically, and label the jth pattern as N j .Thus, the expected number of distinct patterns of length k is where U k,j is the number of occurrences of the jth pattern.Our analysis will actually bypass the question of which the jth pattern is, and the strategy will be to show that for any j, where for any variable T we denote the distribution of T by L(T ), and the Poisson variable with parameter λ by Po(λ).Note that E(U k,j ) = (n − k + 1)/k! = λ for each j.If (4) were to be shown to be true by proving that where ε n,k does not depend on the pattern, it would follow that for each j, and thus via (3) that We have that where I j is the indicator variable that equals 1 if the pattern in question appears in the k places {j, j + 1, . . ., j + k − 1} starting at j. Also, I j is independent of the ensemble of I ℓ 's whose windows do not intersect those of I j .Thus Corollary 2.C.5 in [3] indicates that [P(I j I i = 1) + P(I j = 1)P( Since P(I j = 1) = 1 k! for each j, we have that λ = n−k+1 k!
. Using this fact and bounding 1−e −λ λ by 1, (9) reduces to Equations ( 7) and (10) thus give We deal separately with the four terms in (11): First note that We retain the (n − k + 1) term in (12) for a later analysis and bound the second term as follows: where we have used the bound 1/k! ≤ (e/k) k .Second, in a similar fashion, we have for the second term in (11), Thirdly, again in a similar fashion, we bound 2(n − k + 1)k/k! by 2n 2 /k! and obtain This leaves us with the need to conduct an analysis of In fact, such a correlation analysis is critical to the proof of Theorem 1.2 via the Stein-Chen method.We thus pause the proof of Theorem 1.2 to compute P(I j I i = 1) for windows with an overlap of r, and the next critical result provides a bound.
Lemma 2.1.For windows beginning at i and j such that we have Proof: Our proof is similar to an argument used in [6].It is clear that the patterns in the overlap positions must be isomorphic in order for the pattern in question to exist in both windows.For example we could have the pattern being 53412 with r = 3 (in this case the overlap pattern is 312 for both windows.The easiest example is for the pattern to be monotone (in this case any r works).However the pattern cannot be 21534 with r = 2. (16) provides a uniform upper bound on P(I j I i = 1) that does not pay heed to the fact that P(I j I i = 1) might equal zero for some patterns.
To prove (16), we will find a bound on how many ways the numbers between 1 and 2k − r can be assigned so that the patterns exist in both windows.This would yield the numerator in (16).The denominator of (16), namely (2k − r)!, is obvious.
Consider the example below: The overlap is a 321 pattern.The number in the "1" position has to be 1+2-1=2, since one number in the second occurrence of 652431 has to be assigned and must be lower than it.Similarly, 3+5-2=6 must be the number associated with the "2" position.This is because 5 numbers lower than it still need an assignment.Finally, the number in the "3" position must be 4+6-3=7.In general if the overlap is of magnitude r then the numbers allotted to the 1, 2, . . ., r ranks must be, respectively u 1 + l 1 − 1, u 2 + l 2 − 2, . . ., u r + l r − r, where the u i 's and l i 's are the upper and lower values of the ranks.These numbers are determined as indicated.Now for the rest of the numbers: Using the same notation as above, the u 1 + l 1 − 2 'low' numbers need to be assigned, u 1 − 1 to the smaller numbers in π 1 and l 1 − 1 to the smaller numbers in π 2 .Since it doesn't matter how we do this, there are u1+l1−2 u1−1 choices.The same process is repeated for the positions between the '1' and the '2' in the two patterns.This can be done in (u2+l2)−(u1+l1)−2 u2−u1−1 ways.In general, when we look at the choices of numbers for the positions between the "s" and the "s + 1" in the two patterns, there are possibilities.One choice for the values in the non-overlap positions (for the example in Figure 1) is shown in Figure 2. The only choice here is for the 3, 4, and 5 numbers; we chose to allot the 3 to the top occurrence and 5 and 4 to the bottom occurrence (in that order).We next use the crude (but adequate) bound a b ≤ 2 a on each term above to get Collapsing the resulting telescoping product, we get the numerator of 2 2k−2r ; notice how the "-2" terms in the exponent collect.This completes the proof of Lemma 2.1.
Continuing with the proof of Theorem 1.2, the expression below (15) simplifies as Together, Equations ( 11) through (17) give us the fact that for fixed k, Let us consider the last two terms on the right side of (18).First so that (19) holds if, e.g., k ≥ 4 log n log log n = b n .The second term to evaluate is 8n/(k − 3) where we'll now assume that k ≥ b n .We get for n large enough, where the last bound in (20) is obtained by writing b n = 4 log n log log n .Equation (20) establishes Theorem 1.2.
3 Open Questions (a) Can we gain a deeper insight into the concentration of X around E(X) by estimating the variance of X?
(b) Might we be able to improve Theorem 1.2 by employing a tighter proof, in particular, by not assuming that all patterns are compatible for any overlap?Some of the ideas in Borga and Penaguiao [4] and [5] might help regarding this question.

FIGURE 2 :
FIGURE 2: Assignment of Numbers to Consistent Overlaps