Common Intervals in Permutations

An interval of a permutation is a consecutive substring consisting of consecutive symbols. For example, 4536 is an interval in the permutation 71453682. These arise in genetic applications. For the applications, it makes sense to generalize so as to allow gaps of bounded size δ − 1 , both in the locations and the symbols. For example, 4527 has gaps bounded by 1 (since 3 and 6 are missing) and is therefore a δ -interval of 389415627 for δ = 2 . After analyzing the distribution of the number of intervals of a uniform random permutation, we study the number of 2-intervals. This is exponentially large, but tightly clustered around its mean. Perhaps surprisingly, the quenched and annealed means are the same. Our analysis is via a multivariate generating function enumerating pairs of potential 2-intervals by size and intersection size.


Introduction
Let [n] denote the set {1, 2, . . ., n}.We are interested in counting the common intervals of a pair of permutations.To be precise, if G A and G B are two permutations of [n], we are interested in counting the pairs of intervals (I, J) for which G A (I) = G B (J).It is equivalent to count intervals I for which G −1 B G A (I) is also an interval.Accordingly, we define Definition 1.1 The interval The proper intervals are those whose lengths are at least 2 and at most n − 1.
Here and throughout, we use vector notation for permutations rather than cycle notation, so that (σ 1 , . . ., σ n ) denotes the permutation i → σ i rather than the permutation consisting of a single n-cycle.Example: Let G be the permutation (3,1,2,4,5).Then the proper intervals of G are [1,2], [4,5], [1,3] and [1,4].When G is a random variable, uniformly distributed over all permutations of [n], let X k denote the number of of intervals of length k of G and let X = k X k denote the number of intervals of G.We will show in Section 2 that as n → ∞, the distribution of X converges to a Poisson with mean 2.
The number of intervals, or runs of a permutation, was studied in the forties by Kaplansky [11] and Wolfowitz [18,19] from a statistical point of view.See also [13].Recently several algorithms were designed to efficiently enumerate all common intervals of permutations [9,17] and their time complexity is O(n + K) where n is the size of the permutation and K the number of intervals.These algorithms were designed because common intervals have several applications.They relate to the consecutive arrangement problem [7].Genetic algorithms for sequencing problems are based on common intervals [12,14].In bioinformatics [4,5,8,9,10], genomes of prokaryotes can be modeled as a permutation of genes.A common interval is then a set of orthologous genes that appear consecutively, possibly in different orders, in two genomes.Therefore common intervals can be used to detect groups of genes that are functionally associated [9,10].As the annotation of genomes is not perfect, the notion of consecutivity in intervals needs to be relaxed.A notion of gene teams was defined in [6], where a gene team is a maximal set of orthologous genes, possibly occurring in different orders in the two species, but separated in each case by gaps that do not exceed a fixed threshold, δ.To study these, we consider a generalization of intervals, namely δ-intervals (the previous case corresponds to δ = 1).
We call I a δ-interval of length k of G if both I and G −1 (I) are δ-intervals.Proper δ-intervals are again those of cardinality at least 2 and at most n − 1.
The main purpose of this paper is to investigate the asymptotic properties of X (δ) k , where this denotes the number of δ−intervals of length k of a uniformly chosen random permutation of [n], and of the total number X (δ) := k X (δ) k of δ-intervals of a random permutation.We are interested in all δ > 1 but in the present manuscript we examine only the case δ = 2.To reduce the number of superscripts, we let Y and Y k denote X (2) and X (2) k respectively.
The number X (δ) of δ-intervals when δ > 1 behaves very differently from X. Whereas X is O(1) as n → ∞, with all the contributions coming from short intervals, there will typically be many δ-intervals.In fact for δ = 2, a thumbnail computation produces numbers α k in the unit interval (α 2 ≈ 0.57939) such that for k ∼ αn and α > α k , the random variable Y k will be typically exponentially large: the number of 2-intervals of [n] of size k grows exponentially, the probability of G −1 of one of these also being a 2-interval decays exponentially, and the growth overcomes the decay when α > α k .
Seeing that Y grows exponentially in n, it is natural to look at the rescaled quantity n −1 log Y .We compute the annealed mean, n −1 log EY .The term "annealed" means that we first take an expectation over the (uniform) measure on permutations.The more interesting quantities are the quenched quantities, which refer to the typical, rather than the mean behavior of Y .Often one has a so-called lottery effect, meaning that the mean of a quantity Y comes primarily from an exponentially small number of values that are exponentially larger than the median value, and that consequently, E log Y < log EY .For example, when there is a Gaussian limit law, n −1/2 (log Y − nµ) → N (0, σ 2 ), then one will typically have a lottery effect.Perhaps surprisingly in light of the discussion in Section 7, there is no lottery effect.Our main result, Theorem 4.1 below, is that for δ = 2, we have EY 2 = O(EY ) 2 .This shows that as n → ∞, the sequence Y := Y /EY is tight on (0, ∞), meaning that when Y is rescaled by its mean, the probability of Y /EY < ǫ is uniformly bounded by some g(ǫ) going to zero as ǫ → 0.
The paper is organized as follows.In Section 2 we study the case δ = 1.We recall previous results and show that the distribution of X converges to a Poisson with mean 2. In Section 3 we compute the mean value of 2-intervals.We use basic counting arguments and then apply a saddle point argument.Section 4 states the main result which is that the quenched and annealed means are the same, that is EY 2 = O(EY ) 2 .Then we outline in Section 4 all the steps of the proof.The proof itself is presented in Sections 5, 6 and 7.In Section 5 we present a 4-variable generating function for pairs of 2-intervals of size k and k ′ which overlap on κ positions.Then in Section 6 we compute a rate function thanks to saddle point analysis which gives us the asymptotics of the number of pairs of 2-intervals of size k and k ′ which overlap on κ positions.This gives the exponential part of EY 2 .The full computation of the asymptotics of the non-exponential part of EY 2 seems daunting.We will show in Section 7 that EY 2 − (EY ) 2 = o(EY ) 2 , and that a Gaussian limit is not possible (Theorem 7.1).We are left to conjecture that Y converges in distribution to an unidentified positive real random variable, whose properties are discussed in Section 7.
A preliminary version of this paper was presented at the Third International Colloquium of Mathematics and Computer Science held at the Vienna University of Technology (Sep.2004).

Intervals
Recall that X k denotes the number of of intervals of length k of G and that X = k X k denotes the number of intervals of G. Uno et al [17] computed Although this was not explicitly stated in [17], it is not hard to show that in fact Proposition 2.1 As n → ∞, the distribution of X converges to a Poisson with mean 2.
Thus it suffices to show that X 2 converges to a Poisson of mean 2. Kaplansky proves this in [11].We give here an independent argument via a more modern approach, using the Poisson approximation machinery first developed by Chen and Stein, and put in an explicit and usable form in [1].
. Write p k for P(A k ) and p k,l for P(A k ∩ A l ).Theorem 1 of [2] concludes convergence to a Poisson with mean k p k provided the following three quantities go to zero.In fact the distance in total variation to the Poisson for any fixed n is bounded above by 2(b , so the argument will show that the total variation distance is O(1/n).
It remains only to show that b where || • || T V is the total variation distance, µ is the unconditional probability measure on σ and µ A k is the µ conditioned on A k .Now there is an easy way to generate a random permutation conditional on A k : pick G 0 uniformly at random, pick a pair of positions {j, j + 1} independently uniformly at random, switch the values of G −1 on k and G(j), and switch the values G −1 on k + 1 and G(j + 1).
With probability 1 − O(1/n), this does not change whether A j occurs for any j = k.It follows that ||µ A k − µ|| T V = O(1/n), which completes the proof.✷ More generally, consider X k .There are T 1,k = n − k + 1 possible k-tuples and each of them has a probability of being made of k consecutive integers (again irrespective of their order, we denote this property by C k ).Hence Set We see that, as n → ∞, the dominant terms of E(X) are given by k = O(1) and k = n − O(1).Indeed, by Stirling, and setting k = αn + 1, we easily derive exponentially , n → ∞, for fixed ε.
We obtain As an example of X k behaviour, let us now turn to X 3 and compute E 2 3 := E X 2 3 .First we have Next we have T 2,3 = 2(n − 3) couple of triplets with two common values and the probability of one of these couples contributing by 1 to E 2 3 is given by Next we have T 3,3 = 2(n − 4) couples of triplets with one common value and the probability of one of these couples contributing by 1 to E 2 3 is given by Finally, there are T 4,3 = (n − 4)(n − 5) couples of disjoint triplets and the probability of one of these couples contributing by 1 to E 2 3 is given by . as it should.This leads to

Note again that
What is the asymptotic distribution of X 3 ?Let H i denote the event: the triplet [σ i , σ i+1 , σ i+2 ] is made of three consecutive integers (again irrespective of their order).We obtain, by inclusion-exclusion,

This leads to
Note that this also leads to

. which is of course comptatible with (1) and (3).
A simulation of X 3 based on M = 3000 trials with n = 40 is given in Figure 1 (asymptotic =line, observed =circle).The fit is reasonable.

The mean number of 2-intervals
Let N (k, n) denote the number of 2-intervals that are subsets of [n] and have cardinality k.We will take advantage of the uniformity of G.For each of the N (k, n) 2-intervals of cardinality k, its inverse image under G is uniformly distributed on k-subsets of [n].Therefore, the probability is exactly N (k, n)/ n k for any given 2-interval of cardinality k, that its inverse image under G is again a 2-interval.Thus To evaluate N (k, n), note that there may be anywhere from 0 to m k := min{k − 1, n − k} "holes" in a 2-interval, where a hole is an element not in the 2-interval but between its endpoints.Let T i,k denote the Fig. 1: Observed and limiting (4) X3 distribution number of 2-intervals of cardinality k with i holes.These may be enumerated by the following procedure.Pick a starting position r with 1 ≤ r ≤ n−k −i+1 and let r be the least element of the 2-interval.Choose any sequence with i occurrences of the word "skip" and k − 1 − i occurrences of the word "no-skip".If the first word in the sequence is "no-skip" then r + 1 is the next element of the 2-interval; if the first word is "skip" then r + 2 is the next element.Continue in this manner until the sequence is used up.This method of enumeration makes it clear that When When k/n > 1/2 the sum is better approximated than evaluated exactly.We find that which has its maximum term near the endpoint i = n − k when k ≥ 2 3 n, and near k/2 when k ≤ 2 3 n.More categorical asymptotics than we need are available by using a normal approximation near k = 2n/3.
which is asymptotically the same as when k < n/2.On the other hand, for (k Finally, for , the normal approximation yields directly where Ψ x is the expected positive part of Z + x with Z a standard normal. Via equation ( 6), these asymptotics for N (k, n) lead directly to asymptotics for EY k .To obtain asymptotics for EY , we then sum over k, using a saddle point approximation.The only significant terms are near k = α * n, where α * will be determined shortly but is evidently greater than 2/3.We therefore use (9) with n → ∞ and α := k/n to obtain where When α < 2/3, we obtain, by a similar analysis, The unique root in [2/3, 1] is α * ≈ .7840013296 . ... A saddle point approximation now gives us 4 The second moment of the number of 2-intervals In the next three sections, we show: This section outlines, in three subsections, the argument for this.The subsequent section derives the crucial generating function and the following one uses the generating function to verify certain key computations.

Counting pairs of 2-intervals
Just as the mean of Y may be obtained from a saddle point analysis of EY k near k/n = α * , we expect the second moment of Y to be dominated by terms EY 2 k with k near some α * * .Because we have seen from numerical data that the quenched and annealed behavior are the same, we expect to find, and do find, that α * * = α * .
Again we will take advantage of symmetry.This time, if I and I ′ are 2-intervals, we will need to know the cardinality of their intersection before we can determine the probability that G −1 (I) and G −1 (I ′ ) are both 2-intervals.We therefore define N (k, k ′ , n, κ) to be the number of pairs of 2-intervals (I, For the computation of EY 2 k we will want to specialize to the case k = k ′ , so we denote N (k, n, κ) := N (k, k, n, κ).Our computations will now be analogous to the computation F 1 and its argmax, α * .Specifically, letting we will find a rate function r❛t❡(α, β, ρ) such that for all parameter values in a range containing the dominant contributions to the second moment of Y .
To obtain the analogue of ( 5) for second moments, we will need the rate function for total number of pairs of subsets A and It follows that for any pair of sets of respective cardinalities k and k ′ whose intersection has cardinality κ ≤ k, the probability that their union is a specific pair of sets is and that the rate function for this probability, lim n −1 log P , which we denote ❡♥t for entropy, is given by Observe that the function ❡♥t satisfies the identity where is the usual entropy function, and in particular that ρ = α • β is a maximum for the function −❡♥t for fixed α, β.
The rate function for the expected number of pairs of 2-intervals of respective sizes k and k ′ with overlap κ is given by which we now analyze.

The exponential order of the second moment
is the sum of polynomially many summands, it follows that the exponential order of EY 2 is the same as the order of the largest summand, namely In order to compute λ * * we must find the location, (α * * , β * * , ρ * * ), of the maximum of F 2 .Without computing, we may narrow the search considerably.First, from the inequality EY k Y l ≤ 1 2 (EY 2 k + EY 2 l ), we see that the maximum of EY k Y l (for fixed n) can only occur when l = k, and therefore for some α, ρ.
Next, consider how N (k, k ′ , n, κ) varies with κ for fixed (k, k ′ , n).In other words, enumerate pairs of 2-intervals of fixed sizes k and k ′ according to the size of their intersection, κ.Observe that κ N (k, k ′ , n, κ) counts all pairs of 2-intervals of sizes k and k ′ , so that Since the number of summands is linear in n, we have at the exponential level that where Later, we will show, for α and β in a range containing α * * , that this supremum occurs at ρ = α • β.Assume this for now.We have previously remarked that −❡♥t(α, β, •) has a maximum at α • β as well.Both functions are smooth, so the function F 2 = 2 • r❛t❡ + ❡♥t has a critical point there as well.With some work, using a four-variable generating function, we will show that this critical point is a maximum.It will then follow from ( 18) and ( 14) that Taking the maximum over α and β then would give

End of the outline
More precise information about EY 2 is obtained by a saddle point summation.In particular, from the form of the generating function, it will follow that there is a n −3/2 correction term: in a neighborhood of the maximum.The product and the sum will be asymptotically C exp(nF 2 (α * , α * , α 2 * )) where and H is the Hessian of the rate function F 2 in appropriate coordinates.In particular the polynomial correction term is canceled by the summation, and the demonstration that EY 2 = O(EY ) 2 is completed.
To summarize, the foregoing outline proves Theorem 4.1 once we have verified several assertions: (m1) that the supremum of r❛t❡(α, β, •) occurs at α • β; (m2) that the critical point of F 2 at ρ = α • β is a maximum; (m3) that exact asymptotics for EY k Y l have an n −3/2 correction term to the exponential and that the discrete saddle point summation exactly cancels this out.
We remark that this argument does not show that EY 2 − (EY ) 2 = Θ(EY ) 2 .In principle we could compute A(α * , α * , α 2 * ) and H and compare to the constant computed in (11), but this computation seems daunting.Instead, a separate argument ruling out the possibility that E(Y 2 ) − (EY ) 2 = o(EY ) 2 is given in the last section.
Before turning to the generating function, we establish some helpful numerical bounds.

The Generating Function
Recall that N (k, k ′ , κ, n) denotes the number of pairs of 2-intervals (I, Define the generating function As a warm-up, let us compute the generating function for N (k, n) and recover the asymptotic formulae ( 7) - (9).Recall from building the generic indicator function that a 2-interval consists of any number of unused locations, followed by a single used location, followed by any sequence of pairs (unused , used) and single used locations, followed by any number of unused locations.Each pair (unused , used) contributes a factor of uz 2 while each single used location gives a factor of uz.The generating function for an arbitrary sequence of pairs and singles is thus uz/(1 − (uz + uz 2 )) = uz/(1 − (1 + z)uz).Taking into account the initial and final strings of unused positions gives To recover the asymptotics in the different regimes, write Writing F 5 := α log(1 + z) − (1 − α) log z and setting (k − 1)/n := α we then have There is a double pole at z = 1 and a saddle point at F ′ 5 (z) = 0, that is, at z * := (1 − α)/(2α − 1).when α < 2/3 the double pole is the dominant singularity and leads by residues to which is (7).When α > 2/3 the saddle point is dominant and leads to which agrees with (9).Now to derive F , we follow a similar route.A canonical way to build a pair of sets and keep track of the intersection is as follows.
1.An initial sequence of positions before the first common position; 2. A common position followed by zero or more segments of the form: a sequence of positions not common to either set, in such a way that no two positions in a row are absent from either set, followed by a common position; 3. A final sequence of positions after the last common position.
The crucial part of the generating function is the one that grows exponentially with κ, namely the second of the three parts.Nevertheless, in order to compute a valid leading order asymptotic, we must keep track of all three parts.To enumerate the second of the three parts, note that each segment between common positions can be one of six possible classes of configuration.We list these here, along with the factor contributed by such a step to the generating function.a Empty.f a = 1.b A single position, which can belong to either set or neither, but not both.
c A positive number of pairs (j, j +1) where j ∈ I \I ′ and j +1 ∈ I ′ \I.
c ′ A positive number of pairs (j, j + 1) where j ∈ I ′ \ I and j + 1 d the same as (c) but there is a single position in I \ I ′ at the end.f d = zu 1 f c .d ′ the same as (c ′ ) but there is a single position in I ′ \ I at the end.
The generating function for an arbitrary sequence of these is Fig. 3: shaded squares in each row correspond to positions present in that set For the first and last of the three parts, we first recall from (20) the generating function for that part of a 2-interval between its first and last point.By symmetry, parts 1 and 3 have the same generating function, which is equal to 1/(1 − z) times the generating function g for the segment to the right of the last common position but to the left of the last position in I ∪ I ′ .We may write g as the sum of several cases.e Empty.g e = 1.
f A position in neither set, followed by a non-empty string of positions, each of which is in neither set or I, with no two in a row not in I. g f = zF 6 (u 1 , z).
f ′ A position in neither set, followed by a non-empty string of positions, each of which is in neither set or I ′ , with no two in a row not in I ′ .g f ′ = zF 6 (u 2 , z).
g A string of pairs as in case (c) above, followed by a nonempty sequence as in case (f).
g ′ A string of pairs as in case (c ′ ) above, followed by a nonempty sequence as in case (f ′ ).
h The same as (g) except with a position in I ′ \ I in the beginning.
h ′ The same as (g ′ ) except with a position in I \I ′ in the beginning.
Summing, we see that the factor from the first and last parts is Finally, 6 The rate function Let L denote the logarithmic gradient, that is, LQ is the vector whose j th coordinate is x j ∂Q/∂x j .Let h(x, y, z, τ ) = 1 − xy − τ (1 + x + y + z + xy − xyz) and let V o be the set of smooth points of the variety where h vanishes.Let (µ, ν, δ) denote (α − ρ, β − ρ, 1 − α − β + ρ).In the next subsection we will prove: Theorem 6.1 Let s 0 := (α * * , β * * , ρ * * ) be the argmax for F 2 as in (16).There is a neighborhood N of s 0 in RP d−1 and a continuous map x : N → V ∩ (R + ) 4 such for every s ∈ N there is a point x(s) satisfying: Readers not interested where this comes from may skip now to Section 6.1.
The general approach to extracting asymptotics from the generating function is taken from [3].Let F be a rational generating function in d variables, written as the quotient of polynomials P/Q with Q(0) = 0.The coefficients of F = r a r z r may be evaluated via Cauchy's integral formula The cycle of integration, T , is, initially, the product of small circles around the origin in each coordinate.
But, letting ❉♦♠ denote the domain of holomorphy of F (that is (C * ) d \ V where V is the variety on which Q vanishes), we may replace T by anything in the homology class It is shown in [3] that H d (❉♦♠) is generated by cycles of the following sort.Fix a vector s ∈ (R d ) + , projecting to a direction also denoted s in RP d−1 .positive orthant of RP d−1 , for which one wishes to compute asymptotics of a r as r → s.Let {S β } be a Whitney decomposition of V into strata (e.g., if V is smooth, then one has simply {V }).For generic values of s, the function h s := − d j=1 s j log |z j | will be a Morse function on all strata of V and there will be a finite set of critical points {x β,j } of the restriction to stratum S β of h s .Then H d (❉♦♠) is generated by cycles C such that • C is the product of a cycle C 1 in some S β with an arbitrarily small cycle C 2 orthogonal to S β ; • C 1 passes through some x β,j and h s is maximized on C 1 at x β,j .
The integral of ω := z −r F dz/z over C 2 will be computable as a residue, ω 1 .The integral of ω 1 over C 1 will be a saddle point integral, in the sense that it will be the integral of P 1 (z)z −r dz/z where for r → s, the dominant term in the integrand, z −r , has maximum modulus and stationary phase at x β,j .It is easy to evaluate C1 ω 1 as a stationary phase integral.Asymptotics of a r are then obtained by summing over critical points x β,j in the support of [T ] (when [T ] is written as a linear combination of these, which have positive coefficient).Among these, only those with the highest value of h s need be considered.
Carrying out this programme will require several steps: i Find the critical points (routine but messy exercise in computer algebra) ii Determine which of these are in the support of [T ] (nontrivial topological problem with tidy answer in terms of local tangent cone) iii Compute the functions r❛t❡ and A 0 (straightforward) iv Optimize F 2 in α, β and ρ (cajole the computer into performing the right simplifications) v Sum in a neighborhood of the optimum (fairly routine discrete saddle point computation) The first three of these are carried out in the next subsection, and the last two in the subsequent one.

Finding the dominating point
We now specialize to the generating function F = f g 2 .It will turn out to simplify the computations if we change variables to The other divisors in V are factors in the denominator of g, namely, g 1 := 1 − z, g 2 := 1 − xy, g 3 := 1 − (1 + z)x and g 4 := 1 − (1 + z)y.The variety V e f where f vanishes is not smooth, since ∇ f vanishes on the curve γ := (−1, −1, 1/t, t).Thus V e f has the strata V o := V e f \ γ and γ.The other divisors of V are smooth.
Define the function D : V → CP 3 by D(x) = Lx for smooth points of V and otherwise by letting D(x) be the closure of the limit points of L(y) as y → x.The point x(s) "controls" the asymptotics in the direction s, as captured by the following result taken from [15].Proposition 6.2For any s in the positive orthant of RP 3 , there is a point x(s) with the following properties.
Remark 6.3 The only difference between this proposition and Theorem 6.1 is that we would like x ∈ V o rather than just x ∈ V .
Proof: Let log D be the logarithmic domain of convergence of F , that is, the closure of the set of x ∈ R 4 such that a r exp(r • x) is finite.Since F has nonnegative coefficients, we may use the argument of [15,Theorem 6.3] to see that there will be a minimal point x(s) for which conclusions (1) and ( 2) hold.Here a minimal point means a point x which is the only intersection of V with the polydisk {y : |y j | ≤ x j ∀j} in C 4 .The minimal point in question will be the exponential (e u1 , e u2 , e u3 , e u4 ) of the contact point u for the support hyperplane to log D in direction s.In other words, choosing u to maximize s • u on log D will yield a point x = exp(u) which is minimal in direction s.The third conclusion follows from [15,Theorem 3.5].✷ Finding the asymptotics for n −1 log[nµ, nν, nδ, nρ] F is now a matter of locating the minimal point.
Lemma 6.4 Let s 0 be the maximizing direction (α * * , β * * , ρ * * ) for F .There is a neighborhood N of s 0 in RP 3 such that for any s ∈ N , the critical point x = (x 0 , y 0 , z 0 , τ 0 ) with maximum value of h s among those in the support of [T ] is in V o and the coordinates are positive and real.
Proof: Exponentiating log D, we obtain a set E in the positive orthant of R 4 that may be described as follows.First we compute the intersection with the plane τ = 0, or in other words, the positive minimal points of the divisors of g.Recall that these divisors are {z = 1}, {x(1 + z) = 1}, {y(1 + z) = 1} and {xy = 1}.Let E ′ denote the connected component of the complement of the coordinate hyperplanes and these four divisors that contains the origin in its closure.It is not hard to describe E ′ : it is the region below the graph of the function z = min{1, 1/x − 1, 1/y − 1}.A lower boundary is the square {z = 0, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}, an upper boundary is the square {z = 1, 0 ≤ x ≤ 1/2, 0 ≤ y ≤ 1/2} and there are sloping curved ruled surfaces for the remaining upper boundary defined by The divisor {xy = 1} intersects E ′ only at the point (1, 1, 0) and plays no role in bounding log D.
In R 4 , the set E will be a subset of the cylinder E ′′ := E ′ × [0, ∞).In particular, it will be bounded "below" (lowest τ ) by E ′ × {0} and "above" (highest τ ) by t = ψ(x, y, z) := (1 − xy)/(1 + x + y + z + xy − xyz).There will be side boundaries at the graph of ψ restricted to the boundaries of E ′ .Now we rule out finding the minimal point at any place other than on the "upper" boundary, V f = V o ∪γ.As long as s is strictly positive, the support hyperplane to log D normal to s must contact log D either at a smooth point whose normal is strictly positive or at a non-smooth point whose normals together have positive values in each coordinate.Exponentiating, we see that the minimal point must be on the closure of the "upper" surface, namely the graph of ψ on E ′ .We must now rule out the following places for the minimal point to occur: 1. the graph E 1 of ψ on S 1 ; 2. the graph E 2 of ψ on S 2 ; 3. the graph E 3 of ψ on the upper square, [0, 1/2] × [0, 1/2] × {1}.
To rule out E 1 , we compute L(y) as y in the graph of ψ in the interior of E ′ converges to the graph on E 1 .Recalling that the direction corresponding to such a point on V ∩ E 1 is given by v := L f there, we have 1 and n for their first and last element respectively (i) .We have interpreted f as generating a sequence of blocks between common values; it is well known that the statistics of these blocks are asymptotically independent.In particular, the intersection of two independent random 2-intervals, chosen uniformly from pairs of sizes (k, k ′ ), will be almost asymptotically of size N (k/n)N (k ′ /n) in probability.In other words, when the conclusion of Theorem 6.1 holds, the assertion (1) also holds.

Optimization and algebraic simplification
Everything now rests on verifying (2), namely that the critical point for F 2 (α, β, •) at α • β is actually a maximum.This is mostly one long computer algebra computation.For replicability, we outline the trickier steps.Verification of the maximum can be (and later will have to be) restricted to the diagonal α = β.But in order also to verify (3) we will need a summation over all α and β, so we begin with all the variables.
In order to get Maple to halt we had to first use the term ordering tdeg[z,t,alpha,beta,delta,u,v] and then compute a basis in the order plex[z,t,u,alpha,beta,delta,v] for the Groebner basis coming from the previous computation.The resulting Groebner basis has 27 generators, that first one of which (it will be the the last rather than the first in versions of Maple before Maple 10) is an elimination polynomial for v, that is, it contains v, α, β and δ but not z, u or t.Factoring out (v + 1) and the constant term (1 − α − δ), we solve for v to obtain (i) As an aside: when the minimal point is on E 1 , the first 2-interval tends to span only (1 − Θ(1))n of the interval [n]; when it is on E 2 , the second 2-interval is similarly short; when it is on E 3 , the union of the two 2-intervals is short.
Luckily this can be simplified.First, we find the elimination polynomial for u in terms of v, α, β, δ, which is the 3 rd basis element, divided by (1 + v) 2 : upoly := -u * delta * v-u * beta * v+u * v+alpha+delta-u-1+2 * alpha * u; Solving for u and plugging in the value v 0 above yields where We then recover a simplified version of v 0 by symmetry, switching α and β: Continuing with elimination polynomials, we find that When we set β = α, things become a little simpler.We get This is now manageable, meaning we can take two derivatives in δ.We get a mess, several pages long, but when we evaluate it at δ = (1 − α) 2 (i.e., ρ = α 2 ), the radical becomes a polynomial and we get −8 times the following rational function of α for the second derivative. .This is a symmetrized definition, aimed at making some of the analysis easier without fundamentally changing the problem.We may reason heuristically that the least and greatest elements of a typical interval of G will be respectively O(1) and n − O(1), and that the strong intervals of G are a positive fraction of all intervals of G, so that in some sense, the number of strong δ-intervals of a random permutation should behave like the number of δ-intervals of a random permutation.Given a permutation G, let us look at the complement To say that I is a strong 2-interval of [n] is exactly to say that I c is an independent set of the cycle graph C with edges between each j and (j + 1) modn .To say that G −1 (I) is a strong 2-interval is equivalent to saying that I c is an independent set of the cycle graph G(C) with edges between G(j) and G((j + 1) modn ).Let H = H(G) denote graph with vertex set [n] whose edges are the union of the two n-cycles C and G(C).Then a strong 2-interval of G is exactly the complement of an independent set of H(G).There is a natural question, analogous to the question of how many 2-intervals there are in a typical permutation, namely, Problem: Determine the (quenched) behavior of the random number Z of independent sets of the graph C ∪ G(C).
This appears difficult, and the simplest heuristics misleading.One might expect that Z is asymptotically log-normal for the following reason.Adding or deleting an edge should change the number of independent sets by a factor of Θ(1).Changing an edge to a different edge should therefore also change by such a factor.The randomness in G is the result of Θ(n) random selections, thus there should be a variance of Θ(n) in log Z.We know this reasoning fails for 2-intervals, so it probably fails for strong 2-intervals as well.
Indeed, one may think that for graphs of high girth and average degree d, the log of the number of independent sets is very well approximated by the following mean field heuristic.If n is the number of vertices, then the number of vertex subsets of size k = αn is ∼ Cn −1/2 exp nα log 1 α + (1 − α) log 1 1 − α .
Such a set contains ∼ α 2 n 2 /2 pairs, each being an edge of the graph with probability ∼ d/n.A mean field heuristic would say that the probability of a k-set being independent is roughly exp(−α 2 dn/2).
Multiplying by the number of k-sets and taking the log gives For fixed d one may optimize in α.For d = 4, one finds that the optimal α is roughly 0.26064 . . .and that the number of independent sets would then be roughly exp(0.43786 . . .n).For the random graph H this surely fails, since the number of 2-intervals is exponentially lower than this.It appears that d-regularity is a stronger constraint than average degree d.Perhaps a large family of graphs, such as those of average degree d, tends to be subject to the lottery phenomenon, with the typical log number of independent sets falling below the mean of ≈ exp(.437n), while the more homogeneous family of d-regular graphs exhibits the quenched behavior even in the mean.Amid all this speculation, let us prove that the variance of Y must be at least of order (EY ) 2 , ruling out Gaussian behavior.We give a proof for Z instead of Y because the bookkeeping is simpler, with the proof for Y being entirely analogous.Proof: Let K ′ be the subgraph of K induced on the vertices of K at distance at least 2 from both x and y.Each independent set I of K ′ has at most 2 2d+2 supersets that are subsets of the vertices of K, and at least one of these is an independent set containing both x and y. ✷ Remark 7.3 This is a bad bound, but on the other hand it is sharp (let K be the union of two stars).What is the right constant for d-regular graphs?
For G ∈ A, define φ(G) to be the permutation G ′ such that G ′ (j + 1) = i ′ + 1, G(j ′ + 1) = i + 1, and G ′ agrees with G except on j + 1 and j ′ + 1.The graph H(φ(G)) is the graph H(G) with two extra edges, namely {i, i ′ + 1} and {i ′ , i + 1}.The set of independent sets of φ(G) is therefore a subset of the set of independent sets of G, namely those not containing both endpoints of either new edge.Taking just one new edge into consideration and using Lemma 7.2, we see that the number of independent sets of φ(G) is most 1 − 2 −10 times the number of independent sets of G. On the other hand, the measure of the collection {φ(G) : G ∈ A} is (1/2)e −4 + o(1) by reasoning similar to that used in the proof of Proposition 2.1.[The probability of no intervals of size 2 is e −2 and the probability of seeing (i, i ′ , j, j ′ ) as above is (1/4) • 2e −2 .]Choosing δ < 2 −10 completes the proof.✷

Conclusion
We could also have dealt with a more general problem.We could have allowed gaps of bounded size δ − 1 in the positions and gaps of bounded size γ −1 in the positions in the symbols.We call these δ, γ-intervals.
We conjecture that similar results hold for these δ, γ-intervals.Let X (δ,γ) be the number of δ, γ-intervals of G.

Theorem 7 . 1
There exists a positive number δ such that there is no number c for which Z ∈ [c, (1 + δ)c] with probability at least 1 − δ.It follows that Z/EZ, which has been shown to be tight, cannot converge to a constant in probability.Lemma 7.2 Let K be any finite graph with degrees bounded by d and let µ be the probability measure which is uniform on independent sets of K. Then for any non-adjacent vertices x, y of K, µ{I : x, y ∈ I} ≥ ǫ d := 2 −2d−2 .
7 A lower variance bound and a related modelLet us call a subset of [n] a strong δ-interval if it intersects all of the intervals of size δ of [n], including cyclic ones, e.g., {n, 1, . .., δ − 1}.For δ = 2, this means a strong interval must be an interval and also must intersect {1, 2}, {1, n} and {n−1, n}.So, for example, the set {1, 3, 4, 6, 7, 8} is always a 2-interval of [n], n ≥ 8, but is only a strong 2-interval if n = 8 or 9.A strong δ-interval of the permutation G is a strong interval such that I and G −1 (I) are strong intervals of[n]