The number of distinct adjacent pairs in geometrically distributed words: a probabilistic and combinatorial analysis

The analysis of strings of $n$ random variables with geometric distribution has recently attracted renewed interest: Archibald et al. consider the number of distinct adjacent pairs in geometrically distributed words. They obtain the asymptotic ($n\rightarrow\infty$) mean of this number in the cases of different and identical pairs. In this paper we are interested in all asymptotic moments in the identical case, in the asymptotic variance in the different case and in the asymptotic distribution in both cases. We use two approaches: the first one, the probabilistic approach, leads to variances in both cases and to some conjectures on all moments in the identical case and on the distribution in both cases. The second approach, the combinatorial one, relies on multivariate pattern matching techniques, yielding exact formulas for first and second moments. We use such tools as Mellin transforms, Analytic Combinatorics, Markov Chains.


Introduction
We follow the notation and setup of Archibald et al. (2021).In this earlier work, the authors derived results about the asymptotic mean of the numbers of different and identical pairs, in a sequence of geometric random variables.Archibald et al. (2021) give a broad selection of references to the literature, including applications to leader election algorithms, pattern matching in randomly generated words and permutations, gaps in sequences, the design of codes, etc.In the present work, we go far beyond the analysis of the mean numbers of different and identical pairs.We use two approaches, namely, a probabilistic approach and also a combinatorial approach.We are able to derive results about the asymptotic variance and distribution, and to make conjectures about higher moments.We also derive exact results, using multivariate pattern matching, for the first and second moments.As motivated by Archibald et al. (2021), we consider a string of n independent random variables Z 1 , Z 2 , . . ., Z n , with geometric distribution P(Z k = i) = P i := p q i−1 for i ≥ 1.Our eventual aim is to study the consecutive pairs of geometric random variables in this sequence, with a goal of characterizing the asymptotic behavior, as n → ∞.

Louchard et. al
We use Iverson's notation, namely, for an event A, we write [[A]] = 1 if event A occurs, and [[A]] = 0 otherwise.We want to precisely characterize the distribution of the number of times that (i, j) appears as a consecutive pair in Z 1 , Z 2 , . . ., Z n , i.e., the number of k's such that X k = i and X k+1 = j.So we define X (n) i,j (m) as a Bernoulli random variable that indicates whether the pair (i, j) appears m times in a sequence of n geometric random variables: i,j (m) := [[pair (i, j) appears m times in the string of size n]].
It is useful to have a succinct notation for the Bernoulli random variable X (n) i,j that indicates that (i, j) appears at least one time in a sequence of n geometric random variables: i,j (0) := [[pair (i, j) appears at least once in the string of size n]].
Finally, we define X (n) 1 as the number of types of matching consecutive pairs (we say "types" because we only pay attention to whether a pair (i, i) occurs or does not occur, i.e., whether it never occurs, or whether it occurs one or more times): is the number of types of any matching consecutive pairs (different or matching): and finally X (n) 3 is the number of types of different consecutive pairs that occur: Our methodology is to derive asymptotic expressions for the moments, utilizing Mellin transforms applied to harmonic sums.For context and an in-depth explanation of such techniques, see the nice exposition in Flajolet et al. (1995).One highlight of the precision of this analytic method is that we are able to derive the dominant part of moments as well as the (tiny) periodic part, in the form of a Fourier series.The paper is organized as follows: In Section 2 we present our main results, that is, asymptotic expressions for the variances of X (n) k , 1 ≤ k ≤ 3, and a result concerning the asymptotic independence of the variables X (n) i,i , i ∈ N. In Section 3 we conjecture some stronger forms of asymptotic independence, based on which we are able to derive the limiting distribution and asymptotics of higher moments of X (n) 1 .Section 4 is devoted to the proofs of these results, and to some considerations in support of a conjectured Gaussian limiting distribution of X (n) 3 .In Section 5 we use a combinatorial approach to derive exact expressions for first and second moments of X (n) k , 1 ≤ k ≤ 3.In the Appendix, we collect our results pertaining to Mellin transforms.
The number of distinct adjacent pairs in geometrically distributed words 3

Main results
In a private communication, B. Pittel observed that the asymptotic distribution of , where λ = nP i P j .

Asymptotics of EX
and EX (n) 3 have also recently been obtained by Archibald et al. (2021), using generating functions of the sequences of expectations.One of our main results deals with asymptotics of Var X (n) i , 1 ≤ i ≤ 3, as n → ∞.Our approach simply consists in using Var X and similarly for Var X (n) 2 and Var X (n) 3 .This necessitates thorough investigation of the involved covariances.As it turns out, the main term of Var X (n) 1 is given by a term i,i , the double sum of covariances only contributing O( 1n ).This is different for Var X (n) 2 , whose main term is a sum of S (n) 2 ∼ i,j≥1 Var X (n) i,j and another contribution T are expressed in terms of Fourier series in ln(np 2 ).A plot of the constant term of T (n) 2 is provided in Figure 1.
A question triggered by the observation that i̸ =j Cov (X n is: How "close to being independent" are X (n) i,i i∈N ?The following theorem provides a partial answer in that regard.
Theorem 2.2 The random variables X (n) i,i , i ∈ N are asymptotically independent, in the sense that, for any k ∈ N, any subset I ⊆ N of size k, and any (x i ) i∈I ∈ {0, 1} k we have with implied constant depending on I only via k.
Remark 2.3 The random variables X (n) i,i i≥1 are negatively correlated: For finite I ⊆ N we have as can easily be deduced from the following theorem.
Theorem (McDiarmid (1992)): Let V and I be finite non-empty sets.Let (Z v : v ∈ V ) be a family of independent random variables, each taking values in some set containing I; and for each i ∈ I, let The number of distinct adjacent pairs in geometrically distributed words 5 collection is increasing (meaning that every superset of a set in F i is also in F i ) or each is decreasing (meaning that every subset of a set in F i is also in We just have to choose V := {1, . . ., n}, and all F i equal to Cases like the following for n = 5 and i ̸ = j, i,i = 1)P(X (5) j,j = 1) − P(X i,i = X (5) suggest that the inequality may be strict for |I| ≥ 2. This is different for the array , where both strictly positive and strictly negative correlations can be observed: For n = 3 and i ̸ = j, holds for P i , P j small enough, and for different pairs (k i , m i ) i∈I , with |I| ≥ n, we clearly have 3 Further conjectures and results for pairs of identical letters

Higher moments
The proof of Theorem 2.1 (see Lemma 4.8) shows that ] is a sum of independent random variables, with ξ . This leads us to the following conjecture.
Conjecture 3.1 For any k ∈ N we have lim n→∞ (E|X where, using L = ln(1/q) again, asymptotics of V (n) j , j ≥ 1, are given by (9)

Now let
where the asymptotics of the inner sum can be obtained using G(knp 2 ) from Appendix A.1, leading to (9).Finally the cumulants κ (n) m are found by extracting coefficients of θ m from S n (θ), and are given by finite linear combinations of the (V ) j≥1 , as stated in (8).
The cumulants now allow for computation of moments: The mean of X (n) 1 is given by This is identical to (Archibald et al., 2021, Thm. 2), see also (11).Our approach here is simple and general.Note that the mean does not rely on the state of Conjecture 3.1: the mean computation actually depends only on Lemma 4.3.Similarly, the variance of After some algebra, we verify that this is identical to Thm 2.1 .
The number of distinct adjacent pairs in geometrically distributed words 7

Limiting distribution
A conjecture weaker than Conjecture 3.1 is Conjecture 3.4 For any t ∈ R we have lim n→∞ [P(X is given by (10).
A simulation with p = 1/4 and 50000 simulated words for each n ∈ {10000, 11547, 13333, 15396} is given in Figure 2. The fit is excellent.A corresponding table of observed and theoretical nonperiodic mean and variance in the equal pairs case (as well as another table for the unequal pairs case) is given below, all results rounded to 3 decimal places.We define X(n) 2 the sample mean and unbiased sample variance of a sample (X (n),i j ) N i=1 .See Theorem 3.7 for asymptotics of EX The sample size N for each row in the left table is 50000, and in the right table it is 200000, see also Figure 3.  750.195 750.198 129.889 130.053 Remark 3.6 Here we briefly sketch, how we obtained the graph of f in Figure 2, where p = 1/4.As before, we use random variables ξ (n) i distributed Poisson(np 2 q 2(i−1) ), but now there is such a random (circles), p = 1/4, number of simulated words = 50000 for each n ∈ {10000, 11547, 13333, 15396}.
variable for each i ∈ Z and each real n > 0. For fixed such n the random variables ξ (n) i i∈Z are assumed independent, and also the definition ξ (n) is used for real n > 0. We use i * = i * (n) = ln(np 2 /q 2 )/(2L) again.For any n satisfying i * + η ∈ Z, we have where for n = ν = ν(η) := q 2(1−η) /p 2 we have i * + η = 0, and ξ (ν) i ∼ Poisson q 2(i−η) .We want a good approximation of f (η) only for η ∈ [−3, 5].For such η we have So, up to an error smaller than 10 −6 , f (η) is given by The number of distinct adjacent pairs in geometrically distributed words 9 where, for each fixed η, the latter coefficient can easily be computed using Maple.
4.1 Two pairs (i, i) and (r, r) of identical letters The proofs of the theorems rest upon calculation of probabilities of avoiding certain pairs, which we will be doing by employing Markov chains.To illustrate that approach, we consider in greater detail the case of avoiding two fixed pairs (i, i) and (r, r), where i ̸ = r, in a sequence of length n.No distinction of letters different from i, r is necessary, so for our Markov chain we can use a finite state space S := {e, i, r, ∆}, where e := N\{i, r} stands for "everything else", i.e., the set N\{i, r} is lumped together, and ∆ denotes an additional cemetery state.where ϕ(z k ) := ∆ if for some j < k we have (z j , z j+1 ) ∈ {(i, i), (r, r)}, and otherwise
Those trajectories (y k ) n k=0 satisfying y n ̸ = ∆ are in correspondence to sequences (z k ) n k=1 that avoid the pairs (i, i) and (r, r).Using the transition matrix Π :=     P e P i P r 0 P e 0 P r P i P e P i 0 P r 0 0 0 1 where P e := 1 and initial probability π(•) := [1, 0, 0] and column vector of all ones 1, that probability is A bound on such probability will now be derived in the following more general context.We fix a finite non-empty set of forbidden pairs where j 1 < . . .< j |J| .Moreover we fix 0 < δ ≤ 1/2 and let Lemma 4.1 Let ε := i∈I P ki P mi .Then holds for (P j ) j∈J ∈ D J δ .Furthermore, there are functions λ 1 , C 1 and Φ n , n ≥ 1, depending on P j , j ∈ J, that are C ∞ and positive on an open set F satisfying D J δ ⊆ F, such that Remark 4.2 At several places we take the liberty to regard (P j ) j≥1 as variables (which is slight abuse of notation), to the effect, that several results in this section hold more generally also for strings of random variables with a distribution different from the geometric.The reader must be prepared to see expressions involving lim Pj →0 , ∂ ∂Pj , and functions of (P j ) j∈J being C ∞ in some domain, etc. all the time.In particular, we allow (P j ) j∈J to vary within the set D J δ above, which is a proper subset of the unit simplex of dimension |J|, because some of our results require P e to be bounded away from zero.
Proof of Lemma 4.1.Assume P j > 0 for j ∈ J, as well as P e := 1 − j∈J P j ≥ δ.Note that ε ≤ j,ℓ∈J Define the matrix Π with rows and columns indexed by the set J ∪ {e} (which we assume ordered, starting with e and followed by the elements of J in ascending order) via Πk,m := 0, (k, m) ∈ I, P m , else.
We define a row vector w := [ √ P e , P j1 , . . ., P j |J| ], satisfying ∥w∥ 2 = 1, and a diagonal matrix S := Diag(w), and the matrix where the column vectors e j , j ∈ J ∪ {e}, denote the standard unit vectors in R |J|+1 , and observe, using k,m = 1 − i∈I P ki P mi , and π = (e e ) t , P(X Observe that Π is non-negative and primitive, therefore, by the Perron-Frobenius Theorem (see Seneta (1981)), there is a unique positive eigenvalue λ 1 , that is strictly larger in modulus than any other eigenvalue, and corresponding strictly positive left and right eigenvectors u and v, such that Πn = , where λ 2 is an eigenvalue of second largest modulus.This leads to By setting one or more of (P j ) j∈J to zero, one or more of the non-dominant eigenvalues (λ k ) k≥2 become zero, but there is a non-negative primitive submatrix constructed from the non-zero columns (and corresponding rows) of Π, guaranteeing a unique positive eigenvalue larger in modulus than all other eigenvalues.As the row and column corresponding to state e will always be part of that submatrix, the first components u e and v e of u and of v will be positive.By continuity, these properties also hold in a neighbourhood of such (P j ) j∈J , which yields λ 1 being C ∞ in some open superset F of D J δ , by the implicit function theorem, using the facts that the characteristic polynomial p(λ) of Π, considered as a function of (λ, (P j ) j∈J ), is C ∞ , and the derivative of p(λ) evaluated in a simple zero λ 1 is non-zero.On the set F, the components of 1 ue u and 1 ve v are C ∞ functions of (P j ) j∈J as well.We let Those are positive C ∞ functions of (P j ) j∈J on an open set F, satisfying D J δ ⊆ F ⊆ F, the further restriction made necessary by the need to avoid uv ≤ 0, which may occur for (P j ) j∈J outside D J δ .Note that primitivity of Π may cease to hold when P e = 0. Moreover note that |λ 2 | is continuous on D J δ , but need not be differentiable on that set.
The bound (13) fits our needs when ε is large.Equation ( 14) is useful in the case of small ε, if asymptotics of λ 1 , C 1 and Φ n are known.In order to derive such asymptotics, we let Π be the matrix obtained from Π by deleting row and column corresponding to state e.Left and right eigenvectors u = [1, β] and v = [1/P e , µ t ] t , with row vector β = (β j ) j∈J and column vector µ = (µ j ) j∈J , corresponding to the dominant eigenvalue λ 1 of Π, lead to equations with row vector p = (P j ) j∈J , and with ascending order of indices in β, µ, p.We keep denoting the column vector of all ones of appropriate dimension by 1, and express C 1 in terms of β and µ as follows: Asymptotics up to any fixed order K of λ 1 , β, µ are conveniently computed via fixed point iteration as described by the following algorithm: Algorithm 1 Calculate asymptotics of λ 1 , β, µ up to fixed order.
The output λ, β, μ of the algorithm then satisfies Here and in the following the notation O * k always refers to the variables (P j ) j∈J , but not to P e .So, for instance, O * 4 is the same as O(γ 4 ), where γ = j∈J P j .A few words on justification of the algorithm: First note, that nothing changes if the line λ ← P e [1 + β1] is replaced by λ ← P e [1 + pμ].This is seen to hold for k = 0, where β = p has already been updated, but μ = 1 has not, and for k > 0 by a simple induction step.We can thus see Algorithm 1 as a combination of two algorithms, one of them only updating the pair ( β, λ), the other only updating the pair (λ, μ), with those algorithms having identical updates of λ.Let us concentrate on the latter algorithm.Denote x = (λ, μ) and let 0 be the zero vector of appropriate dimension.Observe that the function ) defined in some neighbourhood V of p = 0, satisfying x(0) = x 0 and F (x(p), p) = 0 for p ∈ V, by the implicit function theorem.Denoting iterates by The next lemma provides asymptotics of probabilities in the case of a single avoided pair.
Lemma 4.3 The probabilities of avoiding the pair (i, i), resp.(i, r) for i ̸ = r, in a sequence of length n satisfy as n → ∞, uniformly for Proof.We first consider the forbidden pair (i, i).The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by where we used Algorithm 1 (with K = 4) and ( 18).Following a suggestion by Salvy, we can easily derive λ 1 from p(λ), after replacing P e by 1 − P i .We add an extra variable v, carrying the weight of the P .: P. := vP . .We have the local expansion of the solution at 0 by using the Maple package gfun (see Salvy and Zimmermann (1994)): where pr denotes the precision of the expansion into v.We obtain the solutions as sol[1], sol[2] and we keep the solution close to 1.
, uniformly in n.This is used in ( 14), together with C 1 = 1 + O(P 2 i ) and leading to P(X ).Note that for fixed α, β > 0 the function x α e −βx is bounded for x > 0, implying Louchard et.al holds, and ( 13) can be built in by observing that . We have thus obtained (19).
We now consider the forbidden pair (i, r) with i ̸ = r.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =   P e P i P r P e P i 0 Clearly, λ 1 , and therefore also C 1 and Φ n , are C ∞ functions of the coefficient P i P r of the characteristic polynomial p, meaning that the error term O * 8 is in fact O(P 4 i P 4 r ).Sufficiently accurate for our purposes are the asymptotics λ 1 = 1 − P i P r + O(P 2 i P 2 r ) and C 1 = 1 + O(P i P r ).One of the eigenvalues is 0, therefore a representation π Πn 1 = C 1 λ n 1 + C 2 λ n 2 as before also holds in this case, with C 2 = O(P i P r ), and , leads to (20) via ( 14), taking care of error terms as above.
Corollary 4.4 The variances of as n → ∞, uniformly for P i ∈ D {i} δ , resp.for (P i , P r ) ∈ D {i,r} δ .
In order to obtain asymptotics for the covariance Cov (X i,i = 0)P(X (n) r,r = 0), we need the following result.Lemma 4.5 Let A ∈ R k×k , with k ≥ 2, have spectral radius ρ(A) ≤ 1 and Frobenius norm ∥A∥ F = C ′ .Then, with C := max(C ′ , k) and C ′′ := 2C k−1 , we have Proof.We use Schur decomposition, according to which there is a unitary matrix Q such that Ā := QAQ −1 is upper triangular and satisfies ρ( Ā) = ρ(A) and ∥ Ā∥ F = ∥A∥ F .Then also The number of distinct adjacent pairs in geometrically distributed words as we now show.Note that ( Ān ) i,i+ℓ is a sum of products āi0,i1 • āi1,i2 • • • āin−1,in , where the sum extends over all sequences (i k ) n k=0 that are increasing with i 0 = i and i n = i + ℓ.Such a sequence has at least one and at most ℓ jumps.For j satisfying 1 ≤ j ≤ ℓ, there are ℓ−1 j−1 ways to accommodate j jump heights (h m ) j m=1 , and for each of those there are n j ways to position those j jumps.In terms of cumulated jump heights where ā is a product of n − j diagonal elements of Ā, and therefore satisfies | ā| ≤ 1.Furthermore, , so by observing that the product is increasing, we can extend the estimate (23), We now turn to asymptotics of covariances.Lemma 4.6 For i ̸ = r and (P i , P r ) ∈ D {i,r} δ we have, for nP i P r (P i + P r ) 2 = O(1), Proof.We first find asymptotics of λ 1 and C 1 from P(X proceeding as in the previous lemma.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =   P e P i P r P e 0 P r P e P i 0 Again, we can also replace P e by 1 − P i − P r and use gfun.From Lemma 4.1 we know that λ 1 , C 1 and Louchard et.al holds for (P i , P r ) ∈ D {i,r} δ . In fact, we will only need that those functions are C 2 in the following.Note that P(X (n) i,i = 0) can be obtained from (25) as the limiting case P r → 0. Observe that we have Φn(P i , Pr) Φn(0, Pr)Φn(P i , 0) = lim Pr →0 Φn(P i , Pr) Φn(0, Pr)Φn(P i , 0) = 1, and therefore C1(Pi,Pr) C1(0,Pr)C1(Pi,0) = 1 + O(P i P r ) and Φn(Pi,Pr) Φn(0,Pr)Φn(Pi,0) = 1 + O(P i P r ).To see that the latter holds uniformly in n and (P i , P r ) ∈ D {i,r} δ , we start defining Π := Π − λ1 uv vu, so that Π = λ1 uv vu + Π, and where we used that u and v are in the left resp.right kernel of the matrix Π.
From (13) we derive Cov (X , that together with ( 24), where we use implies the next corollary, since e − n , for nP i P r (P i +P r ) 2 = Ω(1).
Corollary 4.7 For i ̸ = r, the covariance of as n → ∞, uniformly for In this subsection we use the results on variances and covariances in the case of avoided pairs of identical letters, that we have derived so far, to furnish a proof of equation ( 4) of Theorem 2.1.
is asymptotically given by given in (1).In particular the contribution of covariances is negligible.
Proof.Dealing with covariances first, note that (27) guarantees that the double sum of covariances r,r ) makes a negligible contribution to the variance of X This follows from the following general result: If for some c < 1 a set P = {x i : i ∈ N} satisfies x i > 0 and xi+1 xi ≤ c for i ∈ N, then x∈P x α e −x < ∞.For a proof observe that there is a constant With the help of (28) we find i≥1 r≥1 This leads to i̸ =r Cov (X

Louchard et. al
We now turn to i≥1 VarX (n) i,i .Observe that the sum of error terms from (21) satisfies by (28).Therefore, up to an error term O 1 √ n , the variance Var which can be evaluated using G from Appendix A.1, directly leading to S (n) 1 from (1).

Contribution of covariances to the variance of X (n) 2
In this subsection we will prove the following lemma, which will also imply equations ( 5) and ( 6) of Theorem 2.1.
Lemma 4.9 The variance of is asymptotically given by Var X where H(i, j, k) = (e nPiPj P k − 1)e −nPiPj −nPj P k , and are given in (2) and (3).Only covariances Cov (X k,j ), with i, j, k all different, and Cov (X Proof.We start considering distinct forbidden pairs (i 1 , j 1 ), (i 2 , j 2 ), where we allow i 1 ̸ = j 1 or i 2 ̸ = j 2 or both, and are again interested in negligibility of covariance contributions.Let J := {i 1 , j 1 , i 2 , j 2 }, and assume P i > 0 for i ∈ J, as well as P e := 1 − i∈J P i ≥ δ.Define the matrix Π with rows and columns indexed by the set J ∪ {e} (which we assume ordered, starting with e and followed by the elements of J in ascending order) via Πi,j := 0, (i, j) ∈ {(i 1 , j 1 ), (i 2 , j 2 )}, P j , else, We will have to distinguish several cases, which however share some common features: The sought probability can be expressed as where, as previously observed, λ 1 , C 1 and Φ n for n ≥ 1 are C ∞ functions on an open superset of D J δ .Limits lim n→∞ Φ n = 1, lim n→∞ ∂Φn ∂Pi 1 = 0, etc., will again be uniform for with implied constant independent of n. (This independence can be shown as in the proof of Lemma 4.6.)As we will see, more accurate representations for λ 1 , complementing those obtained by Algorithm 1, can always be found in the form where Q = O * 3 and Q ≥ 0. We will observe, that in each of the cases Q = i,r,t:(i,r),(r,t)∈{(i1,j1),(i2,j2)} ) (depending on whether (i 1 , j 1 ) = (i, i) or (i, r), we have Q * = P 3 i or Q * = 0, and similarly for Q • , see the proof of Lemma 4.3), we will obtain in most of the cases where the error term needs justification in each of these cases.In some cases this is done by employing the MVT, as in the proof of Lemma 4.6.This results in the following expression for a quotient of probabilities, that directly leads to an expression for the covariance, where we denote valid for nP i1 P j1 P i2 P j2 = O(1).It will turn out that in some of the cases we have Q = 0.In cases where Q > 0 we always have Q = O i∈J P i and Q ≤ 1−δ 2 ε, with ε := P i1 P j1 + P i2 P j2 .Using the latter, and (13), as well as In case of nP i1 P j1 P i2 P j2 = Ω(1) we use ( 13) to obtain Cov (X and all this results in Cov (X

Louchard et. al
We distinguish the following cases, only Cases 1, 5 and 6 involving Q ̸ = 0, and Case 6 slightly deviating from the general pattern outlined above.
The matrix Π and its characteristic polynomial p are given by Π =     P e P i P r P t P e P i 0 P t P e P i P r 0 P e P i P r P t     , p(λ) = λ 4 − λ 3 + P r (P i + P t )λ 2 − P i P r P t λ.
Using Algorithm 1 and ( 18), we obtain λ 1 = 1 − P i P r − P r P t + P i P r P t + O * 4 , C 1 = 1 + P i P r + P r P t − 2P i P r P t + O * 4 .We can see that λ 1 = 1 − P i P r − P r P t + P r P i P t + P 2 r O * 2 holds, by noting that λ 1 is a C ∞ function of the coefficients P r (P i + P t ) and −P i P r P t of the polynomial p, and terms of order 2 or higher contribute P 2 r O * 2 .Thus, by the MVT, for some 0 < p i < P i , 0 < p t < P t , So ( 31) is established with Q = P i P r P t , which indeed satisfies Q ≤ 1 4 P r (P i + P t ) ≤ 1−δ 2 ε, since δ ≤ 1 2 .Case 2a: Pairs (i, r), (i, t) with i, r, t all different.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =     P e P i P r P t P e P i 0 0 P e P i P r P t P e P i P r P Again, λ 1 is a C ∞ function of the coefficient P i (P r + P t ), leading to λ 1 = 1 − P i P r − P i P t + P 2 i O * 2 , which we use to derive ln( λ1 λ * λ• ) = P r P t ∂ 2 ln λ1 ∂Pr∂Pt (p r , p t ) = O(P r P t P 2 i ), yielding (31) with Q = 0. Case 2b: Pairs (r, i), (t, i) with i, r, t all different.Here the matrix (call it Πb ) can be seen to be a similarity transformation involving diagonal matrices of the transposed matrix (call it Πa ) in Case 2a, more precisely, with p := π Π = [P e , (P i ) i∈I ], we have 1, and implying that p(λ), λ 1 , C 1 , and also the covariance, are the same as in Case 2a.
Case 3: Pairs (i, i), (r, t) with i, r, t all different.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =     P e P i P r P t P e 0 P r P t P e P i P r 0 The number of distinct adjacent pairs in geometrically distributed words

21
Denoting by λ • = lim Pi→0 λ 1 the largest zero of λ 2 − λ + P r P t , and r(λ) = p(λ) λ , we compute and conclude by the implicit function theorem, using λ • = 1 + O * 2 , that there is a unique C ∞ function µ of P i , P r , P t near the origin, satisfying µ(0, 0, 0) = −1, such that λ 1 = λ • + P 2 i µ.This leads to ∂ 2 λ1 ∂Pi∂Pr = O(P i ), and similarly ∂ 2 λ1 ∂Pi∂Pt = O(P i ), resulting in ln( λ1 λ * λ• ) = O(P r P t P 2 i ), yielding (31).Case 4: Pairs (i, j), (r, t) with i, j, r, t all different.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by P e P i P j P r P t P e P i 0 P r P t P e P i P j P r P t P e P i P j P r 0 Observe that ∂ 2 ln λ1 ∂Pi∂Pr = O * 2 and ∂ 2 ln λ1 ∂Pj ∂Pt = O * 2 lead to ln( λ1 λ * λ• ) = O(P i P j P r P t ), yielding (31).
Case 5: Pairs (i, r), (r, i) with i, r different.The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =   P e P i P r P e P i 0 Note that λ 1 is a C ∞ function of the coefficients P i P r and P i P r (1 − P i − P r ), leading to which, together with λ * = λ • = 1 − P i P r + O(P 2 i P 2 r ), we use to derive This is in accordance with (31), with Q = P 2 i P r + P i P 2 r = P i P r (P i + P r ) The matrix Π, its characteristic polynomial p, and asymptotics of λ 1 and C 1 are given by Π =   P e P i P r P e 0 0 We start deriving the more precise estimate λ 1 = 1 − P 2 i − P i P r + P 2 i P r + P 3 i + P 2 i O * 2 :

22
Louchard et.al Abbreviating σ = P i + P r , κ = P i − P 2 i , we use p(λ 1 ) = 0 to infer the existence of a function µ that satisfies λ 1 = 1 − κσ + P 2 i µ.Indeed, from we conclude by the implicit function theorem that there is a unique C ∞ function µ of P i , P r near the origin, satisfying µ = O * 2 .Since lim Pr→0 λ 1 = λ * and lim Pr→0 λ • = 1, we have λ1 λ * λ• = 1 + O(P r ).This estimate will now be refined.From This is not quite ( 31), but Q = P 2 i P r = Pr 2 P 2 i + Pi 2 P i P r ≤ 1−δ 2 ε is satisfied, and O(P 2 i P r ) turns out to be a sufficiently good substitute for O(P i1 P j1 P i2 P j2 ).
Here the matrix Π can be seen to be a similarity transformation of the transposed matrix in Case 6a, implying that p(λ), λ 1 , C 1 , and also the covariance, are the same as in Case 6a.We summarize the covariances Cov (X We continue showing that the multiple sums of error terms arising in ( 22) and Cases 1-6 are negligible.
In addition to (28) we will also use that i,k≥1 This can be deduced from (28), using β = 1, observing i,k≥1 and furthermore where for the inner sum (w.r.t.t) we used (28).Similarly Cases 2 give rise to triple sums of order O ln n n .The same is true for Case 3, which is seen by upper bounding the triple sums by where α ∈ {1/2, 1}.Finally, the following estimates i,r≥1 i,r≥1 , deal with Cases 6.The total contribution of error terms is therefore of order O ln n √ n .We are left with dealing with the sums of the main terms of Cases 1 and 5, and (22).Note that Case 1 has a twin case, Cov (X and (e a+b −1) = (e a −1)+(e b −1)+(e a −1)(e b −1) where we have estimated two of the sums using ( 28) and ( 33).Asymptotics of the sum i,j,k≥1 H(i, j, k) as given in (3).The sum which, as we have seen, is an asymptotic equivalent of i,r≥1 Var X as given in (2).This completes the proof of the lemma, and also proves ( 6), as we have seen, that multiple sums of covariances Cov (X Remark 4.10 Along the lines of the two preceding proofs an independent proof of Theorem 3.7 could easily be furnished.We would use (13), ( 19), (20) to identify i≥1 (1 − e −nP 2 i ) and i̸ =j (1 − e −nPiPj ) as asymptotic equivalents of EX

More than two pairs of identical letters
We now turn to the case of k pairs (i 1 , i 1 ), . . ., (i k , i k ), allowing for k > 2. Lemma 4.11 Fix a set I := {i 1 , i 2 , . . ., i k } of size k, assuming i k < . . .< i 1 , and thus P i1 < . . .< P i k .Let ε := i∈I P 2 i .Then we have with all λ i different, and error terms holding uniformly in k.More precisely, we have again with error terms holding uniformly in k.
Proof.As before, we let e := N \ I and P e := 1 − i∈I P i , and introduce the matrix .
In order to find eigenvalues and corresponding left and right eigenvectors of Π, we have to solve the following systems, Note that (µ i ) i∈I solves the right system if and only if (β i ) i∈I = (P i µ i ) i∈I solves the left system.From the left system we easily obtain and, upon inserting into the first equation of the left system, There are at most k + 1 different solutions to (39), those being exactly the eigenvalues of Π. Defining f (λ) := λ − 1 + i∈I P 2 i λ+Pi , we observe the following k + 1 sign changes on the interval from which we obtain the result regarding the locations of the eigenvalues.We continue with the proof of (34).The first estimate, O( 1 n ), directly follows from (13).For the second, note that ε ≤ 1/4 implies P i k ≤ 1/2.We then use (26) and S and w as defined in the proof of Lemma 4.1.Then for some orthogonal matrix Q the matrix Π := QS ΠS −1 Q −1 is diagonal and satisfies ρ( Π) = |λ k+1 | < P i k ≤ 1/2, and |λ k+1−j | < P i k−j ≤ q j 2 for j ≥ 1, implying ∥ Πn ∥ F ≤ 1 1−q 2 −n .This leads to Turning now to asymptotic expansions of λ 1 and C 1 , we first provide a convenient representation of the latter in the spirit of (39), starting from (18), (40) Note that asymptotic estimates of higher order than those given in ( 35) and (36) could easily be obtained by Algorithm 1, but as we need error terms uniformly in k, we choose another route.We assume ε ≤ 1/9 and observe f (1 39), we obtain proving (35).Similarly, (36) follows from (40), using λ 1 + P i ≥ 1 − 3 2 ε ≥ 5/6: This completes the proof of the lemma.
Proof of Theorem 2.2: We first prove (7) in the case that x i = 0 for all i ∈ I. Letting ε := i∈I P 2 i again, by the previous lemma we have By letting P j → 0 for j ∈ I \ {i}, we obtain and finally using i∈I e −n(P 2 i −P 3 i ) ≤ e −nε(1−P1) , and the fact that e −x(1−P1) (x + x 2 ) is bounded for x ≥ 0. Clearly, equation ( 7) holds for I = {i} and all x i ∈ {0, 1}.Assume that equation ( 7) has been shown for all I with |I| = k.Consider I ′ with |I ′ | = k + 1.Then, as we have just shown, equation ( 7) holds for I ′ when i∈I ′ x i = 0.It also holds when i∈I ′ x i = 1: The number of distinct adjacent pairs in geometrically distributed words 27 so, by taking the difference of these equations, we have Similarly, by induction on κ := i∈I ′ x i , we can prove that (41) holds for all I ′ with |I ′ | = k + 1 and all x ∈ {0, 1} k+1 .Clearly the error terms O 1 n may now suffer from dependence on |I|, but not on I, as the values {P i } i∈I did not enter the proof.
We conclude this subsection with the following conjecture.
Conjecture 4.12 The same kind of asymptotic independence as in Theorem 2.2 holds for X (n) ki,mi i≥1 , when the sets {k i , m i } are pairwise disjoint.

Some further results on the probability of avoiding a prescribed set of pairs
In this section we aim at a better understanding of λ 1 and C 1 given in ( 14), as examples like from the proof of Lemma 4.3, resp.from Case 6a in the proof of Lemma 4.9, suggest that there may be a simple relationship between λ 1 and C 1 .This turns out to be the case, see ( 43) below, and our method of proof also allows for a representation of the generating function of the probabilities in ( 14).
Besides shedding light on above mystery, we hope that the results of this section will turn out useful when computing asymptotics of higher moments of and X (n) 3 , a task however not further pursued in the present paper.We start with a finite non-empty set of forbidden pairs I := {(k i , m i ) : i ∈ I} and let J := i∈I {k i , m i }.
Using (42), we can express λ 1 = λ(1) in terms of (ψ i ) i≥2 as follows, This is found by computing the ninth Taylor polynomial of λ(v) at v = 0 and evaluating it at v = 1.
Clearly, more terms of λ 1 can easily be extracted using gfun.Furthermore, by (43), we have The expansion obtained from (44) also turns out to use only (ψ i ) i≥2 , and starts

Combinatorial Pattern Matching Approach
For a combinatorial approach, we utilize the methodology of Bassino et al. (2012).The full strength of Bassino et al. (2012) is not needed, because (in the present analysis) we are only studying "reduced" sets of patterns.In a reduced set of patterns, no word is a subword of another word.Here, we are always analyzing patterns of length 2, so our patterns are necessarily (already) reduced.So we only need to understand Sections 4.1 and 4.2 of Bassino et al. (2012).
Since we follow the notation and overall approach of Bassino et al. (2012), the reader might want to review the first 10 pages of Bassino et al. (2012), through Section 4.2.The basic methodology is to use an inclusion-exclusion approach to enumerating patterns.This approach allows an exact derivation of the probabilities of each set of patterns.For this approach, Section 4.1 of Bassino et al. (2012) explains how to utilize decorated texts, in which some occurrences of patterns are "distinguished" (while others might not be distinguished).
Collections of overlapping distinguished texts are gathered together into clusters.With this methodology, "the set of decorated texts T decomposes as sequences of either arbitrary letters of the alphabet A or clusters: w) , where π(w) is the probability of a text, and τ (w) is the number of distinguished occurrences of subwords in w, the generating function of all decorated texts is T (z, t) = 1/(1 − A(z) − ξ(z, t)).
Finally, using inclusion-exclusion, it follows that the probability generating function F U (z, x), in which powers of z mark the length of texts, and powers of x mark the total number of occurrences of patterns in U, we obtain ).This is the set of core ideas from Bassino et al. (2012) that forms the foundation of the analysis in the present section.We define X (n) as the total number of distinct (adjacent) pairs in a word Z 1 , . . ., Z n , and we have Note 5.1 The roots of the polynomials in the denominators of the generating functions in Table 1 and in Table 2 exist and are unique (or there is a removable singularity that can be defined by using continuity).
Lemma 5.2 For n ≥ 2, and for i ̸ = j, the probability that ij occurs (at least once) as an adjacent pattern in Z 1 , . . ., Z n is exactly Proof.The proof of Lemma 5.2 is in subsection 5.2.1.
Lemma 5.3 For n ≥ 2, the probability that ii occurs (at least once) as an adjacent pattern in Z 1 , . . ., Z n is exactly  Again, for n < 2, we have E[X (n) i,i ] = 0.
Proof.The proof of Lemma 5.3 is in subsection 5.2.2.
The number of distinct adjacent pairs in geometrically distributed words 35

Main results
By adding the results from Lemmas 5.2 and 5.3, we establish the following theorem: Theorem 5.4 For n ≥ 2, the mean number of distinct (adjacent) pairs in a word Z 1 , . . ., Z n is exactly For n < 2, we have E[X (n) ] = 0.
In Section 5.3, we give all of the analogous parts of the analysis for E[(X (n) ) 2 ], but we do not wrap the results into a statement in a theorem, because the second moment has many parts, and the notation is cumbersome.ξ(z, t) = P i P j tz 2 .

Analysis
The generating function of the decorated texts (with z marking the length of the words, and t marking the number of decorated occurrences of ij, and the coefficients are the associated probabilities) is where A(z) = z is the probability generating function of the alphabet A. Now we use F (z, x) to denote the bivariate probability generating function of occurrences of ij (with z marking the length of the words, and x marking the number of occurrences of ij, and the coefficients are the associated probabilities), i.e., we define F (z, x) := ∞ n=0 ∞ k=0 P (Z 1 , . . ., Z n has exactly k occurrences of ij as a subword)x k z n .

Louchard et. al
It follows that the probability generating function of the words with at least one occurrence of ii and at least one occurrence of kk is The partial fraction decomposition for the second term is given in Table 1B.The third term is the same as the second term, using k instead of i.The partial fraction decomposition for the fourth term is given in Table 2E. .
It follows that the probability generating function of the words with at least one occurrence of ij and at least one occurrence of kℓ is The partial fraction decomposition for the second term is given in Table 1B.
The partial fraction decomposition for the third term is given in Table 1A, using k and ℓ instead of i and j.
The partial fraction decomposition for the fourth term is given in Table 2F.
5.3.3i ̸ = j and k = ℓ 5.3.3.1 k = ℓ and i and j are distinct Same as section 5.3.2.1 but with i and k exchanged, and with j and ℓ exchanged.
5.3.3.3 k = ℓ = j ̸ = i Same as section 5.3.2.3 but with i and k exchanged, and with j and ℓ exchanged.It follows that the probability generating function of the words with at least one occurrence of ij and at least one occurrence of kℓ is + 1 1 − z + P i P j z 2 + P k P ℓ z 2 The partial fraction decomposition for the second term is given in Table 1A.The partial fraction decomposition for the third term is given in Table 1A, using k and ℓ instead of i and j.The partial fraction decomposition for the fourth term is given in Table 1C.5.3.4.2 k = i and j and ℓ are distinct The clusters are ij and iℓ.So, by the same analysis from section 5.3.4.1, we get Exactly as in section 5.3.4.1 above: The partial fraction decomposition for the second term is given in Table 1A.The partial fraction decomposition for the third term is given in Table 1A, using ℓ instead of j.The partial fraction decomposition for the fourth term is given in Table 1C, using i instead of k.
from the quadruple sum of covariances of different pairs, of order Θ(1).All of S simulations use p = 1/4.
to define ψ 1 := p1 = j∈J P j = 1 − P e and ψ i+1 := pΠ ∼ ∼ i 1 = k0,...,ki:(k0,k1),...,(ki−1,ki)∈IP k0 P k1 • • • P ki , for i ≥ 1.Note that ψ 2 = ε, with ε introduced in Lemma 4.1, and ψ 3 is a generalization of Q introduced in (30).Moreover ψ i ≤ (1 − P e ) i = O * i holds for i ≥ 1.Denote the identity matrix of appropriate dimension by I and define a meromorphic function in terms of a resolvent, = 0, i ∈ I), Louchard et.al and for v ∈ [0, 1] consider now the functions C(v) and λ(v) defined via (14) by p (n) I ((vP j ) j∈J ) ∼ C(v)λ(v) n .Arguing as in the proof of Lemma 4.1, i.e., invoking the Perron-Frobenius theorem and the implicit function theorem, these functions are analytic in an open subset of C containing the interval [0, 1].The following theorem shows how to express λ(v), C(v), and P I (z) := n≥0 p (n) I z n , in terms of Ψ. Theorem 4.13 The function λ(v) is a solution to the following equation, −s)(t−s)(1−z/s) + st (s−r)(t−r)(1−z/r) coeff. of z n (1−PiPj t 2 )rs (r−t)(s−t)t n + (1−PiPj s 2 )rt (r−s)(t−s)s n + (1−PiPj r 2 )st (s−r)(t−r)r nTab.2: of the average number of distinct (adjacent) pairs 5.2.1 Analysis of distinct (adjacent) two letter patterns ij with i ̸ = j If we fix i ̸ = j and we analyze the occurrences of the pattern ij, then the only "cluster" (to use Bassino et al.'s terminology) is ij itself.So the generating function ξ(z, t) of the set of clusters C = {ij} becomes only (compare with (6) in Bassino et al.): 5.3.2i = j and k ̸ = ℓ 5.3.2.1 i = j and k and ℓ are distinct The clusters each have the form ii • • • i or kℓ, i.e., they are all words that consist of either 2 or more consecutive occurrences of i, or simply the word kℓ.So ξ(z, t, u) of the set of clusters C = {ii, iii, iiii, iiiii, . . ., kℓ} becomesξ(z, t, u) = P 2 i tz 2 1 − P i tz + P k P ℓ uz 2(with z marking the length of the words, and t marking the number of decorated occurrences of ij, and u marking the number of decorated occurrences of kℓ, and the coefficients are the associated probabilities).− P k P ℓ uz 2 , and F (z, x, y) = T (z, x − 1, y − 1(x−1)z − P k P ℓ (y − 1)z 2