Waiting Time Distribution for Pattern Occurrence in a Constrained Sequence: an Embedding Markov Chain Approach

In this paper we consider the distribution of a pattern of interest in a binary random ( d, k ) -sequence generated by a Markov source. Such constrained sequences are frequently encountered in communication systems. Unlike the previous approach based on generating function we have chosen here to use Markov chain embedding techniques. By doing so, we get both previous results (sequence constrained up to the r th occurrence), and new ones (sequence constrained up to its end). We also provide in both cases efﬁcient algorithms using basic linear algebra only. We then compare our numerical results to previous ones and ﬁnally propose several interesting extensions of our method which further illustrate the usefulness of our approach. That is to say higher order Markov chains, renewal occurrences rather than overlapping ones, heterogeneous models, more complex patterns of interest, and multistate trial sequences.


Introduction
The binary sequences used in communications systems (like magnetic or optical recording systems) are often subject to technical constraints.The simplest of these constraints allows runs of zeros of specific lengths.Several authors (Zehavi and Wolf, 1988;Marcus et al., 1998;Lothaire, 2005) have considered such particular constraint sequences called (d, k)-sequence containing no run of zeros of length smaller than d or greater than k.It is clear that this constraint is equivalent to the case when forbidden patterns are introduced.For instance, these forbidden patterns are '11' and '00000' in a (1, 4)-sequence, they are '11', '101' and '0000' in a (2, 3)-sequence.In this paper, we consider the distribution of a given pattern of interest (for example '100100100') in a random (d, k)-sequence generated by a Markov source.
Recently, this problem has been considered by Stefanov and Szpankowski (2007) where the authors use a sophisticated approach based on generating functions.In this paper, we introduce an alternative approach to the same problem using the well known technique of Markov chain embedding (Fu, 1996;Chadjiconstantinidis et al., 2000;Antzoulakos, 2001;Fu and Chang, 2002) which will allow us both to get the same results as in a previous paper, but also several interesting extensions.One should note that we use the same notation as that in Stefanov and Szpankowski (2007) to facilitate the comparison.
Formally, let X = X 1 X 2 . . .be an homogeneous Markov chain over {0, 1} with transition matrix π and starting distribution µ 1 .We denote by N i (resp. N w = w 1 w 2 . . .w m (resp.the forbidden patterns w ∈ W) in X 1 X 2 . . .X i and by Y r the waiting time of the r th occurrence of w.Then, we consider the following two probabilities of interest: the distribution of Y r given the sequence is constrained up to n, and the distribution of Y r given the sequence is constrained up to Y r .This paper is organized as follows: Section 2 presents our main results starting with the embedding Markov chain allowing to keep track of both occurrences of the pattern of interest and of the forbidden patterns (2.1) followed by its applications to probability computations given the constrained sequence up to Y r (2.2) like it is done in Stefanov and Szpankowski (2007) or up to n (2.3) which is a new result.In both cases, we derive efficient algorithms to compute these probabilities by using basic linear algebra only.Then we give a numerical example, we compare our results to the ones from previous paper (2.4), and we study the practical limits of our algorithm through intense testing (2.5).In Section 3, we propose several extensions of the method: higher order Markov chains (3.1), renewal occurrences rather than overlapping ones (3.2), heterogeneous models (3.3), more complex patterns of interest (3.4) and multistate trial sequences (3.5).
Theorem 1 Let X = X 1 X 2 . . .a random sequence over P be defined by: Then X is an homogeneous Markov chain whose transition matrix Π is given for all (possibly empty) words p and q and for any a, b ∈ {0, 1} such as pa, qb ∈ P by: and we obtain the following properties: hal-00972328, version 1 -3 Apr 2014 i) w ends in position i in X ⇐⇒ X i = w; ii) for all w ∈ W, w ends in position i in X ⇐⇒ X i = w.
Proof: X is obviously a Markov chain and the expression of its transition matrix is straightforward to establish.We then remark that the Deterministic Finite Automaton (DFA) defined on the state space P ∪ {ε} (ε denotes the empty word), the finite alphabet {0, 1}, with ε as starting state, {w} ∪ W as set of final states and δ defined for all p ∈ P ∪ {ε} and a ∈ {0, 1} by as transition function is exactly the Aho-Corasick DFA (Aho and Corasick, 1975) that allows us to count simultaneously w and all w ∈ W. The properties i) and ii) directly come from those of this DFA. 2 It should be noted that we have chosen here to explicitly use the Aho-Corasick DFA in the construction of our Markov chain embedding while most authors working on the subject usually use it implicitely (for instance, see Chang, 2005).
Corollary 2 For all r 0 the sequence ( Z r i ) i 1 defined for all i 1 by: is a Markovian sequence whose starting distribution a r and transition matrix A r are defined for all 0 i, j r + 1 by size |P| blocks with: where the blocks are ordered as follows: block 0 (P, 0, 0), block 1 (P, 1, 0), . . ., block r − 1 (P, r − 1, 0), block r (P, r+, 0) and block r + 1 (P, •, 1+); and where we decompose the transition matrix Π into: where Q (resp.Q) contains all transitions ending in {w} (resp.W) and P the remaining ones.
Proof: For all 0 i < r, a transition from the block (P, i, 0) to the same block does not allow any occurrence of w nor any w ∈ W to appear which means that the transition P must be used.If the transition is now from (P, i, 0) to (P, i + 1, 0), hence one occurrence of w just occurred, and we have to use the transition matrix Q.If a forbidden pattern occurs, then the transition from block (P, i, 0) to (P, •, 1+) obviously uses Q.The expression of the remaining transitions simply relies on the same mechanism. 2 Require: Two arrays of dimension r × |P|: u and v; two real numbers: sum1 and sum2

Yr
= 0) for any r 1 and n 2. All vector × matrix products (lines 8 and 13) use the sparse structure of matrices P and Q; the expression u • v (lines 7 and 12) denotes the scalar product of the two vectors.One can efficiently compute P(Y r > n|

The sequence is constrained up to Y r
Proposition 3 For all r 1 and for all i 2 we have: where T denote the transpose operator and the two row-vectors u r and v r are defined for size |P| block 0 i r − 1 by: (For all p ∈ P, e p denotes the indicatrix row-vector of state p) and two matrices R r and S r and are defined for the block (i, j) with 0 i, j r − 1 by: = 0 that means that Z r i−1 belongs to the block (P, r − 1, 0) and that w appears in position i which means that the transition Q is used.R r is hence the submatrix of A r corresponding to the blocks (P, j, 0) with 0 j r − 1, and S r is a same dimension matrix allowing the product of the most right-handed block by Q. 2 Require: two arrays of dimension r × |P|: u and v; two arrays of dimension |P|: x and y; three real numbers: sum1, sum2 and den 1: INITIALIZATION: 2: sum1 = 0 3: u[0] = µ 1 and u[j] = 0 for all 1 j r − 1 4: v[j] = 0 for all 0 j < r − 1 and v[r − 1] = Qe T w 5: FINITE SUM: 6: for i = 2 . . .n do 7: Corollary 4 For all r 1 and n 2 we have: Proof: This simply results from a combination of Equation (2) and Equation (3). 2 Algorithm 1 is a straightforward implementation of (4) using the special structure of R r to get an efficient iterative computation of vector × matrix products.The tail sum is here computed numerically which can be a source of error due to the finite level of precision.An alternative could consist in using the inverse of I − R r to immediately get this tail sum as suggested by the following corollary.
Corollary 5 If I − P is invertible then for all r 1 and n 2 we have: where T r = (I − R r ) −1 is defined on the block (i, j) with 0 i, j r − 1 by: Proof: Thanks to the definition of R r it is clear that [(I −R r )×T r ](i, j)(I −P )×T r (i, j)−Q×T r (i+1, j).It is then easy to verify that such a block is either I if i = j or 0 otherwise thus proving that (I − R r ) × T r = I.A similar approach shows that T r ×(I −R r ) = I.We then just have to replace

Require
This result allows us to get Algorithm 2 which is more robust numerically than the previous one, and also faster.However, and unlike Algorithm 1, this algorithm cannot be extended to a heterogeneous model.

The sequence is constrained up to n
Proposition 6 For all r 0 and n 1 we have: where the two vectors y r and z r are defined for the size |P| block 0 i r by: and the matrix T r is defined for the block (i, j) with 1 i, j r by: ) we need to compute the probability that Z r n belongs to blocks (P, r+, 0).The transition matrix T r is hence simply a submatrix of A r . 2 Corollary 7 For all r 1 and n 1 we have: where 1 is a row-vector of ones (of the appropriate dimension).
Proof: This simply results from a combination of Equation ( 1) and Equation (6). 2 A straightforward implementation of (7) gives Algorithm 4. Like in Algorithm 1, we take advantage of the sparse structure of matrices P and Q through a very natural recurrence expression of vector × matrix products with T r .
The results we get in Table 1 for P(Y r n| N (d,k) Yr = 0, X 1 = 0) are exactly the same as the ones from Stefanov and Szpankowski (2007).This is something rather positive since the two approaches use completely different techniques (Markov chain embedding here and generating functions in the previous paper).It is also interesting to note that even with our slow Scilab implementation (from our personal experience one can expect at least to decrease by a factor 20 the computational time with a pure C implementation of the same algorithms), the computation time remains relatively small.Since working conditionally to the constraint up to Y r involves an infinite sum while the constraint up to n only requires a finite one, the computational times for the first probability are longer than the second ones, but grows linearly with r in both cases.

Numerical limits of the presented algorithms
The space and time complexities of all algorithms we have presented in this paper are strongly connected to three parameters: the number r of occurrences for the pattern of interest, the sequence length n, and the size |P| depending both on the pattern of interest and on the set of forbidden patterns.These complexities are linear in all theses parameters, except for Algorithm 2 where there also is a small quadratic contribution of r when the tail distribution is requested.
In Table 2 we study the practical limits of our algorithms by considering patterns whose size ranges from 9 to 180, and various values of r and n.In this table, we can see that the numerical convergence used in Algorithm 1 provides reliable results (with only two degenerated cases where Algorithm 1 fails to give the correct answer).However, Algorithm 2 proves itself to be faster in all but one case (j = 2, r = 15, n = 500).
3 Extensions of the method

Markov chain of higher order
It is easy to adapt our method to the case where X is an order m Markov chain.To do so, we just have to replace P by the union of {0, 1} m with all the prefixes of w and all w ∈ W whose length is at least m.The algorithms then remain the same except that the starting distribution µ 1 over {0, 1} must be replaced by a starting distribution µ m over {0, 1} m and hence, the first transition to consider is to position i = m + 1 rather than i = 1 + 1 = 2 (line 6 in Algorithm 1 and Algorithm 4).Tab.2: Numerical performance of Algorithm 1 and Algorithm 2 to compute the distribution of the pattern w = (100100100) j (ex: with j = 2, w = 100100100100100100) in a (1, 4)-sequence starting with X1 = 0 and with the following transition probabilities: π0,0 = 0.4, π0,1 = 0.6, π1,0 = 1.0 and π1,1 = 0.0.n∞ is the largest value of n used in the infinite sum; a finite value corresponds to the position where numerical convergence is achieved in Algorithm 1 while an infinite one corresponds to exact result of Algorithm 2. Algorithm 1 failed to produce a result on two occasions (r = 150, n = 500 for j = 10 and j = 20).
For example let us consider the order m = 2 Markov chain whose transition probabilities are given by: π 00,0 = 0.4, π 00,1 = 0.6, π 01,0 = 1.0, π 01,1 = 0.0, π 10,0 = q, π 10,1 = (1 − q), π 11,0 = 1.0 and π 11,1 = 0.0 where q ∈ [0, 1].If q = 0.4, the order of the model is reduced to 1, and we get the same model as in the previous numerical example.We can see in Table 3 that in this case, the results are very close to those in Table 1.As a validation, one can get exactly the same result as in Table 1 with q = 0.4 by considering the starting distribution µ 2 (00) = 0.4 and µ 2 (01) = 0.6.However, we should emphasize that this result is not a weighted combination of rows 3 and 4 of Table 3 since the computed probabilities are conditional ones.If we now consider a higher value of q (like q = 0.5) we shall favor the occurrences of '100' thus resulting in higher probabilities.At the opposite, a lower value of q (like q = 0.3) will decrease accordingly our probabilities of interest.

Renewal occurrences
Up to now, we have studied the number N i of overlapping occurrences of our pattern of interest (up to position i).It may also be useful to consider the number N i of renewal occurrences up to position i instead of N i .We recall here that a renewal occurrence is only counted if it does not overlap a previously counted one.For instance, if w = 100100100, there is three overlapping occurrences of w in '01001001001001001' but only one renewal occurrence (the first one).
It is often a great deal of work to study N i rather than N i , it is however possible here with only a small modification of our transition function δ.The basic idea is just that looking for a renewal occurrence of w after one of its occurrences is exactly the same as looking for an occurrence from the start.Hence, simply setting δ(w, 0) = 0 and δ(w, 1) = 1 ensures that not any unwanted occurrence will be counted.
Using the same model (with X 1 = 0) as in Section 2.

Heterogeneous models
Like in Stefanov and Szpankowski (2007), we have supposed that our sequence model is homogeneous.However, it is interesting to observe that our results hold with heterogeneous models.The matrices P and Q must be replaced in all algorithms by their actual value at the current position.

Conclusion
In this paper, we have proposed several methods based on Markov chain embedding techniques allowing us to study the distribution of a pattern of interest in random (d, k)-sequences.The case where the sequence is constrained up to Y r has already been treated by Stefanov and Szpankowski (2007) using generating function but the case where the sequence is constrained up to n is a new result.
In both cases, we suggest efficient algorithms using basic linear algebra only (sums, sparse matrix×vector products, etc.) which are both easy to understand and implement.As a validation, we have compared our numerical results to those of Stefanov and Szpankowski (2007), and they are exactly the same.
We also have demonstrated the flexibility and usefulness of our approach by providing several extensions (renewal occurrences, heterogeneous models, complex patterns of interest, etc.) which are here quite straightforward to obtain while some may be hard to get in the generating functions framework.
of occurrences of the pattern of interest 1365-8050 c 2008 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France hal-00972328, version 1 -3 Apr 2014 Author manuscript, published in "Discrete Mathematics and Theoretical Computer Science 10, 3 (2008) 149--160" w 15: den = lupower(µ 1 , r) × e T w Output: P(Y r n| N (d,k) Yr = 0) = sum1/den Algorithm 2: Algorithm computing P(Y r n| N (d,k) Yr = 0) for any r 1 and n 2. All vector × matrix products (lines 8 and 13) use the sparse structure of matrices P and Q; the expression u • v (lines 7 and 12) denotes the scalar product of the two vectors.One can efficiently compute P(Y r > n| N (d,k) Yr = 0) with the same algorithm by simply returning sum2/den.Lines 11-14 may be omitted if one is not interested in the computation of sum2.Space complexity is O(r × |P|) and time complexity is O(r × |P| × n + r 2 × |P|) (remove the r 2 × |P| term if lines 11-14 are omitted).

:
Two array of size |P|: x and b 1: b = u 2: for i = 1 . . .k do 3: find x such as x(I − P ) = b // sparse LU solving 4: b = xQ // sparse product 5: end for Output: u[(I − P ) −1 Q] k = b Algorithm 3: Procedure lupower(u, k) computing u[(I − P ) −1 Q] k through sequential LU solving of linear equations.The necessary sparse LU factorization has to be performed only once for all lupower calls.Complexities are O(|P|) in space and O(|P| × k) in time.
for w = 100100100 and where N 500 and Y 5 are the renewal version of N 500 and Y 5 .Unsurprisingly, it is more difficult to observe 5 renewal occurrences of w than to observe 5 overlapping ones.