Waiting Time Distribution for Pattern Occurrences in a Constrained Sequence

A binary sequence of zeros and ones is called a (d; k)-sequence if it does not contain runs of zeros of length either lessthan d or greater than k, where d and k are arbitrary, but fixed, non-negative integers and d < k. Such sequences find requires that (d; k)-sequences do not contain a specific pattern w. Therefore, distribution results concerning pattern occurrence in (d; k)-sequences are of interest. In this paper we study the distribution of the waiting time until the r-th occurrence of a pattern w in a random (d; k)-sequence generated by a Markov source. Numerical examples are also provided.


Introduction
In many communication systems, including magnetic and optical recording ones, one must restrict the structure of a bit stream (binary sequence) to a class of sequences satisfying certain constraints.The simplest constrained binary sequences are those in which runs of zeros (between two consecutive 1's) must have length at least d and at most k, where d < k.Such sequences are called (d, k)-sequences (cf.[18,19,32]).For example, in (1, 4)-sequence 11 and 00000 are forbidden runs.In some situations, as observed in [20], one needs to avoid certain patterns in (d, k)-sequences.In this paper, for a given pattern (word) w (w = w 1 w 2 . . .w m ) we study the exact distribution of the waiting time until the r−th occurrence of the pattern w in a random (d, k)-sequence generated by a Markov source.
Pattern matching is a well studied problem.It is motivated by applications in communication theory as well as computational biology where one looks for over-represented or under-represented patterns in order to find useful signals.In general, for a given set of patterns W = {W 1 , . . ., W K }, where the W i are words of the same length, one searches for all W occurrences in a text of length n.(In this paper we only consider a single pattern of length m that we denote by w.)In computer science literature several fast algorithms (e.g., Knuth-Morris-Pratt and Boyer-Moore algorithms) were designed to search for such patterns.Here, we are rather interested in the distribution theory associated with the number of W occurrences in a probabilistic framework where the (constrained) text is generated randomly (a Markov source in our case).
The pattern matching problem (in a probabilistic framework) goes back, at least, to Feller.The number of word occurrences in a random text has been intensively studied over the last two decades, with significant progress in this area being reported [3,4,5,6,7,10,11,13,14,15,18,21,24,25,26,27,28,29,31].For instance, Guibas and Odlyzko [14] revealed the fundamental role played by autocorrelation sets and their associated polynomials.Li [15] and Gerber and Li [13] introduced martingale techniques to the area and combined the latter with a relevant Markov chain embedding.Markov chain embeddings have been widely used by a number of authors (see [6,10,11] and the references in [2,12]).Blom and Thorburn [5] made connections with Markov renewal theory and Biggins and Cannings [4] elaborated on these.Stefanov and Pakes [29] introduced exponential family methodology to the area and Stefanov [27] extended it in combination with suitable Markov renewal embeddings.Régnier and Szpankowski [22,23] established that the number of occurrences of a word is asymptotically normal under a diversity of models that include Markov chains.Nicodème, Salvy, and Flajolet [21] showed generally that the number of places in a random text at which a 'motif' (i.e., a general regular expression pattern) terminates is asymptotically normally distributed.Bender and Kochman [3] studied a generalized pattern W occurrences using (in nutshell) the deBruijn graph representation that allowed the authors to establish the central limit theorem, but without explicit mean and variance.Recent surveys on pattern matching can be found in Lothaire [18] (Chaps.6 and 7).To the best of our knowledge, the distribution theory associated with pattern occurrence in a constrained sequence, such as a (d, k)-sequence, has not been treated in the literature.
A brief description of our problem and methodology follows.Let N n be the number of w (w = w 1 . . .w m ) occurrences in a binary sequence, of length n, generated by a two-state Markov chain X.Throughout the paper such sequences will be called unconstrained sequences whereas (d, k)-sequences will be called constrained sequences.By Y r we define the waiting time until the r-th occurrence of the pattern w in an unconstrained sequence.Bearing in mind that the initial symbol at time zero counts to the sequence length we have for all r, n ≥ 1.This basic renewal equation is the starting point of two different approaches to the analysis of pattern occurrences, on finite alphabets, in unconstrained sequences as surveyed in Chaps.6 and 7 of [18].For example, [3,14,21,22,23] analyze N n , whereas the authors of [24,28,31] study the waiting time Y r , for unconstrained sequences.In the case of constrained sequences we may be interested in either the distribution of N n given the sequence is constrained up to time n, or the distribution of Y r given the sequence is constrained up to time Y r .In other words, denoting by the number of runs of zero of length either less than d or greater than k in an unconstrained sequence of length n, the probabilities of interests are ).Also the evaluation of each of these two conditional probabilities lead to two different problems.For the latter probability we also have In the present paper we deal with P (Y r ≤ n − 1| N (d,k) Yr = 0), whereas in a forthcoming paper we will Of course, the latter probability is of relevance in situations when the constrained sequence has been observed up to time n whereas the former is such when the constrained sequence is observed up to an r−th occurrence of the pattern of interest.
Stefanov [28] provides an original approach for a recursive evaluation of the generating functions of the waiting time conditioned on seeing a portion of the pattern in an unconstrained sequence.Also the approach provides the joint generating functions of the aforementioned waiting time Y r together with the associated counts of relevant events.This paper extends the analysis of [28] to constrained sequences.The case of constrained sequences, when the probability of interest is

Yr
= 0), leads to more general type of events, associated with the above waiting time, than those considered in [28].The key points of that extension are explained in Idea of the Proof inserted immediately after Theorem 1 in the next section.
The paper is organized as follows.In the next section we present our main theoretical results.These provide recursive formulae for computing the joint generating function of the waiting time until seeing the r-th occurrence of a pattern and the associated count of runs of zero of length either less than d or greater than k (the so called forbidden patterns).In the last section we provide numerical examples.

Main Results
We assume that the binary sequences are generated by a a two-state Markov chain X, (X(n), n = 0, 1, . ..).Its transition probabilities are denoted by

Recall that N (d,k)
n counts the number of the so called forbidden patterns up to time n.Denote by the joint generating function of the waiting time, Y r , until seeing the r-th occurrence of the pattern w = w 1 w 2 . . .w m , given the initial symbol of the sequence is s, and the associated count, , of occurrences of runs of zeros of length either less than d or greater than k up to that waiting time.
Note that if G Z1,Z2 (z 1 , z 2 ) is the joint generating function of two nonnegative integer random variables, then for the generating function G Z1|Z2=0 (z) of the conditional distribution of Z 1 , given Z 2 = 0, we have and of course its generating function equals G Y (s) where and by G (z 1 , z 2 ) we denoted the joint generating function of the intersite distance between two consecutive occurrences of the pattern w and the associated count of occurrences of the forbidden patterns.
The remaining part of the paper is devoted to a method for an explicit derivation of the joint generating functions Recall that these generating functions are associated with unconstrained sequences.
Let ν i,j be the transition time from state i to state j in the two-state Markov chain X introduced earlier, that is, and let I νi,j be the associated indicator function of the event "a run of zeros of length either less than d or greater than k has occurred during the transition time ν i,j ."It is assumed that ν i,i = 0.
Introduce the functions g i,j (z 1 , z 2 ) for i, j = 0, 1, as follows: Let ], (i, j) = (1, 0) be the joint generating function of (ν i,j , I νi,j ), if i, j = 0, 1 and (i, j) = (1, 0).Of course because ν 0,0 = ν 1,1 = 0, and subsequently I ν0,0 = I ν1,1 = 0. We define g 1,0 (z 1 , z 2 ) to be the generating function G ν1,0 (z 1 ) of ν 1,0 , that is, The second identity comes from noticing that ν 1,0 is a geometrically distributed random variable with probability of success p 1,0 and support {1, 2, . ..}.In other words, the meaning of g 1,0 (z 1 , z 2 ) is the same as that for the other g i,j (z 1 , z 2 ) with the convention that I ν1,0 = 0.The reasons for defining g 1,0 (z 1 , z 2 ) as if ignoring a possible occurrence of the event of interest within the passage time ν 1,0 , will become clear in the proof of Theorem 1 below.Another generating function of relevance is the joint generating function of (ν 0,1 , I ν0,1 ), given that exactly r zeros are preceding the starting state zero; it is assumed that these zeros are allowed to be counted towards the formation of the event marked by the indicator function I ν0,1 .Denote this joint generating function by g r−0,1 (z 1 , z 2 ).Clearly, g r−0,1 (z 1 , z 2 ) equals the joint generating function of ν 0,1 and the indicator function of the event "a run of zeros of length either less than d − r or greater than k − r has occurred within that transition time".

Lemma 1
The following explicit expressions hold for the joint generating functions g 0,1 (z 1 , z 2 ) and g r−0,1 (z 1 , z 2 ) : where max(i, j) is the maximal of the two integers i and j, and the convention 0 i=1 = 0 applies.Proof: Denote by p i = P (ν 0,1 = i) and note that p i = p 0,1 p i−1 0,0 because ν 0,1 is geometrically distributed with probability of success p 0,1 and support {1, 2, . ..}.Also note that I ν0,1 = 0 if and only if d ≤ ν 0,1 ≤ k.Thus, for the joint generating function g 0,1 (z 1 , z 2 ) we get Simplifying the above expression leads to (3) above.Similar arguments apply for the derivation of the expression for g r−0,1 (z 1 , z 2 ) and the details are therefore omitted.The proof of Lemma 1 is complete.✷ We will derive simple recurrence relations leading to an exact evaluation of the joint generating function (z 1 , z 2 ), which have been introduced above.For the pattern of interest w = w 1 w 2 . . .w m , denote that is, I(s 1 , s 2 , . . ., s r ) = I w1 (s 1 )I w2 (s 2 ) . . .I wr (s r ), where I (•) (•) is an indicator function.For the sake of brevity we introduce the following notation.For each j, j = 2, 3, . . ., m − 1, and each a, a = 0, 1, let where it is assumed that I(w i , w i+1 , . . ., w j , a) = 1 if i > j.Note that L r (j, a) = 1 for r < j if and only if none of w i w i+1 . . .w j a for i = 2, 3, . . .r, is a prefix to w 1 w 2 . . .w m , whereas w r+1 w r+2 . . .w j a is such.Also, L j (j, a) is equal to one if and only if none of w i w i+1 . . .w j a for i = 2, 3, . . .j, is a prefix to w 1 w 2 . . .w m .In other words, the L i (j, a) are relevant indicator functions related to the self-overlapping structure of the pattern w = w 1 w 2 . . .w m .In passing we observe that our definition of L r (j, a) is related to the autocorrelation set and polynomial of Guibas and Odlyzko [14] (cf.also [18,23]).
Let now Y (s) 1 (w j 1 ) be the waiting time to see the pattern w j 1 = w 1 w 2 . . .w j , given the initial state is s.Then we define by the joint generating function of Y (s) 1 (w j 1 ) and the associated count, , of forbidden patterns (runs of zeros of length either less than d or greater than k).Here we allow the first symbol, that is, s, to contribute to the pattern (of course this matters if s = w 1 ).Recall that for j = m the joint generating function be the joint generating function of the same quantities as above, given the initial state (assumed to be zero) is preceded by exactly r zeros and the latter zeros are allowed to count towards the formation of the relevant event concerning the forbidden patterns (in other words the length of the first zero run within the waiting time Y (0) 1 (w j 1 ) is increased by r); also the initial state zero is allowed to contribute to the pattern.Further, let G (w1w2...w h ) j (z 1 , z 2 ) be the joint generating function of the same quantities as above, given the sub-pattern w 1 w 2 . . .w h (h ≤ j) has been reached.
Throughout the article it is assumed that the pattern of interest w = w 1 w 2 . . .w m does not contain forbidden patterns.Each pattern of zeros and ones can be viewed as a sequence of alternating blocks of ones and zeros where the length of the i-th block is denoted by k i and k i > 0 for i = 2, 3, . . .and k 1 > 0 if the initial symbol of the pattern is one whereas k 1 = 0 if the initial symbol is zero.For example, for the pattern 11100001100000 we have k 1 = 3, k 2 = 4, k 3 = 2, k 4 = 5, and for the pattern 001111100011 we have Denote by J 1 and J 2 the following subsets of {1, 2, . . ., m}, which are associated with the pattern w = w 1 w 2 . . .w m : where b is the number of zero blocks of the pattern w.For example, if the pattern of interest is 001111100011 and d = 2, k = 5 we get J 2 = {2, 10}, and where * stands for either 0 or 1, or r − 0. Actually, G ( * ) 1 (z 1 , z 2 ) has the same meaning as that of g * ,w1 (z 1 , z 2 ) unless w 1 = 0 and * = 1 (recall our definition of g 1,0 given in (2) above).For the latter case we formally assume that (7) holds and the reason for that assumption will become clear in the Idea of Proof of Theorem 1 below.Closed explicit expressions for g i,j (z 1 , z 2 ) and g r−0,1 (z 1 , z 2 ) are found in Lemma 1 and prior to its statement.The formal proof is presented in the next section.

Idea of the Proof:
The proof is based on a suitable extension of the methodology introduced in Stefanov [28].The latter treated patterns formed on finite alphabets in strings generated by general discreteand continuous-time models.In particular, Theorem 4.1 (cf.[28] p. 890) provides recurrence relations leading to exact evaluation of the joint generating function of the waiting time until reaching a pattern together with the associated counts of occurrences of the corresponding symbols of the alphabet.In this paper we deal with a simpler model (binary alphabet and discrete-time parameter) but the joint generating function of interest is that of the waiting time until reaching a pattern together with the associated count of occurrences of an event which is not as simple as the events considered in [28].A careful scrutiny of [28] proofs reveals that the recurrence relations provided there are applicable to the joint generating function of the waiting time till reaching a pattern together with the associated count of occurrences of an 'event' if the following two conditions are satisfied concerning that 'event': (i) the joint generating functions for the following quantities are available: the waiting time to reach a letter from another (or the same) letter of the alphabet together with the associated count of occurrences of the 'event' of interest.
(ii) All occurrences of the 'event' of interest are captured within the passage times between the states, that is, occurrence or non occurrence of the 'event' does not depend on the history prior to a passage time or the future after that passage time.
Note that nominating the event of interest to be a run of zeros of length either less than d or greater than k we get that condition (ii) is not satisfied in general.For example, in a passage time from state zero to state one the occurrence or non occurrence of our event of interest (a constrained zero-run) depends on the number of zeros just preceding the starting state zero.As for a passage time from state one to state zero, note that, on one hand, the occurrence or non occurrence of the event of interest is not affected by the outcomes preceding the initial state one.On the other hand, within that passage time a run of zeros of length 1 occurs (the last observation within such passage time is zero which is preceded by one), that is, the event of interest occurs if d > 1 and given we stop observing the generated random sequence with such passage time.If we do not stop observing the generated sequence at such passage time then the occurrence or non occurrence of the event of interest depends on the future outcomes (that is, on how many zeros will follow after the first zero achieved from sate one).Note that we stop observing the generated random sequences at occurrences of the pattern of interest, which we assume does not contain constrained zero-runs.That is, we do not stop observing the generated random sequence at a passage time from state one to state zero if d > 1.Therefore, within a passage time from state one to state zero we should not account for a possible occurrence of the event of interest, because such occurrence will be accounted for within the following passage time from state zero to state one.This is the reason for defining g 1,0 (z 1 , z 2 ) to be equal to the generating function of ν 1,0 , as if assuming that within a passage time from state one to state zero a constrained zero-run does not occur.By the same reason we assumed that G (1) 1 (z 1 , z 2 ) = g 1,w1 (z 1 , z 2 ) if w 1 = 0 (cf. the comment prior to Theorem 1).In particular, we may assume that condition (i) above is satisfied for our problem because the relevant joint generating functions are provided in Lemma 1 and prior to it.
Further we show how the methodology in [28] can be extended to derive relevant recurrence relations for the case of the waiting time until reaching a pattern and the associated count of occurrences of constrained zero-runs.
Recall that we consider a pattern w = w 1 w 2 . . .w m whose consecutive blocks of ones and zeros are of lengths k 1 , k 2 , k 3 , . . ., respectively.
Assume first the pattern of interest consists of the first k 1 + 1 symbols of w.Note that in this case condition (ii) is satisfied because there are no zeros preceding an initial zero state at a passage time from state zero to state one.Therefore, the relations given by ( 8), ( 9), (10) for j, such that 1 ≤ j ≤ k 1 are a special case of the recurrence relations in Theorem 4.1 of [28].Since the model treated in [28] is more general and the uninitiated reader may find it not quite transparent how to write the relations for our special model here we provide the following hint.Delete the entries of t wj+1 and t n , and replace φ wj ,wj+1 (z 0 ) by z 1 in the corresponding recurrence relations in Theorem 4.1 of [28] to get the relations ( 8), ( 9), (10).
Assume now that the pattern of interest consists of the first k 1 + 2 (< k 1 + k 2 ) symbols.Then note that from the time epoch at which we have reached the subpattern w 1 w 2 . . .w k1+1 till reaching the pattern w 1 w 2 . . .w k1+2 one may miss counts of the event of interest (a run of zeros of length less than d) due to the following observation.Upon reaching w 1 w 2 . . .w k1+1 assume that in the next step the pattern w 1 w 2 . . .w k1+2 is not reached (that is, a mismatch occur at this stage).Thus, a run of zeros of length 1 (< d, of course given d > 1) has occurred and it will not be accounted for by the recurrence relations provided in [28].To account for such occurrences one should multiply by z 2 the relevant joint generating functions each time such mismatch occurs.It is achieved by replacing the denominator 1 − p wj ,1−wj+1 z 1 A j by 1−p wj ,1−wj+1 z 1 z 2 A j in the recurrence relations.Therefore, relations (12), ( 13), ( 14) hold for j = k 1 +1.
Similarly, if the pattern consists of the first k 1 + 3 (< k 1 + k 2 ) symbols then upon reaching the subpattern w 1 w 2 . . .w k+2 on the following step one may miss a count of the event of interest.More specifically, this is a run of zeros of length 2 (< d, of course given d > 2).To account for such occurrences of the event of interest again the denominator 1−p wj ,1−wj+1 z 1 A j is to be replaced by 1−p wj ,1−wj+1 z 1 z 2 A j .Clearly, the same argument applies if the pattern consists of the first k 1 + j − 1 (< k 1 + k 2 ) symbols of w, where j ≤ d.Therefore, the recurrence relations ( 12), ( 13), ( 14) hold for j such that k 1 + 1 ≤ j ≤ k 1 + d − 1 (these j's belong to J 1 ).Further, note that for larger j, such that k 1 + d ≤ j ≤ k 1 + k 2 − 1 (note that such j's do not belong to J 1 ∪ J 2 ) the recurrence relations given by ( 8), ( 9), (10) hold, because in mismatch situations constrained zero-runs do not occur.
Assume now that the pattern consists of the first k 1 +k 2 +1 symbols of w.Then note that in a mismatch situation at the next step after reaching the sub-pattern consisting of the first k 1 + k 2 symbols, we are at state zero with exactly k 2 zeros preceding it.The method in [28] implies that G (1−wj+1) j (z 1 , z 2 ) (this is the generating function, in the expression for A j , which accounts for the evolution of the sequence after such a mismatch situation, and given no overlap occurred after that mismatch) from the expression for A j , given in (11), is to be substituted by G (k2−0) j (z 1 , z 2 ) in order to account for all occurrences of the event of interest.That is, for j = k 1 + k 2 , the relations (8), ( 9), (10) hold with G in the expressions for the A j ; that is, A j is replaced by B j .For larger j (j > k 1 + k 2 ) similar arguments to those above apply.

Proof of Theorem 1
First, recall that the g.f., G(z), of the geometric distribution on {0, 1, . ..}, with probability of 'success' p, is given by p/(1 − qz), where q = 1 − p.Also recall that for the g.f. of the random sum where z = (z 1 , z 2 , . . ., z n ) and the Y i are independent and identically distributed (i.i.d.) random vectors with g.f.G Yi (z) and ν is a non-negative r.v., independent of the Y i , with g.f.G ν (z).If the distribution of ν is geometric then the random sum Y is called a geometric sum.
The following quantity is called briefly the first return time to the pattern w 1 w 2 . . .w j : inf{n ≥ 1 : Recall that the pattern of interest is denoted by w = w 1 w 2 . . .w m .Note that j / ∈ J 1 ∪ J 2 if and only if either w j+1 = 1, or w j w j+1 = 10, or w j+1 = 0 and the number of zeros preceding w j+1 , in the block of zeros to which w j+1 belongs, is less than d (recall that d pertains to the term (d,k)-sequence).
We will prove the validity of ( 8), ( 9) and ( 10) first for j = 1, 2, . . ., k 1 .Recall that k 1 is the length of the first block of ones of the pattern of interest w.Of course these j's do not belong to the set (J 1 ∪ J 2 ).Now consider the subpattern w 1 w 2 consisting of the first two symbols of the pattern w, of course assuming that k 1 ≥ 1.Note that the joint generating function of the first return time to state w 1 and the associated count of the forbidden patterns within that return time, conditional on not entering state w 2 at the first step, is Actually, ( 17) is derived via conditioning on the first step.It is easy to see, using the strong Markov property and applied to the consecutive entry times to state w 1 , that the joint distribution of the waiting time to reach the pattern w 1 w 2 from state s and the associated count of forbidden patterns up to that waiting time, is the same as the joint distribution of where e 1 is the unit vector (1, 0), the Y i,1 are i.i.d.(two-dimensional) random vectors, also independent of K 1 , and ν 1 is a geometric random variable, independent of the Y i,1 and K 1 with a probability of 'success' p w1,w2 .Further, the (two-dimensional) random vector K 1 has the same joint distribution as that of the waiting time to reach w 1 from state s, and the associated count of forbidden patterns, that is, its joint g.f.G K1 (z 1 , z 2 ) is equal to g s,w1 (z 1 , z 2 ).The random vector Y i,1 has a joint generating function given by H 1 (z 1 , z 2 ).Thus, in view of ( 16) and ( 17) and recalling that G (s) .
Using the same arguments as those above we get that for r = 1, 2, . . .
That is, noting that 11)), we get that ( 8), ( 9) and ( 10) hold for j = 1.Now consider the subpattern w 1 w 2 w 3 assuming that k 1 ≥ 2 (that is j = 2).Similarly to the preceding case (when j = 1), conditioning on the first step, note that the joint generating function of the first return time to the subpattern w 1 w 2 and the associated count of the forbidden patterns within that return time, conditional on not entering state w 3 at the first step, is given by where the L i (j, a) have been introduced in (6).Again using the strong Markov property and applied to the consecutive entry times to the subpattern w 1 w 2 , we get that the joint distribution of the waiting time to reach the subpattern w 1 w 2 w 3 from state s and the associated count of forbidden patterns up to that waiting time, is the same as the joint distribution of where e 1 is the unit vector (1, 0), the Y i,2 are i.i.d.random vectors, also independent of K 2 , and ν 2 is a geometric random variable, independent of the Y i,2 and K 2 with a probability of 'success' p w2,w3 .Further, the random vector K 2 has the same joint distribution as that of the waiting time to reach w 1 w 2 from state s, and the associated count of forbidden patterns, that is, its joint g.f.G K2 (z 1 , z 2 ) is equal to G (s) 2 (z 1 , z 2 ).The random vector Y i,2 has a joint generating function given by H 2 (z 1 , z 2 ).Therefore, similarly to the preceding case, and using (18) we get that That is, (8) holds for j = 2.Likewise, ( 9) and ( 10) hold for j = 2.The same arguments, as those used in the cases for j = 1, 2 apply to any j such that 1 ≤ j ≤ k 1 .Therefore (8), ( 9) and ( 10) hold for j = 1, 2, . . .k 1 .
Now consider the case when j ∈ J 1 .First, we will consider the j's belonging to {j : , that is, we consider the subpattern w 1 w 2 . . .w k1+2 .Again, conditioning on the first step, note that the joint generating function of the first return time to the subpattern w 1 w 2 . . .w k1+1 and the associated count of the forbidden patterns within that return time, conditional on not entering state w k1+2 at the first step, is given by where the A j and L i (j, a) have been introduced in ( 11) and ( 6), respectively.Actually, H k1+1 (z 1 , z 2 ) differs from its counterparts H j (z 1 , z 2 ), j ≤ k 1 , (cf. ( 17) and ( 18)) by the presence of z 2 in front of A k1+1 .The presence of z 2 accounts for unaccounted otherwise occurrence of a forbidden pattern (a zero run of length less than d) at the first step when one fails to reach in one step the state w k1+2 from the already reached subpattern w 1 w 2 . . .w k1+1 .Further, similarly to the preceding cases and applying the strong Markov property to the consecutive entry times to the subpattern w 1 w 2 . . .w k1+1 , one gets that Thus, (12) holds for j = k 1 + 1.Likewise, one gets that ( 13) and ( 14) hold for j = k 1 + 1. Exactly the same arguments, as those used in the case for j = k 1 + 1 apply to any j such that k Therefore, ( 12), ( 13) and ( 14) hold for j = k 1 + 1, k 1 + 2, . . ., k 1 + d − 1.Consider now j = k 1 + d, k 1 + d + 1, . . ., k 1 + k 2 − 1.These j's do not belong to J 1 ∪ J 2 .Note that the relevant H j (z 1 , z 2 ) is given by that is, (20) has the same form as that of ( 17) and (18).This is due to the observation that at the first step when w j+1 is not reached from the already reached subpattern w 1 w 2 . . .w j a forbidden pattern does not occur (the reached zero run is of length at least d and of course less than k).Therefore, (8), ( 9) and (10) hold for j = k 1 + d, k 1 + d + 1, . . ., k 1 + k 2 − 1.
Consider now the case j = k 1 + k 2 .This j belongs to J 2 .Note that the relevant H j (z 1 , z 2 ) for the joint g.f. of the first return time to the subpattern w 1 w 2 . . .w k1+k2 and the associated count of the forbidden patterns within that return time, conditional on not entering state w k1+k2+1 at the first step, is given by where B j is given in (15).More specifically, note first that B j differs from A j only through the generating function associated with the indicator function L j (j, 1 − w j+1 ) (cf. ( 6)).This g.f. is G (1−wj ) j for A j and G (k2n−0) j (n = 1 for j = k 1 + k 2 ) for B j , and it accounts for what happens after the first step given L j (j, 1 − w j+1 ) = 1.Further, note that conditioning on not entering state w k1+k2+1 at the first step means that a run of zeros of length exactly k 1 + k 2 + 1 has been reached, that is after the first step the current state zero is preceded by exactly k 1 + k 2 zeros.After that first step is made one waits until Tab.1: Probabilities for the number of occurrences, Nn, of w = 100100100 in a random (1, 4)-sequence of length n = 500 with p0,0 = 0.4, p0,1 = 0.6, p1,0 = 1, p1,1 = 0.