Binary patterns in the Prouhet-Thue-Morse sequence

We show that, with the exception of the words a 2 ba 2 and b 2 ab 2 , all (ﬁnite or inﬁnite) binary patterns in the Prouhet-Thue-Morse sequence can actually be found in that sequence as segments (up to exchange of letters in the inﬁnite case). This result was previously attributed to unpublished work by D. Guaiana and may also be derived from publications of A. Shur only available in Russian. We also identify the (ﬁnitely many) ﬁnite binary patterns that appear non trivially, in the sense that they are obtained by applying an endomorphism that does not map the set of all segments of the sequence into itself.


Introduction
Let µ be the endomorphism of the free semigroup {a, b} + defined by µ(a) = ab and µ(b) = ba. Since a is a prefix of µ(a), µ n (a) is also a prefix of µ n+1 (a). Hence, the sequence (µ n (a)) n determines a sequence of letters, or infinite word, whose prefix of length 2 n is µ n (a); we say that the infinite word t thus obtained is generated by µ. It is called the Prouhet-Thue-Morse sequence and it has been the object of extensive studies and applications. It was first considered by Prouhet (1851) in connection with a problem in number theory, five decades later by Thue (1906Thue ( , 1912 to exhibit infinite words avoiding cubes and squares, and another two decades later by Morse (1921) as a discretized description of non-periodic recurrent geodesics in surfaces of negative curvature. See Allouche and Shallit (1999) for a survey on this topic, including several further connections with other branches of Mathematics. The first author and other collaborators have previously studied the sequence t in the framework of symbolic dynamics and its connections with free profinite semigroups (see Almeida and Costa (2013) and Almeida et al. (2020)). It was in fact an attempt to construct a profinite semigroup with certain properties that prompted this work, although no further references to profinite semigroups will be made in this paper. This paper concerns the study of binary patterns of t, that is, finite or infinite words w over the alphabet {a, b} for which there exists an endomorphism ϕ of the semigroup {a, b} + (naturally extended to infinite words) such that the word ϕ(w) can be found as a block of consecutive letters of t (which we call a segment of t). Since we need to identify concrete finite segments of t, a simple and efficient algorithm on how to compute them is presented in Section 2.
Characterizations of binary patterns of t are due to Shur (1996a) and D. Guaiana (unpublished work announced in Restivo and Salemi (2002b,a)). Our first main contribution is a proof of the characterization attributed to D. Guaiana (but also, independently, obtained by Shur (1997) in his thesis) using results from Shur (1996a): with the exception of a 2 ba 2 and b 2 ab 2 , the binary patterns of t are its finite segments. Section 3 presents our proof of this result.
The endomorphism µ and the endomorphism ξ exchanging the letters a and b are easily seen to transform finite segments of t into other such segments. In Section 4, we consider the problem of determining which finite segments may only be transformed into other segments by endomorphisms of {a, b} + that may be obtained by composition of µ and ξ. Such words are said to be typical since we show that all but finitely many finite segments of t are typical. We further determine all atypical words. As an application of our results, we also determine all infinite binary patterns of t.
We conclude the paper with Section 5, where we propose the investigation of the properties we established for t for arbitrary infinite words.

Segments of t
By a word we always mean a finite sequence of letters of an alphabet A, that is a member of the free monoid A * . A word u is a factor of a word v if there exist words x and y such that v = xuy. In spite of the terminology, an infinite word is not a word but rather an infinite sequence of letters.
Note that {ab, ba} is a code, in the sense that it generates a free subsemigroup of {a, b} + and, therefore, µ is injective.
For an infinite word w = a 1 a 2 · · · , by the segments of w we mean the words of the form a k a k+1 · · · a with k and the infinite words of the form a k a k+1 · · · . Note that, since µ n+1 (a) = µ n (a)µ n (b), all factors of the words µ n (b) are segments of t. It follows that a word u ∈ {a, b} + is a segment of t if and only if so is the word that is obtained from u by interchanging the letters a and b.
A word w ∈ A + is said to be avoided by t if there is no homomorphism ϕ : A + → {a, b} + such that ϕ(w) is a segment of t. We also say that w ∈ A + is unavoidable in t if it is not avoided by t; we then also say that w is a pattern of t. For instance, it is well known that a 3 and ababa are avoided by t, which is also expressed by saying that t is, respectively, cube-free and overlap-free (Lothaire, 1983).
The preceding notions are extended to infinite words by saying how endomorphisms of {a, b} + are applied to infinite words. Given an infinite word w = a 1 a 2 · · · over the alphabet {a, b} and an endomorphism ϕ of {a, b} + , we let ϕ(w) be infinite word obtained by concatenating the ϕ(a i ): For a nonempty word u, let t 1 (u) denote its last letter. The computation of the segments of t may be carried out easily in view of the following proposition. The first part is an improved version of Shur (1996b, Corollary 1), although the same conclusion is in fact already established in the proof of the cited statement. We present a proof for the sake of completeness.
Proposition 2.1 A word w is a segment of t if and only if it is a factor of µ n (a), where n = 1 if |w| = 1, n = 3 if |w| = 2, and n = 2 + log 2 (|w| − 1) otherwise. Moreover, for every integer k 3, the value n = 2 + log 2 (k − 1) is minimum for µ n (a) to admit as factors all segments of t of length k.
Proof: Since t is cube-free, the cases where |w| 3 are easily verified by inspection. Suppose that w is a segment of t which we may assume to be of length at least 4. Then, w is a factor of µ n (a) for some positive integer n. Take n to be minimum with that property. If m is the minimum positive integer such that w is a factor of µ m (x) for some letter x, then either only x = b can play that role and n = m + 1, or x = a may play it and n = m. We need to show, respectively, that m 1 + log 2 (|w| − 1) or m 2 + log 2 (|w| − 1) . Since |w| 4, we may assume that m 4 for, otherwise, the inequality m 1 + log 2 (|w| − 1) holds trivially.
Let µ(x) = xy. As µ m (x) = µ m−1 (xy) and m is minimum, there must be a nontrivial factorization w = w 1 w 2 with w 1 a suffix of µ m−1 (x) and w 2 a prefix of µ m−1 (y).
If one of the factors w 1 or w 2 has length greater than 2 m−2 , then we must have |w| > 2 m−2 + 1, which implies that m 1 + log 2 (|w| − 1) and fulfills our aim. Thus, we may assume that both w 1 and w 2 have length at most 2 m−2 . Now, we have µ m (x) = µ m−2 (xyyx) and w is a factor of µ m−2 (yy) = µ m−3 (yxyx). If w is a factor of either µ m−3 (xyx) or µ m−3 (yxy), then it is also a factor of µ m−2 (xx) = µ m−3 (xyxy) and, therefore, also of µ m (y), so that we are in the case n = m. On the other hand, by the minimality of m the word w cannot be a factor of µ m−3 (xy) = µ m−2 (x) or µ m−3 (yx) = µ m−2 (y), and so we have |w| 2 m−3 + 2, which implies that n = m 2 + log 2 (|w| − 1) , as claimed. It remains to consider the case where w is a factor of neither µ m−3 (xyx) nor µ m−3 (yxy). Then there must be a factorization w = sµ m−3 (xy)z with s and z nontrivial words, so that |w| 2 m−2 + 2, which yields m 1 + log 2 (|w| − 1) . This completes the proof of the first part of the proposition.
To prove the last part of the proposition, first note that, for k 3, the value of f (k) = 2 + log 2 (k − 1) is at least 3. We claim that, for n 3, there is a word of length 2 n−3 + 2 that is a segment of t but not a factor of µ n−1 (a). Noting that f (2 n−3 + 2) = n, the result follows.
Since there are no overlaps in µ n−3 (bb), the only place where µ n−3 (a) is found as a factor of µ n−3 (bb) is as the product of the two middle factors in the factorization given in (2). Hence, the word w = t 1 (µ n−3 (b))µ n−3 (a)b is not a factor of µ n−3 (bb) since, for instance, b is not the first letter of µ n−4 (a). 2 For example, the segments of lengths 4 and 5 of the infinite word t are the factors of those lengths of µ 4 (a) = abbabaabbaababba. But, for instance, aabb is a segment of t but not a factor of µ 3 (a) = abbabaab; it is precisely the segment considered in the last part of the proof of Proposition 2.1.
Throughout the remainder of the paper, when we need to check whether a concrete finite word is a segment of t, without any further reference we simply apply the algorithm given by Proposition 2.1, which is linear in the length of the given word. We proceed similarly when we need to compute all the segments of t of a given length. Now, we take into account also the dual version of Proposition 2.1, where µ n (b) is considered instead of µ n (a), which is a direct consequence of Proposition 2.1 using the fact that the set of all segments of t of fixed length k is closed under taking images under ξ. Since every segment of t of length 2 n+1 − 1 must contain the factor µ n (a) or µ n (b), it follows from Proposition 2.1 that, for k 3, every segment of length k of t is a factor of every segment of t of some length which is at most The existence of such an is the property known as uniform recurrence of t and holds for every sequence generated by iterating a primitive endomorphism of a free semigroup (Queffélec, 2010, Proposition 5.2). In the case of the Prouhet-Thue-Morse sequence, the optimum value of is presented in Allouche and Shallit (2003, Example 10.9.3): for k 3, we have = 9 · 2 r + k − 1, where r is the integer determined by the inequalities 2 r + 2 k 2 r+1 + 1. Note that using the first inequality determining r, one gets the upper bound 10k − 19, which is better than our rough upper bound 16k − 17.

Finite binary patterns
The following result plays a key role below.
Theorem 3.1 (Shur (1996a)) The set of words of {a, b} + that are avoided by t is the fully invariant ideal generated by the set Moreover, the above is a minimal generating set for the fully invariant ideal of the words avoided by t.
The generators corresponding to k = 1 and k = 2 are, respectively, bab 2 a 2 ba and a ab 2 a ba 2 b ab 2 a a; the generator corresponding to m = 2 is a ba 2 b ab 2 a ba 2 b a while, for m = 1, the word given by t 1 (µ m (a))µ m (bab)a = b 2 a 2 b 2 a 2 is avoided by t but may be obtained for instance from the generator a 2 ba 2 b by mapping a to b and b to a 2 .
Another useful ingredient in our arguments is the following "synchronization" result.
Lemma 3.2 (de Luca and Varricchio (1989, Lemma 3.9)) Let X = {ab, ba} and consider s ∈ X + with |s| 4. If u and v are words such that usv ∈ X + and |u| is odd then usv has an overlap.
Since t has no overlaps, we conclude that t = usv, with s as in Lemma 3.2 and u a finite word, then u has even length.
, b}, and n 0, then u = µ n (w) and v = µ n (z) for some word w and infinite word z.

Proof:
We proceed by induction on n, the case n = 0 being trivial as µ 0 is interpreted to be the identity function. Suppose that n 1 and, by symmetry, assume that x = a. Since µ n+1 (a) = µ n (a)µ n (b) = µ n−1 (abba), by the induction hypothesis we know that u = µ n−1 (w) and v = µ n−1 (z), for some finite word w and infinite word z, and so t = wabbaz. Lemma 3.2 then implies that w and z belong to the image of µ. Hence, u and v belong to the image of µ n . 2 We say that a segment u of t is special if both ua and ub are segments of t. The special segments of t have been investigated by de Luca and Varricchio (1989) with the purpose of counting the number of segments of each given length. For our purposes, it suffices to observe the following much simpler result.
We say that two words are suffix comparable if at least one of them is a suffix of the other. The following lemma is the core of our arguments.
Lemma 3.5 Suppose that u is a finite word such that ua is unavoidable in t and ub is a segment of t but ua is not. If n 2 and u is suffix comparable with µ n (a), then u is also suffix comparable with µ n+1 (b).
, we may assume that µ n (a) is a suffix of u, say u = u 1 µ n (a). Consider a concrete occurrence of ub in t: t = u 0 ubv, where u 0 ∈ {a, b} * . Since t is recurrent, we may assume that |u 0 | 2 n . By Corollary 3.3, we know that u 0 u 1 = µ n−1 (u ) and bv = µ n−1 (v ) for some word u and infinite word v (cf. Figure 1 Suppose first that u ends with the letter b and |u| > 3 · 2 n−1 . If u ends in ab then the word t 1 (µ n−1 (a))µ n−1 (bab)a is a suffix of ua which, in view of Theorem 3.1, contradicts the assumption that ua is unavoidable in t. On the other hand, if u ends with b 2 then, taking into account that bv starts with µ n−1 (b), we conclude that t = µ n−1 (u b 2 ab 2 v ) for some finite word u and infinite word v . Since t is a fixed point of the injective endomorphism µ, it follows that b 2 ab 2 is a segment of t, which we know is not the case.
If |u| 3 · 2 n−1 , then u is suffix of µ n−1 (xab) for a letter x. By Lemma 3.4, as it is easy to check that xab is special, so is µ n−1 (xab). Hence, u is special, contradicting the assumption that ua is not a segment of t.
Thus, u must end with ba, so that u 0 u ends with µ n+1 (b). Since both u and µ n+1 (b) are suffixes of u 0 u, they must be suffix comparable, thereby concluding the proof of the lemma. 2 Similarly, one can prove the following lemma.
Lemma 3.6 Suppose that u is a finite word such that ua is unavoidable in t and ub is a segment of t but ua is not. If n 2 and u is suffix comparable with µ n (b) then u is also suffix comparable with µ n+1 (a).
Shur also observed in Shur (1996a) that the word a 2 ba 2 (and, therefore, also b 2 ab 2 ) is unavoidable in t but it is not a segment of t. Our first main result is that there are no other such examples, thus providing an alternative characterization of two-letter words unavoidable in t.
According to Restivo and Salemi (2002a, Theorem 2) and Restivo and Salemi (2002b, Theorem 3), the following theorem, which is considered to be surprising, was first proved by D. Guaiana in 1996 but, through private communication with A. Restivo, we learned that the proof was never published and the manuscript appears to be lost. On the other hand, we later learned from A. M. Shur that the next theorem also appears in his Ph.D. thesis (1997), which has never been published other than as a document in the Russian State Library. Moreover, Shur observed that the result can also easily be drawn from Theorem 3.1 using a characterization of the finite words on the alphabet {a, b} that are not segments of t, which is given in Shur (2005, Statement 1), a paper also in Russian. Since all the proofs seem to be either lost or somewhat inaccessible in the Russian literature, again we present our own proof for the sake of completeness.
Theorem 3.7 A word w ∈ {a, b} + is unavoidable in t if and only if it is one of the words a 2 ba 2 and b 2 ab 2 , or it is a segment of t.
Proof: We proceed by induction on the length of w. In view of Theorem 3.1, it is easy to check that the theorem holds for words of length at most 5. Assuming inductively that the result holds for words of length n, let w be a word of length n + 1 6 that is unavoidable in t. Since interchanging the letters a and b does not affect either of the properties of being unavoidable in t and being a segment of t, we may as well assume that a is the last letter of w.
Let w = ua. Since w is unavoidable in t, so is u. Hence, by induction hypothesis, u may be found somewhere as a segment of t. Take such an occurrence of u in t and let x be the letter immediately after it. We wish to show that there is such an occurrence of u in t with x = a. Aiming at a contradiction, we may assume that there is no such occurrence, that is, we always have x = b. Since t is recurrent, the segment ux may be found in t as far as we wish, so that we may continue prolonging it on the left as much as may be convenient. Thus, we are assuming that ua is unavoidable in t and that ub is a segment of t as long as desired but ua is not a segment of t. However, we have to be careful because there is in principle no assurance that such an extension of u to the left retains the property that ua is unavoidable in t.
Since a 3 is avoided by t and we are assuming that x = b, u cannot end with b 2 . We distinguish several cases according to the termination of the word u.
If u ends with b then, by the above, it ends with ab. Suppose, more precisely, that u ends with bab. Since w ends with baba and ababa is avoided by t, in fact u must end with b 2 ab. This situation is impossible since we know that the suffix b 2 ab 2 of ub is not a segment of t.
Alternatively, assuming that u ends with b, it must end with ba 2 b = µ 2 (b). We may then apply successively Lemmas 3.5 and 3.6 to deduce that there is some n 2 such that u is a suffix of µ n (a). By Lemma 3.4, it follows that u is special, which contradicts the assumption that ua is not a segment of t.
The next case we consider is that where u ends with aba. Note that u cannot end with a 2 ba for, otherwise, w = ua ends with ba 2 ba 2 and, therefore, it cannot be unavoidable in t. Also, u cannot end with baba since babab is not a segment of t. Hence, aba is not a suffix of u.
Thus, assuming that u ends with ba, it must end with ab 2 a = µ 2 (a). We are then again led to a contradiction as above using Lemmas 3.5, 3.6 and 3.4. 2

Typical finite binary patterns
Recall that the endomorphism of {a, b} + switching the letters a and b is denoted ξ. Since the finite segments of t are the finite words that are factors of µ n (a) for all sufficiently large n and µ n (a) is a factor of µ n+1 (b), we may replace a by b in that characterization of the segments of t. Moreover, as ξ commutes with µ, we conclude that the set of finite segments of t is closed under applying the substitutions ξ and µ. By induction on n, the words µ 2n (a) are palindromic, in the sense that they coincide with the words read in the reverse order; this entails the well known fact that the set of segments of t is closed under reversal. We say that a word w ∈ {a, b} + is atypical if it is a segment of t and there is an endomorphism ϕ of {a, b} + such that ϕ(w) is also a segment of t and ϕ is not of one of the forms µ n or ξ • µ n with n 0. Segments of t that are not atypical are said to be typical.
We say that a word is a variant of another word w if it may be obtained from w by applying reversal or ξ or both. Note that the set of atypical words is closed under taking factors and, by the above discussion, it is also closed under taking variants.
Proposition 4.1 If u 2 is a segment of t then u is one of the words µ n (a), µ n (b), µ n (aba) or µ n (bab) for some n 0.
Yet another property of the Prouhet-Thue-Morse infinite word is the following result which explains the above terminology.
Theorem 4.2 Let w ∈ {a, b} + be a segment of t containing at least one of the segments aba and bab along with all other segments of t of length 3. Then w is typical.
Proof: Suppose ϕ is an endomorphism of {a, b} + such that ϕ(w) is a segment of t. By Proposition 4.1, since ϕ(a 2 ) and ϕ(b 2 ) are square segments of t, each of the words ϕ(a) and ϕ(b) must be obtained by applying a power of µ to one of the words a, b, aba, bab. Let then ϕ(a) = µ k (u) and ϕ(b) = µ (v), where u, v ∈ {a, b, aba, bab}. We may assume that k since, otherwise, we would consider the pair (ξ(w), ϕ•ξ) instead of (w, ϕ). Then, we have the factorization ϕ = µ k •ψ, where ψ is the endomorphism of {a, b} + defined by ψ(a) = u and ψ(b) = µ −k (v). Since µ is injective and t is a fixed point of µ, from the fact that ϕ(w) is a segment of t, we conclude that so is ψ(w). On the other hand, since ξ and µ commute, if ψ is a product of µ and ξ then so is ϕ. Hence, we may assume that k = 0.
The mapping ξ•ϕ has also the property that (ξ•ϕ)(w) is a segment of t. Since (ξ•ϕ)(a) = ξ(µ k (u)) = µ k (ξ(u)), we may further assume that u is one of the words a or aba. Since aab is a factor of w and neither a 3 nor a 2 ba 2 is a factor of t, then v cannot start with a and, therefore, it must be either b or bab.
Consider first the case where u = a. Since baa is a factor of w but a 3 is not a factor of t, ϕ(b) cannot end in a, which implies that is even. If 2, then ϕ(b) starts with µ 2 (b) = ba 2 b. But, since a 2 b is a factor of w, this implies a 2 ba 2 is a factor of ϕ(w) and, therefore, a segment of t, which we know is not the case. Hence, we must have = 0. It remains to rule out the case ϕ(b) = bab, which results from noting that in that case, from the assumption that either aba or bab is a factor of w, it follows that either ababa or bababab is a factor of ϕ(w) while we know that ababa is not a segment of t.
Next, consider the case where u = aba. Since a 2 b is a factor of w and ababa is not a segment of t, ϕ(b) cannot start with ba. As µ (b) is a prefix of ϕ(b), it follows that = 0 and ϕ(b) = b. This leads to a similar situation as that considered at the end of the preceding paragraph, with the letters a and b interchanged which is, therefore, excluded. This completes the proof of the theorem. 2 The assumption of Theorem 4.2 that a segment of t contains as factors at least one of the words aba and bab along with all other segments of length 3 of t holds for all segments of t of length 10, as may be easily checked by examining all segments of that length. Hence, by Theorem 4.2 there are only finitely many atypical words. Since there are 5 different segments of length 3 of t that are supposed to appear in the word, no word with length shorter than 7 satisfies the criterion of Theorem 4.2 and ab 2 a 2 ba is a word of length 7 that does satisfy it. On the other hand, the segment a 2 bab 2 aba, of length 9, fails to have the segment ba 2 as a factor.
The following result completes the above observations by giving the full identification of atypical words.
Theorem 4.3 Up to taking variants, the atypical words are the factors of the words aabab, abaaba, and aabbaab.
Proof: To check that all relevant words have been duly considered, the reader may wish to refer to the diagram in Figure 2 later in the paper, where all atypical words are represented.
Note that all these words are variants of factors of at least one of the three words in the statement of the theorem. Hence, by showing that those three words are atypical, we obtain that so are all words of length up to 5. We next indicate for each of the words in the statement of the theorem an endomorphism ϕ of {a, b} + not of the forms µ n and ξ • µ n that maps it to a segment of t: • aabab: ϕ(a) = a, ϕ(b) = b 2 aba 2 b; • abaaba: ϕ(a) = a, ϕ(b) = b 2 ; • aabbaab: ϕ(a) = a, ϕ(b) = bab.
The verification of all these statements amounts to routine calculations.
Showing that there are no other atypical words requires more work. Note that a word is typical if it has a typical factor. Hence, also excluding variants and words that satisfy the criterion of Theorem 4.2, we obtain the following reduced list of words remaining to be treated: aababb, aabbab, abbaabba, ababba. ( We proceed to show that each word w in the list (3) is typical. For that purpose, assume that ϕ is an endomorphism of {a, b} + such that ϕ(w) is a segment of t.
In the first three cases, since ϕ(a 2 ) and ϕ(b 2 ) are factors of ϕ(w), we may start the argument using Proposition 4.1 as in the proof of Theorem 4.2, assuming that ϕ(a) is either a or aba.
Consider first the case ϕ(a) = a. Since in the three cases, aab is a factor of w but t is cube-free, ϕ(b) must start with b. Therefore, we may assume that ϕ(b) = bv with v ∈ {a, b} + . In all three cases, since abb is a factor of w, we get that abvbv is a factor of ϕ(w) and this would provide an overlap in t if v ends with a. Hence v ends with b. Since ϕ(b) is of the form µ n (x) for some x ∈ {a, aba, b, bab}, we conclude that either ϕ(b) starts with baab or it is bab. The first case is excluded since aabaa is then a factor of ϕ(aab), whence also of ϕ(w), while it is not a segment of t. The case ϕ(b) = bab is also excluded if w is either aababb or aabbab since it leads to the overlap babab in the factor ϕ(bab) of ϕ(w). In case w = abbaabba, one can simply check directly that ϕ(w) is not a segment of t.
Still treating for the moment only the first three of the words in the list (3), suppose next that ϕ(a) = aba. Again, as aab is a factor of w and aabaa cannot be a factor of ϕ(w), ϕ(b) must start with b. If it ends with a, then aϕ(bb) would be an overlap in ϕ(w) since abb is a factor of w. Hence, ϕ(b) starts and ends with b. This is impossible in case w has the factor bab since it would lead to the overlap babab in ϕ(w). This excludes the cases where w is the first or the second word in the list (3). So, we have w = abbaabba. Then ϕ(w) is a square segment of t. By Proposition 4.1, ϕ(abba) is one of the words µ n (x) with x ∈ {a, aba, b, bab}. Since n 1 gives a word that is too short to be ϕ(abba), we must have n 2, in which case a simple calculation shows that µ n (x) cannot start with aba. This ends the verification that the first three words in the list (3) are typical.
It remains to consider the word w = ababba. Here, we have two square factors of ϕ(w), namely the squares of ϕ(b) and ϕ(ab). By Proposition 4.1 we know that there are words x, y ∈ {a, aba, b, bab} and non negative integers m, n such that ϕ(b) = µ m (x) and ϕ(ab) = µ n (y). In case m > n, comparing the lengths of the word ϕ(ab) and its factor ϕ(b), we obtain the inequality 2 n |y| > 2 m |x|, so that 3 |y| > 2 m−n |x| 2 m−n . It follows that |x| = 1, |y| = 3 and m = n + 1. From the equalities µ n (y) = ϕ(ab) = ϕ(a)µ n+1 (x) we then deduce that ϕ(a) = µ n (y 1 ), where y 1 is the first letter of y, and µ(x) = xy 1 . Since abb is a factor of w, ϕ(abb) = µ n (y 1 xy 1 xy 1 ) is a segment of t, which contradicts t being overlap free. Thus, we must have n m. Then ϕ(a) must be of the form µ m (z) where z is a prefix of µ n−m (y). It follows that, as in the proof of Theorem 4.2, we may then assume that m = 0 and that ϕ(b) is either b or bab. Consider first the case where ϕ(b) = b. Since abba is a factor of w but b 3 is not a factor of ϕ(w), the word ϕ(a) must start and end with the letter a and we may assume that it is not reduced to a. Since ϕ(a) = z, we conclude that ϕ(a) must start with abba. Since bba is a factor of w, this yields the factor bbabb of ϕ(w), which is not possible since ϕ(w) is a segment of t. Finally, the case ϕ(b) = bab is excluded since ϕ(ab) = µ n (y) cannot end with bab. This concludes the proof of the theorem. 2 To facilitate the visualization of the set of atypical words, we give a semigroup theoretical formulation. Although we do not go deep into it, the reader unfamiliar with semigroup theory may prefer to skip these considerations or refer to a standard textbook in the area such as Clifford and Preston (1961);Howie (1995).
Let S be the set of atypical words. We may define a multiplication on the set S 0 = S ∪ {0} as follows: for u, v ∈ S, u · v is uv if uv is atypical and 0 otherwise; for all s ∈ S 0 , s · 0 = 0 · s = 0. Note that S 0 is the Rees quotient of {a, b} + by the ideal consisting of the typical words together with the words that are not segments of t.
The diagram in Figure 2 represents S 0 as a partially ordered set for the Green J -order, in which an element u lies above v if and only if u is a factor of v. The words in bold are the lexicographic minima among their variants; note that those that are atoms (which are underlined) are precisely the words that were shown directly to be atypical in Theorem 4.3.
We conclude this section with another application of Theorem 4.2, this one concerning infinite patterns of t.
Corollary 4.4 Let w be an infinite word and suppose that there is an endomorphism ϕ of {a, b} + such that ϕ(w) is a suffix of either t or ξ(t). Then w is itself a suffix of either t or ξ(t). Proof: Since all segments of w are unavoidable in t and they are all extendable on the right, by Theorem 3.7 they are segments of t. Since the language of the segments of t defines a minimal subshift (Queffélec, 2010, Proposition 5.2), it follows that w and t have the same segments. In particular, the word a 2 b 2 a 2 bab is a segment of w and it satisfies the assumption of Theorem 4.2. It follows that there is n 0 such that ϕ = µ n or ϕ = ξ • µ n . Again, since µ is injective and both t and ξ(t) are fixed by µ, the result follows. 2 The somewhat different formulation for finite and infinite segments (compare Theorem 3.7 with Corollary 4.4) is fully justified by the following result, which entails that the infinite words t and ξ(t) have no common suffix.
Proposition 4.5 If s is an infinite word over {a, b} and w is a common infinite suffix of s and ξ(s), then w is periodic.
Proof: By assumption, there are finite words x and y such that s = xw, ξ(s) = yw. Since w and ξ(w) start with different letters, the words x and y have different lengths. Replacing s by ξ(s), if needed, we may assume that x is shorter than y. As ξ(x) is a prefix of ξ(s) = yw, it follows that y = ξ(x)z for some word z. From ξ(s) = ξ(x)zw, we deduce that ξ(w) = zw and so w = ξ 2 (w) = ξ(z)zw, thereby showing that w is periodic. 2

Final remarks and problems
For an infinite word w over a finite alphabet A, let L(w) be the language consisting of its finite segments. Note that the automorphisms of the semigroup A + permute the letters of A; we call them letter exchanges.
The language obtained from L(w) by applying all possible letter exchanges is denotedL(w). Let E(w) denote the set of all endomorphisms ϕ of A + such that ϕ(L(w)) ⊆ L(w). The setĒ(w) is similarly defined usingL(w) instead of L(w). Note that both E(w) andĒ(w) are submonoids of the monoid End(A + ) of all endomorphisms of the semigroup A + .
The following is an immediate consequence of Theorem 4.2.
Corollary 5.1 The monoid E(t) =Ē(t) is generated by the set {ξ, µ}. In particular, it is finitely generated. 2 Corollary 5.1 is intimately related with a result of Thue (see Berstel (1995, Chapter 3, Theorem 2.16)) that characterizes the set of the so-called overlap-free morphisms, that is, endomorphisms of {a, b} + that map the set of all overlap-free words into itself, namely as the monoid generated by {ξ, µ}. In fact, in view of another result of Thue (see Berstel (1995, Chapter 3, Theorem 2.15)), all (overlap-free) words that can be arbitrarily prolonged in both directions to overlap-free words are segments of t. It follows that overlap-free morphisms belong to E(t) and so Corollary 5.1 immediately yields Thue's necessary condition for overlap-free morphisms. That the condition is also sufficient is given by another result of Thue (see Berstel (1995, Chapter 3, Lemma 2.2)). It does not appear to be immediately obvious how to deduce Corollary 5.1 from Thue's results.
Corollary 5.1 is also related with a result of Pansiot (1981) characterizing the endomorphisms of {a, b} that generate some infinite word obtained from t by dropping a finite prefix as precisely the powers of µ. Since t is recurrent, all such infinite words w have the same language L(w) = L(t). Hence, the endomorphisms ϕ considered by Pansiot belong to E(t), whence they are products of ξ and µ. Since ξ and µ commute, it follows from Corollary 5.1 that ϕ is either µ k or ξµ k for some k 0, the latter possibility being excluded because w is assumed to be a fixed point of ϕ. This gives Pansiot's result. Again, it is not clear how to deduce Corollary 5.1 from Pansiot's results.
Theorems 3.7 and 4.2, together with Corollary 5.1 may be regarded as three finiteness properties of the Prouhet-Thue-Morse sequence. It is natural to ask which infinite words possess such finiteness properties. More precisely, we propose the following problems.
Problem 1 Which infinite words w have the property that, up to finitely many exceptions, the patterns of w on the same alphabet are obtained from its segments up to an exchange of letters?
Problem 2 For which infinite words w is the monoid E(w) finitely generated? Similar question forĒ(w).
We say that a finite segment u of w is w-atypical if there is some endomorphism ϕ / ∈Ē(w) of A + such that ϕ(u) is also a segment of w.
Problem 3 Which infinite words w have only finitely many w-atypical segments?
A negative example for Problem 1 is provided by the Fibonacci infinite word, which is the only fixed point f of the endomorphism φ of {a, b} + defined by φ(a) = ab and φ(b) = a. That there are infinitely many finite binary patterns of f that are not segments of f was proved in Restivo and Salemi (2002a) (see also Restivo and Salemi (2002b)), where it is also shown that there are Sturmian infinite words that admit as patterns all segments of all Sturmian infinite words. Recall that an infinite word is Sturmian if it has exactly n + 1 segments of each length n 1. We do not know whether E(f ) is generated by ϕ andĒ(f ) is generated by ϕ and ξ. We also do not know whether the set of f -atypical words is finite.
Problem 1 was raised in Restivo and Salemi (2002a) for binary infinite words that are either fixed points of endomorphisms or of linear complexity. In the same paper, it is observed that if w is an infinite word with all elements of A + as segments (which may be obtained for instance by concatenating all the words in a sequence enumerating the elements of A + ), then obviously w is a positive example for Problem 1. Note that E(w) =Ē(w) = End(A + ) and it is easy to see that End(A + ) is not finitely generated: for the endomorphisms that maps each letter to itself, except for one letter a that is mapped to a p , where p is prime, the only elements of End(w) that are factors of it are the letter exchanges and the factors of which it is also a factor. From the preceding observation it also follows that there are no w-atypical words. Thus, w is a negative example for Problem 2 and a positive example for Problem 3.