Latin Square Thue-Morse Sequences are Overlap-Free

We define a morphism based upon a Latin square that generalizes the Thue-Morse morphism. We prove that fixed points of this morphism are overlap-free sequences generalizing results of Allouche - Shallit and Frid.


Introduction
In his 1912 paper, Axel Thue introduced the first binary sequence that does not contain an overlap [7]. It is now called the Thue-Morse sequence: 01101001100101101001011001101001 . . ..
An overlap is a string of letters in the form cxcxc where c is a single letter and x is finite string that is potentially empty. Overlaps begin with a square, namely ww where w = cx as given above. It is easy to observe, as Thue did, that any binary string of four or more letters must contain a square.
There are several ways to define the Thue-Morse sequence [2]. We will derive it as a fixed point of a morphism. Let Σ be an alphabet and let Σ * ∪ Σ ω be the set of all finite or infinite strings over Σ. A morphism is a mapping h : Σ * ∪ Σ ω → Σ * ∪ Σ ω that obeys the identity h(xy) = h(x)h(y), for x a finite string and y ∈ Σ * ∪ Σ ω [1, p. 8].
Notice that µ ω (µ(0)) = µ ω (0) and µ(µ ω (0)) = µ ω (0). This second observation says that the Thue-Morse sequence is a fixed point of µ [1, p. 10]. We can identify the binary alphabet of the Thue-Morse sequence with Z/2Z the integers modulo 2. It is natural to then generalize it to Z/nZ, by considering the alphabet Σ = {0, 1, . . . , n − 1}, and for i ∈ Σ, defining the morphism where i is the residue modulo n. Notice that for Σ = {0, 1}, φ 2 (i) = µ(i). In 2000, Allouche and Shallit proved that φ ω n is overlap-free [3]. In this paper, we generalize φ n , which is based on the Cayley table of Z/nZ, to Latin squares of arbitrary finite size n. We define our morphism based the Latin square, and prove that the fixed point of the Latin square morphism is an overlap-free sequence. Note that the Cayley table for Z/nZ is a Latin square, but not every Latin square is a Cayley table.

Latin Square Morphisms produce Tilings
Allouche and Shallit's morphism can be seen as a mapping of i to the i th row (that begins with i) of the Cayley table for Z/nZ. For example when n = 3, we have This suggests a natural generalization to any Latin square.
Begin with a generic alphabet of n letters, which we may assume to be {1, 2, . . . , n}. Recall that a Latin square L is an n × n table with n different letters such that each letter occurs only once in each column and only once in each row. We will concern ourself with the Latin squares in which the first column retains the natural order of our alphabet (1, 2, . . . , n). For n = 3, there are two such Latin squares. The one that does not come from Z/3Z directly is  Let L t denote the t th row of our Latin square L. For each t ∈ Σ we define the Latin square morphism by ℓ(t) = L t . For example we can use the above Latin square for n = 3 to define the following morphism, Given any t ∈ Σ, ℓ(t), ℓ 2 (t), ℓ 3 (t), . . . converges to a sequence ℓ ω (t), which is a fixed point of the morphism ℓ. So, (2) ℓ(ℓ ω (t)) = ℓ ω (t) In fact every fixed point of ℓ is of the form ℓ ω (t) for some t ∈ Σ [1, p. 10]. Express the sequence as ℓ ω (t 1 ) = t 1 t 2 t 3 . . ., so Thus, we have a tiling of our sequence (and of the natural numbers) by the rows of our Latin square L. Again, in terms of our example where n = 3 we have three tiles 132, 213, and 321 and so ℓ ω (1) = 132321213321213132 . . . = |132|321|213|321|213|132| . . .. Now, consider the subsequence created by taking the first letter of each tile. Notice that this sequence is in fact our original sequence. Thus our sequence contains itself as a subsequence. These two observations, our sequence as a tiling and our sequence equaling a subsequence of itself, will be critical for the proof of our main result.

Overlap-Free Latin Square Sequences
In this section we prove our main result.
. . , n}, and let L be an n × n Latin square using the letters from Σ, with the first column in its natural order. For an arbitrary t ∈ Σ, let L t denote the row of L corresponding to t in the first column. If we define the Latin square morphism as then we have that for any t ∈ Σ, ℓ ω (t) is an overlap-free sequence.
Remark. The Latin square for n = 3 above can be seen to be the Cayley . . so the j th letter in the sequence is t j . Similarly, the m th tile in the sequence is T m . We will be also using the notion of length of a string of letters, meaning the number of letters in a string. For an arbitrary string w the length of w will be denoted |w|. Use r to denote the location of t j on its tile T m , so j = (m − 1)n + r with |T m | = n and r ∈ {1, 2, . . . , n}.
Assume for a contradiction that ℓ ω (t 1 ) contains an overlap; moreover that cxcxc is the shortest overlap in ℓ ω (t 1 ). Write ℓ ω (t 1 ) = AcxcxcB, where c is a single letter, x is a finite string with |cx| ≥ n, A is a finite string, and B is the infinite tail of our sequence. We have that |cx| ≥ n (bound by the length of the tiles) because each tile is a permutation of 1, 2, . . . , n, and we cannot have two of the three copies of c contained in one tile. Our subscripts place this overlap in our sequence. For i ∈ {1, 2, 3}, let j i denote the subscript of the i th c. Thus, Our argument proceeds as follows: there are two cases |cx| ≡ 0 (mod n) and |cx| ≡ 0 (mod n). In the first case we use the fact that we have a tiling of ℓ ω (t 1 ) by the rows of a Latin square, to show that the overlap cxcxc is not possible. In the second case, when |cx| ≡ 0 (mod n), we argue based upon the fact that ℓ ω (t 1 ) contains itself as a subsequence that the existence of the overlap cxcxc leads to the existence of a shorter overlap, and thus a contradiction.
3.1.1. Six Cases. Since r 2 − r 1 ≡ |cx| ≡ 0 (mod n) there are two main cases that we will first consider: r 1 < r 2 and r 2 < r 1 . However, for the explicit details of our conclusions we will consider all six of the following possibilities depending on the value of r 3 , The equalities on the left arise out of equation (4) and the fact that the integer 2r 2 − r 1 satisfies, −n ≤ 2r 2 − r 1 ≤ 2n. This means that r 3 is the element in the set {2r 2 − r 1 + n, 2r 2 − r 1 , 2r 2 − r 1 − n} that lies in the interval 0 < r 3 ≤ n. Notice that r 3 = 2r 2 − r 1 in both cases when r 1 < r 2 and r 2 < r 1 .
3.1.2. G and the beginning of each cx. When r 1 < r 2 , we pick G ⊂ Σ to be the last r 2 − r 1 letters in T m1 such that G has no specific order and G = ∅. Of course, the remainder of the letters in T m1 are in G, the complement of G. Notice that this puts c = t j1 ∈ G. By equating the letters in T m1 with the corresponding letters in t j2 xt j3 , we find that the last n − r 2 + 1 letters of T m2 (starting with c = t j2 ) are in G. Also, we find that the first r 2 − r 1 letters of T m2+1 are G. When r 2 < r 1 , we pick G ⊂ Σ to be the last r 1 − r 2 letters in T m2 such that G has no specific order and G = ∅. Obviously, the remainder of letters in T m2 must be those that make up G again placing c = t j2 ∈ G.
By equating the letters in T m2 with the corresponding letters in t j1 xt j2 we find that the last n − r 1 + 1 letters of T m1 (starting with c = t j1 ) are in G. Also, we find that the first r 1 − r 2 letters of T m1+1 are G.
We have discussed the appearance of G and its complement G in the beginning of each cx. So, we set forth to describe G and G at the end of each cx.

3.1.3.
Following G through the overlap. It is a basic observation that because each tile is a permutation of the letters in Σ, each tile can be partitioned into G and its complement G. It is fundamental to our argument that because of the equality t j1 xt j2 = cxc = t j2 xt j3 , the letters in G form a contiguous collection of elements in each tile involved in our overlap excluding T mi (each of which will need further description), either the beginning or the ending of each tile. The idea involved in following G through the overlap is quite simple, we illustrate it in one particular case r 1 < r 2 < r 3 .
We have explicitly described the location of G at the beginning of each cx. We will now use our example r 1 < r 2 < r 3 to show to the reader how the tiling of our sequence can be used to find the location of G at the end of each cx. In doing so, we will refer to Figure 1.
In Figure 1, we have displaced the overlap from our sequence (represented by the continuous solid horizontal line). We have also split our overlap in half leaving T m2 intact for equality purposes. We have placed t j1 xt j2 over t j2 xt j3 with t j1 directly over t j2 and t j2 directly over t j3 so that we can see equality of terms simply by looking straight up or straight down (displayed by vertical arrows). The set of letters G is represented by a horizontal solid line above and below our sequence line, and the set of letters G is represented by horizontal dotted lines above and below the sequence line. Also, notice that we have drawn in the edges of the tiles with smaller vertical black lines. Figure 1: The situation when r 1 < r 2 < r 3 . Now notice that by using the tiles we can equate letters in t j1 xt j2 with t j2 xt j3 all the way through the overlap. Since we know that G occurs in the first r 2 − r 1 letters of T m2+1 , then G is the last n − (r 2 − r 1 ) letters of T m2 + 1. This causes G to be the first n − (r 2 − r 1 ) letters of T m1+1 , and thus G appears in the last r 2 − r 1 letters of T m1+1 . Thus we can conclude that G occurs in the last r 2 − r 1 letters of all the tiles in t j1 xt j2 except for T m2 . We can also conclude that G occurs in the first r 2 − r 1 letters of all the tiles in t j2 xt j3 up through T m3−1 . We can approach every case by the same process.
3.1.4. G and how each cx ends. We now will explain the conclusions for the six possible cases that we defined earlier, leaving the actual drawing to the reader.
Case r 1 < r 2 < r 3 (as seen in Figure 1). After we follow G through the overlap, we find that G occurs in the first r 2 − r 1 letters of T m3 . Recall r 3 = 2r 2 − r 1 . So, we have that the next r 3 − (r 2 − r 1 ) = r 2 letters of T m3 are not in G. Notice that the size of G, r 2 − r 1 , added to r 2 make up all of r 3 . This places the boundary between T m2−1 and T m2 exactly in line with the end of G in T m3 and the beginning of G. We then equate the first letters in T m3 with those in T m2 to find that G occurs nowhere in T m2 . So now, we have described T m2 fully. Earlier we defined G such that G occurred from t j2 to the end of the tile, and we have just shown that the first r 2 letters of T m2 (which includes t j2 ) must be in G. So G does not appear in anywhere in T m2 , and since G = ∅, we must have a contradiction.
Cases r 1 ≤ r 3 < r 2 and r 3 < r 1 < r 2 . After we follow G through the overlap, we find that G occurs in the first r 2 − r 1 letters of T m3−1 . So, G occurs in the final n − (r 2 − r 1 ) letters of T m3−1 causing the first n − (r 2 − r 1 ) letters of T m2 to be G. Notice that r 2 = [n − (r 2 − r 1 )] + r 3 . So the boundary between G and G in T m2 coincides with the boundary between T m3−1 and T m3 . This means that t j2 ∈ G, but we assumed that c / ∈ G earlier which is a contradiction. Case r 3 < r 2 < r 1 . After we follow G through the overlap, we find that G occurs in the last r 1 − r 2 letters of T m3−1 . This causes G to occur in the first r 1 − r 2 letters of T m2 by equality of t j1 xt j2 and t j2 xt j3 . To describe the remaining letters of T m2 up to and including t j2 consider r 2 − (r 1 − r 2 ) = r 3 . So G occurs in the next r 3 letters after G. Thus we have that G is repeated twice in T m2 so we have our contradiction.
Cases r 2 < r 1 ≤ r 3 and r 2 < r 3 < r 1 . After we follow G through the overlap we find that G occurs in the first r 1 − r 2 letters of T m2−1 . This causes G to occur in the final n − (r 1 − r 2 ) letters of T m2−1 and thus the first n − (r 1 − r 2 ) letters of T m3 . Since r 2 = r 3 − [n − (r 1 − r 2 )], we see that the left boundary of T m2 coincides with the right boundary of these first n − (r 1 − r 2 ) letters of T m3 . In particular, this means that the last r 1 − r 2 letters of T m3 , which include c, are in G. But, this contradicts the fact that c / ∈ G.
3.2. Case 2: |cx| ≡ 0 (mod n). We begin by considering some π ∈ S n the symmetric group on n letters. Note that we may apply π to any string by requiring π to act on each individual letter, so π(t 1 t 2 . . . t s ) = π(t 1 )π(t 2 ) . . . π(t s ). Thus π can be treated as a morphism. Moreover, π : Σ * → Σ * is an invertible map because π ∈ S n . Thus w ∈ Σ * contains an overlap if and only if π(w) ∈ Σ * contains an overlap. Define the function d (a,n) : N → N by d (a,n) (m) = (m − 1)n + a. Now if we let M = (t s ) be a sequence, then define the sequence given by the function D (a,n) (M ) to be the subsequence (t d (a,n) (s) ) of M . So for i ∈ {1, 2, . . . , n} arbitrary we have that D (i,n) (ℓ ω (t 1 )) = t i t i+n t i+2n . . . .