On the number of factors in codings of three interval exchange

We consider exchange of three intervals with permutation $(3,2,1)$. The aim of this paper is to count the cardinality of the set $3\iet(N)$ of all words of length $N$ which appear as factors in infinite words coding such transformations. We use the strong relation of 3iet words and words coding exchange of two intervals, i.e., Sturmian words. The known asymptotic formula $# 2\iet(N)/N^3\sim\frac1{\pi^2}$ for the number of Sturmian factors allows us to find bounds $\frac1{3\pi^2} + o(1) \leq # 3\iet(N)/N^4 \leq \frac2{\pi^2} + o(1)$.


Introduction
The study of infinite words arising from an exchange of several intervals was initiated by Rauzy, whereas dynamical systems connected with exchange of intervals has been already studied by Katok and Stepin [15]. The most explored subfamily of such words are Sturmian words corresponding to an exchange of two intervals [22]. Infinite words coding exchange of 3 intervals form the next interesting family.
The special case of Sturmian words is very well understood. For surveys on the properties of Sturmian words see [18,22].
Much less is known about codings of r intervals for r > 2. Combinatorial characterization of the language of such words was given first for r = 3 in [11], then for general r in [6]. Boshernitzan and Caroll [8] have found a sufficient condition for substitutivity of an infinite word coding an exchange of r intervals. The necessity of this condition in the special case r = 3 and reverse-order permutation was demonstrated in [1]. A criterion for substitution invariance of words coding exchange of 3 intervals with permutation (3,2,1) is given in [3] and [4].
Actually, infinite words coding exchange of 3 intervals with permutation (3,2,1), here called 3iet words, constitute the central theme of this paper. They stand out among interval exchange words by the fact that, similarly to Sturmian words, they can be geometrically represented by cut-and-project sequences. This was first remarked in [16], then developed in [12]. Results of these papers are connected to the well known 3-distance theorem [23,2], and can be also explained in terms of the theorem of Katok [14] which states that when a k-interval exchange transformation is induced on a subinterval, the resulting transformation is a k or k + 1 interval exchange.
An even more explicit connection between 3iet words and Sturmian words is displayed in [3]. This connection allowed us to estimate the number of factors of 3iet words using the following enumeration formula for Sturmian words, first given by Lipatov [17]; see also [19,7].
Theorem 1 [17,19,7] The number #2iet(N ) of Sturmian factors of length N is given by The aim of this paper is to show the following theorem.

Theorem 2
The asymptotic growth of the number of 3iet factors of length N is given by for arbitrary δ > 0.
The proof of Theorem 2 is divided into two parts, the upper bound is given in Theorem 9 in Section 5, the lower bound in Theorem 12 in Section 6. The lemmas on Sturmian words used in these sections may be of independent interest. Sections 3 and 4 describing properties of the Euler function and of Sturmian factors are auxiliary.
Precise values of #3iet(N ) for N ≤ 10 are found in Section 7.

Definitions of interval exchange words
Let A be a finite alphabet. A concatenation w = w 0 w 1 · · · w n−1 of letters w i ∈ A is called a word of length |w| = n. The number of occurrences of a letter a in the word w is denoted by |w| a . The set of all finite words over A together with the empty word is denoted by A * . We can also consider one-or two-sided infinite words u = u 0 u 1 · · · ∈ A N , resp. u = · · · u −2 u −1 u 0 u 1 u 2 · · · ∈ A Z . A finite word w is a factor of an infinite word u = (u n ) if w = u i u i+1 · · · u i+n−1 for some i. The factors of an infinite word u form the language of u, which is denoted by L(u). We denote L n (u) = L(u) ∩ A n . The factor complexity of u is the function C u : N → N given by C u (n) = #L n (u). Sturmian words are aperiodic words with the smallest possible complexity, namely C u (n) = n + 1 for all n ∈ N. It is known [20] that Sturmian words on {0, 1} are exactly the lower and upper mechanical words, which means that each Sturmian word u = (u n ) n∈Z ∈ {0, 1} Z is defined by where α ∈ (0, 1) is irrational and β ∈ [0, 1). Another representation of Sturmian words is given by the two interval exchange. Define ε = 1 − α ∈ (0, 1), where α is the parameter used above. Consider the intervals 1), and the corresponding transformation We can code the orbit of a point x 0 ∈ [0, 1) by the infinite word u ε,x0 = (u n ) n∈Z ∈ {0, 1} Z : The word u ε,x0 is the upper mechanical word corresponding to α = 1 − ε and β = x 0 . Similarly, lower mechanical words are obtained by coding exchange of intervals of the type (·, ·]. It is the representation of Sturmian words by two interval exchange which will be the most used here. Since the language of a Sturmian word does not depend on the initial point x 0 , but rather only on the parameter ε, we can denote the language of u ε,x0 by L(ε) and the set of factors of length M of u ε,x0 by L M (ε).
We study infinite words which generalize Sturmian words in the way that they code exchange of three intervals. Let ε, be real numbers satisfying The mapping T ε, (x) : is called exchange of three intervals (i) with permutation (3, 2, 1). The orbit of a point x 0 ∈ [0, ) under the transformation T ε, of (4) can be coded by the infinite word u ε, ,x0 = (u n ) n∈Z in the alphabet {A, B, C}, where A visual representation of a 3 interval exchange transformation can be found, e. g., in [1]. Since ε is irrational, the infinite word u is aperiodic. We call these words 3iet words. Again, the language of a 3iet word does not depend on the initial point x 0 , thus we denote it by L(ε, ). The factor complexity of 3iet words depends on the parameter in the following way: C(n) = 2n + 1, if ∈ Z + εZ, otherwise it satisfies C(n) = n + C for some constant C and sufficiently large n, see [1]. Words with the latter complexity are called quasisturmian and described for example in [9,10].
where k ⊥ n means gcd(k, n) = 1. In our formulas, we will use the Landau big O notation; instead of a function f (n), we write O(g(n)) if there exists a constant K such that |f (n)| ≤ K|g(n)| for all n ∈ N.
The proof of the first asymptotic formula for the totient function can be found for example in [21]: Using the first equality from Theorem 1 and this formula, one can easily derive the following estimate found already in [17]: Yet another formula involving the totient function will be useful. It can be easily derived that for any q, numbers coprime to q are in a certain sense uniformly distributed, namely where ω(q) is the number of prime divisors of q. From the result of Hardy and Wright [13] on the asymptotic behavior of ω(q), it follows that

Sturmian factors and Farey numbers
As we have already mentioned, for every fixed irrational ε ∈ (0, 1) and fixed M ∈ N, the number of factors of length M in the language L(ε) of a Sturmian word is exactly M + 1. Obviously, the set L M (ε) does not determine ε uniquely: the same set of M + 1 factors appears for uncountably many irrational ε which form an interval. As shown in [19], if for some irrational ε 1 < ε 2 it holds that L M (ε 1 ) = L M (ε 2 ), then there exists a rational p q such that ε 1 < p q < ε 2 and q ≤ M . Therefore the interval (0, 1) of admissible parameters ε is partitioned into small intervals determining classes of Sturmian words with distinct L M (ε). The partition is given by the Farey fractions of order M , i.e., all the reduced fractions in [0, 1] with denominator smaller than or equal to M . It is useful to consider the Farey fractions as an ordered list Therefore Now consider words coding the transformations T fi defined as in (1). If f i = p q , then such a word is purely periodic and looks as Recall that a finite word v is said to be primitive if it cannot be written in the form v = ww · · · w = w k for any integer k ≥ 2. Therefore the language of the periodic word coding the transformation T fi has exactly q factors of length M for every M ≥ q. Let us denote the set of such factors by L (i) M . As it is derived in [19], We have #L M (ε) = M + 1, #L

The upper bound
The main tool which we use in this section is a link between 3iet words and Sturmian words over the alphabet {0, 1} by means of morphisms σ 01 , σ 10 : {A, B, C} * → {0, 1} * defined by In [3], the following statement is proved.
Theorem 4 A ternary word u is a 3iet word if and only if both σ 01 (u) and σ 10 (u) are Sturmian words.
Based on this strong relation, we define the notion of b-amicability.
Definition 1 Let w (1) , w (2) be finite words over the alphabet {0, 1}. We say that w (1) , w (2) form a bamicable pair, if there exists a 3iet factor w over {A, B, C} with exactly b letters B such that w (1) = σ 01 (w) and w (2) = σ 10 (w). The notion is illustrated in Figure 2. Note that due to Theorem 4, both words of a b-amicable pair are Sturmian factors from the same language L(ε) for some irrational ε ∈ (0, 1). A 3iet factor w of length N with precisely b letters B corresponds to a b-amicable pair of Sturmian factors of length N +b. Obviously, different b-amicable pairs give rise to different 3iet factors. Therefore we can express the number of 3iet words as the number of corresponding b-amicable pairs. Let us focus on b-amicable pairs of Sturmian factors. First, it is obvious that words which form a bamicable pair must both contain at least b letters 0 and b letters 1. As a consequence of Lemma 3, we have the following statement. Proof: Obviously, words forming a b-amicable pair must be of length at least M ≥ 2b. Consider an interval [f i , f i+1 ) such that the language L M (ε) for ε ∈ (f i , f i+1 ) contains an amicable pair and f i = p q , p ⊥ q. Corollary 6 then implies that b must satisfy providing bounds on p, namely, Thus, we are looking for the number With the use of (8) we can write Together with the asymptotic behavior of M q=1 ϕ(q) given in (6), this implies the statement of the lemma. Proof: The language L M (ε) contains M + 1 factors of length M . Choose such a factor w and consider the set of points x ∈ [0, 1) such that the orbit {T j ε (x)}, j = 0, 1, . . . , M − 1, is coded by w. It can be easily shown that this set is a subinterval of [0, 1) of the type [·, ·). Thus [0, 1) is divided into M + 1 disjoint subintervals; the division is done by M division points T −j ε (ε), j = 0, 1, . . . , M − 1. Let us denote these intervals J i , i = 1, 2, . . . , M + 1, ordered so that x < y for any x ∈ J i and any y ∈ J i+1 . The corresponding words of length M are denoted by w (i) , i = 1, 2, . . . , M + 1. Note that T j (J i ) is an interval for all j = 0, 1, . . . , M − 1.
Let us first find an upper bound for the number of 1-amicable pairs of Sturmian factors of length M . The following procedure is illustrated in Figure 3. Take two consecutive intervals, J i , J i+1 for some i = 1, . . . , M . It means that J i ∪ J i+1 is an interval which contains in its interior exactly one of the division points, say T −k ε (ε). If k = 0, then the word w (i) starts with 0 and the word w (i+1) starts with 1. More generally, one can show that words w (i) and w (i+1) have a common prefix v of length k and the and therefore the word w (i) has prefix v0 and the word w (i+1) has prefix v1.
If k = M − 1, then w (i) = v0 and w (i+1) = v1 and they are not 1-amicable. On the other hand, if k < M − 1, then T k+1 (J i ) = T k (J i ) + 1 − ε is an interval of the form [·, 1), and T k+1 (J i+1 ) = T k (J i+1 ) − ε is an interval of the form [0, ·). We further have and therefore the set T j (J i ∪ J i+1 ) is again an interval for all j, k + 2 ≤ j ≤ M − 1, and thus the words w (i) and w (i+1) have a common suffix v of length M − k − 2. Therefore we have w (i) = v01v and w (i+1) = v10v and the words w (i) and w (i+1) form a 1-amicable pair. Together, we have a 1-amicable pair of words w (i) and w (i+1) for i = 1, . . . , M , i = k. Their number is therefore M − 1. The previous considerations imply that the words w (i) , i = 1, . . . , M +1, are ordered lexicographically, where consecutive words differ only by interchange of 01 ↔ 10 at one place. The only exception is the pair of words w (i) and w (i+1) whose intervals J i , J i+1 are separated by the iteration T −M +1 ε (ε). These words are of the form w (i) = v0 and w (i+1) = v1, which corresponds to 'half' of the interchange 01 ↔ 10.
From that, it is clear that b-amicable pairs can only be the pairs of words w (i) , w (i+b) , i = 1, . . . , M + 1 − b, with exceptions, namely those indices i, such that the interval J i ∪ · · · ∪ J i+b contains in its interior the iteration T −M +1 ε (ε). However, since we do not know the position of this point, it is difficult to determine the number of these exceptions, but it is at least one. Therefore the number of b-amicable pairs is at most M − b. 2 Now we are ready to prove the upper bound from Theorem 2.

Theorem 9
The number of 3iet factors of length N satisfies for arbitrary δ > 0.
Proof: Combining Proposition 5 and Lemma 8, we see that Putting in the estimate on #Z N +b,b from Lemma 7, we obtain

6 The lower bound
Consider the following geometric representation of Sturmian and 3iet words by cut-and-project sequences. Given a strip in the Euclidean plane parallel to the (irrationally oriented) straight line y = εx, take all points of the lattice Z 2 and project them orthogonally to the straight line. It is known [16,12] that in such a way, one obtains a sequence of points with two or three distances between them. If the distances are two, say ∆ 1 , ∆ 2 , we can code them by two letters, and the resulting word is a Sturmian word. If the distances are three, they are of the form ∆ 1 , ∆ 2 , ∆ 1 + ∆ 2 . Coding these distances by three letters, we obtain a 3iet word. Moreover, Sturmian word is obtained only for a discrete set of values of the width of the considered strip. For details about this construction we refer to [12]. On the other hand, any Sturmian word u ε,x0 or a 3iet word u ε, ,x0 can be represented in this way, the parameter ε being the slope of the straight line and corresponding to the width of the strip [12,16]. If without loss of generality we take ε ∈ (0, 1), the widths of the strips giving rise to Sturmian words form a sequence · · · < max{ε, 1 − ε} < 1 < 1 + ε < · · · All parameters in between these values yield 3iet words. This geometric representation is illustrated in Figure 4, where 1 < 2 are two consecutive discrete values for which the cut-and-project scheme yields a Sturmian word. Projections of points in the strip S 1 with parameter 1 correspond to a Sturmian word u the value 2 gives a Sturmian word u (2) over {A, C}, where again, A is the projection of the horizontal side and C of the vertical side of the unit square. Note that the projection of a diagonal is the sum of projections of the vertical and horizontal sides. This corresponds to the fact that the infinite word u (2) arises from u (1) by replacing every B by AC.
Any strip S such that S 1 ⊂ S ⊂ S 2 gives rise to a 3iet word over the alphabet {A, B, C} where only some letters B have been replaced by AC. Enlarging the strip S from S 1 to S 2 causes more and more lattice points to fall into S and thus to split more and more distances B into CA. This procedure is illustrated in Figure 5 on a finite segment taken from Moreover, since a Sturmian factor with at least b letters B is of length at least b, it follows that the sum is taken only over b such that N − b ≥ b, i.e., b ≤ N/2 . This gives the following To estimate the number in the right part of the equation, we use the relation (9). Let us mention that an asymptotic formula for the number of Sturmian factors w with given |w| and |w| 1 was derived in [5].
Proof: Let f i be the i-th Farey fraction f i = pi qi . Consider the set L By (8), the number of Farey fractions with the same denominator q = q i ≤ M satisfying (11) is equal to Since the values of Farey fractions increase with i, there exists a maximal index, say s, such that every factor in the set s i=0 L (i) M has sufficient number of letters 1. Therefore In order to determine the cardinality of this union, we write it as a disjoint union, namely Since #L We group summands with the same values of q i = q. The number of such summands (for each q) is given by (12). Therefore where we have used (7). The result easily follows. 2 The following theorem provides a slightly better lower bound on #3iet(N ) than that announced in Theorem 2. Instead of the constant 1 3 we obtain 17 48 > 1 3 . Theorem 12 The number of 3iet factors of length N satisfies for arbitrary δ > 0.
Proof: Combining the formula from Proposition 10 and Lemma 11, we have

7 Precise values for small lengths
We have seen for Sturmian words, that given N ∈ N, the interval (0, 1) of possible values of the parameter ε was divided into N k=1 ϕ(k) areas such that all Sturmian words with parameter within one area had the same set of N + 1 factors of length N . Similarly, one can expect that the family of possible pairs of parameters ε ∈ (0, 1) and ∈ max{ε, 1 − ε}, 1 will be divided into regions with the same set of factors of length N . The number of factors of a 3iet word with parameters in the interior of the regions will be equal to 2N + 1.
Let us describe the way to obtain the list of factors of length N for fixed parameters ε, . Denote d 1 = ε, d 2 = − 1 + ε, the discontinuity points of the transformation T ε, . The domain of T ε, , namely [0, ), is divided by points T −j ε, (d 1 ), T −j ε, (d 2 ), j = 0, 1, . . . , N − 1 generically into 2N + 1 subintervals. (This happens for ∈ Z + Zε and for small N also for other . Otherwise, some of these iterations coincide and the number of subintervals is smaller.) Each of these subintervals corresponds to one factor of length N occurring in the language of the infinite word u ε, ,x0 for any x 0 . The ordering of iterations T −j ε, (d 1 ), T −j ε, (d 2 ) in [0, ) depends on parameters ε, ; different orderings give rise to different lists of 3iet factors. Values of ε, providing the same ordering are given by linear inequalities in ε, . Figure 6 shows the division of the region of parameters ε, by these inequalities for factors of length 2. The set of all 3iet factors of length 2 is the union of lists given in the figure, i.e.,  In Figure 7, we give the same analysis for factors of length N = 3. Note that the region and its division into areas according to factors of given length must be symmetric with respect to the axis ε = 1/2. This corresponds to interchange of letters A ↔ C in all the factors. Figure 7 shows only one half of the picture.
The lists of factors of length 3 in individual areas of Figure 7 are given as follows. As we have proved above, asymptotically, the values from the last string of the table should fall between 17/48 and 2. It seems from the table that these bounds start to hold quite early, and perhaps it is possible to improve them.