Tag-Systems for the Hilbert Curve

Hilbert words correspond to ﬁnite approximations of the Hilbert space ﬁlling curve. The Hilbert inﬁnite word H is obtained as the limit of these words. It gives a description of the Hilbert (inﬁnite) curve. We give a uniform tag-system to generate automatically H and, by showing that it is almost cube-free, we prove that it cannot be obtained by simply iterating a morphism.


Introduction
Words without repetitions were studied since the very first work specifically dedicated to combinatorics on words by Thue (cf Thue, 1906 [16]).Also, the generation of infinite words by morphisms was the subject of many works (see, e.g., Lothaire, 2002 [9], Allouche and Shallit, 2003 [2]).
Generating a word with a HD0L-system consists of applying a morphism to an infinite word generated by another morphism.Berstel [3] gave an example showing that the power of generation of HD0L-systems is greater than the power of generation of D0L-systems (only one morphism is applied): he proved that the Arshon word (a square-free word over a 3-letter alphabet, see Arshon, 1937 [1], Séébold, 2002 [15]) is generated by a tag-system (a particular case of an HD0L-system, see Cobham, 1972 [6]) whereas it cannot be obtained with a D0L-system.Here we give a new example of this phenomenon.Studying the Hilbert word (an infinite word over a 4-letter alphabet describing the drawing of the square-filling Hilbert curve, see, e.g., Sagan, 1994 [14]) we prove that it is 4-power-free but not cube-free (as it is the case with the well known Fibonacci word, see, e.g., Berstel, 1986 [4], Allouche and Shallit, 2003 [2]) and it is generated by a uniform tag-system but not by a D0L-system (as it is the case with the Arshon word).
After some preliminaries (Section 2), we introduce in Section 3 the notion of Hilbert words and we give some simple first results.Then we show in Section 4 that they are almost cube-free, and that they are obtained by using a tag-system but not by iterating a single morphism.We conclude with some open questions.

Preliminaries
The terminology and notation are mainly those of Lothaire, 2002 [9].
Let A be a finite set called an alphabet and A * the free monoid generated by A.
The elements of A are called letters and those of A * are called words.The empty word ε is the identity element of A * for the concatenation of words (the concatenation of two words u and v is the word uv), and we denote by A + the semigroup A * \ {ε}.
The length of a word u, denoted by |u|, is the number of occurrences of letters in u.In particular |ε| = 0.The number of occurrences of a letter a in a word u is denoted by |u| a .
If n is a nonnegative integer, u n is the word obtained by concatenating n occurrences of the word u.Of course, |u n | = n|u|.The cases n = 2, n = 3, and n = 4 deserve particular attention in what follows.A word u 2 (resp.u 3 , u 4 ), with u = ε, is called a square (resp.a cube, a 4-power).
A word w is called a factor (resp. a prefix, resp.a suffix) of u if there exist words x, y such that u = xwy (resp.u = wy, resp.u = xw).The factor (resp. the prefix, resp.the suffix) is proper if xy = ε (resp.y = ε, resp.x = ε).A word u is a subsequence of the word v if there exist words u An infinite word (or sequence) over A is an application a The notion of factor is extended to infinite words as follows: a (finite, possibly empty) word u is a factor (resp. prefix) of an infinite word a over A if there exist In what follows, we will consider morphisms on A. Let B be an alphabet (often, B = A).
A morphism on A (in short a morphism) is an application f : A nonerasing morphism is prolongable on x 0 , x 0 ∈ A + , if there exists u ∈ A + such that f (x 0 ) = x 0 u.In this case, for all n ∈ IN the word f n (x 0 ) is a proper prefix of the word f n+1 (x 0 ) and this defines a unique infinite word which is the limit of the sequence (f n (x 0 )) n≥0 .We write x = f ω (x 0 ) and say that x is generated by f.
A (finite or infinite) word u over A is square-free (resp.cube-free, 4-power-free) if none of its factors is a square (resp.a cube, a 4-power).A morphism f on A is squarefree if the word f (u) is square-free whenever u is a square-free word.The morphism f is weakly square-free if f generates a square-free infinite word.A D0L-system is a triple G = (A, f, u) where A is an alphabet, f a morphism on A and u ∈ A * .An infinite word x is generated by A HD0L-system is a quintuple T = (A, u, f, g, B) where A and B are alphabets, u ∈ A + , f is a nonerasing morphism on A, prolongable on u, and g is a morphism from A onto B. An infinite word y is generated by T if y = g(f ω (u)).When g is a literal morphism T is called a tag-system.The tag-system is uniform if f is a uniform morphism.The terminology of tag-system comes from the fundamental study of Cobham [6].Chapter 5 of [13] is dedicated to a deep study of D0L-systems (see also Pansiot, 1983 [11] who used the terminology extended tag-systems for HD0L-systems).

The Hilbert curve and the Hilbert words
Peano [12] was the first in 1890 to realize the construction of a fractal curve that fills a square without any holes.This construction is obtained by drawing, without removing the pen from the surface of the paper, an infinite succession of unit lines left, right, up, or down.Thus this succession can be represented by an infinite word over the alphabet Σ = {u, ū, r, r} where u stands for up, ū stands for down, r stands for right, and r stands for left (for a description of pictures by words, see the basic study of Maurer, Rozenberg and Welzl, 1982 [10]).In 1891 Hilbert [8] defined another space filling curve.The word so obtained, called the Hilbert infinite word, is denoted by H (the Hilbert curve is some particular example of a space filling curve, and space filling curves are often described under the generic name of Peano curves.For this reason, in a previous paper [7] we improperly gave the name of Peano curve to the Hilbert curve).
Let us describe the algorithm of Hilbert.The general idea is to divide, at step n, the unit square in 4 n equal subsquares each of them containing an equal length part of the curve (except the first and the last ones which contain a part of length 1/2).The curve so obtained is then depicted by a word of length 4 n − 1 which we will call the n-th Hilbert word H n .When n tends to infinity the curve fills the unit square without any holes and the sequence of words H n tends to the Hilbert infinite word H (see Section 4).
Step by step, the algorithm is the following (let us recall that the drawing of the curves is realized without removing the pen from the surface of the paper; in the following figures • and respectively represent the starting and the ending points of the drawing).
• At step 1 the unit square is divided in 4 equal subsquares and it contains the staple-like curve depicted by the word H 1 = urū.

d
• From step n to step n + 1 the curve and grid sizes are decreased by a factor two and four copies are put together to form a new square.
-The first copy is the left lower one, and it is obtained as follows: first perform a vertical flip, then rotate a quarter turn left.
-The second and the third copies are the upper two ones: they are placed as they are.
-The fourth copy is the right lower one: first perform a vertical flip, then rotate a quarter turn right.
This gives the following.
Then, the curve is made continuous by connecting the ending point of the first (resp.the second, the third) copy to the starting point of the second (resp.the third, the fourth) one with three unit segments respectively corresponding to a move up (u), a move right (r), and a move down (ū) -the dashes lines in the following diagram.

d d d d
To end, all the starting and ending points are removed, except the starting point of the first copy and the ending point of the fourth copy. d For more about the construction of space filling curves, see, for example, Sagan, 1994 [14].
Now, let us define on Σ three literal morphisms f , r g , and r d by These three morphisms respectively represent a vertical flip, a quarter turn left rotation, and a quarter turn right rotation.
From the construction we have that the Hilbert word H n+1 (which represents the drawing without removing the pen from the surface of the paper of the Hilbert curve at step n + 1) is obtained from H n by where One has H 2 = rur u urū r urū ū rūr and H 3 = urūrrurururrūru u ruruurūrurūūrūr r ruruurūrurūūrūr ū ūrurrūrūrūrrurū Now let w be a word over Σ.The word w is obtained from w by replacing each occurrence of u, r, ū, r respectively by ū, r, u, r (ε = ε).It is clear that ρ and λ are the literal morphisms defined on Σ by ρ(u) = r and, for any λ(u) = r and, for any Let g be the literal morphism defined on Σ by g(u) = g(ū) = u, g(r) = g(r) = r, and let us recall that if w is a word over Σ with Together with the literal morphisms ρ, λ, f , and g, the Hilbert words H n have the following straightforward properties.
5. For any n ≥ 1, the Hilbert word H n is irreducible, that is, it does not contain any factor uū, ūu, rr, or rr.
From the construction it is easy to see that, for any positive integer n, Moreover, we have the following more precise counting of each letter in H n .
Lemma 2 For any n ∈ IN \ {0}, one has Proof.The result is obvious for n = 1.
From (1) we get, for x ∈ Σ, Then, by the definition of ρ and λ, we obtain and the result follows by induction.
As an immediate corollary one has the following.

Generating the Hilbert infinite word
Preliminary remark.The sequence (H n ) n≥1 has two limits according to whether n is even or odd.An equivalent construction (equivalent in the sense that it provides a curve drawn without removing the pen from the surface of the paper and filling the unit square without any holes) can be obtained with no distinction between the even case and the odd one: it is enough at each even step, before computing the corresponding Hilbert word, to apply to the whole picture a vertical flip followed by a quarter turn left rotation and then each H n is a prefix of H n+1 .But the limit for the odd indices is the same in the two cases so, because the properties of the Hilbert words H n are more interesting with our first construction, we keep the definition of the Hilbert words H n given in Section 3 and define the Hilbert infinite word H as the limit of odd rank Hilbert words, that is, H = lim n→∞ H 2n+1 .
In this section, we prove that the Hilbert infinite word H is generated by a HD0Lsystem, it contains no cube except those of only one letter, and it cannot be generated by a single morphism (it is not even generated by a D0L-system).Then, applying a deep result of Pansiot [11] (see also Allouche and Shallit, 2003 [2, Chapter 7]), we obtain an equivalent uniform tag-system generating H.
Let Ω be the eight-letter alphabet Ω = {A, B, C, D, a, b, c, d}, and let γ and h be the following morphisms.
The proof of this result will use the following lemma which expresses the geometrical meaning of the morphisms γ and h.
Proof.The eight equalities are obviously true if n = 0. Now, let us prove for any integer n ≥ 0 that if the eight equalities are true for n then they are also true for n + 1.One has This proves the first equality.The seven others are verified in the same way.
Proof of Theorem 4.
The equality is of course true if n = 0 since H 1 = urū = h(A).Now, Now, to prove that the Hilbert infinite word H contains no cube except x 3 , x ∈ Σ, we need an intermediate lemma.First remark that the morphism γ is clearly not a square-free morphism (for example, γ(CA) contains BB as a factor).It is even not weakly square-free (it does not generate a square-free word because, for example, γ 4 (A) contains bAbAb as a factor).But we have the following.Proof.The property is straightforward if n = 0 or n = 1.
In order to get a contradiction, let us suppose that, for some integer n ≥ 2, γ n (A) contains a factor Y wY wY , Y ∈ {A, B, C, D}, w ∈ Ω * , when γ n−1 (A) does not contain any such factor.Moreover, let us suppose that Y = A (the three other cases are symmetrical by definition of γ).
Before continuing, we remark that the symbols of γ n (A) alternate between lowerand upper-case letters.
In this case, the first occurrence of A following u is necessarily the first letter of γ(B).This implies that Aw starts with γ(B) = AbBaBdC and, since this last factor can only appear, in γ n (A), as an occurrence of γ(B), it follows that the second occurrence of Aw also starts with γ(B), that is, there exists W ∈ Ω * such that AwAw = γ(BW BW ) and γ n−1 (A) starts with w 1 BW BW .But since AwAw is followed, in γ n (A), by the letter A, the next letter in γ n−1 (A) is a B, which implies that γ n−1 (A) contains BW BW B as a factor, a contradiction.2. u = γ(w 1 )Ba In this case, the first occurrence of A after u is followed by bAcD.This implies that Aw starts with AbAcD and, since this last factor can only appear, in γ n (A), in an occurrence of γ(A), it follows that Aw = AbAcDγ(W )Ba for some W ∈ Ω * where AbAcD and Ba are respectively the suffix and the prefix of γ(A).This means γ n−1 (A) starts with w 1 AW AW A, a contradiction.

u = γ(w 1 )BaAb
In this case, the first occurrence of A after u is followed by cD.This implies that Aw starts with AcD.Here two cases are possible.Either w ends with BaAb and, as in the previous case, γ n−1 (A) contains a factor AW AW A, a contradiction.Or this factor AcD is the central part of some γ(DcC).But in this case AcD is followed, in Aw, by cCdCaB and, since γ n (A) starts with uAw, this factor cCdCaB should be the beginning of some γ(W ) in γ n (A).This is impossible.4. u = γ(w 1 )CdDcDb In this case, the first occurrence of A after u is the last letter of γ(D) and w starts in the same manner as some γ(W ).Let us consider the letter A at the beginning of the second occurrence of Aw.
• It is impossible that this A is the first letter of γ(B) because γ(W ) (and thus w) cannot start with bBaBdC.• It is impossible that this A is the first A in γ(A) because γ(W ) (and thus w) cannot start with bAcD.• It is impossible that this A is the second A in γ(A).Indeed otherwise w starts with cD and since w starts in the same manner as some γ(W ), w starts with cDcCdCaB.But this would imply that CdCaB is the beginning of some γ(Z) which is impossible.• Thus this A is again the last letter of γ(D).This is also the case for the last A of AwAwA, which implies that γ n−1 (A) starts with w 1 DW DW D, a contradiction.Now, we are ready to prove a second noteworthy result.

Theorem 7
The infinite word H does not contain any factor xyW xyW xy with x, y letters and W a word.In particular, the only cubes in H are x 3 with x a single letter.Moreover, H is 4-power-free.
It is interesting to remark here that, starting from the index 0, we find in the infinite word H the first occurrence of r 3 (resp.u 3 , r3 , ū3 ) at the index 30 (resp.94, 222, 478).Now we prove the first part of the theorem.Suppose that H contains a factor xyW xyW xy with x, y letters and W a word, and let T be such that T xyW xyW xy is a prefix of H, i.e., T xyW xyW xy is a prefix of h(γ 2n (A)) for some n ∈ IN.
There are four possible cases depending on the value of |T | mod 4.
• If |T | mod 4 = 0 then x is the first letter of h(X) for some X ∈ {A, B, C, D}.
We suppose X = A (the other cases are symmetrical).Then x = u, y = r and W starts with ū: W = ūW .By definition of h and γ, the factor urū can only appear, in H, as h(A).This implies that there exists a word w ∈ Ω * such that T xyW xyW xy = T h(Aw Aw )ur.But, by construction, w ends with a lower-case letter thus, in γ 2n (A), Aw Aw is followed by an upper-case letter.Since the image of this letter by h starts with ur, this letter is necessarily A. Thus γ 2n (A) contains Aw Aw A, a contradiction with Lemma 6.
• If |T | mod 4 = 1 then x is the second letter of h(X), X ∈ {A, B, C, D}, y is the third letter of h(X), and W ends with the first letter of h(X).As in the previous case, we obtain a contradiction with Lemma 6.
• If |T | mod 4 = 2 then x is the third letter of h(X), X ∈ {A, B, C, D}.We suppose X = A which implies x = ū.Since, in γ 2n (A), A is necessarily followed by b or c, we have y = r or y = ū.Thus |W x| mod 4 = 3 which implies that W x ends with the image by h of an upper-case letter.Since x = ū this letter is A and, as previously, we obtain that γ 2n (A) contains Aw Aw A for some word w ∈ Ω * , a contradiction with Lemma 6.
• If |T | mod 4 = 3 then x is the image by h of a lower-case letter.Then y is the first letter of h(X) for some X ∈ {A, B, C, D}.This implies that T xyW xyW xy is a prefix of T xh(X)W xh(X)W xh(X) where W is the word such that yW = h(X)W .Again, this means that γ 2n (A) contains Xw Xw X, a contradiction with Lemma 6.
A direct corollary is the following.

Corollary 8
The infinite word H cannot be generated by a D0L-system.
Proof.If H were generated by a D0L-system (f, Σ, v) then H = f n (H) for some n ∈ IN \ {0}.Consequently f (u 3 ), f (r 3 ), f (ū 3 ), and f (r 3 ) are factors of H. Since H does not contain any cube except u 3 , r 3 , ū3 , and r3 , this implies that |f (x)| ≤ 1 for any x ∈ Σ: a contradiction because, to generate an infinite word, f must be prolongable on at least one letter.
To end this part let us remark that, following a deep result of Cobham [5] (see also Pansiot, 1983 [11]), we know that there exists a tag-system generating the infinite word H. However a direct application of Pansiot's algorithm gives a non-uniform tagsystem with a 16-letter alphabet and a morphism whose image length is 21 for almost all the letters.Here, we obtain below a better result by proving that H is generated by a uniform tag-system; indeed it is known that there exist words generated by tag-system but not by a uniform one, see, e.g., Pansiot, 1983 [11].Moreover the tag-system below has a 8-letter alphabet and a morphism whose image length is 16.
Proof.It is a little tedious but not difficult task to verify the result for n = 1.Now, let us prove for any integer n ≥ 1 that if the eight equalities are true for 2n then they are also true for 2n + 2. One has This proves the third equality.The seven others are verified in the same way.
Proof.Since γ(A) begins with Ba and γ(B) begins with A we have that, for every integer n ≥ 1, γ 2n−2 (A) is a prefix of γ 2n−1 (Ba) which is itself a prefix of γ 2n (A).

Further questions
In this paper we have started the study of the Hilbert infinite word H.Many other questions could be looked for.Here are some of them.
The subword complexity of an infinite word w is the function counting the number of distinct length-n subwords of w.It is reasonable to hope that, using either the HD0L-system or the tag-system generating H, the complexity function of this word could be obtained.
The critical exponent of an infinite word w is a number e such that w contains α-powers for some α < e, but has no α-powers for α > e (it may or may not have epowers).Here α is a rational number and e may be rational or real.Since H contains cubes and the only cubes it contains are one-letter cubes, the critical exponent of H is 3. However there is another interesting notion connected to the notion of critical exponent.Let us call super-critical exponent of an infinite word w the number e s such that w contains α-powers for every rational number α < e s , but has no α-powers for α > e s (again, it may or may not have e s -powers).Of course e s ≤ e.In the case of some classical infinite words, as the Thue-Morse word or the Fibonacci word, the critical exponent and the super-critical exponent have the same value.But, in the present case the super-critical exponent is undoubtedly less than 3 because, from Theorem 7, H cannot contain all rational powers less than 3.
One of the referees proposed to replace absolute directions (left, right, up, down) by relative directions (left, right, straight).The infinite word so obtained on a 3-letter alphabet, which is of course different from H, is generated by a tag-system with a 7-letter alphabet and a uniform morphism whose image length is only 4. Comparing the properties of this word with those of H could be interesting.

Acknowledgment
Discussions with Valérie Berthé and Julien Cassaigne were helpful.Referees comments and suggestions considerably improved the paper.

Lemma 6
For any n ∈ IN, γ n (A) does not contain any factor Y wY wY with Y ∈ {A, B, C, D} and w ∈ Ω * .

First suppose |W x| mod 4 = 3 .
Then if y = r the only possibility is |W | mod 4 = 1 (that is, after T the second occurrence of xy = ūr is at the end of h(D)) and if y = ū the only possibility is |W | mod 4 = 3 (that is, after T the second occurrence of xy = ūū is such that x is the image by h of a lower-case letter and y is the beginning of the image of an upper-case letter.)In the two cases |T xyW xyW | mod 4 = 0 which implies that the third occurrence of xy is the beginning of some h(Y ) with Y ∈ {A, B, C, D} : this is impossible because xy = ūr or xy = ūū.