Automatic sequences: from rational bases to trees

The $n$th term of an automatic sequence is the output of a deterministic finite automaton fed with the representation of $n$ in a suitable numeration system. In this paper, instead of considering automatic sequences built on a numeration system with a regular numeration language, we consider those built on languages associated with trees having periodic labeled signatures and, in particular, rational base numeration systems. We obtain two main characterizations of these sequences. The first one is concerned with $r$-block substitutions where $r$ morphisms are applied periodically. In particular, we provide examples of such sequences that are not morphic. The second characterization involves the factors, or subtrees of finite height, of the tree associated with the numeration system and decorated by the terms of the sequence.


Introduction
Motivated by a question of Mahler in number theory, the introduction of rational base numeration systems has brought to light a family of formal languages with a rich combinatorial structure [1].In particular, the generation of infinite trees with a periodic signature has emerged [20,21,22,23].Marsault and Sakarovitch very quickly linked the enumeration of the vertices of such trees (called breadth-first serialization) to the concept of abstract numeration system built on the corresponding prefix-closed language: the traversal of the tree is exactly the radix enumeration of the words of the language.In this paper, we study automatic sequences associated with that type of numeration systems.In particular, in the rational base p q , a sequence is p q -automatic if its nth term is obtained as the output of a DFAO fed with the base-p q representation of n.Thanks to a result of Lepistö [15] on factor complexity, we observe that we can get sequences that are not morphic.We obtain several characterizations of these sequences.The first one boils down to translating Cobham's theorem from 1972 into this setting.In Section 4, we show that any automatic sequence built on a tree language with a purely periodic labeled signature is the image under a coding of an alternating fixed point of uniform morphisms not necessarily of the same length.If all the morphisms had the same length, as observed in [12], we would only get classical k-automatic sequences.As a consequence, in the rational base p q , if a sequence is p q -automatic, then it is the image under a coding of a fixed point of a q-block substitution whose images all have length p.In the literature, these substitutions are also called PD0L where a periodic control is applied -q different morphisms are applied depending on the index of the considered letter modulo q.
On the other hand, Sturmian trees as studied in [3] also have a rich combinatorial structure where subtrees play a special role analogous to factors occurring in infinite words.In Section 5, we discuss about the factors, i.e., subtrees of finite height, that may appear in the tree whose paths from the root are labeled by the words of the numeration language and whose vertices are colored according to the sequence of interest.Related to the k-kernel of a sequence, we obtain a new characterization of the classical k-automatic sequences: a sequence x is k-automatic if and only if the labeled tree of the base-k numeration system decorated by x is rational, i.e., it has finitely many infinite subtrees.For numeration systems built on a regular language, the function counting the number of decorated subtrees of height n is bounded, and we get a similar result.This is not the case in the more general setting of rational base numeration systems.Nevertheless, we obtain sufficient conditions for a sequence to be p q -automatic in terms of the number of extensions a subtree may have.
This paper is organized as follows.In Section 2, we recall basic definitions about abstract numeration systems, tree languages, rational base numeration systems, and alternating morphisms.In Section 3, we give some examples of the automatic sequences that we will consider.The parity of the sum-of-digits in base 3  2 is such an example.In Section 4, Cobham's theorem is adapted to the case of automatic sequences built on tree languages with a periodic labeled signature in Theorem 20 (so, in particular, to the rational base numeration systems in Corollary 21).In Section 5, we decorate the nodes of the tree associated with the language of a rational base numeration system with the elements of a sequence taking finitely many values.Under some mild assumption (always satisfied when distinct states of the deterministic finite automaton with output producing the sequence have distinct output), we obtain a characterization of p q -automatic sequences in terms of the number of extensions for subtrees of some finite height occurring in the decorated tree.In Section 6, we review some usual closure properties of p q -automatic sequences.
by lexicographical order).The generalization of abstract numeration systems to context-free languages was, for instance, considered in [6].Rational base numeration systems discussed below in Section 2.3 are also abstract numeration systems built on non-regular languages.
Definition 1.An abstract numeration system (or ANS for short) is a triple S = (L, A, <) where L is an infinite language over a totally ordered (finite) alphabet (A, <).We say that L is the numeration language.
The map rep S : N → L is the one-to-one correspondence mapping n ∈ N onto the (n + 1)st word in the radix ordered language L, which is then called the S-representation of n.The S-representation of 0 is the first word in L. The inverse map is denoted by val S : L → N.For any word w in L, val S (w) is its S-numerical value.
Positional numeration systems, such as integer base numeration systems, the Fibonacci numeration system, and Pisot numeration systems, are based on the greediness of the representations (computed through a greedy algorithm where at each step one subtracts, by Euclidean division, the largest available term of the sequence from the remaining part to be represented [13]).They all share the following property: m < n if and only if rep(m) is less than rep(n) for the radix order.These numeration systems are thus ANS.As a non-standard example of ANS, consider the language a * b * over {a, b} and assume that a < b.Let S = (a * b * , {a, b}, <).The first few words in the numeration language are ε, a, b, aa, ab, bb, . ... For instance, rep S (3) = aa and rep S (5) = bb.One can show that val S (a p b q ) = (p+q)(p+q+1) 2 + q.For details, we refer the reader to [14] or [26].
In the next definition, we assume that most significant digits are read first.This is not a real restriction (see Section 6).Definition 2. Let S = (L, A, <) be an abstract numeration system and let B be a finite alphabet.An infinite word Let k ≥ 2 be an integer.We let A k denote the alphabet {0, 1, . . ., k − 1}.For the usual base-k numeration system built on the language an S-automatic sequence is said to be k-automatic [2].We also write rep k and val k in this context.

Tree languages
Prefix-closed languages define labeled trees (also called trie or prefix-tree in computer science) and viceversa.Let (A, <) be a totally ordered (finite) alphabet and let L be a prefix-closed language over (A, <).
The set of nodes of the tree is L. If w and wd are words in L with d ∈ A, then there is an edge from w to wd with label d.The children of a node are ordered by the labels of the letters in the ordered alphabet A.
In Figure 1, we have depicted the first levels of the tree associated with the prefix-closed language a * b * .Nodes are enumerated by breadth-first traversal (or, serialization).We recall some notion from [21] or [23].Let T be an ordered tree of finite degree.The (breadth-first) signature of T is a sequence of integers, the sequence of the degrees of the nodes visited by the (canonical) breadth-first traversal of the tree.The (breadth-first) labeling of T is the infinite sequence of the labels of the edges visited by the breadth-first traversal of this tree.As an example, with the tree in Figure 1, its signature is 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, . . .and its labeling is a, b, a, b, b, a, b, b, b, a, b, . ... Remark 3. As observed by Marsault and Sakarovitch [21], it is usually convenient to consider i-trees: the root is assumed to be a child of itself.It is especially the case for positional numeration systems when one has to deal with leading zeroes as the words u and 0u may represent the same integer.In an i-tree, paths labeled by u and 0u lead to the same node.
We now present a useful way to describe or generate infinite labeled i-trees.Let A be a finite alphabet of which 0 is assumed to be the smallest letter.A labeled signature is an infinite sequence (w n ) n≥0 of finite words over A providing a signature (|w n |) n≥0 and a consistent labeling of a tree (made of the sequence of letters of (w n ) n≥0 ).It will be assumed that the letters of each word are in strictly increasing order and that w 0 = 0x with x ∈ A + .To that aim we let inc(A * ) denote the set of words over A with increasingly ordered letters.For instance, 025 belongs to inc(A * 6 ) but 0241 does not.Examples of labeled signatures will be given in Section 2.3.Remark 4. Since a labeled signature s generates an i-tree, by abuse, we say that such a signature defines a prefix-closed language denoted by L(s), which is made of the labels, not starting with 0, of the paths in the i-tree.Moreover, since we assumed the words of s all belong to inc(A * ) for some finite alphabet A, the canonical breadth-first traversal of this tree produces an abstract numeration system.Indeed the enumeration of the nodes v 0 , v 1 , v 2 , . . . of the tree is such that v n is the nth word in the radix ordered language L(s).The language L(s), the set of nodes of the tree and N are thus in one-to-one correspondence.

Rational bases
The framework of rational base numeration systems [1] is an interesting setting giving rise to a nonregular numeration language.Nevertheless the corresponding tree has a rich combinatorial structure: it has a purely periodic labeled signature.
Let p and q be two relatively prime integers with p > q > 1.Given a positive integer n, we define the sequence (n i ) i≥0 as follows: we set n 0 = n and, for all i ≥ 0, qn i = pn i+1 + a i where a i is the remainder of the Euclidean division of qn i by p.Note that a i ∈ A p for all i ≥ 0. Since p > q, the sequence (n i ) i≥0 is decreasing and eventually vanishes at some index ℓ + 1.We obtain Conversely, for a word w = w ℓ w ℓ−1 • • • w 0 ∈ A * p , the value of w in base p q is the rational number Note that val p q (w) is a not always an integer and val p q (uv) = val p q (u)( p q ) |v| + val p q (v) for all u, v ∈ A * p .We let N p q denote the value set, i.e., the set of numbers representable in base p q : A word w ∈ A * p is a representation of an integer n ≥ 0 in base p q if val p q (w) = n.Just as for integer bases, representations in rational bases are unique up to leading zeroes [1, Theorem 1].Therefore we let rep p q (n) denote the representation of n in base p q that does not start with 0. By convention, the representation of 0 in base p q is the empty word ε.In base p q , the numeration language is the set Hence, rational base numeration systems are special cases of ANS built on L p q : m < n if and only if rep p a (m) < rep p a (n) for the radix order.It is clear that L p q ⊆ A * p is a prefix-closed language.As a consequence of the previous section, it can be seen as a tree.
Example 5.The alphabet for the base 3  2 is A 3 = {0, 1, 2}.The first few words in L 3 2 are ε, 2, 21, 210, 212, 2101, 2120, 2122, and the associated i-tree is depicted in Figure 2. If we add an edge of label 0 on the root of this tree (see Remark 3), its signature is 2, 1, 2, 1, . . .and its labeling is 0, 2, 1, 0, 2, 1, 0, 2, 1, . ... Otherwise stated, the purely periodic labeled signature (02, 1) ω gives the i-tree of the language L 3 2 ; see Figure 2.For all n ≥ 0, the nth node in the breadth-first traversal is the word rep 3 2 (n).Observe that there is an edge labeled by a ∈ A 3 from the node n to the node m if and only if m = 3 2 • n + a 2 .This remark is valid for all rational bases.Remark 6.The language L p q is highly non-regular: it has the bounded left-iteration property; for details, see [20].In L p q seen as a tree, no two infinite subtrees are isomorphic, i.e., for any two words u, v ∈ L p q with u = v, the quotients u −1 L p q and v −1 L p q are distinct.As we will see with Lemma 29, this does not prevent the languages u −1 L p q and v −1 L p q from coinciding on words of length bounded by a constant depending on val p q (u) and val p q (v) modulo a power of q.Nevertheless the associated tree has a purely periodic labeled signature.For example, with p q respectively equal to 3 2 , 5 2 , 7 3 and 11 4 , we respectively have the signatures (02, 1) ω , (024, 13) ω , (036, 25, 14) ω , (048, 159, 26 (10), 37) ω .Generalizations of these languages (called rhythmic generations of trees) are studied in [23].Definition 7. We say that a sequence is p q -automatic if it is S-automatic for the ANS built on the language L p q , i.e., S = (L p q , A p , <).

Alternating morphisms
The Kolakoski-Oldenburger word [28, A000002] is the unique word k over {1, 2} starting with 2 and satisfying ∆(k) = k where ∆ is the run-length encoding map It is a well-known (and challenging) object of study in combinatorics on words.It can be obtained by periodically iterating two morphisms, namely More precisely, in [8], In the literature, one also finds the terminology PD0L for D0L system with periodic control [12,15].Definition 8. Let r ≥ 1 be an integer, let A be a finite alphabet, and let f 0 , . . ., f r−1 be r morphisms over As observed by Dekking [9] for the Kolakoski word, an alternating fixed point can also be obtained by an r-block substitution.Definition 9. Let r ≥ 1 be an integer and let A be a finite alphabet.An r-block substitution g : If the length of the word is not a multiple of r, then the suffix of the word is ignored under the action of g.

An infinite word w
Proposition 10.Let r ≥ 1 be an integer, let A be a finite alphabet, and let f 0 , . . ., f r−1 be r morphisms over A * .If an infinite word over A is an alternating fixed point of (f 0 , . . ., f r−1 ), then it is a fixed point of an r-block substitution.
Proof: For every of length-r word Thanks to the previous result, the Kolakoski-Oldenburger word k is also a fixed point of the 2-block substitution g : Observe that the lengths of images under g are not all equal.

Concrete examples of automatic sequences
Let us present how the above concepts are linked with the help of some examples.The first one is our toy example.
As a consequence of Proposition 16, it will turn out that t is an alternating fixed point of (f 0 , f 1 ) with With Proposition 10, t is also a fixed point of the 2-block substitution g : Observe that we have a 2-block substitution with images of length 3.This is not a coincidence, as we will see with Corollary 21.
Automatic sequences in integer bases are morphic words, i.e., images, under a coding, of a fixed point of a prolongable morphism [2].As shown by the next example, there are 3  2 -automatic sequences that are not morphic.For a word u ∈ {0, 1} * , we let u denote the word obtained by applying the involution i → 1 − i, i ∈ {0, 1}, to the letters of u.
Example 12. Lepistö considered in [15] the following 2-block substitution producing the word F 2 = 01001100001 • • •.He showed that the factor complexity p F2 of this word satisfies p F2 (n) > δn t for some δ > 0 and t > 2. Hence, this word cannot be purely morphic nor morphic (because these kinds of words have a factor complexity in O(n 2 ) [24]).With Proposition 17, we can show that F 2 is a 3  2 -automatic sequence generated by the DFAO depicted in Figure 4.
Remark 13.Similarly, the non-morphic word F p introduced in [15] is p+1 p -automatic.It is generated by the p-block substitution defined by h p (au) = g 0 (a)u for a ∈ {0, 1} and u ∈ {0, 1} p−1 , where g 0 is defined in Example 12.
We conclude this section with an example of an automatic sequence associated with a language coming from a periodic signature.

Cobham's theorem
Cobham's theorem from 1972 states that a sequence is k-automatic if and only if it is the image under a coding of the fixed point of a k-uniform morphism [7] (or see [2,Theorem 6.3.2]).This result has been generalized to various contexts: numeration systems associated with a substitution, Pisot numeration systems, Bertrand numeration systems, ANS with regular languages, and so on [5,10,17,25].Also see [14] or [26] for a comprehensive presentation.In this section, we adapt it to the case of S-automatic sequences built on tree languages with a periodic labeled signature (so, in particular, to the rational base case).We start off with a technical lemma.
Lemma 15.Let r ≥ 1 be an integer, let A be a finite alphabet, and let f 0 , . . ., f r−1 be morphisms over A * .Let x = x 0 x 1 x 2 • • • be an alternating fixed point of (f 0 , . . ., f r−1 ).For all m ≥ 0, we have Proof: Let m ≥ 0. From the definition of an alternating fixed point, we have the factorization .
, which concludes the proof.Given an S-automatic sequence associated with the language of a tree with a purely periodic labeled signature, we can turn it into an alternating fixed point of uniform morphisms.
Proposition 16.Let r ≥ 1 be an integer and let A be a finite alphabet of digits.Let w 0 , . . ., w r−1 be r non-empty words in inc(A * ).Consider the language L(s) of the i-tree generated by the purely periodic signature s = (w 0 , w 1 , . . ., w r−1 ) ω .Let A = (Q, q 0 , A, δ) be a DFA.For i ∈ {0, . . ., r − 1}, we define the r morphisms from Q * to itself by where w i,j denotes the jth letter of w i .The alternating fixed point x = x 0 x 1 • • • of (f 0 , . . ., f r−1 ) starting with q 0 is the sequence of states reached in A when reading the words of L(s) in increasing radix order, i.e., for all n ≥ 0, x n = δ(q 0 , rep S (n)) with S = (L(s), A, <).
Proof: Up to renaming the letters of w 0 , without loss of generality we may assume that w 0 = 0x with x ∈ A + .
We proceed by induction on n ≥ 0. It is clear that x 0 = δ(q 0 , ε) = q 0 .Let n ≥ 1. Assume that the property holds for all integers less than n and we prove it for n.
Write rep S (n) = a ℓ • • • a 1 a 0 .This means that in the i-tree generated by s, we have a path of label a ℓ • • • a 0 from the root.We identify words in L(s) with vertices of the i-tree.Since L(s) is prefix-closed, there exists an integer m < n such that rep S (m) = a ℓ • • • a 1 .Let i = m mod r.By definition of the periodic labeled signature s, in the i-tree generated by s, reading a ℓ • • • a 1 from the root leads to a node having |w i | children that are reached with edges labeled by the letters of w i .Since w i ∈ inc(A * ), the letter a 0 occurs exactly once in w i , so assume that w i,j = a 0 for some j ∈ {0, . . ., |w i | − 1}.By construction of the i-tree given by a periodic labeled signature (see Figure 7 for a pictorial description), we have that By the induction hypothesis, we obtain Given an alternating fixed point of uniform morphisms, we can turn it into an S-automatic sequence for convenient choices of a language of a tree with a purely periodic labeled signature and a DFAO.
Proposition 17.Let r ≥ 1 be an integer and let A be a finite alphabet.Let f 0 , . . ., f r−1 : A * → A * be r uniform morphisms of respective length ℓ 0 , . . ., ℓ r−1 such that f 0 is prolongable on some letter a ∈ A, i.e., f 0 (a) = ax with x ∈ A + .Let x = x 0 x 1 • • • be the alternating fixed point of (f 0 , . . ., f r−1 ) starting with a. Consider the language L(s) of the i-tree generated by the purely periodic labeled signature

which is made of consecutive non-negative integers. Define a DFA A having
• A as set of states, • a as initial state, • B = {0, . . ., j<r ℓ j − 1} as alphabet, • its transition function δ : A × B → A defined as follows: For all i ∈ B, there exist a unique j i ≥ 0 and a unique t i ≥ 0 such that i = k≤ji−1 ℓ k + t i with t i < ℓ ji , and we set Then the word x is the sequence of the states reached in A when reading the words of L(s) by increasing radix order, i.e., for all n ≥ 0, x n = δ(a, rep S (n)) with S = (L(s), B, <).
Proof: We again proceed by induction on n ≥ 0. It is clear that Assume the property holds for all values less than n and we prove it for n.
This means that in the i-tree with a periodic labeled signature s, we have a path of label a ℓ • • • a 0 from the root.We identify words in L(s) ⊆ B * with vertices of the i-tree.
Since L(s) is prefix-closed, there exists m < n such that rep S (m) = a ℓ • • • a 1 .Let j = m mod r.In the i-tree generated by s, reading a ℓ • • • a 1 from the root leads to a node having ℓ j children that are reached with edges labeled by Observe that the words in s belong to inc(B * ).Therefore the letter a 0 occurs exactly once in B and in particular amongst those labels, assume that a 0 = k≤j−1 ℓ k + t for some t ∈ {0, . . ., ℓ j − 1}.By construction of the i-tree, we have that By the induction hypothesis, we obtain and by definition of the transition function, δ(x m , a 0 ) = [f j (x m )] t = [f m mod r (x m )] t .From Lemma 15 and Equation (4), this is exactly x n .
Remark 18.What matters in the above statement is that two distinct words of the signature s do not share any common letter.It mainly ensures that the choice of the morphism to apply when defining δ is uniquely determined by the letter to be read.
Example 19.If we consider the morphisms in (2), Proposition 17 provides us with the signature s = (01, 2) ω instead of the signature (02, 1) ω of L 3 2 .We will produce the sequence t using the language h(L 3  2 ) where the coding h is defined by h(0) = 0, h(1) = 2 and h(2) = 1 and in the DFAO in Figure 3, the same coding is applied to the labels of the transitions.What matters is the shape of the tree (i.e., the sequence of degrees of the vertices) rather than the labels themselves.
Theorem 20.Let A, B be two finite alphabets.An infinite word over B is the image under a coding g : A → B of an alternating fixed point of uniform morphisms (not necessarily of the same length) over A if and only if it is S-automatic for an abstract numeration system S built on a tree language with a purely periodic labeled signature.
Proof: The forward direction follows from Proposition 17: define a DFAO where the output function τ is obtained from the coding g : A → B defined by τ (b) = g(b) for all b in A. The reverse direction directly follows from Proposition 16.
We are able to say more in the special case of rational bases.The tree language associated with the rational base p q has a periodic signature of the form (w 0 , . . ., w q−1 ) ω with q−1 i=0 |w i | = p and w i ∈ A * p for all i.See Remark 6 for examples.
Corollary 21.If a sequence is p q -automatic, then it is the image under a coding of a fixed point of a q-block substitution whose images all have length p.
Proof: Let (w 0 , . . ., w q−1 ) ω denote the periodic signature in base p q .Proposition 16 provides q morphisms f i that are respectively |w i |-uniform.By Proposition 10, the alternating fixed point of (f 0 , . . ., f q−1 ) is a fixed point of a q-block substitution g such that, for any length-q word a 0

Decorating trees and subtrees
As already observed in Section 2.2, a prefix-closed language L over an ordered (finite) alphabet (A, <) gives an ordered labeled tree T (L) in which edges are labeled by letters in A. Labels of paths from the root to nodes provide a one-to-one correspondence between nodes in T (L) and words in L. We now add an extra information, such as a color, on every node.This information is provided by a sequence taking finitely many values.Definition 22.Let T = (V, E) be a rooted ordered infinite tree, i.e., each node has a finite (ordered) sequence of children.As observed in Remark 4, the canonical breadth-first traversal of T gives an abstract numeration system -an enumeration of the nodes: v 0 , v 1 , v 2 , . ... Let x = x 0 x 1 • • • be an infinite word over a finite alphabet B. A decoration of T by x is a map from V to B associating with the node v n the decoration (or color) x n , for all n ≥ 0.
To be consistent and to avoid confusion, we refer respectively to label and decoration the labeling of the edges and nodes of a tree.
Example 23.In Figure 8 are depicted a prefix of T (L 3  2 ) decorated with the sequence t of Example 11 and a prefix of the tree T (L 2 ) associated with the binary numeration system (see (1)) and decorated with the Thue-Morse sequence 0110100110010110 • • •.In these trees, the symbol 0 (respectively 1) is denoted by a black (respectively red) decorated node.
We use the terminology of [3] where Sturmian trees are studied; it is relevant to consider (labeled and decorated) factors occurring in trees.
Definition 24.The domain dom(T ) of a labeled tree T is the set of labels of paths from the root to its nodes.In particular, dom(T (L)) = L for any prefix-closed language L over an ordered (finite) alphabet.The truncation of a tree at height h is the restriction of the tree to the domain dom(T ) ∩ A ≤h .Let L be a prefix-closed language over (A, <) and x = x 0 x 1 • • • be an infinite word over some finite alphabet B. From now on, we consider the labeled tree T (L) decorated by x. (We could use an ad hoc notation like T x (L) but in any case we only work with decorated trees and it would make the presentation cumbersome.)For all n ≥ 0, the nth word w n in L corresponds to the nth node of T (L) decorated by x n .Otherwise stated, for the ANS S = (L, A, <) built on L, if w ∈ L, the node corresponding to w in T (L) has decoration x valS (w) .Definition 25.Let w ∈ L. We let T [w] denote the subtree of T having w as root.Its domain is w −1 L = {u | wu ∈ L}.We say that T [w] is a suffix of T .
For any h ≥ 0, we let T [w, h] denote the factor of height h rooted at w, which is the truncation of T [w] at height h.The prefix of height h of T is the factor T [ε, h].Two factors T [w, h] and T [w ′ , h] of the same height are equal if they have the same domain and the same decoration, i.e., x valS (wu) = x valS (w ′ u) for all u ∈ dom(T [w, h]) = dom(T [w ′ , h]).We let denote the set of factors of height h occurring in T .The tree T is rational if it has finitely many suffixes.
Note that, due to Remark 6, with any decoration, even constant, the tree T (L p q ) is not rational.In Figure 9, we have depicted the factors of height 2 occurring in T (L 3  2 ) decorated by t.In Figure 10, we have depicted the factors of height 2 occurring in T (L 2 ) decorated by the Thue-Morse sequence.In this second example, except for the prefix of height 2, observe that a factor of height 2 is completely determined by the decoration of its root.
Since every factor of height h is the prefix of a factor of height h + 1, we trivially have #F h+1 ≥ #F h .This is quite similar to factors occurring in an infinite word: any factor has at least one extension.In particular, ultimately periodic words are characterized by a bounded factor complexity.

Lemma 26. [3, Proposition 1]
Let L be a prefix-closed language over (A, <) and let x = x 0 x 1 • • • be an infinite word over some finite alphabet B. Consider the labeled tree T (L) decorated by x.The tree T (L) is rational if and only if #F h = #F h+1 for some h ≥ 0. In particular, in that case, #F h = #F h+n for all n ≥ 0.
We can characterize S-automatic sequences built on a prefix-closed regular language L in terms of the decorated tree T (L).For the sake of presentation, we mainly focus on the case of k-automatic sequences.The reader can relate our construction to the k-kernel of a sequence.Roughly, each element of the k-kernel corresponds to reading one fixed suffix u from each node w of the tree T (L k ).We have val k (wu) = k |u| val k (w) + val k (u) and an element from the k-kernel is a sequence of the form (x k |u| n+val k (u) ) n≥0 .
Theorem 27.Let k ≥ 2 be an integer.A sequence x is k-automatic if and only if the labeled tree T (L k ) decorated by x is rational.
Proof: Let us prove the forward direction.If x is k-automatic, there exists a DFAO A = (Q, q 0 , A k , δ, τ ) producing it when fed with base-k representations of integers.Let w ∈ L k be a non-empty base-k representation.The suffix T [w] is completely determined by the state δ(q 0 , w).Indeed, it is a full k-ary tree and the decorations are given by τ (δ(q 0 , wu)) for u running through A * k in radix order.For the empty word, however, the suffix T [ε] = T is decorated by τ (δ(q 0 , u)) for u running through {ε} ∪ {1, . . ., k − 1}A * k .Hence T (L k ) is rational: it has a finite number of suffix trees.Let us prove the backward direction.Assume that the decorated tree T := T (L k ) is rational.By definition, the set Q := {T [w] | w ∈ dom(T )} is finite.We define a DFAO F whose set of states is Q and whose transition function is given by The initial state is given by the tree T [ε] = T and we set δ(T [ε], 0) = T [ε].Finally the output function maps a suffix T [w] to the decoration of its root w, that is, x valk(w) .If follows that x n is the output of F when fed with rep k (n).Indeed starting from the initial state T [ε], we reach the state T [rep k (n)] and the output is We improve the previous result to ANS with a regular numeration language.
Theorem 28.Let S = (L, A, <) be an ANS built on a prefix-closed regular language L. A sequence x is S-automatic if and only if the labeled tree T (L) decorated by x is rational.
Proof: The proof follows exactly the same lines as for integer base numeration systems.The only refinement is the following one.A suffix T [w] of T (L) is determined by w −1 L and δ(q 0 , w).Since L is regular, the set {w −1 L | w ∈ A * } is finite.

Rational bases
We now turn to rational base numeration systems.A factor of height h in T (L 3  2 ) only depends on the value of its root modulo 2 h .This result holds for any rational base numeration system.
Lemma 29.[19,Lemme 4.14] Let w, w ′ ∈ L p q be non-empty words and let u ∈ A * p be a word of length h.
, then val p q (w) ≡ val p q (w ′ ) mod q h .In the previous lemma, the empty word behaves differently.For a non-empty word w ∈ L p q with val p q (w) ≡ 0 mod q h , a word u ∈ A h p not starting with 0 verifies u ∈ ε −1 L p q if and only if u ∈ w −1 L p q .Therefore the prefix of the tree T (L p q ) has to be treated separately.Lemma 30.[19,Corollaire 4.17] Every word u ∈ A * p is suffix of a word in L p q .As a consequence of these lemmas, we obtain the following corollary.
Corollary 31.For all h ≥ 0, the set {w −1 L p q ∩ A h p | w ∈ A + p } is a partition of A h p into q h non-empty languages.
Otherwise stated, in the tree T (L p q ) with no decoration or, equivalently with a constant decoration for all nodes, there are q h + 1 factors of height h ≥ 1 (we add 1 to count the height-h prefix, which has a different shape).For instance, if the decorations in Figure 9 are not taken into account, there are 5 = 2 2 +1 height-2 factors occurring in T (L 3  2 ).Except for the height-h prefix, each factor of height h is extended in exactly q ways to a factor of height h + 1.To the first (leftmost) leaf of a factor of height h are attached children corresponding to one of the q words of the periodic labeled signature.To the next leaves on the same level are periodically attached as many nodes as the length of the different words of the signature.For instance, in the case p q = 3 2 , the first (leftmost) leaf of a factor of height h becomes a node of degree either 1 (label 1) or 2 (labels 0 and 2) to get a factor of height h + 1.The next leaves on the same level periodically become nodes of degree 2 or 1 accordingly.An example is depicted in Figure 11.
Lemma 32.Let x be a p q -automatic sequence produced by the DFAO A = (Q, q 0 , A p , δ, τ ) and let T (L p q ) be decorated by x.For all h ≥ 1, the number #F h of height-h factors of T (L p q ) is bounded by 1 + q h • #Q.Proof: Let w ∈ L p q be a non-empty base-p q representation and let h ≥ 1.We claim that the factor T [w, h] is completely determined by the value val p q (w) mod q h and the state δ(q 0 , w).First, from Lemma 29, the labeled tree T [w, h] of height h with root w and in particular, its domain, only depends on val p q (w) modulo q h .Indeed, if w, w ′ ∈ L p q are such that val p q (w) ≡ val p q (w ′ ) mod q h , then Second, the decorations of the factor T [w, h] are given by τ (δ(q 0 , wu)) for u running through dom(T [w, h]) = w −1 L p q ∩ A ≤h p enumerated in radix order.So the decorations only depend on the state δ(q 0 , w) of A. Hence the number of such factors is bounded by Definition 33.A tree of height h ≥ 0 has nodes on h + 1 levels: the level of a node is its distance to the root.Hence, the root is the only node on level 0 and the leaves are on level h.
For instance, in Figure 11, each tree of height 3 has four levels.
Definition 34.Let T be a labeled decorated tree and let h ≥ 0. We let F ∞ h ⊆ F h denote the set of factors of height h occurring infinitely often in T .For any suitable letter a in the signature of T , we let F ∞ h,a ⊆ F ∞ h denote the set of factors of height h occurring infinitely often in T such that the label of the edge between the first node on level h − 1 and its first child is a.Otherwise stated, the first word of length h in the domain of the factor ends with a.
Example 35.In Figure 11, assuming that they occur infinitely often, the first four trees belong to F ∞ 3,1 and the last four on the second row belong to F ∞ 3,0 .Even though the language L p q is highly non-regular, we can still handle a subset of p q -automatic sequences.Roughly, with the next two theorems, we characterize p q -automatic sequences in terms of the extensions that factors of a fixed height occurring infinitely often may have.As mentioned below, the first result can be notably applied when distinct states of the DFAO producing the sequence have distinct outputs.
In the remaining of the section, we let (w 0 , . . ., w q−1 ) denote the signature of T (L p q ).For all 0 ≤ j ≤ q − 1 and all 0 ≤ i ≤ |w j | − 1, we also let w j,i denote the ith letter of w j .For the next statement, recall from Lemma 29 that for two p q -representations u, v, val p q (u) ≡ val p q (v) mod q h implies that the factors T [u, h] and T [v, h] in T (L p q ) have the same domain, i.e., u −1 L p q ∩ A ≤h p = v −1 L p q ∩ A ≤h p .Theorem 36.Let x be a p q -automatic sequence over a finite alphabet B generated by a DFAO A = (Q, q 0 , A p , δ, τ : A p → B) with the following property: there exists an integer h such that, for all words u, v ∈ L p q such that val p q (u) ≡ val p q (v) mod q h and δ(q 0 , u) = δ(q 0 , v), there exists a word w ∈ u −1 L p q ∩ A ≤h p such that τ (δ(q 0 , uw)) = τ (δ(q 0 , vw)).Then in the tree T (L p q ) decorated by x, each factor in F ∞ h can be extended to at most one factor in F ∞ h+1,wj,0 for all 0 ≤ j ≤ q − 1.
Proof: Consider a factor of height h occurring infinitely often, i.e., there is a sequence (u i ) i≥1 of words in From Lemma 29, all values val p q (u i ) are congruent to r modulo q h for some 0 ≤ r < q h .Thus the values of val p q (u i ) modulo q h+1 that appear infinitely often take at most q values (among r, r + q h , . . ., r + (q − 1)q h ).
Consequently, no two distinct factors of height h + 1 occurring infinitely often and having the same domain can have the same prefix of height h.Therefore, each factor U of height h occurring infinitely often gives rise to at most one factor U ′ of height h + 1 in every F ∞ h+1,wj,0 for 0 ≤ j ≤ q − 1 (U and the first letter w j,0 uniquely determine the domain of U ′ ).
Remark 37.In the case of a k-automatic sequence, the assumption of the above theorem is always satisfied.We may apply the usual minimization algorithm about indistinguishable states to the DFAO producing the sequence: two states r, r ′ are distinguishable if there exists a word u such that τ (δ(r, u)) = τ (δ(r ′ , u)).The pairs {r, r ′ } such that τ (r) = τ (r ′ ) are distinguishable (by the empty word).Then proceed recursively: if a not yet distinguished pair {r, r ′ } is such that δ(r, a) = s and δ(r ′ , a) = s ′ for some letter a and an already distinguished pair {s, s ′ }, then {r, r ′ } is distinguished.The process stops when no new pair is distinguished and we can merge states that belong to indistinguished pairs.In the resulting DFAO, any two states are distinguished by a word whose length is bounded by the number of states of the DFAO.We can thus apply the above theorem.Notice that for a k-automatic sequence, there is no restriction on the word distinguishing states since it belongs to A * k .The extra requirement that w ∈ u −1 L p q ∩ A ≤h p = v −1 L p q ∩ A ≤h p is therefore important in the case of rational bases and is not present for base-k numeration systems.
Remark 38.For a rational base numeration system, the assumption of the above theorem is always satisfied if the output function τ is the identity; otherwise stated, if the output function maps distinct states to distinct values.This is for instance the case of our toy example t.
When the output function is not injective, the situation could be more intricate as shown in the next two examples.
Example 39.Let us now consider a DFAO with a non-injective output function.Take the cyclic DFAO in Figure 12.We show that its output function meets the condition in Theorem 36.Observe that the states q 0 and q 1 have the same output.Set h = 2. Let u, v ∈ L 3  2 be two words such that val 3 2 (u) ≡ val 3  2 (v) mod 2 2 .So they can be extended by exactly the same words of length at most 2 to get base-3 2 representations.Assume that q 0 .u= q i and q 0 .v= q j with i, j ∈ {0, 1, 2} and i = j.If {i, j} = {0, 1} or {1, 2}, there exists a word w of length 1 such that uw, vw ∈ L 3 2 and the outputs given by q 0 .uwand q 0 .vware distinct.If {i, j} = {0, 2}, there exists a word w of length 2 such that uw, vw ∈ L 3  2 and the outputs given by q 0 .uwand q 0 .vware distinct.
Example 40.The condition in Theorem 36 might be harder to test than in the previous example.For instance, take the DFAO depicted in Figure 13 reading base- 3  2 representations.The condition in Theorem 36 is not met for h = 4.For instance the words u = 212001220110220 and v = 212022000012021 are such that q 0 .u= q 1 , q 0 .v= q 0 and u 3 such that the outputs of q 0 .uwand q 0 .vware distinct.In particular, T [u, 4] = T [v, 4].Furthermore, the reader can observe that the conclusion of Theorem 36 does not hold, the latter tree has two extensions as T [u, 5] = T [v, 5] because the states q 0 .u1 4 0 = q 1 .0= q 3 and q 0 .v1 4 0 = q 0 .0= q 2 have distinct outputs (observe u Fig. 13: A DFAO with two distinct outputs but four states. We can generalize the above example with the suffix 1 4 .Let h ≥ 1 and consider the word 1 h .From Lemma 30, it occurs as a suffix of words in L 3 2 .One may thus find words similar to u and v in the above computations.Actually, val 3 2 (u) = 591 and val 3 2 (v) = 623 are both congruent to 15 = 2 4 − 1 modulo 2 4 (so, they can be followed by the suffix 1 4 ), and val 3 2 (u1 4 ) and val 3 2 (v1 4 ) are both even (so, they can be followed by either 0 or 2).To have a situation similar to the one with u and v above, we have to look for numbers n which are congruent to 2 h − 1 modulo 2 h and such that is an even integer.Numbers of the form n = (2j + 1)2 h − 1 are convenient.To conclude with this example, a way to show that the DFAO in Figure 13 does not fulfill the condition, is to prove that, for all h, there are two integers (2j + 1)2 h − 1 and (2j ′ + 1)2 h − 1 whose representations lead to respectively to q 0 and q 1 .
Theorem 41.Let x be a sequence over a finite alphabet B, and let the tree T (L p q ) be decorated by x.If there exists some h ≥ 0 such that each factor in F ∞ h can be extended to at most one factor in F ∞ h+1,wj,0 for all 0 ≤ j ≤ q − 1, then x is p q -automatic.
Proof: For the sake of readability, write T = T (L p q ).The length-h factors of T occurring only a finite number of times appear in a prefix of the tree.Let t ≥ 0 be the least integer such that all nodes on any level ℓ ≥ t are roots of a factor in F ∞ h .We first define a NFA T in the following way.An illustration that we hope to be helpful is given below in Example 42.It is made (nodes and edges) of the prefix T [ε, t + h − 1] of height t + h − 1 and a copy of every element in F ∞ h .So the set of states is the union of the nodes of the prefix T [ε, t + h − 1] and the nodes in the trees of F ∞ h .Final states are all the nodes of the prefix T [ε, t + h − 1] and the nodes on level exactly h in every element of F ∞ h , i.e., the leaves of every element of F ∞ h .The unique initial state is the root of the prefix T [ε, t + h − 1].We define the following extra transitions between these elements.
• If a node m on level t − 1 in the prefix T [ε, t + h − 1] has a child n reached through an arc with label d, then in the NFA we add an extra transition with the same label d from m to the root of the element of F ∞ h equal to T [n, h].This is well defined because n on level t. • Let r be the root of an element T [r, h] of F ∞ h .Suppose that r has a child s reached through an arc with label d.By assumption the element T [r, h] in F ∞ h can be extended in at most one way to an element U c in F ∞ h+1,c for each c ∈ {w 0,0 , . . ., w q−1,0 }.The tree U c with root r has a subtree of height h with root rd = s denoted by V c,d ∈ F ∞ h (as depicted in Figure 14; if the extension with c exists, V c,d is unique).In the NFA, we add extra transitions with label d from r to the root of V c,d (there are at most q such trees).We will make use of the following unambiguity property of T .Every word u ∈ L p q is accepted by T and there is exactly one successful run for u in T .If the length of u ∈ L p q is less than t + h, there is one successful run and it remains in the prefix T [ε, t + h − 1].If a run uses a transition between a node on level t − 1 in the prefix T [ε, t + h − 1] and the root of an element in F ∞ h , then the word has to be of length at least t + h to reach a final state by construction.Now consider a word u ∈ L p q of length t + h + j with j ≥ 0 and write Reading the prefix u 0 • • • u t−1 leads to the root of an element U in F ∞ h .Assume that this element can be extended in (at least) two ways to a tree of height h+1.This means that in T , we have two transitions from the root of U with label u t : one going to the root of some V 1 ∈ F ∞ h,c1 and one going to the root of some This is a consequence of Corollary 31: the difference between dom(V 1 ) and dom(V 2 ) appears precisely on level h where the labeling is periodically (w e , w e+1 , . . ., w q−1 , w 0 , . . ., w e−1 ) and (w f , w f +1 , . . ., . . ., w q−1 , w 0 , . . ., w f −1 ) respectively where w e (respectively w f ) starts with c 1 (respectively c 2 ) and the two q-tuples of words are a cycle shift of the signature (w 0 , . . ., w q−1 ) of T .So if we non-deterministically make the wrong choice of transition while reading u t , we will not be able to process the letter u t+h .The choice of a transition determines the words of length h that can be read from that point on.The same reasoning occurs for the decision taken at step t + j and the letter processed at step t + h + j.
We still have to turn T into a DFAO producing x ∈ B N .To do so, we determinize T with the classical subset construction.Thanks to the unambiguity property of T , if a subset of states obtained during the construction contains final states of T , then they are all decorated by the same letter b ∈ B. The output of this state is thus set to b.If a subset of states obtained during the construction contains no final state, then its output is irrelevant (it can be set to any value).
Example 42.Consider the rational base 3  2 .Our aim is to illustrate the above theorem: we have information about factors of a decorated tree T (L 3 2 ) -those occurring infinitely often and those occurring only a finite number of times -and we want to build the corresponding 3  2 -automatic sequence.Assume that t = h = 1 and that factors of length 1 can be extended as in Figure 9.We assume that the last eight trees of height 2 occur infinitely often.Hence their four prefixes of height 1 have exactly two extensions.We assume that the prefix given by the first tree in Figure 9 occurs only once.
From this, we build the NFA T depicted in Figure 15.The prefix tree of height t+ h− 1 = 1 is depicted on the left and its root is the initial state.The single word 2 of length 1 is accepted by a run staying in this tree.Then, are represented the four trees of F ∞ 1 .Their respective leaves are final states.Finally, we have to inspect Figure 9 to determine the transitions connecting roots of these trees.For instance, let us focus on state 7 in Figure 15.On Figure 9, the corresponding tree can be extended in two ways: the second and the fourth trees on the first row.In the first of these trees, the tree hanging to the child 0 (respectively 2) of the root corresponds to state 5 (respectively 7).Hence, there is a transition of label 0 (respectively 2) from 7 to 5 (respectively 7) in Figure 15.Similarly, the second tree gives the extra transitions of label 0 from 7 to 7 and of label 2 from 7 to 5. but from state 5 there is no transition with label 0. The successful runs of the first few words in L 3 2 are given below: ε q 0 2 q 0 → q 1 21 q 0 → 0 → 1 210 q 0 → 0 → 7 → 8 212 q 0 → 0 → 7 → 9 2101 q 0 → 0 → 7 → 5 → 6 2120 q 0 → 0 → 7 → 7 → 8 2122 q 0 → 0 → 7 → 7 → 9 21011 q 0 → 0 → 7 → 5 → 0 → 1 21200 q 0 → 0 → 7 → 7 → 7 → 8 21202 q 0 → 0 → 7 → 7 → 7 → 9 21221 q 0 → 0 → 7 → 7 → 5 → 6 We may now determinize this NFA T .We apply the classical subset construction to get a DFAO.If a subset of states contains a final state of T from {1, 8, 9} (respectively {q 0 , q 1 , 3, 4, 6}), the corresponding decoration being 1 (respectively 0), the output for this state is 1 (respectively 0).Indeed, as explained in the proof, a subset of states of T obtained during the determinization algorithm cannot contain states with two distinct decorations.After determinization, we obtain the (minimal) DFAO depicted in Figure 16.In the latter figure, we have not set any output for state 2 because it corresponds to a subset of states in T which does not contain any final state.Otherwise stated, that particular output is irrelevant as no valid representation will end up in that state.

Recognizable sets and stability properties
In this short section, our aim is to present some direct closure properties of automatic sequences in ANS built on tree languages.These statements should not surprise the reader used to constructions of automata and automatic sequences.

Fig. 1 :
Fig. 1: The first few levels of the tree associated with a * b * .

Fig. 2 :
Fig.2: The first levels of the i-tree associated with L 3 2 .

Fig. 9 :Fig. 10 :
Fig. 9: The 9 factors of height 2 in T (L32 ) decorated by t.The first one is the prefix occurring only once.

1 (
The leftmost leaf of each tree is reached by reading a word ending with 1, so all trees belong to F ∞ 3,assuming they appear infinitely often).The leftmost leaf of each tree is reached by reading a word ending with 0, so all trees belong to F ∞ 3,0 (assuming they appear infinitely often).

Fig. 14 :
Fig. 14: Extension of a tree in F ∞ h .