Automaticity of primitive words and irreducible polynomials

If L is a language, the automaticity function A L ( n ) (resp. N L ( n ) ) of L counts the number of states of a smallest deterministic (resp. non-deterministic) ﬁnite automaton that accepts a language that agrees with L on all inputs of length at most n . We provide bounds for the automaticity of the language of primitive words and the language of unbordered words over a k -letter alphabet. We also give a bound for the automaticity of the language of base-b representations of the irreducible polynomials over a ﬁnite ﬁeld. This latter result is analogous to a result of Shallit concerning the base-k representations of the set of prime numbers.


Introduction
Automaticity is a measure of how close a non-regular language is to being regular.We can approximate a non-regular language L by considering a regular language L such that the words of length at most n in L are exactly the words of length at most n in L .The automaticity of L is the number of states of a smallest deterministic finite automaton accepting some approximation L .Non-deterministic automaticity can be defined similarly.Automaticity was first introduced by Trakhtenbrot [18] and later by Karp [7].Shallit and Breitbart [17] wrote a survey of the basic results concerning automaticity known at the time.
In the first part of this article we give bounds for the non-deterministic automaticity of the language of primitive words and the language of unbordered words.A word is primitive if it is not a power of a smaller word.A word is unbordered if it has no non-trivial period.The language of primitive words has been well-studied (see the survey by Lischke [9], for example).It is not difficult to show that the language of primitive words is not regular, but it is a long-standing open problem to show that this language is not context-free.It is also not difficult to show that the language of unbordered words is not regular.For a proof that this language is not context-free see [12].
In the second part we give a bound on the automaticity of the set of irreducible polynomials over a finite field.The set of base-k representations of the prime numbers is not a regular language for any base k.Shallit [16] gave a lower bound on the automaticity of the set of prime numbers in any base.We consider the same problem in the setting of polynomials over a finite field.Given a fixed non-constant polynomial b, one can also define the base-b representation for such polynomials (see for example [14]).Rigo and Waxweiler [15] proved that the set of base-b representations of the irreducible polynomials is again a non-regular language for any base b.We obtain our bound for the automaticity using arguments similar to those of [16].
There is an interesting connection between primitive words and irreducible polynomials over a finite field.The number of primitive words of length n over an alphabet of size q is where µ is the Möbius function (see [10,Section 1.3]).Similarly, the number of monic irreducible polynomials of degree n over the finite field with q elements is This is equal to the number of equivalence classes of primitive words of length n under the conjugacy relation x ∼ y if x is a cyclic shift of y.For an explicit bijection between the set of irreducible polynomials and the set of primitive necklaces, see [13, Section 7.6.2].

Definitions
Let L ⊆ Σ * .A language L is an n-th order approximation to L if We define the automaticity A L (n) of a language L to be the number of states of a smallest DFA accepting some n-th order approximation to L. Similarly, the nondeterministic automaticity N L (n) of a language L is the number of states of a smallest NFA accepting some n-th order approximation to L. Let x, y ∈ Σ * .We say that x and y are n-similar for L if for all z ∈ Σ * with |xz|, |yz| ≤ n, we have xz ∈ L if and only if yz ∈ L. If x and y are not n-similar, then they are n-dissimilar for L.
Theorem 1 ( [8]) Let L ⊆ Σ * .For all n ≥ 0, A L (n) is the maximum possible cardinality of a set of pairwise n-dissimilar words for L.
, since {ε, 0, 00, . . ., 0 m } is a set of pairwise 2m-dissimilar words for L. To see this, consider 0 j and 0 k for 0 ≤ j < k ≤ n.Then Let U be a finite set of words.We say that U is a set of uniformly n-dissimilar words for L if for each x ∈ U there exists z such that

Automaticity of primitive and unbordered words
Let k ≥ 2 be an integer.A word y is a k-power if y can be written as y = x k for some non-empty word x.If y cannot be so written for any k ≥ 2, then y is primitive.
Bordered words are generalizations of powers.We say a word x is bordered if there exist words u, v, w ∈ Σ + such that x = uv = wu.In this case, the word u is said to be a border for x.Otherwise, x is unbordered.
Let w = w 0 • • • w −1 and let p < .The word w has a period p if w i = w i+p for all 0 ≤ i ≤ − p − 1.Note that a word is unbordered if it has no period.
We recall the notation O(•) and Ω(•).Let f and g be functions from N to R. The function f is O(g) if there exist C > 0 and n 0 such that for all n > n 0 we have Theorem 4 Let ε > 0 be a real number.The nondeterministic automaticity of the set Q k of primitive words over the alphabet where β k is a constant that depends only on k.
Proof: For n ≥ 0, we will define a set of uniformly n-dissimilar words as follows.Let To show that the words in D n are uniformly n-dissimilar, let x, y ∈ D n , x = y.Observe that xx is not primitive, but yx is, for if yx were not primitive, then yx = z for some word z and some ≥ 3.In this case |yx|/ is a period of y, and hence y is bordered, which contradicts the definition of D n .
Guibas and Odlyzko [4, Theorem 7.2] gave the following formula for the size of D n (see also [11]): there exists a constant β k such that By Theorem 3 we have Note that Guibas and Odlyzko gave an explicit formula for the β k , which permits one to calculate β k to any desired degree of accuracy.For example, if k = 2, we have β 2 = 0.26771654 • • • .
Next we give an upper bound on the nondeterministic automaticity of Q k .
Theorem 5 For n ≥ 0, Proof: For each n ≥ 0 we construct a deterministic automaton that accepts all words of length at most n in the complement of Q k .The automaton is constructed as follows.First consider the language of square words (2-powers) of length i.We can construct an automaton accepting this language by first constructing the complete k-ary tree with k i/2 leaves so that each path from the root to a leaf is labeled by a different word of length i/2.We then make a copy of this tree, but reflected, so that the arrows are directed away from the leaves towards the root of the tree.The leaves of the first tree are identified with the leaves of the second tree.This construction is illustrated in Figure 1, which shows the automaton accepting all binary squares of length 6.In the figure, dotted lines connect states to be identified, and transitions not shown go to a sink state.
The left tree has states, so the automaton for the squares of length i has at most For each d > 2, to accept d-powers of length i we simply construct a tree with k i/d leaves so that each path from the root to a leaf is labeled by a different d-power of length i.This tree has at most ik i/d states.
To create the automaton accepting all non-primitive words of length i, we can combine all of these automata, sharing edges and transitions whenever possible.The resulting automaton has at most states.We can therefore can construct an automaton accepting all non-primitive words of length at most n using at most states.Since this automaton is deterministic, the automaton accepting Q k has at most this many states as well. 2 Next we consider the language of unbordered words.
Theorem 6 Let ε > 0. The nondeterministic automaticity of the set UB k of unbordered words over the alphabet Proof: For each ε > 0 there exists j such that the number of words of length m over a k-letter alphabet that avoid 1 j is Ω((k − ε) m ) (see for example the analysis given in the section "Longest runs" starting on p. 308 of [2]).Fix such a j.For n ≥ 2(j + 2) we define ) and w does not contain 1 j }.
To show that the words in D n are uniformly n-dissimilar, let x, y ∈ D n , x = y.Since x, y ∈ D n , there exist w 1 and w 2 such that x = 0w 1 01 j and y = 0w 2 01 j .
Clearly xx is bordered; however, xy is not bordered.Suppose to the contrary that xy has a border b.Since b is a non-empty prefix of xy, it must begin with 0; since it is also a suffix, it must end with 1 j .However xy contains only one occurrence of 1 j apart from the occurrence at the end.It follows that b = x, and since b is also a suffix of xy and |x| = |y|, we must also have b = y.Thus x = y, which is a contradiction. Since ), we have the result by Theorem 3. 2

Irreducible polynomials
In this section we consider the automaticity of the language of representations of irreducible polynomials over a finite field with respect to some base b.
Let F be a field with q elements.Let F[X] be the polynomial ring over F. If f ∈ F[X] we denote its degree by deg f .Let B be an integer and let F[X] <B denote the set of polynomials over F of degree strictly less than B. If b is a fixed non-constant polynomial, then any polynomial f can be written uniquely as where each c i has degree less than deg b.
We define a function Ψ : F[X] <B → F B by Ψ(f ) := (0, . . ., 0 ) over the alphabet F B is the brepresentation of f .By convention, the representation of the zero polynomial is ε.Given a b-representation w ∈ (F B ) * , we denote its value in F[X] by w b .Note that we have chosen to write f starting with the least significant "digit" and ending with the most significant "digit".
is regular.Rigo and Waxweiler [15] proved that for any base b, the set of irreducible polynomials over F is not b-recognizable.Let T ⊂ F[X] and let b be a non-constant polynomial.The b-automaticity of T is denoted by A b T (n) and is defined as the automaticity Theorem 7 There exists a constant B such that the set S of monic irreducible polynomials over F has b-automaticity A b S (n) ≥ q Bn /Bn + O(q Bn/2 /Bn).
The main tool for the proof of this theorem is the following result of Hsu [6,Corollary 3.4].
Theorem 8 Let a and m be polynomials over F such that (a, m) = 1.Let #S N (a, m) denote the number of monic irreducible polynomials of degree N congruent to a modulo m and let M = deg m.If The proof of the following lemma is similar to that of [16,Lemma 6], which is in turn based on an idea found in [5] and [1].
Then there exists a constant C q and a polynomial h such that hd + f is irreducible and hd + g is not irreducible, where deg h ≤ C q deg d.We may thus take C q = 2C(C + 1) to complete the proof.Hence |xz| = n/(1 + C q ) + C q |x| = n/(1 + C q ) + C q (n/(1 + C q )) = n.
We now estimate the size of D n .Let B = deg b/(1+C q ).Note that there are q Bn /Bn+O(q Bn/2 /Bn) monic irreducible polynomials in F [X] of degree Bn.Since deg b is a constant, there are at most a constant number of polynomials f that divide b.Hence |D n | = q Bn /Bn + O(q Bn/2 /Bn). 2

•
|xz| ≤ n and xz ∈ L; and • for each y ∈ U such that x = y, we have |yz| ≤ n and yz / ∈ L. Theorem 3 ([3]) Let L ⊆ Σ * and let U be a set of uniformly n-dissimilar words for L. Then N L (n) ≥ |U |.

2 Proof of Theorem 7 :
To prove Theorem 7 we will contruct a set D n of n-dissimilar words for [S] b .Let C q be as in Lemma 9. LetD n = {[f ] b : f ∈ S, (f, b) = 1, deg f = (n deg b)/(1 + C q )}.Note that all words in D n have the same length.Consider two elements x, y ∈ D n .Let f = x b , g = y b .By Lemma 9, there exists h such that hb |x| + f is irreducible and hb |x| + g is not, wheredeg h = C q deg b |x| .Let z = [h] b .Then xz ∈ [S]b and yz / ∈ [S] b .Since deg h = C q deg b |x| , we have |z| deg b = C q |x| deg b, and so |z| = C q |x|.Since deg f = (n deg b)/(1+C q ), we have |x| = n/(1+C q ).