Noneffective Regularity of Equality Languages and Bounded Delay Morphisms

We give an instance of a class of morphisms for which it is easy to prove that their equality set is regular, but its emptiness is still undecidable. The class is that of bounded delay 2 morphisms.


Introduction
Regular (or rational) languages constitute a fundamental class of formal languages, which is typically viewed as a simple, or even the simplest class of languages.For example, decision problems on regular languages are -with a few exceptions -algorithmically decidable.
However, there also exists another viewpoint.There are some questions, which lead to regular languages, but the mechanism is so complicated that the regularity holds noneffectively (the meaning of this assertion will be explained below).A splendid example is a result of Haines (or, in fact, that of Higman) stating that the upper closure of any language under the relation of being a sparse subword is regular, see e.g [Hai69].Another example of this nature is the regularity of equality languages of bounded delay morphisms, which is the topic of this paper.Our goal is to give a simple -a textbook type -proof of this fundamental result.
Equality languages are sets of solutions of instances of the Post Correspondence Problem (PCP for short).Hence their emptiness is undecidable as was proved by E. Post already in 1946, see [Pos46].Later it was revealed that their generating power is very high: each recursively enumerable language is obtained as a morphic image of an intersection of an equality language and a regular language, see e.g.[Cul79] or [Kar09] for a recent survey.On the other hand, certain equality languages, namely those with elementary morphisms, played a crucial role in a nice solution of the celebrated D0L sequence equivalence problem, see [ER78].For these morphisms the equality language is regular.A further analysis revealed that the regularity was, in fact, a consequence of so-called bounded delay property, see [CK85] and [HK97].
Interesting problems arose: Is the equality language of bounded delay morphisms effectively regular?Is the PCP decidable for bounded delay morphisms?These problems were answered negatively by K.
Ruohonen in [Ruo85].In fact, he proved his result in a stronger form of biprefix morphisms.A drawback of his proof is that it is quite complicated.It relies, as an earlier proof showing the undecidability of PCP for injective morphisms, see [Lec63], on reversible Turing machines.However, an equality set of injective morphisms need not be regular, see [KS79].
Our goal here is to give an instance of a class of morphisms for which the PCP is undecidable, and the equality languages are regular, and moreover the proofs of these results are short.Our family is that of morphisms with bounded delay 2. Our proofs use the ideas of [Ruo85], but allow simpler constructions, and consequently shorter proofs.
We will briefly outline the structure and constructions in this paper.We use the halting problem of Turing machines on empty input as our undecidable problem.We want to simulate their computations by the equality mechanism of two cooperating morphisms.It is well known that this can be done.However, the morphisms need not possess a bounded delay property.A key idea is here that instead of using normal configurations of Turing machines we use so-called extended configurations which contain some extra local information.This makes it possible to do the above simulation by bounded delay morphisms.The whole approach does not require the use of so-called reversible Turing machines, as was crucial in [Ruo85], although some ingredients of reversible Turing machines motivate the definition of extended configurations.As a consequence we obtain a direct and relatively simple proof for the undecidability of PCP for bounded delay morphisms.

Notions
We assume that the reader is familiar with the basics of automata theory and combinatorics on words, see e.g.[HU79], [Lot02] and [CK97].For more on morphisms of free monoids we refer to [HK97].We only recall here that the notation u ≤ v is used to specify that word u is a prefix of word v, and that words u and v are comparable if u ≤ v or v ≤ u.
The equality language of two morphisms f, g : Σ * → Γ * is the set The PCP asks whether this language is empty or not.The modified PCP asks whether the language E(f, g) ∩ aΣ * is empty for a given letter a ∈ Σ.
A morphism h : Σ * → Γ * is of bounded delay n, if there are no letters a, b ∈ Σ and words u, v ∈ Σ n , for which a = b and h(au) ≤ h(bv).Further it is of bounded delay if it is so for some n.
An instantaneous description (ID) of a Turing machine (TM) at some point of its computation is u(q, a)v, where q is the current state, a is the symbol under the head, and u and v are the parts to the left and to the right of it.So every computation has a sequence of associated IDs, each of which describes the configuration of the machine at one step of the computation.
The article [Ruo85] contains two constructions of TMs: First, for an arbitrary TM M an equivalent reversible machine M is constructed.Second, an equivalent so-called synchronized modulo 2 machine M is constructed for M .We do not need these constructions.First reversible TMs were constructed already in 1970s, see [Ben73], but their importance has increased after the invention of quantum computing, see [Hir03].

Extended Instantaneous Descriptions
Our goal is to prove the undecidability of the PCP for bounded delay morphisms.We will do this by reducing it to the halting problem of TMs on empty input.More precisely, for an arbitrary TM we will construct two bounded delay morphisms such that their equality set contains a word starting with a specific letter if and only if the machine accepts the empty word.
In a standard proof of the undecidability of the PCP a computation of a TM is simulated with two morphisms, which operate on IDs (see e.g.[Kar09]).We wish to do this simulation with bounded delay morphisms.One of the problems in this approach is that a bounded delay morphism is necessarily injective on letters, but the transition function of a TM can be non-injective.This is why we will in this section define extended instantaneous descriptions.In the next section we will then define the bounded delay morphisms, which will operate on these extended configurations, and prove the results.
Let M = (Q, Σ, Γ, δ, q 0 , * , {h}) be a deterministic TM, where Q is the set of states, Σ is the input alphabet, Γ is the tape alphabet, δ is the (partial) transition function, q 0 is the initial state, * is the blank symbol and h is the halting state.For convenience, we assume that M never enters q 0 , always halts in h and never writes * .
, where S is a new symbol.We write the second component of an element of Θ below the first component.For example, if q ∈ Q and a ∈ Γ, then We define extended instantaneous descriptions (EID) of the computation of M with empty input.Every EID will be an element of (Q × Γ) * #Γ * Θ∆ * $, where # and $ are new symbols.The initial EID is (3) The next rows define the rules β i → β i+1 with which we can derive the next EID β i+1 from an EID β i . Here So if β i → β i+1 for i = 0, 1, 2, . . ., then β 0 , β 1 , β 2 , . . .are the EIDs.This sequence of EIDs can be finite or infinite.It can be easily seen that for every EID β i there is at most one EID β i+1 such that β i → β i+1 , so the sequence is unique.
The idea of EIDs is that the part to the left of # contains the state-symbol-pairs of the computation up to some point, and the part between # and $ becomes an ID, if the element of Θ is replaced with its first component.It is important that two consecutive EIDs differ only in one place.This is why the element of Θ moves back and forth and carries information in its second component.Lemma 3.3 and its proof explain how EIDs work.
Example 3.2 Let M be the TM of Example 3.1.The EIDs are The next lemma shows the relation between IDs and EIDs.
Lemma 3.3 Let α 0 , α 1 , α 2 , . . .be the IDs and β 0 , β 1 , β 2 , . . . the EIDs of the computation of M with empty input.Let α i = u i x i v i , where u i , v i ∈ Γ * and x i ∈ Q × Γ.Now there are 0 = j 0 < j 1 < j 2 < . . .such that Proof: Clearly (12) holds for i = 0. We assume that it holds for i and prove that where → + is the transitive closure of →. (Naturally we assume here that α i is not the last ID, that is δ(x i ) and thus α i+1 are defined.)By using one of the rules (5)-( 8), we simulate one step of the computation and end up with the EID By using rule (9) sufficiently many times and then rule (10), we get the EID where yv = α i+1 .By using rule (4) sufficiently many times we get the required EID β ji+1 .

Morphisms
Like in the previous section, we will consider a TM M = (Q, Σ, Γ, δ, q 0 , * , {h}), and we will use the notation, We will define two bounded delay morphisms g 1 , f 1 : A * → B * , which will simulate the computation of M with empty input.The alphabet B is the same as that used by EIDs, that is B = ∆ 1 ∪ Θ.For every u ∈ B * we define a letter [u].The alphabet A will consist of a finite number of these letters.The morphism g 1 is defined so that g 1 ([u]) = u.The first column of Table 1 gives the values of g 1 on the letters of A, and thus it also implicitly defines the alphabet A. The second column gives the values of f 1 on these letters.In the table There is a correspondence between the rows of the table and the rules (4)-(10) in the definition of EIDs.
Tab. 1: Morphisms g1 and f1 The next lemma shows that g 1 and f 1 simulate M in the sense that if g 1 "reads" an EID, then f 1 "writes" the next EID.
Lemma 4.1 Let β j , β j+1 be two consecutive EIDs.There is a word u ∈ A * such that g 1 (u) = β j and The morphisms f 1 and g 1 are almost sufficient for simulating M , but we must extend them to handle initialization and termination.Initialization is easy, because there is only one initial EID, but termination is hard, because there are many potential final EIDs.This problem is handled by running the simulation backwards, which requires more complicated notation.The reversibility of a computation, which is enabled by the use of EIDs instead of IDs, is implicitly present here, as well as later in the proof of Lemma 4.4.
For some elements x ∈ Θ we need a separate copy of x.This will be denoted by x.For some other elements x we want x and x to be equal.So we define a mapping x → x as follows: If x = (h, a)

S
, where a ∈ Γ, let x = x.For every other x ∈ Θ, let x be a copy of x.Further, for every x ∈ ∆ 1 , let x = x.Finally, let B = {x : x ∈ B} and Θ = {x : x ∈ Θ}.We extend the mapping x → x to a morphism B * → (B ∪ B) * .Note that We define two bounded delay morphisms g, f : Here I and T are new symbols.If t ∈ A ∩ A, then g 1 (t) = f 1 (t) and f 1 (t) = g 1 (t), so f and g are well defined.
Lemma 4.2 If M does not accept the empty word, then the set {w ∈ E(f, g) : I ≤ w} is empty, and if M does accept the empty word, then the set contains a unique minimal word (minimal with respect to the prefix ordering).
Proof: Let the EIDs of M be β 0 , β 1 , β 2 , . . .If the computation of M does not halt, this will go on forever, and f (Iw) and g(Iw) can never match.If the computation stops in a nonhalting state, then there is no way to continue.So if M does not accept the empty word, then f (Iw) = g(Iw) for all w.
Proof: We will prove this for g; the case of f is similar.The bounded delay condition of g can be violated in two ways: for two different letters r, s, either g(r) = g(s) (which always violates it), or g(r) < g(s) (which may or may not violate it).We will prove that the first case will not happen, and that if g(r) < g(s), then there are no letters p, q such that g(rpq) and g(s) are comparable.This means that g is of bounded delay 2. Assume that g(r) = g(s).
From this it follows that if r, s ∈ A, then r = s, and the same holds if r, s ∈ A. If r ∈ A A and s ∈ A A, then g(r) = g(s), because g(s) contains an element of Θ and g(r) does not.Finally, if r = I, then s = I, and if r = T , then s = T .It follows that g is injective on letters.
Assume that g(r) < g(s).Now g(s) contains X ∈ Θ ∪ Θ, but g(r) does not contain it.If g(rpq) and g(s) are comparable, then g(p) or g(q) contains this X, but on a different position.This is impossible, because if X is a factor of g(s) and g(t), where t ∈ {p, q}, then its position in g(s) and g(t) is the same.This can be verified from Table 1. 2 Theorem 4.5 For any TM M we can construct two bounded delay 2 morphisms f, g and specify a letter I so that if M does not accept the empty word, then the set {w ∈ E(f, g) : I ≤ w} is empty, and if M does accept the empty word, then the set contains a unique minimal word.
Proof: Follows from Lemma 4.2 and from Lemma 4.4. 2 The undecidability result now follows with standard techniques.
Theorem 4.6 The PCP for bounded delay 2 morphisms is undecidable.
Proof: For any TM M we can construct the morphisms f and g of Theorem 4.5.If the modified PCP for bounded delay morphisms would be decidable, then it could be decided whether M accepts the empty word, but this is not possible.The undecidability of the ordinary PCP for bounded delay morphisms follows easily from the modified version (see also [HK97]).For arbitrary bounded delay 2 morphisms f, g : Σ * 1 → Σ * 2 and letter I ∈ Σ 1 we will construct bounded delay 2 morphisms f , g : (Σ 1 ∪ {i, t}) * → (Σ 2 ∪ {i, t, X}) * such that {w ∈ E(f, g) : I ≤ w} is empty if and only if E(f , g ) is empty (here i, t, X are new symbols).
Define morphisms l, r by l(a) = Xa and r(a 2

Regularity of the Equality Set
To conclude our presentation we recall that the equality sets of bounded delay morphisms are regular.For the sake of completeness we present a short proof of this.More details can be found in [HK97].Actually, even a stronger result is proved in [CK85].
Theorem 5.1 Let f, g : Σ → Γ be morphisms of bounded delay.The language E(f, g) is regular.
Proof: Because f is of bounded delay, there is a number N such that if v ∈ Γ N , then for at most one a ∈ Σ there is a w ∈ Σ * such that v ≤ f (a)f (w).The same holds for g.We assume that N is a suitable bound for both morphisms f and g.We construct a deterministic automaton accepting E(f, g) ∪ {1}.Let Q = (Γ * × {1}) ∪ ({1} × Γ * ) be the set of states and let where (u, v) ∈ Q and a ∈ Σ, be the (partial) transition function.So δ((u, v), a) is well defined, if uf (a) and vg(a) are comparable, and undefined otherwise.
Let Q 1 = {(u, v) ∈ Q : |uv| < N } and let Q 2 be the set of those states in Q Q 1 to which there is a transition from Q 1 .The sets Q 1 and Q 2 are finite.If (u, v) ∈ Q Q 1 , a ∈ Σ and δ((u, v), a) ∈ Q , then δ((u, v), aw) = (1, 1) for some w ∈ Σ * .This means that uf (a)f (w) = vg(a)g(w).Thus either u = 1, |v| ≥ N and v ≤ f (a)f (w), or v = 1, |u| ≥ N and u ≤ g(a)g(w).In both cases a is uniquely determined because of the bounded delay property, so from each q ∈ Q Q 1 there is a transition to only one state of Q .Thus for each q ∈ Q 2 there is only one chain of the form q → q 1 → • • • → q n ∈ Q 1 , where q 1 , . . ., q n−1 / ∈ Q 1 .Let Q 3 be the set of states in these chains.Because Q 2 is finite and each of these chains is finite, also Q 3 is finite.If q ∈ Q Q 1 , there is a chain of transitions from (1, 1) to q, and another from q to (1, 1).The former chain necessarily contains a state of Q 2 .Thus q ∈ Q 3 .This means that To summarize, we have proved: Theorem 5.2 The equality language of bounded delay 2 morphisms is regular, but cannot be found algorithmically, and PCP for these morphisms is undecidable.
We conclude by emphasizing that our Theorem 5.2 is not the strongest possible -K.Ruohonen proved in [Ruo85] a stronger result (for biprefix morphisms instead of bounded delay morphisms).However, our proof and constructions are much simpler, and can be viewed as "textbook proofs" of this important result.