Dissecting power of intersection of two context-free languages

We say that a language $L$ is \emph{constantly growing} if there is a constant $c$ such that for every word $u\in L$ there is a word $v\in L$ with $\vert u\vert<\vert v\vert\leq c+\vert u\vert$. We say that a language $L$ is \emph{geometrically growing} if there is a constant $c$ such that for every word $u\in L$ there is a word $v\in L$ with $\vert u\vert<\vert v\vert\leq c\vert u\vert$. Given two infinite languages $L_1,L_2$, we say that $L_1$ \emph{dissects} $L_2$ if $\vert L_2\setminus L_1\vert=\infty$ and $\vert L_1\cap L_2\vert=\infty$. In 2013, it was shown that for every constantly growing language $L$ there is a regular language $R$ such that $R$ dissects $L$. In the current article we show how to dissect a geometrically growing language by a homomorphic image of intersection of two context-free languages. Consider three alphabets $\Gamma$, $\Sigma$, and $\Theta$ such that $\vert \Sigma\vert=1$ and $\vert \Theta\vert=4$. We prove that there are context-free languages $M_1,M_2\subseteq \Theta^*$, an erasing alphabetical homomorphism $\pi:\Theta^*\rightarrow \Sigma^*$, and a nonerasing alphabetical homomorphism $\varphi : \Gamma^*\rightarrow \Sigma^*$ such that: If $L\subseteq \Gamma^*$ is a geometrically growing language then there is a regular language $R\subseteq \Theta^*$ such that $\varphi^{-1}\left(\pi\left(R\cap M_1\cap M_2\right)\right)$ dissects the language $L$.


Introduction
In the theory of formal languages, the regular and the context-free languages constitute a fundamental concept that attracted a lot of attention in the past several decades.
In contrast to regular languages, the context-free languages are closed neither under intersection nor under complement.The intersection of context-free languages have been systematically studied; see for instance [4,6,9].Let CFL k denote the family of all languages such that for each L ∈ CFL k there are k context-free languages L 1 , L 2 , . . ., L k with L = k i=1 L i .For each k, it has been shown that there is a language L ∈ CFL k+1 such that L ∈ CFL k .Thus the k-intersections of context-free languages form an infinite hierarchy in the family of all formal languages lying between context-free and context sensitive languages [6].
Dissection of infinite languages belongs to the topics of the theory of formal languages that have been studied in recent years.Let L 1 and L 2 be infinite languages.We say that Let C be a family of languages.We say that a language L 2 is C-dissectible if there is L 1 ∈ C such that L 1 dissects L 2 .Let REG denote the family of regular languages.In [10] 2 Josef Rukavicka the REG-dissectibility has been investigated.Several families of REG-dissectible languages have been presented.Moreover, it has been shown that there are infinite languages that cannot be dissected with a regular language.Also some open questions for REG-dissectibility can be found in [10].For example, it is not known if the complement of a context-free language is REG-dissectible.
Given two countable sets A and B, we write A ⊆ ae B if |A − B| < ∞ and we write A = ae B if both A ⊆ ae B and B ⊆ ae A hold.The subscript "ae" stands for "almost everywhere".We say that A covers B with an infinite margin (or A i-covers B, in short) if both B ⊆ A and A = ae B hold.We represent a pair of languages A and B such that A i-covers B by i(A, B).A language C is said to separate i(B, A) with infinite margins (or i-separates i(B, A), in short) if B ⊆ C ⊆ A, A = ae C, and B = ae C. In addition, given two language families A, B we define We say that a language family C i-separates i(B, A) if for every i(B, A) ∈ i(B, A) there is a language C ∈ C such that C i-separates i(B, A).
Given A and B be any two language families, let In [10], a connection between REG-dissectibility and i-separation has been shown: [10,Lemma 5.1]) Let A and B be any two language families and assume that A − B is REG-dissectible.It then holds that, for any A ∈ A and any B ∈ B, if A i-covers B, then there exists a language in E that i-separates i(B, A), where Although Lemma 1.1 is stated explicitly for REG-dissectibility, from the proof in [10] it is clear that any family of languages could be applied.For convenience, in Section 5 we present such generalization with a proof; see Lemma 5.1.This generalized connection between dissectibility and i-separation adds another argument for the study of a dissection of infinite languages.
There is a longstanding open question in [1]: Given two context-free languages L 1 , L 2 such that L 1 ⊂ L 2 and L 2 \ L 1 is an infinite language, is there a context-free language L 3 such that L 3 ⊂ L 2 , L 1 ⊂ L 3 , and both the languages L 3 \ L 1 and L 2 \ L 3 are infinite?This question was mentioned also in [10] using the i-separation: Let CFL denote the family of all context free languages.Does CFL i-separate i(CFL, CFL)?Understanding the dissectibility could help to solve this open question or at least it could help to identify "minimal" language families C such that C i-separates i(CFL, CFL).
Some other results concerning the dissection of infinite languages may be found in [5].Two related topics are the construction of minimal covers of languages, [2], and the immunity of languages, [3,7,10].Recall that a language L 1 is called C-immune if there is no infinite language L 2 ⊆ L 1 such that L 2 ∈ C.
Let N + denote the set of all positive integers.An infinite language L is called constantly growing if there is a constant c such that for every word u ∈ L there is a word v ∈ L with |u| < |v| ≤ c + |u|.In [10], it has been proved that every constantly growing language L is REG-dissectible.
We introduce a "natural" generalization of constantly growing languages as follows.Let R + denote the set of all positive real numbers.We define that a language L is a geometrically growing language if there is a constant c ∈ R + such that for every u ∈ L there exists v ∈ L with |u| < |v| ≤ c|u|.We say also that L is c-geometrically growing.In the current article we show how to dissect geometrically growing language by a homomorphic image of intersection of two context-free languages.
Consider two alphabets Σ and Θ such that |Σ| = 1 and |Θ| = 4; that is Σ and Θ denote alphabets with one letter and four letters, respectively.The main results of the current article are the following theorem and its corollary below.Theorem 1.2There are context-free languages M 1 , M 2 ⊆ Θ * and an erasing alphabetical homomorphism π : Θ * → Σ * such that: If L ⊆ Σ * is a geometrically growing language then there is a regular language R ⊆ Θ * such that π (R ∩ M 1 ∩ M 2 ) dissects the language L.
To emphasize the essential result of our article, we consider in Theorem 1.2 that L is a language over an alphabet with one letter.Let Γ denote a finite alphabet.The next corollary shows a generalization for a geometrically growing language over the alphabet Γ.
dissects the language L.
Proof: Let L 1 ⊆ Γ * be a language and let ϕ : Γ * → Σ * be an alphabetical homomorphism defined as follows: ϕ(a) = z for every a ∈ Γ, where z is the only letter of the alphabet Σ.
Since the intersection of a regular language and a context-free language is a context-free language we have R ∩ M 1 ∩ M 2 is also intersection of two context-free languages.This explains why we do not mention the regular language in the title of the article.
We sketch the basic ideas of our proof.Note that a non-associative word on the letter a is a "well parenthesized" word containing a given number of occurrences of a.It is known that the number of non-associative words containing n + 1 occurrences of a is equal to the n-th Catalan number [8].For example for n = 3 we have five distinct non-associative words: (((aa)a)a), ((aa)(aa)), (a(a(aa))), (a((aa)a)), and ((a(aa))a).Every non-associative word contains the prefix ( k a for some k ∈ N + , where ( k denotes the k-th power of the opening bracket.We show that there are non-associative words such that k equals "approximately" log 2 n.We construct two context-free languages whose intersection accepts such words and we call these words balanced extended non-associative words.By counting the number of opening brackets of a balanced extended non-associative word with n occurrences of a we can compute the logarithm of the number of occurrences of a.If L is a geometrically growing language then the language L = {a j | j = ⌈log 2 |w|⌉ and w ∈ L} is obviously constantly growing.Hence, by means of intersection of two context-free languages we transform the challenge of dissecting a geometrically growing language to the challenge of dissecting a constantly growing language.This approach allows us to prove our result.

Preliminaries
Let ǫ denote the empty word.Given a finite alphabet A, let A + denote the set of all finite nonempty words over the alphabet A and let A * = A + ∪ {ǫ}.
Let Fac(w) denote the set of all factors of the word w ∈ A * .We have ǫ, w ∈ Fac(w).
Let Pref(w), Suf (w) ⊆ Fac(w) denote the set of all prefixes and suffixes of w ∈ A * , respectively.We have ǫ, w ∈ Pref(w) ∩ Suf(w).Let occur(w, t) denote the number of occurrences of the factor t ∈ A + in the word w ∈ A * ; formally

Given two finite alphabets
It follows that in order to define a homomorphism τ , it suffices to define τ (a) for every a ∈ A 1 ; such definition "naturally" extends to every word u ∈ A * 1 .We say that τ is an alphabetical homomorphism if τ (a) ∈ A 2 for every a ∈ A 1 .We say that τ is an erasing alphabetical homomorphism if τ (a) ∈ A 2 ∪ {ǫ} for every a ∈ A 1 and there is at least one a ∈ A 1 such that τ (a) = ǫ.

Balanced non-associative words
Let Θ = {x, y, z, p}.We reserve the symbols x, y, z, p for the letters of the alphabet Θ.It means that wherever in our article we use the symbols x, y, z, p, we refer to the letters of Θ.
Let ENW ⊆ Θ * be the language generated by the following context-free grammar, where S is a start non-terminal symbol, P is a non-terminal symbol, and x, y, z, p ∈ Θ are terminal symbols: • S → xP P y, We call the words from ENW extended non-associative words.
Remark 3.1 Let the letter x represent an opening bracket and the letter y a closing bracket.It is easy to see that if v 1 , v 2 ∈ {pzp, pzzp} ∪ ENW then xv 1 v 2 y ∈ ENW.Note that if w ∈ ENW then w is "well parenthesized" with brackets x and y.Also note that if w ∈ ENW, xvy ∈ Fac(w), occur(v, x) = 0, and occur(v, y) = 0, then v ∈ {pzppzzp, pzppzp, pzzppzzp, pzzppzp}.Remark 3.2 Recall from [8] that a "standard" non-associative word on the letter a, mentioned in the introduction, can be represented as a full binary rooted tree, where every inner node represents a corresponding pair of brackets and every leaf represents the letter a.It is known that the number of inner nodes plus one is equal to the number of leaves in a full binary rooted tree.
Obviously we can also represent the extended non-associative words from ENW as full binary rooted trees, where the factors pzp and pzzp represent the leaves.It follows that if w ∈ ENW then occur(w, x) + 1 = occur(w, pzp) + occur(w, pzzp).
If w is a non-associative word on the symbol a with brackets x, y having n + 1 occurrences of a then we get 2 n+1 extended non-associative words by replacing a with pzp or pzzp; for example if K = {xxa 1 a 2 ya 3 y | a 1 , a 2 , a 3 ∈ {pzp, pzzp}} then |K| = 2 3 = 8 and K ⊆ ENW.Since the number of non-associative words containing n + 1 occurrences of a is equal to the n-th Catalan number C n [8], it is clear that where n ∈ N + .

Dissecting power of intersection of two context-free languages 5
Let BAL ⊆ Θ * be the language generated by the following context-free grammar, where S is a start non-terminal symbol, T, V, Z are non-terminal symbols, and x, y, z, p ∈ Θ are terminal symbols: We call the words from BAL balanced words.The reason for the name "balanced" comes from the following lemma.Proof: The proof is by induction on j = occur(w, p).From the definition of the language BAL, it is clear that if occur(w, p) = 0 then w = y i x i for some i ∈ {0} ∪ N + .Hence we have the base case for j = 0. Suppose j > 0. Then it follows that w = w 1 pw 2 for some w 1 , w 2 ∈ Fac(w).Since occur(w 1 , p) + 1 + occur(w 2 , p) = j, we have occur(w 1 , p), occur(w 2 , p) < j.Hence the lemma holds for both pw 1 p and pw 2 p and in consequence lemma holds also for pwp.This completes the proof.✷ Let Ω = ENW ∩ BAL ⊆ Θ * .We call the words from Ω balanced extended non-associative words.
Remark 3.4 To understand the idea of balanced extended non-associative words, suppose w ∈ Ω and let G be the full binary rooted tree that represents w (as explained in Remark 3.2).Then in G, the length of the path from the root to a leaf does not depend on the leaf; it means the number of inner nodes lying on the path from a leaf to the root is a constant for G.Given a word w ∈ Θ * , let height(w) = max{j | x j ∈ Fac(w)}.We call height(w) the height of w.We show that if w ∈ Ω and h is the height of w then x h is a prefix of w and y h is a suffix of w.Lemma 3.6 If w ∈ Ω and h = height(w) then x h ∈ Pref(w) and y h ∈ Suf(w).
Proof: Since Ω ⊆ ENW, there is h ∈ N + such that x h p ∈ Pref(w).To get a contradiction suppose that h < h.Because Ω ⊆ BAL it follows that w = x h pw 1 py h x h pw 2 for some w 1 ∈ Fac(w), w 2 ∈ Suf(w), and occur(x h pw 1 py h x h , x h ) = 1.
Lemma 3.3 implies that occur(pw 1 p, x) = occur(pw 1 p, y).Let r = x h pw 1 py h .It follows that occur(r, x) < occur(r, y).This is a contradiction, since for every prefix v ∈ Pref(w) of an extended non-associative word w ∈ ENW (a well parenthesized word) we have occur(v, x) ≥ occur(v, y).We conclude that h = h and x h ∈ Pref(w).In an analogous way we can show that y h ∈ Suf(w).This completes the proof.✷ For a word w ∈ Ω, we show the relation between the height of w and the number of occurrences of z in w.Thus the proposition holds for h = 1.Suppose that the proposition holds for all h < h and let h ≥ 2. Since Ω ⊆ ENW, it follows that h ≥ 2 implies that w = xw 1 w 2 y for some w 1 , w 2 ∈ ENW ∪{pzp, pzzp} with {w 1 , w 2 } ∩ ENW = ∅.Without loss of generality suppose that w 1 ∈ ENW.
Because x h1 ∈ Pref(w 1 ) it follows that x h1+1 ∈ Pref(w).Thus h 1 + 1 = h.As we assumed that the proposition holds for all h < h, we can derive that This completes the proof.✷ Remark 3.8 Proposition 3.7 could be also proven using tree graphs as follows: There are exactly 2 h leaves in a complete tree of height h, since there is a bijection (as mentioned in Remark 3.2), there are 2 h occurrences of pzp and pzzp, and hence the number of occurrences of z in w is between 2 h and 2 h+1 .
Proposition 3.7 has the following obvious corollary.
Given w, u, v ∈ Θ + , let replace(w, v, u) denote the word built from w by replacing the first occurrence of v in w by u.Formally, if occur(w, v) = 0 then replace(w, v, u) = w.If occur(w, v) = j > 0 and w = w 1 vw 2 , where occur(vw 2 , v) = j then replace(w, v, u) = w 1 uw 2 .
We prove that the set of balanced extended non-associative words Ω(n) having n occurrences of z is nonempty for each n ≥ 2.
In principle, we construct a balanced extended non-associative word w j having 2 j occurrences of pzzp and then we replace a certain number of occurrences of pzzp with the factor pzp to achieve the required number of occurrences of z.This completes the proof.✷

Dissection of infinite languages
In [10] it was shown that every constantly growing language can be dissected by some regular language.
In the next proposition we show under which condition we can dissect a language L ⊆ Ω by a regular language.Informally, the proposition says that a geometrically growing subset of balanced extended non-associative words is REG-dissectible.and L ⊆ Ω is an infinite language such that for each w 1 ∈ L there is then there is a regular language R ⊆ Θ * such that R dissects L.
Proof: Let α ∈ N + be such that 2 α−1 < β ≤ 2 α .Obviously such α exists and is uniquely determined.Given w 1 ∈ L, let w 2 ∈ L be such that where n 1 = occur(w 1 , z) and n 2 = occur(w 2 , z).From the conditions of the proposition such w 2 exists.Without loss of generality suppose that log 2 n 1 ≥ 2. Note that there are only finitely many words v ∈ Ω with log 2 (occur(v, z)) < 2.
Let h 1 = height(w 1 ) and h 2 = height(w 2 ).Corollary 3.9 implies that From ( 1) and ( 2) it follows that Since we selected w 1 arbitrarily, it follows from (3) that is a constantly growing language.Lemma 4.1 implies that H ⊆ {x} * is REG-dissectible.Let R ⊆ {x} * be a regular language that dissects H. Let R = {rpv | r ∈ R and v ∈ Θ * } ⊆ Θ * .Obviously R is a regular language that dissects L; to see this, recall that if w ∈ Ω, i ∈ N + , a ∈ Θ \ {x}, and x i a ∈ Pref(w) then a = p.This completes the proof.✷ We step to the proof of the main theorem of the current article.
Proof of Theorem 1.2: Without loss of generality let Σ = {z} be the alphabet with the letter z ∈ Θ.Let π : Θ * → Σ * be an erasing alphabetical homomorphism defined as follows: Thus π erases all letters except for the letter z.From the definition of ENW, BAL, and Ω, it follows that the language Ω is an intersection of two context-free languages ENW and BAL.Let M 1 = ENW and let M 2 = BAL; recall that M 1 and M 2 are used in the statement of Theorem 1.2.
Let L = {w ∈ Ω | π(w) ∈ L)}.Note that L contains w ∈ Ω if and only if there is a word v ∈ L such that the number of occurrences of z in w is equal to the length of v; formally occur(w, z) = |v|.Proposition 3.11 implies that L is an infinite language.
Let c ∈ R + be such that for every u ∈ L there exists v ∈ L with |u| < |v| ≤ c|u|.Since L is a geometrically growing language, we know that such c exists.Let β ∈ N + be such that β ≥ 2 and β ≥ c.Hence L is β-geometrically growing language.It follows that if w 1 ∈ L then there is a word w 2 ∈ L with occur(w 1 , z) < occur(w 2 , z) ≤ β occur(w 1 , z).
Then Proposition 4.2 implies that there is a regular language R ⊆ Θ * that dissects L. This implies that the homomorphic image π (R ∩ ENW ∩ BAL) dissects the language L. This completes the proof.✷

Dissection and i-separation
As mentioned in the introduction, for convenience we present a generalization of Lemma 1.1 ([10, Lemma 5.1]), which demonstrates the connection between dissectibility and i-separation.The presented proof is just a copy of the proof in [10] by changing REG to C. L i )}.
The prefix "hi" stands for "homomorphic image".Using the set hiCFL k,A we can restate Theorem 1.2 as follows: Corollary 6.1 Every geometrically growing language over an alphabet A with |A| = 1 is hiCFL 2,Adissectible.
Moreover we introduced in the current article the notion of a geometrically growing language.We generalize this concept as follows.Let Π = {σ : N + → N + | σ(n) > n for all n ∈ N + }.
Given σ ∈ Π, we say that a language L is σ-growing if for every word u ∈ L there is a word v ∈ L such that |u| < |v| ≤ σ(|u|).Remark 6.2 Let σ(n) = cn for some c ∈ N + with c > 1. Obviously σ ∈ Π.A language L is σ-growing language if and only if L is c-geometrically growing.
Let C be a family of languages.Using the notion of σ-growing languages, we present the following open questions and problems: • Find σ ∈ Π such that there exists a σ-growing language L that is not C-dissectible or show that such σ does not exist.
-If σ ∈ Π and σ(n) > σ(n) for all n ∈ N + then there is a σ-growing language L that is not C-dissectible.
Concerning the family of languages C, we are particularly interested in REG, CFL k , and hiCFL k,A for all k ∈ N + .However the questions may be of interest also for other families.
We list some more open questions and problems in spite of the fact that some of them are already mentioned (directly or indirectly) above.
• Is the family of geometrically growing languages REG-dissectible?

Lemma 5 . 1 6
Let A, B, and C be any three language families and assume that A − B is C-dissectible.It then holds that, for any A ∈ A and any B ∈ B, if A i-covers B, then there exists a language in E that i-separates i(B, A), whereE expresses the set {B ∪ (C ∩ A) | A ∈ A, B ∈ B, C ∈ C}.In other words, E i-separates i(B, A).Proof: Let A ∈ A and B ∈ B be two infinite languages.Let D = A − B and assume that D is infinite.Our assumption guarantees the existence of a language C ∈ C for which C dissects D. We setE = B ∪ (A ∩ C).Since C dissects D, it follows that |(A ∩ C) − B| = ∞ and |(A ∩ C) − B| = ∞.It follows that B ⊆ E ⊆ A and |A − E| = |E − B| = ∞.Thus, E i-separates i(B, A).Since C ∈ C, E belongs to the language family E.This completes the proof.✷Dissecting power of intersection of two context-free languages 9 Open questionsIn the current article we applied a new idea of dissecting a language L by a homomorphic image of a language L from the family of languages CFL 2 .The idea can be generalized for every CFL k , where k ∈ N + .Let us introduce a notation for this technique.Given an alphabet A and a positive integer k, let hiCFL k,A = {L ⊆ A * | there are an alphabet A and context-free languages L i ⊆ A * with i ∈ {1, 2, . . ., k} and a homomorphism φ : A * → A * such that L = φ( k i=1