Coding Partitions †

Motivated by the study of decipherability conditions for codes weaker than Unique Decipherability (UD), we introduce the notion of coding partition. Such a notion generalizes that of UD code and, for codes that are not UD, allows to recover the “unique decipherability” at the level of the classes of the partition. By tacking into account the natural order between the partitions, we define the characteristic partition of a code X as the finest coding partition of X. This leads to introduce the canonical decomposition of a code in at most one “unambiguous” component and other (if any) “totally ambiguous” components. In the case the code is finite, we give an algorithm for computing its canonical partition. This, in particular, allows to decide whether a given partition of a finite code X is a coding partition. This last problem is then approached in the case the code is a rational set. We prove its decidability under the hypothesis that the partition contains a finite number of classes and each class is a rational set. Moreover we conjecture that the canonical partition satisfies such a hypothesis. Finally we consider also some relationships between coding partitions and varieties of codes.


Introduction
The theory of uniquely decipherable (U D) codes, born in the context of information theory, plays a relevant role also in language theory and combinatorics on words (see [1]).In spite of their simple definition, the structure of U D codes is still for a large extent unknown.
In recent years some papers take into account codes that are not U D. The study of the corresponding ambiguities are, in certain cases, motivated by investigations on natural languages (see [3]).From another point of view, the classification of ambiguities is related to conditions of decipherability, weaker than U D, introduced to handle some special problems in information transmission (see [7,5,10,8]).More generally, the study of ambiguities can help in understanding the structure of U D codes.
In this paper we introduce the notion of coding partition of a code X (here we call code an arbitrary set of words).Given a partition P = {X 1 , X 2 , . ..} of a code X, P is, roughly speaking, a coding partition of X if any word w ∈ X * has a unique factorization w = z 1 z 2 • • • z t , where each "block" z i is the concatenation of words from one class of P , and consecutive "blocks" are concatenations of words from different classes of P .The notion of coding partition generalizes that of U D code: indeed U D codes correspond to the extremal case in which each class contains exactly one element.In general, for codes that are not U D, the notion of coding partition allows to recover "unique decipherability" at the level of classes of the partition.In other words, such notion gives a tool to localize the ambiguities for a code that is not U D: indeed the ambiguities are bordered inside the individual classes of the partition and a sort of mutual unambiguity holds between the different classes.
By taking into account the natural ordering between the partitions of a set X, where finer is higher, we have that the coding partitions form a complete lattice.As a consequence, given a code X, we can define the finest coding partition P of X.It is called the characteristic partition of X and it is denoted by P (X).
The structure of P (X) gives useful information about coding properties of X.In particular, an extremal case (each class of P (X) contains only one element) corresponds to U D codes.The opposite extremal case (P (X) contains only one class) gives rise to the definition of totally ambiguous code.Such considerations leads to define a canonical decomposition of a code in at most one unambiguous component and in a set (possibly empty) of totally ambiguous components.
In Sec.3, we consider the case of a finite code X and we present an algorithm that gives the canonical decomposition of X.In particular, the algorithm allows also to decide: 1) given a partition P of X, whether P is a coding partition of X, and 2) given a code X, whether it is totally ambiguous.
In Sec.4,we take into account the rational case.We consider a partition P = {X 1 , X 2 , . . ., X n } of a rational code X in a finite number of classes and such that all classes X i are rational sets.We define a rational relation related to such a partition, and we prove that the partition is coding if and only if the rational relation is a function.This allows to decide whether the partition is coding.
In the last section, we consider the relationships between coding partitions and varieties of codes (see [5]).We prove that, given a coding partition of a code X, if each class of the partition belongs to a given variety of codes, then X belongs to the same variety.

Partitions of a code
Let A be a finite alphabet.Let A * denotes the free monoid generated by A, i.e. the set of words over the alphabet A, and let A + = A * \{ε}.
A code X over A is a subset of A + .The words of X are called code words, the elements of X * messages, where X * denotes the submonoid of A * generated by X, i.e. the set of words obtained concatenating elements of X.
A code X is said to be uniquely decipherable (U D) if every message has a unique factorization into code words, i.e. the equality x 1 , x 2 , . . ., x n , y 1 , y 2 , . . ., y m ∈ X, implies n = m and x 1 = y 1 , . . ., x n = y n .
The theory of U D codes has been widely developed, and it is closely related also to problems in automata theory, combinatorics on words, formal languages and semigroup theory.A complete treatment of such theory can be found in [1].
Remark 1 In literature, in general, the word code denotes a U D code (see [1]).In this paper we take also into account conditions of decipherability weaker than U D. This motivate the choice to call code an arbitrary subset of A + .
Let X be a code and let P = {X 1 , X 2 , . . ., X i , . ..}, be a partition of X i.e. : i≥1 X i = X and X i ∩ X j = ∅, for i = j.
We say that a partition P is concatenatively independent if, for i = j, The partition P is called a coding partition if it is concatenatively independent and moreover any element w ∈ X + has a unique P -f actorization, i.e. if where Remark 2 If P = {X 1 , X 2 , . ..} is a coding partition of X, in general neither X nor the sets X i are U D codes.On the other hand P may not be a coding partition though all X i are U D. This is shown by the following examples.
Remark 3 If X is a U D code then every partition of X is a coding partition.Therefore if e.g.X is a infinite U D code, it is possible to have coding partitions with infinitely many classes.
Let X be a code and let x 1 x 2 • • • x s = y 1 y 2 • • • y t be two factorizations into code words of a message w ∈ X + .In the sequel, when no confusion arises, sometimes we will denote by w both the "word" w and the relation x for all i < s and for all j < t one has . . .
Remark 5 Let P = {X 1 , X 2 , . ..} be a partition of a code X and let w ∈ X + be a message.If there exists a unique factorization of w into code words then there exists a unique P -factorization of w: the consecutive words belonging to the same set of the partition will form a block.This means that if we have two distinct P -factorizations of a message w, we have at least two distinct factorizations of w into code words.
Theorem 1 Let P = {X 1 , X 2 , . ..} be a partition of a code X.The partition P is a coding partition of X iff for every prime relation there exists an integer h such that for all i ≤ s and for all j ≤ t, x i , y j ∈ X h . Proof: We prove that p = 1.If p > 1, by the uniqueness of the P -factorization, there exist l < s , m < t, such that but this contradicts the fact that the relation is prime.⇐ We first prove that the sets X i are concatenatively independent.Suppose, by contradiction, that there exists a word w ∈ We will show now that every message has an unique P -factorization.Suppose, by contradiction, that w ∈ X + is a message with two distinct P -factorizations: Since the sets X i are concatenatively independent, we can assume that z 1 = u 1 .We can then suppose that |u 1 | < |z 1 | and so t ≥ 2. From the Remark 5 there exist at least two different factorizations of w into code words.Given two different factorizations of w, we have a relation that, by Remark 4, can be uniquely factorized as a product of prime relations: is a prime relation and u 1 and u 2 belong to different sets of the partition, we have from the hypothesis that p ≥ 2 and no factor v h can cross u 1 , i.e. v h cannot be factorized as v h = xy, with x suffix of u 1 and y prefix of u 2 .In particular i and v k+1 is prime.So we have a contradiction and this concludes the proof.

Remark 6
The trivial partition is a coding partition.Moreover X is a U D code if and only if the discrete partition of X is a coding partition.In this sense the notion of coding partition generalizes to the partitions of a set the notion of U D code.

Given a partition
Theorem 2 If P is a coding partition of X, then any cross-section Y of P is a U D code.
Proof: Let Y ⊆ X be a cross section of P = {X 1 , X 2 , . ..} and let Y = {y 1 , y 2 , . ..}. Assume that Y is not U D, then there exists a word w ∈ Y + with two distinct factorizations w = y and we can assume that y 1 = y 1 .Of course w ∈ X + but these factorizations may be not P -factorizations because it's possible that y i = y i+1 for some 1 ≤ i ≤ s − 1 or y j = y j+1 for some 1 ≤ j ≤ t − 1.We obtain a P -factorizations if we turn the repetitions in a block, but we obtain two distinct P -factorizations: indeed, since y 1 = y 1 and Y is a cross section, the first blocks are different.Since P is, by hypothesis, a coding partition, we have a contradiction and so Y must be a U D code. 2 Remark 7 The converse of previous proposition does not hold in general, i.e. there exist noncoding partitions P of a code X, such that all the cross-sections of P are U D codes.This is shown by the partition in the Examples 2. As a consequence, in order to decide whether a partition P of a finite code X is coding, it does not suffice to test whether all cross-sections of P are U D codes.An algorithm to decide whether a partition is coding will be given in the next section.
Recall that there is a natural order between the partitions of a set X: if P 1 and P 2 are two partitions of X, P 1 ≤ P 2 if the elements of P 1 are unions of elements of P 2 .
Theorem 3 The set of the coding partitions of a code X is a complete lattice.
Proof: Let L(X) the set of all coding partitions of a set X and let F = {P i | P i ∈ L(X) , i ∈ I} be a family of coding partitions of X.Since the partitions of a set form a complete lattice, it is sufficient to prove that both the meet M = i≥1 P i and the join J = i≥1 P i belong to L(X).Let a prime relation between code words and let S = {x 1 , x 2 , . . ., x t }.By Theorem 1 there exists X hi ∈ P i , such that S ⊆ X hi , ∀i ∈ I. Since M ≤ P i ∀i ∈ I, there exists X k ∈ M such that S ⊆ X k ; from another hand since J is the least upper bound there exists X l ∈ J such that S ⊆ X l so, by the same theorem, we obtain the thesis. 2 As a consequence of previous theorem, given a code X, we can define the finest coding partition P of X.It is called the characteristic partition of X and it is denoted by P (X).
Then, as remarked before, X is a U D code if and only if P (X) is the discrete partition.The opposite extremal case gives rise to the next definition.
A code X is called ambiguous if it is not U D. It is called totally ambiguous if |X| > 1 and P (X) is the trivial partition.
The code X = {01, 10, 1} is totally ambiguous, as the reader can easily verify using Theorem 1.

Remark 8
In [11], Weber and Head introduced the notion of numerically decipherable code: a code X is numerically decipherable (N D) if any two factorizations in code words of each message over X involve the same number of words.In the same paper the authors introduced the notion of homophonic partition of N D codes.The notion of coding partition is here introduced for any code, i.e. for an arbitrary set X of words, and differs from that of homophonic partition even in the case when X is a N D code.Indeed in [11] the authors show that {{0, 10, 101}, {111}} is the finest homophonic partition of the N D code X = {x 1 , x 2 , x 3 , x 4 } = {0, 10, 101, 111}.Since x 2 x 4 x 2 = x 3 x 4 x 1 is a prime relation, we have, according to Theorem 1, that P (X) is the trivial partition.
A less trivial example of totally ambiguous code is the code X = {000, 0010, 001, 10, 1}, that we will study in an example of the next section.
for some i ≥ 1, then X i is a totally ambiguous code.
Proof: Suppose, without loss of generality, that |X 1 | > 1. Suppose, by contradiction, that X 1 is not totally ambiguous, and let P (X 1 ) = {Y 1 , Y 2 , . ..} be the characteristic partition of X 1 , with |P (X 1 )| ≥ 2. If we consider now P = {Y 1 , Y 2 , . . ., X 2 , . ..}, this is a coding partition of X with P (X) < P .Since this contradicts the definition of P (X), X 1 must be totally ambiguous.2 The property that any proper subset Y of an ambiguous code X is a U D code, is related to the property of X to be totally ambiguous, as we can see from the next proposition.
Theorem 5 Let X be a code such that all proper subsets Y of X are U D codes.Then either X is a U D code or it is totally ambiguous.
Proof: Let X be a non U D code.Let P (X) = {X 1 , X 2 , . ..} be the characteristic partition of X and suppose, by contradiction, that X is not totally ambiguous.Then |P (X)| ≥ 2 and since X is not U D, there exists a X i , i ≥ 1 such that |X i | > 1.We have that X i X and that, by Theorem 4, it is totally ambiguous.This contradicts the hypothesis. 2 The converse implication does not hold in general, as shown by the last example just given above: it is totally ambiguous, but its proper subset {000, 001, 10, 1} is not a U D code.
Let X be a code and let P (X) be the characteristic partition of X.Let X 0 be the union of all classes of P (X) having only one element, i.e. of all classes Z ∈ P (X) such that |Z| = 1.The code X 0 is a U D code and is called the unambiguous component of X. From P (X) one then derives another partition of X P C (X) = {X 0 , X 1 , . ..}, where |X i | > 1, for i ≥ 1.The sets X i , with i ≥ 1, are, by Theorem 4, totally ambiguous.They are called the totally ambiguous components of X.The partition P C (X) is called the canonical partition of X: it defines a canonical decomposition of a code X in at most one unambiguous component and a (possibly empty) set of totally ambiguous components.Roughly speaking, if a code X is not U D, then its canonical decomposition, on one hand separates the unambiguous component of the code (if any), and, on the other, localizes the ambiguities inside the totally ambiguous components of the code.If, on the contrary, X is U D, then its canonical decomposition contains only the unambiguous component X 0 .

Computing the canonical partition of a finite code
In this section we present an algorithm that computes the canonical partition of a finite code.
The algorithm is very close to the Sardinas-Patterson algorithm testing whether a code is U D and to its variations like domino graph and simplified domino graph (see [6,5]).Informally, like in Sardinas-Patterson algorithm, at each step we construct a set of suffixes of code words by comparing the code words with the suffixes constructed in the previous step.Here, in addition, for each new suffix u generated by the procedure, we record in a set S (associated to the suffix u) the set of indices of code words involved in the generation of such a suffix.We then construct a sequence of sets, whose elements are pairs of the form (u, S), where u is a suffix of code words and S is a set of indices corresponding to the code words.
Let X = {x 1 , x 2 , . . ., x k } be a finite code.We construct a sequence (U i ) i≥1 , where each U i is a set of pairs of the form (u, S), with u ∈ A * and S ⊆ {1, 2, . . ., k}.The sequence (U i ) i≥1 is defined inductively as follows: and, for n ≥ 1, Consider now the family of subsets of the set {1, 2, . . ., k} and denote by S X the set Let τ be the transitive closure of the relation defined in the set S X as follows: Set R 0 = {1, 2, . . ., k} \ S X and let R 1 , R 2 , . . ., R n be the equivalence classes of τ in S X .It is obvious that {R 0 , R 1 , . . ., R n } defines a partition of the set {1, 2, . . ., k}.Such a partition induces, in turns, a partition on the set X = {x 1 , x 2 , . . ., x k }: We denote this partition produced by the algorithm, by R(X).
Remark 9 Since |X| = k, the elements of U i 's are pairs composed by a suffix of words in X and a subset of {1, 2, . . ., k}.Then the set of different elements in the sequence (U i ) i≥1 is finite.As a consequence, the partition R(X) can be effectively constructed.

Theorem 6
The partition R(X) is the canonical partition of the code X. Proof: algorithm finds the pair (ε, S), where S = {1, 2, . . .t} and, starting from S, the algorithm creates the set of code words X h .By construction we have that x i ∈ X h , 1 ≤ i ≤ t and so, because of Theorem 1, P (X) is a coding partition.It is left to the reader to convince himself that P (X) is the characteristic partition of X so that R(X) become the canonical partition of the code X. 2 Corollary 7 Let P = {X 1 , X 2 , . . ., X n } be a partition of a finite code X.There is an algorithm to decide whether P is a coding partition.
Proof: Using the algorithm we find the canonical partition of X, and we test after if P ≤ P C (X) just verifying if the classes of P are unions of classes of P C (X). 2 Example 3 Consider the code X = {x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 } = {00, 0010, 1000, 11, 1111, 010, 011}.

The rational case
If a code X is infinite, one can have partitions having an infinite number of classes and, moreover, each class can contain infinitely many elements.In this section we consider first partitions having a finite number of classes and such that each class is a rational set.So we have a concatenatively independent partition P = {X 1 , X 2 , . . ., X n } such that each X i is a rational set.In order to give an algorithm to decide whether a partition is a coding partition, we need some preliminary definitions and results from the theory of rational relations.Let us recall that a rational relation ρ : A * → B * is a mapping from A * into the set 2 B * of the subsets of B * such that the graph is a rational subset of the product monoid A * × B * .
M. Nivat has given the following characterization of a rational relation (see [4]).
Theorem 8 Let ρ : A * → B * be a relation.Then ρ is a rational relation iff there exist a new finite alphabet Σ, two alphabetic morphisms α : Σ * → A * and β : Σ * → B * , and a rational subset Come now back to the partition P = {X 1 , X 2 , . . ., X n }, where each X i is a rational set over the alphabet A. Let A 1 , A 2 , . . ., A n be n disjoint copies of the alphabet A, and set Σ = n i=1 A i .Let α i : A i → A be the bijection of A i and A, for i = 1, 2, . . ., n and let α : Σ * → A * be the extension of the α i to Σ * : α| Ai = α i for i = 1, 2, . . ., n. Set Consider now another alphabet B = {b 1 , b 2 , . . ., b n } and the alphabetic morphism β : Σ * → B * defined as follows: Let ρ(P ) : A * → B * be the relation defined by the following graph By the theorem of Nivat, ρ(P ) is a rational relation.
Let us now recall that a relation ρ : The main result of this section is the following theorem.
Theorem 9 A partition P is a coding partition iff ρ(P ) is a function.
Proof: Let ρ = ρ(P ).From the definition of ρ we see that there is a bijection between the set of the P -factorizations of a word w ∈ X + and the set of the words v ∈ K such that α(v) = w.Moreover since β is injective, P is a coding partition if and only if for each word w ∈ X + , In [9] M. P. Schutzenberger proved that it is decidable whether a rational relation is a function.As a consequence, we obtain the following corollary of the previous theorem.
Corollary 10 Given a partition P = {X 1 , X 2 , . . ., X n } such that X i , for i = 1, 2, . . ., n, is a rational set, then it is decidable whether P is a coding partition.
Let us now consider the (more difficult) problem to determine the canonical partition of a rational code X.If the canonical partition has infinitely many classes, it is not clear what means to compute such a partition.Remark that the characteristic partition of a rational code X may have infinitely many classes.Consider, for instance, an infinite U D code X: the characteristic partition coincides with the discrete partition, i.e. each class contains only one element and there exist infinitely many classes.However, in such a case, the canonical partition is the trivial partition, composed by only one class.In all examples of rational codes that we know, the number of classes of the canonical partition is always finite and each class is a rational set.So we formulate the following conjecture.CONJECTURE : If X is rational, the number of classes of P C (X) is finite and each class of P C (X) is a rational set.
If the conjecture is true, the restrictive conditions considered in this section are not actually a restriction, but correspond to the general case.
Remark further that, if X is not rational, then P C (X) can have infinitely many classes, as shown by the following example.
Example 5 Consider the code It is easy to verify that P C (X) contains infinitely many classes and that any class X i is of the form

Coding partitions and varieties of codes
The notion of coding partition allows to decompose a code in such a way that the ambiguities are bordered into the single components of the code and a sort of mutual unambiguity holds between its different components.In recent years, conditions of decipherability weaker than U D have been investigated, formalized in terms of the notion of variety of codes, and a sort of classification of ambiguities of codes has been introduced.It is then interesting to study the relationships between the "type" of ambiguity of the single components in the partition, and that of the whole code.
Let us first briefly introduce the basic definitions and the motivations about the notion of variety of codes.The investigation on decipherability conditions weaker than U D was initiated in [7] by Lempel, who introduced the notion of multiset decipherable (M SD) codes.Here the information of interest is the multiset of code words used in the encoding process so that the order in which transmitted words are received is immaterial.In a more formal way, a code X is a M SD code if the equality with x 1 , x 2 , . . ., x n , y 1 , y 2 , . . ., y m ∈ X, implies the equality of the two multisets {x 1 , x 2 , . . ., x n } and {y 1 , y 2 , . . ., y m }.
In [5] Guzmán considers also the notion of set decipherable (SD) codes.In this case the original message is recovered up to commutativity and actual count of occurrences, i.e. two factorizations of the same message yield the same set of code words.Denote by UD, MSD and SD the classes of U D, M SD and SD codes, respectively.It is clear that UD ⊆ MSD ⊆ SD and has been shown that the two inclusions are strict.
In the same paper [5] Guzmán introduces a very general concept of decipherability using varieties of monoids.Unique decipherability, multiset decipherability and set decipherability then appear as very special cases of such general concept.
Let X be a code and let M be a monoid.We say that X is decipherable in M if every map f : X → M extends to a (unique) homomorphism f : X * → M .
Let K be a class of codes and V a class of monoids.We denote by M(K) the class of all momoids in which every X ∈ K is decipherable.Conversely, C(V) represents the class of all codes decipherable in every M ∈ V.
A variety of codes is a class of codes K such that C(M(K)) = K.
The following holds (see [5]): • UD is a variety of codes corresponding to the variety of all monoids.
• MSD is a variety of codes corresponding to the variety of commutative monoids.
• SD is a variety of codes corresponding to the variety of semilattices i.e. the variety of commutative monoids that are idempotent.
The varieties of codes have been investigated from different points of view.In particular, the papers [5] (see also [11]) studies the problem to decide whether a code belongs to a given variety.In [2] the authors study the problem to characterize those varieties of codes where the Kraft inequality is satisfied.
Let us now come back to partitions.Let P = {X 1 , X 2 , . ..} be a partition of a code X.We are interested to investigate whether there is some relationship between the varieties corresponding to the classes X i 's and the variety corresponding to the code X.The result of this section, that generalizes a result obtained in [2], is given by the following theorem.
Theorem 11 Let P = {X 1 , X 2 , . ..} be a coding partition of a code X and let K be a variety of codes.If X i ∈ K for i ≥ 1, then X ∈ K.
Proof: Let N be the variety of monoids associate to the variety K of codes.We have to show that for all monoids M ∈ N , X is decipherable in M .Let M ∈ N , and let f : X → M be a map from X to M .Denote, for i ≥ 1, by g i the restrictions of f to X i : g i := f | Xi .Since X i are decipherable in M , g i extends to g i : X * i → M , for i ≥ 1.Let w ∈ X + and suppose that its unique P -factorization is w = z i1 z i2 • • • z in , where z ij ∈ X + ij , i j ∈ {1, 2, . ..}.Then, putting f (w) := g i1 (z i1 )g i2 (z i2 ) • • • g in (z in ), we get the unique homomorphism extending f , and so X is decipherable in M . 2 Using the fact that the varieties of codes form a complete lattice (see Corollary 1.6 in [5]) we have the next corollary.
Corollary 12 Let P = {X 1 , X 2 , . ..} be a coding partition of a code X and let K i be varieties of codes such that X i ∈ K i , i ≥ 1.Then X belongs to the join i≥1 K i .
Therefore, in particular, if each X i is a U D code, then also X is a U D code.In the same way, if each X i is a M SD code, then also X is a M SD code, etc.In a coding partition, the properties of the individual classes are transferred to the whole code.