Generalized Fitch Graphs III: Symmetrized Fitch maps and Sets of Symmetric Binary Relations that are explained by Unrooted Edge-labeled Trees

Binary relations derived from labeled rooted trees play an import role in mathematical biology as formal models of evolutionary relationships. The (symmetrized) Fitch relation formalizes xenology as the pairs of genes separated by at least one horizontal transfer event. As a natural generalization, we consider symmetrized Fitch maps, that is, symmetric maps $\varepsilon$ that assign a subset of colors to each pair of vertices in $X$ and that can be explained by a tree $T$ with edges that are labeled with subsets of colors in the sense that the color $m$ appears in $\varepsilon(x,y)$ if and only if $m$ appears in a label along the unique path between $x$ and $y$ in $T$. We first give an alternative characterization of the monochromatic case and then give a characterization of symmetrized Fitch maps in terms of compatibility of a certain set of quartets. We show that recognition of symmetrized Fitch maps is NP-complete. In the restricted case where $|\varepsilon(x,y)|\leq 1$ the problem becomes polynomial, since such maps coincide with class of monochromatic Fitch maps whose graph-representations form precisely the class of complete multi-partite graphs.


Introduction
Labeled phylogenetic trees are a natural structure to model evolutionary histories in biology. The leaf set L of the tree T correspond to currently living entities, while inner nodes model the branching of lineages that then evolve independently. Labels on vertices and edges annotate further details on evolutionary events. Considering the evolution of gene families, for instance, vertex labels may be used to distinguish gene duplication events from speciation and horizontal gene transfer Fitch (2000). Edge labels, on the other hand, may be used to designate (rare) events that change properties of genes, genomes, and organisms Hellmuth et al. (2018a) or to distinguish different fates of offspring genes such as the horizontal transfer into another genomes Geiß et al. (2018). Distance-based phylogenetics can be seen as special case of the latter setting, where edges are weighted by evolutionary distances Semple and Steel (2003). Relations on L are naturally defined as functions of the edge and/or vertex labels along the unique path connecting a pair of leaves. For instance, evolutionary distances are simply the sum of the edge length; the edge set of Pairwise Compatibility Graphs requires the path length (i.e., sum of edge-weights) to fall between given bounds Calamoneri and Sinaimeri (2016); a pair of genes are orthologs, a key relation in functional genomics, if their last common ancestor lca T (x, y) is labeled as speciation; a directed xenology relation is defined by asking whether there is a "transfer edge" on the path between lca T (x, y) and y. In all these examples the mathematical interest is in the inverse problem. Given a relation or a set of relations and a rule relating labeled trees to the relation(s), one asks (i) when does a tree T exist that explains the given relation, (ii) is there a unique explaining tree T that is minimal in some sense (usually edge contraction), and (iii) can a (minimal) explaining tree be constructed efficiently from the given data. For the vertex-labeled case, symbolic ultrametrics Böcker and Dress (1998) and 2-structures Ehrenfeucht and Rozenberg (1990); Hellmuth et al. (2017) provide a comprehensive answer. Edge labels also have been studied extensively. For distances, the 4-point condition Buneman (1971) characterize the "additive" metrics deriving from trees. For rare events, where x ∼ y if they are separated by exactly one event, a complete characterization was provided in Hellmuth et al. (2018a). For PCGs (which exclude the possibility of no event along an edge), on the other hand, only partial results are known Calamoneri and Sinaimeri (2016).
In this contribution we are interested in a generalization of Fitch relations. These relations were introduced to model so-called horizontal gene transfer (HGT) and formalized in Geiß et al. (2018); Hellmuth graphs) as well as directed Fitch maps can be recognized in polynomial time, this is no longer the case for symmetrized Fitch map; we show that their recognition problem is NP-complete. The restriction to maps where each pair of leaves (x, y) has at most one label, however, remains polynomial. In particular, this work complements the results established in Hellmuth (2019); .

Preliminaries
Basic Notation For a finite set X we write [X × X] irr := X × X \ {(x, x) : x ∈ X}, and X k := {X ⊆ X : |X | = k}. The set P(X) denotes the power set of X. A partition of X is a collection of pairwise disjoint non-empty sets X 1 , . . . , X k with k ≥ 1 such that X = X 1 ∪ · . . . ∪ · X k .
We consider undirected graphs G = (V, E) with finite vertex set V (G) = V and edge set E(G) = E ⊆ V 2 , i.e., without loops and multiple edges. The complete graph K |V | has vertex set V and edge set E = V 2 . Hence, K 1 denotes the single vertex graph and K 2 consist of two vertices and the connecting Thus, each part V i is a maximal independent set. A graph G is a complete multi-partite graph if and only if it does not contain K 1 + K 2 , the disjoint union of K 1 and K 2 , i.e., the graph with three vertices and a single edge as an induced subgraph, see e.g. Zverovich (1999).
The set of inner vertices is denoted byV (T ). Analogously, an edge e = {v, w} ∈ E(T ) with v, w ∈V (T ) is an inner edge, and an outer edge, otherwise. The set of inner edges of a tree T is denoted byE(T ).
A star tree is a tree that has exactly one inner vertex and at least two leaves. Moreover, we say a tree T is less resolved than a tree T , denoted by T < T , if T can be obtained from T by a non-empty sequence of edge-contractions.
Remark. From here on we consider only phylogenetic trees, and refer to them simply as trees.
Subsplits and Quartets A subsplit A|B on a set X is an unordered pair of two disjoint and non-empty subsets A, B ⊆ X, i.e. A|B = B|A. A subsplit A|B is trivial if min{|A|, |B|} = 1, and it is a quartet if |A| = |B| = 2. In the latter case we write ab|cd instead of {a, b}|{c, d}. A subsplit A|B on X is a split on X if A ∪ B = X. A subsplit A|B on X is displayed by a tree T with L(T ) = X if there is an edge e ∈ E(T ) such that A ⊆ L(T 1 ) and B ⊆ L(T 2 ), where T 1 and T 2 are the connected components of T \ e := (V (T ), E(T ) \ {e}). In this case we call e a splitting edge w.r.t. A|B. Clearly, removal of an edge in T yields always a split L(T 1 )|L(T 2 ) that is displayed by T . Hence, a subsplit A|B is displayed by T if there is a split A |B in T with A ⊆ A and B ⊆ B . A set S of subsplits is called compatible if there is a tree T that displays every subsplit in S. The set S(T ) comprises all splits on X displayed by T and the set Q(T ) comprises all quartets that are displayed by T .
The relation between trees and split systems is captured by the following well-known result Buneman (1971), see (Semple and Steel, 2003, Section 3.1) for a detailed discussion. In the setting of X-trees, the taxa X are mapped to vertices of the T with degree at most 2 by a not necessarily injective map p : X → V (T ). Since in our setting there is a one-to-one correspondence of X and the leaves of T , i.e., p is injective, the split system S necessarily contains all trivial splits {x}|X \ {x} with x ∈ X.
Proposition 2.1 (Splits-Equivalence Theorem). Let S be a collection of splits on X that contains all trivial splits. Then, there is a tree T with leaf set X such that S = S(T ) if and only if for all pairs of distinct splits A 1 |B 1 , A 2 |B 2 ∈ S at least one of the four intersections A 1 ∩ A 2 , A 1 ∩ B 2 , B 1 ∩ A 2 and B 1 ∩ B 2 is empty. Moreover, if such a tree exists, then T is unique up to isomorphism.
For later reference we state a simple consequence of Proposition 2.1.
Corollary 2.2. Let S be a collection of subsplits on X. If there are two subsplits A 1 |A 2 and B 1 |B 2 in S such that all four intersections A 1 ∩ B 1 , A 1 ∩ B 2 , A 2 ∩ B 1 and A 2 ∩ B 2 are non-empty, then S is not compatible.
Proof: Let S be a collection of subsplits on X, and suppose that are two subsplits A 1 |B 1 and A 2 |B 2 in S such that none of the sets A 1 ∩ A 2 , A 1 ∩ B 2 , B 1 ∩ A 2 and B 1 ∩ B 2 is empty. Assume for contradiction that S is compatible, i.e., there is a tree T that displays S. Thus, there is a split A 1 |A 2 and a split B 1 |B 2 in T such that A 1 ⊆ A 1 , A 2 ⊆ A 2 , B 1 ⊆ B 1 and B 2 ⊆ B 2 . However, by assumption all four intersections , 2}, and hence, by Proposition 2.1, such a tree T cannot exist. Therefore, S is not compatible. We will often refer to the map λ as the edge-labeling and call e an m-edge if m ∈ λ(e), and an ∅-edge if λ(e) = ∅. Note that the choice of m ∈ λ(e) may not be unique and an edge can be both, an mand an m -edge at the same time.

Symmetrized Fitch maps
Definition 3.2. A map ε : [X × X] irr → P (M ), where X is a non-empty set of "leaves" and M is a non-empty set of "colors", is a symmetrized Fitch map if there is an edge-labeled tree (T, λ) with leaf set X and edge labeling λ : E(T ) → P(M ) such that for every pair (x, y) ∈ [X × X] irr it holds that m ∈ ε(x, y) ⇐⇒ there is an m-edge on the path from x to y.
Remark. From here on we assume w.l.o.g. that ε is symmetric and |X| ≥ 3. Figure 2 provides an illustrative example of a symmetrized Fitch map ε : [X × X] irr → P(M ) and one of its corresponding edge-labeled trees (T, λ). In particular, Figure 3 shows that the corresponding edge-labeled trees for ε may not be unique in general. Every map ε : [X × X] irr → P(M ) can also be interpreted as a set of |M | not necessarily disjoint binary relations (or equivalently graphs) on X defined by the sets {(x, y) ∈ [X × X] irr : m ∈ ε(x, y)} of pairs (or equivalently undirected edges) for every fixed color m ∈ M .
. Following the approach by , we start by considering neighborhoods in this graph representation.
Definition 3.4 ( (Hellmuth et al., 2020, Def. 3.3)). The (complementary) neighborhood of vertex y ∈ X and a given color : y ∈ X} for the set of complementary neighborhoods of ε and a particular color m ∈ M . Note that there might be distinct leaves y, y ∈ X or distinct colors m, m ∈ M such that N ¬m [y] = N ¬m [y ]. Moreover, we emphasize that N ¬m [ε] is not a multi-set, i.e. if N ¬m [y] = N ¬m [y ] but y = y or m = m , then they only contribute once to N ¬m [ε].

Characterization of monochromatic symmetrized Fitch maps
x, y ∈ X and some fixed color m ∈ M . Hence, for monochromatic maps we can assume w.l.o.g. that |M | = 1. Monochromatic symmetrized Fitch maps are equivalent to the "undirected Fitch graphs" studied by .
For later reference we briefly recall some key results for this special case. 1. ε is a (monochromatic) symmetrized Fitch map.
2. G m (ε) does not contain a K 1 + K 2 as an induced subgraph. We observe that both edge-labeled trees explains ε. Thus, ε is a (monochromatic) symmetrized Fitch relation. For instance, m / ∈ ε(a, b) ∪ ε(c, d) but m ∈ ε(a, c) imply that every edge-labeled tree, which explains ε, needs at least one inner edge. Thus, these two trees have the fewest numbers of vertices among all trees that may explain ε and are known as so-called "minimally-resolved" trees. The latter arguments imply that minimally-resolved trees need not to be unique; a fact that has also been observed in .

G m (ε) is a complete multi-partite graph.
Using Lemma 3.5, we can derive the following alternative characterization: Then, the following statements are equivalent: Proof: Let ε : [X × X] irr → P(M ) be a monochromatic map with M = {m}. In the following will make frequent use of the fact that ε(a, b) = ε(b, a) and, therefore, m ∈ ε(a, b) if and only if {a, b} ∈ E(G m (ε)).
First, assume that Statement (1) is satisfied. Lemma 3.5 implies that G m (ε) does not contain a K 1 + K 2 as an induced subgraph. Hence, for arbitrary pairwise distinct a, b, c ∈ X with m / ∈ ε(a, b) and m / ∈ ε(b, c), it must hold that m / ∈ ε(a, c). Thus, Statement (2) holds. Now, assume that Statement (2) is a collection of pairwise disjoint non-empty sets N 1 , . . . , N k such that X = N 1 ∪ · . . . ∪ · N k . Since y ∈ N ¬m [y], we conclude that every neighborhood in N ¬m [ε] is non-empty and that y∈X N ¬m [y] = X. To this end, let y, y ∈ X be two distinct vertices that satisfy Now, we continue to show that m / ∈ ε(y, y ) = ε(y , y). To this end, we assume for contradiction that m ∈ ε(y, y ) = ε(y , y). Therefore, such that x, y and y are pairwise distinct. However, m / ∈ ε(x, y) = ε(y, x) and m / ∈ ε(x, y ). In summary, we have m / ∈ ε(y, x), m / ∈ ε(x, y ) and m ∈ ε(y, y ); a contradiction to Statement (2). Thus, m / ∈ ε(y, y ) = ε(y , y). The latter implies that {y, . Moreover, if x / ∈ {y, y }, then x, y and y are pairwise distinct. In this case, m / ∈ ε(x, y ) and m / ∈ ε(y , y) together with Statement (2) implies that m / ∈ ε(x, y). Therefore, Finally, we show that Statement (3) implies Statement (1). Using contraposition, we assume that ε is not a symmetrized Fitch map. Then, we conclude by Lemma 3.5 that G m (ε) contains an K 1 +K 2 as an induced subgraph. Let . Taken the latter arguments together, we observe that N ¬m [ε] cannot be a partition of X. Thus, if Statement (3) is satisfied, then Statement (1) must be satisfied as well.
A natural special case is to consider maps ε : [X × X] irr → P(M ) that assign to each pair (x, y) at most one label. In this case, ε reduces to a map ε : Since ε is a symmetrized Fitch map, there is an edge-labeled tree (T, λ) that explains ε. The latter two arguments imply that T contains an m-edge e and m -edge f . Now, consider a vertex-maximal path P in T that contains e and f . Clearly, P must contain two leaves x, y ∈ X as its end-vertices. But then m, m ∈ ε(x, y) implies |ε(x, y)| > 1; a contradiction.
Finally, we characterize least-resolved trees for a monochromatic symmetrized Fitch map ε: An edgelabeled tree (T * , λ * ) is least-resolved for ε if there is no tree T < T * and no labeling λ such that (T, λ) also explains ε. Such trees can be constructed using the fact that G m (ε) is identified by the set I of its maximal independent sets since it is a complete multi-partite graph (cf. L. 3.5). As remarked above, we assume |X| ≥ 3 to avoid trivial cases.  Note that there are no restrictions on the arrangement of the inner edges in Def. 3.8 as long as (T, λ) results in a phylogenetic tree. For instance, the inner edges could be arranged as a star-graph or as a path. See Fig. 4 for an illustrative example. Proof: Let ε : [X × X] irr → P(M ) be a monochromatic symmetrized Fitch map, and I, I 2 and T ε as specified in Def. 3.8. First, assume that I 2 = ∅, i.e. G m (ε) K |X| . It is an easy exercise to verify that all least-resolved trees for ε must be a star tree, and all edges with one possible exception are m-edges. Hence, all such trees are, by Def. 3.8 (1), contained in T ε .
Let us now assume that I 2 = ∅ and let (T * , λ * ) be a least-resolved tree for ε. We show that (T * , λ * ) ∈ T ε . To this end, we assume first, for contradiction, that there is an inner edge e ∈E(T * ) with λ(e) = ∅, and consider the edge-labeled tree (T , λ ) obtained from T * by contraction of e and keeping the remaining edge labels of λ * . Note that T is still a phylogenetic tree. Let x, y ∈ X be two distinct leaves. If ε(x, y) = ∅, then the path P T * (x, y) can have only ∅-edges. It is easy to see that this property is preserved by (T , λ ). If ε(x, y) = {m}, then the path P T * (x, y) contains an m-edge. Since we only contracted the single inner ∅-edge to obtain T , the path P T (x, y) still contains this m-edge. Consequently, (T , λ ) explains ε but T < T * ; contradicting the fact that (T * , λ * ) is least-resolved for ε. Hence, every inner edge e ∈E(T * ) is an m-edge, and thus, Statement (2e) holds.
To see that Statement (2c) is satisfied, observe that for every x ∈ I∈I2 I there is an I ∈ I 2 with x ∈ I, and there is a y ∈ I \ {x}. Hence, ε(x, y) = ∅, and thus, the path P T * (x, y) contains only edges e with λ * (e) = ∅. In particular, for the outer edge {x, v} ∈ E(T * ), we therefore have λ * ({x, v}) = ∅. Hence, Statement (2c) holds.
To see that Statement (2b) is satisfied, we assume first, for contradiction, that there are two vertices x, y ∈ I ∈ I 2 such that {x, v I } ∈ E(T * ) and {y, v I } ∈ E(T * ) but v I = v I . By Statement (2e), there is an inner m-edge contained in P T * (x, y) since v I and v I are distinct inner vertices. Thus, ε(x, y) = {m} = ∅; a contradiction to x, y ∈ I. By similar arguments, {x, v}, {x , v} ∈ E(T * ) with x ∈ I ∈ I 2 and x ∈ I ∈ I 2 imply x, y ∈ I ∩ I , and since I 2 forms a partition, I = I . Thus, the required uniqueness in Statement (2b) holds. The previous arguments together, imply that Statement (2b) holds.
We continue by showing that every inner vertex v ∈V (T * ) is incident to an outer ∅-edge {v, x} ∈ E(T * ). If T is a star graph, i.e. v ∈V (T * ) is the only inner vertex, then there must be an outer edge {v, x} ∈ E(T * ) with λ({v, x}) = ∅, since I 2 = ∅. Now, assume that T has inner edges. Moreover, assume for contradiction that there is an inner vertex v ∈V (T * ) such that every outer edge {v, x} ∈ E(T * ) has label λ * ({x, v}) = {m}. Since v ∈V (T * ) and T has inner edges, we can apply Statement (2e) to conclude that there is an inner m-edge {v, w} ∈E(T * ). Now, consider the edge-labeled tree (T , λ ) obtained from T * by contraction of {v, w} and keeping the remaining edge labels of λ * . Let ε be the symmetrized Fitch relation explained by (T , λ ) and x, y ∈ X be chosen arbitrarily. If P T * (x, y) does not contain the m-edge {v, w}, then P T (x, y) = P T * (x, y) and the edge-labels along this path remain unchanged. Therefore, ε (x, y) = ε(x, y). Now, suppose that P T * (x, y) contains the edge {v, w}, and thus, ε(x, y) = {m}. If P T (x, y) does not contain any inner edge, then either x or y must be incident to v in T * , say {x, v} ∈ E(T * ) and thus, λ ({x, v}) = λ * ({x, v}) = {m}. Hence, P T (x, y) still contains an m-edge, and therefore, ε (x, y) = ε(x, y) = {m}. Otherwise, if P T (x, y) contains an inner edge, then it contains in particular an m-edge (cf. Statement (2e)). Again, ε (x, y) = ε(x, y) = {m}. Hence, we have shown that the symmetrized Fitch relation ε that is explained by (T , λ ) and ε are identical. Consequently, (T * , λ * ) is not a least-resolved tree for ε; a contradiction. Therefore, every inner vertex v ∈V (T * ) is incident to an outer ∅-edge {v, x} ∈ E(T * ). Now, let {x, v} ∈ E(T * ) be an outer edge with x ∈ X \ I∈I2 I. In particular, x ∈ X \ I∈I2 I implies that ε(x, y) = {m} for every y ∈ X \ {x}. Assume, for contradiction, that λ * ({x, v}) = ∅. Due to the choice of x, every possible outer edge {y, v} ∈ E(T * ) with x = y must have label λ * ({y, v}) = {m}. Let us keep all edge-labels in T * except for {x, v}, which is relabeled to an m-edge. This, results in an edge-labeled tree (T * , λ ) where v is incident to m-edges only that still explains ε. But then, (T * , λ ) and thus, (T * , λ * ) cannot be least-resolved, since for every inner vertex v ∈V (T * ) there must be an outer ∅-edge. Hence, Statement (2d) holds.
It remains to show that every tree in T ε is also least-resolved for ε. Hence, let (T, λ) ∈ T ε be an edge-labeled tree. We show first that (T, λ) explains ε. To this end, let x, y ∈ X be distinct. If ε(x, y) = ∅, then x, y are contained in the same independent set I and, by construction (2b), the path P T (x, y) contains only the two outer ∅-edges {v I , x} and {v I , y}. If ε(x, y) = {m}, then x and y are contained in distinct independent sets I, I ∈ I, say x ∈ I and y ∈ I . If |I| = 1 then, by construction (2c) It remains to show that (T, λ) is least-resolved for ε. If (T, λ) is not least-resolved for ε then there is a least-resolved tree (T , λ ) for ε such that T < T and therefore |V (T )| < |V (T )|. As argued above, if (T * , λ * ) is an arbitrary least-resolved tree for ε we have by construction |V (T * )| = |V (T )|. Therefore |V (T )| < |V (T * )|, contradicting the fact that all least-resolved trees for ε must have the same number of vertices. Consequently, (T, λ) is least-resolved for ε.
In particular, Prop. 3.9 implies the following Corollary 3.10. Let ε be a monochromatic symmetrized Fitch map. Then, the least-resolved trees for ε have the same number of vertices and, thus, in particular, the minimum number of vertices among all trees that explain ε, i.e., they are minimally-resolved trees for ε.
Corollary 3.11. Every monochromatic symmetrized Fitch map can be explained by an edge labeled tree (T, λ) of diameter diam(T ) ≤ 4, i.e., the length of each path in T is four or less.
Corollary 3.11 can also be obtained from the explicit construction of rooted trees that explain undirected Fitch graphs Hellmuth et al. (2018b). As we shall prove in Lemma 3.16 below, every tree that explains ε must display the quartets ab|cd and ac|bd. However, by Corollary 2.2, the set {ab|cd, ac|bd} of quartets is not compatible. Therefore, ε cannot be a Fitch map.

Characterization of symmetrized Fitch maps
Before we provide a characterization of symmetrized Fitch maps, we derive some necessary conditions. Since ε : [X × X] irr → P(M ) is a symmetrized Fitch map, there is an edge-labeled tree (T, λ) that explains ε. Now, create a tree T from T , where every leaf x ∈ X \ X in T is deleted, and create an edge-labeling λ : E(T ) → P(M ) with λ (e) := λ(e) ∩ M for every e ∈ E(T ). By construction, m ∈ ε (x, y) if and only if the unique path between x and y in T contains an m-edge for all m ∈ M and x, y ∈ X . However, the tree T might have vertices of degree 2, and hence may not be a phylogenetic tree. However, we can further modify T as follows: Suppose that there is a vertex v of degree 2. Thus, there are two edges e 1 = {v, w} and e 2 = {v, u} in T . Now, we remove vertex v and the two edges e 1 and e 2 from T and add the edge f = {u, w}, and call the resulting tree T . By construction, every path in T between two leaves x, y ∈ X that contains the edge e 1 or e 2 must now contain the edge f in T . We construct the edge-labeling λ : E(T ) → P(M ) with λ (e) := λ (e) for all e ∈ E(T ) \ f and λ (f ) := λ (e 1 ) ∪ λ (e 2 ). Then, for every m ∈ M and every distinct x, y ∈ X , we have m ∈ ε (x, y) if and only if m ∈ λ (e) for some edge e ∈ P T (x, y). Clearly, T and λ can be iteratively modified as described above until no vertices with degree 2 remain, and hence we end up with an edge-labeled tree (T ,λ). Thus, by construction ofT andλ, we have m ∈ ε (x, y) if and only if the unique path between x and y inT contains an m-edge for all m ∈ M and x, y ∈ X . Hence, (T ,λ) explains ε ; and therefore, ε is a symmetrized Fitch map.
Proposition 3.13. Let ε : [X ×X] irr → P(M ) be a symmetrized Fitch map. Then, for every color m ∈ M the following equivalent statements are satisfied: 1. G m (ε) does not contain a K 1 + K 2 as an induced subgraph.

N ¬m [ε] is a partition of X.
4. G m (ε) is a complete multi-partite graph, where the neighborhoods in N ¬m [ε] form precisely the maximal independent sets in G m (ε). Hence, we can apply Lemma 3. 5 and Prop. 3.6 to conclude that the Statements (1), (2) and (3) are satisfied and equivalent. We continue by showing the equivalence between Statement (3) and (4). To this end, observe first that Lemma 3.5 (1,3) and Proposition 3.6 (1,3) directly imply that G m (ε) is a complete multi-partite graph if and only if N ¬m [ε] is a partition of X. Note, each complete multi-partite graph is, by definition, determined by its maximal independent sets. It remains to show that the neighborhoods in N ¬m [ε] are precisely the maximal independent sets of G m (ε). Let N ¬m [y] ∈ N ¬m [ε]. By definition, for all a, b ∈ N ¬m [y] we have m / ∈ ε(a, y) and m / ∈ ε(b, y). Hence, Statement (2) implies that m / ∈ ε(a, b). By definition of G m (ε) neither of {a, y}, {b, y} and {a, b} forms an edge in G m (ε). Hence, N ¬m [y] is an independent set of G m (ε). Assume, for contradiction, that N ¬m [y] is not a maximal independent set. Hence, there is a vertex z ∈ V \ N ¬m [y] such that {z, v} / ∈ E(G m (ε)) for all v ∈ N ¬m [y]. In particular, therefore, {z, y} / ∈ E(G m (ε)) and thus, by definition of G m (ε), m / ∈ ε(z, y). But then, z ∈ N ¬m [y]; a contradiction. Therefore, the neighborhoods in N ¬m [ε] are precisely the maximal independent sets of G m (ε).

For every
We continue with showing that Statement (3) and (5) Next, we assume that Statement (5) is satisfied, and let N, N ∈ N ¬m [ε] be two arbitrary neighborhoods. Since we have y ∈ N ¬m [y] for every y ∈ X, we conclude that every neighborhood is non-empty in N ¬m [ε] and y∈X N ¬m [y] = X. Moreover, let N ∩ N = ∅. Hence, there is a vertex y ∈ N ∩ N , and thus by Statement (5) we obtain N = N ¬m [y] = N . The latter arguments together imply that N ¬m [ε] is a partition of X, and thus Statement (3) is satisfied.
We will need to define certain sets of subsplits associated with the complementary neighborhoods of ε. Clearly, if a set S of subsplits is compatible, then every subset S ⊆ S is also compatible. S(ε) is compatible if and only if S (ε) is compatible because every subsplit N |N ∈ S(ε) \ S (ε) is trivial and S (ε) ⊆ S(ε). For later reference we summarize the latter observation in the following Before we provide our final characterization we observe that compatibility of S(ε) is a necessary condition for Fitch maps.
Proof: Let ε : [X × X] irr → P(M ) be a symmetrized Fitch map, and let (T, λ) be an arbitrary edgelabeled tree that explains ε. We denote by T |L the vertex-minimal (not necessarily phylogenetic) subtree of T with leaf set L ⊆ L(T ).
Assume for contradiction that there is a subsplit N |N ∈ S(ε) that is not displayed by T . Clearly, if |N | = 1 or |N | = 1, then T displays N |N . Thus, we can assume that |N | > 1 and |N | > 1. Moreover, if none of the paths P T (a, b) and P T (c, d) with a, b ∈ N and c, d ∈ N intersect, then the two trees T |N and T |N are vertex disjoint, and thus, there would be an edge e ∈ E(T ) such that N ⊆ L(T 1 ) and N ⊆ L(T 2 ). Therefore, there are four leaves a, b ∈ N and c, d ∈ N such that the paths P T (a, b) and P T (c, d) intersect. Hence, there is a vertex v ∈ V (P T (a, b)) ∩ V (P T (c, d)). Proposition 3.13 (5), together with a, b ∈ N and c, d ∈ N , implies that a ∈ N = N ¬m [b] and c ∈ N = N ¬m [d]. This, together with the fact that (T, λ) explains ε, implies that there is no m-edge on either of the paths P T (a, b) and P T (c, d).
Since v lies on both paths P T (a, b) and P T (c, d), there is no m-edge on the (sub)paths P T (a, v) and P T (v, d). Therefore, the path P T (a, d) ⊆ P T (a, v) ∪ P T (v, d) cannot contain an m-edge. Now, Proposition 3.13 (5) and a ∈ N imply that N = N ¬m [a]. However, since N |N is a subsplit, we have N ∩ N = ∅, and therefore d / . This, together with the fact that (T, λ) explains ε, implies that there is an m-edge on the path P T (a, d); a contradiction. In summary, every subsplit N |N ∈ S(ε) is displayed by T .
Definition 3.18. Let ε : [X × X] irr → P(M ) be a symmetric map such that S(ε) is compatible. Then, we denote with (T ε , λ ε ) an edge-labeled tree that satisfies the following two conditions: leaves x, y ∈ X we have m ∈ ε(x, y) if and only if there is an m-edge on the path P Tε (x, y). To this end, let m ∈ M be an arbitrary color, and let x, y ∈ X be two distinct arbitrary leaves.
First, suppose that m ∈ ε(x, y). Then, we have y / ∈ N ¬m [x]. This and y ∈ N ¬m [y] implies that x ) and N ¬m [y] ⊆ L(T e,y ), where T e,x and T e,y are the two connected components of T ε \ e. We may assume w.l.og. that this splitting edge e = {v, w} w.r.t. N ¬m [x]|N ¬m [y] is chosen such that v lies on the (unique) path P Tε (w, x) and that |V (T e,x )| is minimal among all such splitting edges w.r.t.
There are two cases, either |V (T e,x )| = 1 or |V (T e,x )| > 1. First, suppose that |V (T e,x )| = 1. This is if and only if L(T e, , and thus, |N ¬m [x]| = 1. Assume for contradiction that there is an N ∈ N ¬m [ε] with x , y ∈ N such that e ∈ E(P Tε (x , y )). Since the path P Tε (x , y ) contains an edge, we conclude that x = y , and thus, |N | ≥ 2. Therefore, , which forms a partition of X, implies that N ∩ N ¬m [x] = ∅. However, since e = {v, w} = {x, w} is an outer edge, we conclude that x ∈ {x , y } ⊆ N . Thus, x ∈ N ∩ N ¬m [x] = ∅; a contradiction. Hence,Condition (2b) in Def. 3.18 is satisfied. Thus, by construction of λ ε , we have m ∈ λ ε (e). Since e is an edge of the path P Tε (x, y) there is an m-edge in P Tε (x, y).
Otherwise, if |V (T e,x )| > 1 and thus |L(T e,x )| > 1, then the minimality of |V (T e,x )| implies that there are two leaves x , x ∈ N ¬m [x] such that v ∈ V (P Tε (x , x )). Now, assume for contradiction that e is not an m-edge. Since e satisfies Condition (2a) in Def. 3.18, it can therefore, not satisfy Condition (2b) in Def. 3.18. Hence, there is a neighborhood N ∈ N ¬m [ε] with z , z ∈ N such that e ∈ E(P Tε (z , z )). This, together with e = {v, w}, implies v ∈ V (P Tε (x , x )) ∩ V (P Tε (z , z )). Since one of the leaves in {z , z } ⊆ N is not contained in T e,x and since N ¬m [x] ⊆ L(T e,x ), we have N = N ¬m [x]. Since N ¬m [ε] is a partition of X, it must hold that N ∩ N ¬m [x] = ∅. Therefore, N |N ¬m [x] ∈ S m (ε) ⊆ S(ε). However, v ∈ V (P Tε (x , x )) ∩ V (P Tε (z , z )), together with x , x ∈ N ¬m [x] and z , z ∈ N , implies that the subsplit N |N ¬m [x] ∈ S(ε) is not displayed by T ε ; a contradiction. Therefore, e is an m-edge that lies on the path P Tε (x, y).
It remains to show that the existence of an m-edge on the path P Tε (x, y) implies m ∈ ε(x, y). Using contraposition, assume that m / ∈ ε(x, y), and thus x, y ∈ N ¬m [x] ∈ N ¬m [ε]. For every edge e ∈ E(P Tε (x, y)), Condition (2b) in Def. 3.18 is violated. Hence, for all e ∈ E(P Tε (x, y)), we have by construction of λ ε that m / ∈ λ ε (e). Thus, P Tε (x, y) does not contain an m-edge, which completes the proof.
The characterization of Fitch maps, which is summarized in Theorem 3.20, follows now directly from Proposition 3.13 (3), Corollary 3.17 and Lemma 3.19. For later reference we state here a simple consequence of Theorem 3.20.
k. This characterization was entirely based on the cardinality of complementary neighborhoods and the proof relied on the fact that the least-resolved tree for a non-symmetrized Fitch map is unique. However, finding a characterization for "k-restricted" symmetrized Fitch maps, seems to be quite difficult, since we cannot build upon the fact that least-resolved trees are unique for symmetrized Fitch maps (see Fig. 3 for a counterexample). Thus, it remains an open question if such restrictions may lead to deeper understanding of symmetrized Fitch maps and whether such maps can be recognized in polynomial time or not.
Real-life estimates of graphs are usually subject to measurement errors. Attempts to correct these estimates naturally leads to editing problem. In our setting, given a symmetric map ε, we are interested in a symmetrized Fitch map ε that is "as close as possible" to ε. A natural distance measure is e.g. the sum of the symmetric differences of the edges of G m (ε). In the light of Corollary 3.21 one may ask whether there is a connection between this "Fitch Map Editing" problem and the problem of finding a maximal subset of consistent quartets in S (ε). Conversely, can one of the many heuristics for the MAXIMUM QUARTET CONSISTENCY PROBLEM (see Morgado and Marques-Silva (2010); Reaz et al. (2014) and the references therein) be adapted such that N ¬m [ε] remains a partition for every m ∈ M ?