A matroid associated with a phylogenetic tree

A (pseudo-)metric $D$ on a finite set $X$ is said to be a `tree metric' if there is a finite tree with leaf set $X$ and non-negative edge weights so that, for all $x,y \in X$, $D(x,y)$ is the path distance in the tree between $x$ and $y$. It is well known that not every metric is a tree metric. However, when some such tree exists, one can always find one whose interior edges have strictly positive edge weights and that has no vertices of degree 2, any such tree is -- up to canonical isomorphism -- uniquely determined by $D$, and one does not even need all of the distances in order to fully (re-)construct the tree's edge weights in this case. Thus, it seems of some interest to investigate which subsets of $\binom{X}{2}$ suffice to determine (`lasso') these edge weights. In this paper, we use the results of a previous paper to discuss the structure of a matroid that can be associated with an (unweighted) $X-$tree $T$ defined by the requirement that its bases are exactly the `tight edge-weight lassos' for $T$, i.e, the minimal subsets $\cl$ of $\ch$ that lasso the edge weights of $T$.


Introduction
Given any finite tree T without vertices of degree 2, there is an associated matroid M(T ) having ground set X 2 where X is the set of leaves of T . In this paper, we describe this matroid and investigate a number of interesting properties it exhibits. The motivation for studying this matroid is its relevance to the problem of uniquely reconstructing an edge-weighted tree from its topology and just some of the leaf-to-leaf distances in that tree. This combinatorial problem arises in phylogenetics (the inference of evolutionary relationships from genetic data) since -due to patchy taxon coverage by available genetic loci [7] -reliable estimates of evolutionary distances can often be obtained only for some pairs of species.
In [3], we already introduced and explored related mathematical questions. We asked when knowing just some of the leaf-to-leaf distances is sufficient to uniquely determine -or, as we say, 'lasso' -the topology of the tree, or its edge weights, or both. In this paper, we turn our attention to a fixed (un-weighted) tree T and the set of minimal subsets L of X 2 for which the leaf-to-leaf distances between all x, y ∈ X with {x, y} ∈ L relative to some edge-weighting ω of T suffice to determine all the other distances relative to ω and, thus, the edge-weighting ω. Indeed, these subsets form the bases of the matroid M(T ) that will be studied here.
We begin by recalling some basic definitions and some relevant terminology from [3] on trees, lassos, and associated concepts (readers unfamiliar with basic matroid theory may wish to consult [9] -though even Wikipedia may suffice). We then define M(T ) and describe some of its basic properties before presenting our main results. Finally, we provide a number of remarks, observations, and questions for possible further study.

Some terminology and basic facts
We will assume throughout that X is a finite set of cardinality n ≥ 3 and, for any 2 elements x, y ∈ X, we will usually write just xy instead of {x, y}, and we will refer to any such set as a 'cord' whenever x = y holds. Throughout this paper, we will assume that T = (V, E) is an X−tree, i.e., a finite tree with vertex set V , leaf set X ⊆ V , and edge set E ⊆ V 2 that has no vertices of degree 2. Two X−trees T 1 = (V 1 , E 1 ) and T 2 = (V 2 , E 2 ) are said to be 'equivalent' if there exist a bijection ϕ : V 1 →V 2 with ϕ(x) = x for all x ∈ X and E 2 = {ϕ(u), ϕ(v)} : {u, v} ∈ E 1 } in which case we will also write T 1 T 2 . In case every interior vertex of an X−tree T (that is, every vertex in V − X) has degree 3, T will also be said to be a 'binary' X−tree.
Further, given any two vertices u, v of T , we denote by [u, v] the set of all vertices on the path p T (u, v) in T from u to v and by E(u|v) = E T (u|v) the set of all edges e in E on that path so that [u, v] = e∈E(u|v) e always holds.
For each e ∈ E, we denote by ω e the map E→R : f → δ ef (where δ is, of course, the Kronecker delta function). And for all e ∈ E and xy ∈ X 2 , we put δ e|xy := 1 in case e ∈ E(x|y) and δ e|xy := 0 otherwise.
Here, given an X−tree T = (V, E), we will be mainly concerned with the R-linear map ω(e) δ e|xy and the associated X 2 -labeled family of linear forms Note that λ xy (ω e ) = δ e|xy holds for all e ∈ E and all x, y ∈ X, and D ω (x, y) = λ xy (ω) = ω T (xy) for all ω ∈ R E and x, y ∈ X where D ω = D (T,ω) denotes the map associated to the edge weighting ω -a map which in case ω is a non-negative edge weighting is nothing but the associated (pseudo-)metric on X induced by the edge weighted tree T = (T, ω) much studied in phylogenetic analysis.
Recall also that, given an arbitrary metric D : X × X→R ≥0 defined on X, • the metric D is dubbed a 'tree metric' if it is of the form D (T,ω) for some X−tree T = (V, E) and some non-negative edge weighting ω : E→R ≥0 of T • which, in turn, holds if and only if D satisfies the well-known 'four-point condition' stating that, for all a, b, c, d in X, the larger two of the three distance sums D(a, b)+D(c, d), D(a, c)+ • that, in this case, one can actually always find an X−tree T and an edge weighting ω of T with D = D (T,ω) such that ω is strictly positive on all interior edges in which case ω is called a 'proper' edge weighting of T , • any such pair (T, ω) is -up to canonical isomorphism -uniquely determined by D, • and one does not even need to know the values of D for all cords xy in X 2 in order to determine all the other distances and, thus, the edge-weighting ω in this case.
In this note, we continue our investigation of those subsets L of X 2 for which -given the X−tree T -already the restriction ω T | L of the map ω T to L suffices to determine -or 'lasso'the edge weighting ω of T that we began in [3]. To this end, we denote, for any subset L of X 2 , -by L = L T the R-linear subspace of the dual vector space R E := Hom R (R E , R) of the space R E generated by the maps λ xy with xy ∈ L, -by rk(L) = rk T (L) := dim R L the dimension of L , and -by Γ(L) := (X, L) the graph with vertex set X and edge set L.
Following the conventions introduced in [3], -we will refer to a subset L of X 2 as being 'connected', 'disconnected' or 'bipartite' etc. whenever the graph Γ(L) is connected, disconnected, or bipartite and so on, -a connected component of Γ(L) will also be called a connected component of L, -and given any two subsets A, B of X, the subset {ab : a ∈ A, b ∈ B} of X 2 will be denoted by A ∨ B so that a subset L of X 2 is bipartite if and only if there exist two disjoint subsets A, B of X with L ⊆ A ∨ B.
Further, a subset L of X 2 will be called -an 'edge-weight lasso' for T if the implication "ω T 1 | L = ω T 2 | L ⇒ ω 1 = ω 2 " holds for any two proper edge-weightings ω 1 , ω 2 : E→R >0 of T , -a 'topological lasso' for T if the implication "ω T 1 | L = ω T 2 | L ⇒ T T " holds for any X−tree T and any proper edge-weightings ω 1 of T and ω 2 of T , respectively, and -a 'strong lasso' for T if it is simultaneously an edge-weight and a topological lasso for T .
Next, recall (see e.g. [6,9]) that an 'abstract' matroid M with a ground set, say, M can be Here, given any X−tree T , we want to investigate the matroid M(T ) with ground set M := X 2 associated to T whose rank function rk M(T ) is the map rk T : P( X 2 )→N 0 defined just above, i.e., the matroid that is 'represented' (over R, again see e.g. [6,9]) by the map It was noted already in [3, Theorem 1] that a subset L of X 2 is an edge-weight lasso for an X−tree T = (V, E) if and only if the implication "ω T 1 | L = ω T 2 | L ⇒ ω 1 = ω 2 " does not only hold for any two proper edge weightings ω 1 , ω 2 of T , but for any two maps ω 1 , ω 2 ∈ R E and, hence, if and only if L coincides with R E or, using the terminology introduced above, if and only if L ∈ G(T ) or, just as well, rk T (L) = rk T ( X 2 ) = |E| holds. In particular, an edge-weight lasso L for T is a 'tight' edge-weight lasso for T , i.e, a minimal subset of X 2 that is an edge-weight lasso for T , if and only if its cardinality coincides with |E| if and only if it is a basis of M(T ), that is, L ∈ B(T ) holds.
Particular types of X−trees that will play an important role in this paper are shown in x : x ∈ X where ' ' denotes just some arbitrary, but fixed element not in X; (ii) 'quartet trees', i.e., binary X−trees that have four leaves (with T ab|cd denoting the quartet tree with leaf set {a, b, c, d} whose central edge that will also be denoted by e ab|cd separates the leaves a, b from c, d), and (iii) 'caterpillar trees', i.e. binary a 2 a n−2 a n a n−1 (ii) (i) * * * * * * * (i) A star tree with leaf set X 4 := {a, b, c, d}; (ii) A binary X 4 −tree -up to equivalence, there are two more binary X 4 −trees ; (iii) a 'caterpillar' X n −tree for X n := {a 1 , a 2 , . . . , a n−1 , a n }.

Star trees
For the simplest type of X−tree, i.e., the star tree T := T (X) with leaf set X (cf. Figure 1 (i) , the associated matroid B(T ) is well known: It is easily seen to exactly coincide with the 'biased matroid' of the complete signed graph (X, X 2 ) with vertex set X all of whose edges have sign −1. In consequence (see e.g. [6, Section 6.10] and the references therein to Zaslavsky's papers on signed graphic matroids), the following results are known to hold: Proposition 3.1. Given a finite set X of cardinality n ≥ 3, the following holds for the matroid B(T ) associated to the star tree T := T (X) with leaf set X: (i) The collection G(T ) of all edge-weight lassos for T coincides with the collection of all 'strongly non-bipartite' subsets L of X 2 , i.e, all subsets L of X 2 for which none of the connected components of L is bipartite.
(ii) The collection B(T ) of all tight edge-weight lassos for T coincides with the collection of all minimal strongly non-bipartite subsets L of X 2 , i.e, all subsets L of X 2 for which each connected component of L contains exactly one circle 1 and the length of this circle has odd parity.
(iii) The collection I(T ) of all independent subsets of M(T ) coincides with the collection of all subsets L of X 2 for which each connected component of L is either a tree or contains exactly one circle and the length of this circle has odd parity.
(iv) The collection C(T ) of all circuits of M(T ) coincides with the collection of all subsets L of X 2 that either form a circle of even length or a pair of circles of odd length together with a connecting simple path, such that the two circles are either disjoint (then the connecting path has one end in common with each circle and is otherwise disjoint from both) or share just a single common vertex (in this case the connecting path is that single vertex).
(v) The co-rank n − rk T (L) of a subset L of X 2 relative to M(T ) coincides with the number of non-bipartite connected component of L.
(vi) The closure [L] T of a subset L of X 2 relative to M(T ) coincides with the union of (a) the edge set of the complete graph whose vertex set is the union of the vertex sets of all non-bipartite connected components of L and (b) all subsets of the form

A recursive approach for computing B(T )
Every X−tree can be reduced by a sequence of edge contractions to a star tree (one may even insist that at each stage, one of the two subtrees incident with the edge being contracted has only one non-leaf vertex, though we do not require this here). Thus, Proposition 3.1 can be used as basis for a recursive description of the matroid associated with any X−tree, provided that one can describe, for any X−tree T , how to obtain B(T ) from B(T /f ) where f is any interior edge of T , and T /f is the X−tree obtained from T by collapsing edge f . We provide such a description shortly, in Proposition 4.2, using the following lemma.
Lemma 4.1. Given any X−tree T = (V, E), any subset F of the set of interior edges of T , any map λ ∈ R E , and any map ρ : In particular, given any edge-weight lasso L for T , L is also an edge-weight lasso for the X−tree T /F . More generally, holds for every subset L of X 2 and any subset F of the set of interior edges of T .
Proof: The first part follows directly from the definitions and implies that λ 2 ) must generate R E−F whenever the maps λ T xy (xy ∈ X 2 ) generate R E while, more generally, they generate a space whose dimension coincides with the difference of rk T (L) and the dimension of the kernel of the map Given an X−tree T , an interior edge f of T , a pair xy ∈ X 2 , and a basis and B ∈ B(T /f ) holds. Then, denoting by Λ B,xy the space of all maps λ ∈ {xy} ∪ B T with However, given any map ρ ∈ R B and any real number c, it follows from the fact that, by defin-

Remark:
Similarly, suppose that T = (V, E) is an X−tree and that U ⊆ V is a T −core as defined in [3, Section 5], i.e., a non-empty subset of V for which the induced subgraph T U := (U, E U := {e ∈ E : e ⊆ U }) of T with vertex set U is connected (and, hence, a tree) and the degree deg T U (v) of any vertex v in T U is either 1 or coincides with the degree deg T (v) of v in T . Then, the rank rk T (L) of a subset L of X 2 relative to T and the rank rk T U (L U ) of the corresponding subset L U of X U 2 relative to the X U −tree T U are easily seen to be related by the inequality This fact can be used to prove [3,Theorem 5] in the same way Lemma 4.1 has been used above to establish Proposition 4.2.

An example
To illustrate Proposition 4.2, consider -for X := {a, b, c, d} -the quartet X−tree T := T ab|cd shown in Figure 1 (ii). In this case, there is -up to scaling -only one linear relation between the six maps λ T xy (xy ∈ X 2 ), viz. the relation Thus, B(T ) consists of the four 5-subsets L of X 2 that do not contain exactly one of the four cords ac, ad, bc, bd -or, equivalently, with |L ∩ {ac, ad, bc, bd}| = 3 -and, hence, the four subsets L of X 2 whose graphs Γ(L) are shown in Figure 2(ii). Clearly, if f coincides with the unique interior edge of T ab|cd i.e., the edge denoted by e ab|cd in Figure 1 (ii) , T /f is equivalent to the star tree T := T (X), also shown in Figure 1 holds implying that, to bases of type B 1 , we can add cords of type dc, but not cords of type db.
And for the two cords da, db ∈ X 2 − B 2 , we have holds implying that, to bases of type B 2 , we can add either one of the two missing cords. Obviously, this fully corroborates our previous assertion about B(T ab|cd ).

Pointed x-covers of binary X−trees T that are bases of M(T )
When T is a binary X−tree, some particular bases in B(T ) are easily described: Select any element x ∈ X and, for each one of the n − 2 interior vertices v of T , consider the three components of the graph obtained from T by deleting v. Select an element of X from each of the two components that do not contain x, and denote this pair by and let P x (T ) denote the collection of subsets of X 2 that can be generated in this way (by the various choices of y v and z v as v varies).
For example, considering again the quartet X−tree T := T ab|cd with its two interior vertices u and v as shown in Figure 1  any pointed x−cover L of a binary X−tree is not only an edge-weight, but a strong lasso for that tree.
We note also that, given two distinct elements x 1 , x 2 in X, a subset L of X 2 cannot simultaneously be a pointed x 1 -cover in P x1 (T ) and a pointed x 2 -cover in P x2 (T ) unless T is a caterpillar tree with x 1 and x 2 at opposite 'ends' of the tree: Indeed, if there exists some L ∈ P x1 (T )∩P x2 (T ), we must have {y v (x 1 )z v (x 1 ) : v ∈ V − X} = x 2 a : a ∈ X − {x 1 , x 2 } implying that the path from x 1 to x 2 in T must pass through every interior vertex of T .
Our next results require two definitions that will also be important later in this paper: Recall first that, given an X−tree T and a subset Y of X of cardinality at least 3, one denotes • by T | Y the Y −tree obtained from the minimal subtree of T that connects the leaves in Y by suppressing any resulting vertices of degree 2 (see e.g. [3, Section 2.3]), • by V | Y and E| Y its vertex and edge set, respectively, • and, given in addition any edge weighting ω of T , one denotes by ω| Y the 'induced' edge weighting of T | Y , i.e., the edge weighting that maps any edge {u, v} ∈ E| Y onto the sum e∈E(u|v) ω(e), yielding a surjective R-linear map res Y : res Y (ω) holds for all ω ∈ R E and all yy ∈ Y 2 .
It follows that the map λ T yy : R E →R coincides, for all yy ∈ Y 2 , with the map λ T | Y yy • res Y , the composition of the maps res Y and λ T | Y yy . So, denoting by res Y the dual -and necessarily injective -map R E| Y → R E : λ → λ • res Y of the map res Y , we have also L T = res Y ( L T | Y ) and rk T (L) = rk T | Y (L) for every subset L of Y 2 . In consequence, we must also have for every subset Y of X of cardinality at least 3 and every subset L of Y 2 , implying also that every circuit L ⊆ Y 2 of M(T | Y ) must also be a circuit of M(T ), i.e., we have C(T | Y ) ⊆ C(T ) for every such subset Y of X of cardinality at least 3.
Further, denoting -for every x ∈ X -by e x ∈ E the unique pendant edge of T containing x, we say that a 2-subset ab of X forms (or 'is') a 'T −cherry' if the two edges e a , e b ∈ E share a vertex, and ab is said to form a 'proper T −cherry' if this vertex has degree 3. Note that a hence, also the corresponding four maps λ T xy (xy ∈ L abcd ) are linearly independent. So, by the matroid augmentation property of independent sets, there exists some B ∈ M(T ) containing these four cords. Conversely, if T | Y T ab|cd and, therefore, also λ T ab + λ T cd = λ T ad + λ T bc holds, L abcd cannot be part of a basis B ∈ M(T ). It follows that M( However, it has been observed already by H. Colonius and H. Schultze in [1,2] that Q(T 1 ) = Q(T 2 ) holds for any two X−trees T 1 , T 2 if and only if one has T 1 T 2 (for a more recent account, see

The rank of topological lassos
Now assume that n ≥ 4 holds and recall that the following three assertions are -according to [3,Theorem 8] -equivalent in this case for any X−tree T = (V, E) and any bipartition of X into two disjoint non-empty subsets A, B: (split-i) The subset A ∨ B of X 2 is a topological lasso for T , (split-ii) A ∨ B is a 't−cover' of T (i.e., given any interior vertex v of T and any two edges e, e ∈ E with v ∈ e, e , there exists some cord xy in L with e, e ∈ E(x|y), see [4,Section 7]).
(split-iii) A ∩ {a, b} = ∅ = B ∩ {a, b} holds for every T −cherry ab. 2 And it was also noted in this context that such bipartitions exist if and only if every T −cherry is a proper T −cherry and n ≥ 4 holds.
Here, we want to complement this result as follows: Theorem 2. Given any X−tree T = (V, E), one has rk T (L) ≤ |E| − 1 for every bipartite subset L of X 2 . Furthermore, the following assertions are equivalent for every such subset L of X 2 : (i) The rank rk T (L) of L coincides with |E| − 1.
(ii) There exists some cord xy ∈ X 2 such that L ∪ {xy} is an edge-weight lasso for T .
(iii) L is connected and L ∪ {xy} is an edge-weight lasso for T for some cord xy ∈ X 2 if and only if L ∪ {xy} is not bipartite.
(iv) L is connected, the closure [L] T of L relative to M(T ) coincides with the edge set of the (necessarily unique) complete bipartite graph with vertex set X whose edge set contains L, i.e., [L] T coincides with the set A∨B in case the two subsets A, B of X form the (necessarily unique) bipartition of X with L ⊆ A ∨ B, and this set forms a 'hyperplane' in M(T ), i.e., a maximal subset of X 2 of rank smaller than |E|.
Proof: Assume that A and B are two subsets of X that form a bipartition of X with L ⊆ A ∨ B, and let ω A|B ∈ R E denote the map in R E that maps every interior edge of T onto 0, every pendant edge e that is incident with some leaf in A onto 1, and every pendant edge e that is incident with some leaf in B onto −1. Clearly, holds for every cord xy ∈ X 2 . In particular, one has λ T xy (ω A|B ) = 0 for some cord xy ∈ X 2 if and only if xy ∈ A∨B holds. So, standard matroid theory implies that rk T (L) ≤ rk T (A∨B) ≤ |E|−1 must hold for every subset L of a set of the form A ∨ B for some bipartition A, B of X, that is, for every bipartite subset L of X 2 -which is just our first assertion. subset L of X 2 and the fact that any maximal subset of X 2 of rank smaller than |E| must have rank |E| − 1.
The above theorem has an interesting application regarding topological lassos: Proof: (i): In view of the equivalence (i) ⇐⇒ (iv) of Theorem 2, there must exist a (necessarily unique) bipartition of X into two disjoint subsets A and B such that the hyperplane [L] T coincides with the set A ∨ B. Furthermore, this set must also be a topological lasso for T if every T −cherry is a proper T −cherry: Indeed, in view of the results from [3] quoted above, it suffices to show that, if ab is a proper T −cherry and a ∈ A holds, we must have b ∈ B. Yet, otherwise, we would have b ∈ A and, therefore, ab ∈ L which would, in case n ≥ 4, allow us to construct yet another non-zero map ω ab ∈ R E with λ T xy (ω ab ) = 0 for all cords xy ∈ L that is not a scalar multiple of ω A|B : Indeed, if the two edges e a , e b ∈ E containing a and b, respectively, share the vertex v, there would exist exactly one further edge e ab ∈ E with v ∈ e ab , and putting (ii) Conversely, assume that L is a topological lasso for T of rank less than |E|. Then, there must exist a non-zero map ω 0 ∈ R E with λ xy (ω 0 ) = 0 for all xy ∈ L. If ω 0 (e 0 ) = 0 held for some interior edge e 0 ∈ E, we could find some proper edge-weighting ω ∈ R E ≥0 of T with ω(e 0 ) = |ω 0 (e 0 )|, while ω(e) > ω 0 (e) holds for all edges e ∈ E − {e 0 } which -in turn -would imply that the map ω := ω − sgn ω 0 (e 0 ) ω 0 would be a map in R E ≥0 that would be a non-proper edge-weighting of T for which D (T,ω) | L = D (T,ω ) | L holds. In view of the last remark in [3, Subsection 2.2], this would contradict our assumption that L is a topological lasso for T .
Thus, as L must be connected for every topological lasso L for T in view of [3,Theorem 4], it follows that the pair A, B of subsets of X forms a bipartition of X, that L must be bipartite relative to this partition, that A ∨ B must also be a topological lasso for T , and that -in consequence -every T −cherry must be a proper T −cherry and ω 0 must be a (positive) scalar multiple of the map ω A|B ∈ R E defined above. In particular, there can be -up to a scalingonly one non-zero map ω ∈ R E with λ xy (ω) = 0 for all xy ∈ L implying that the rank of L must indeed coincide with |E| − 1 and that L ∪ {xy} must, therefore, be a strong lasso for T for every cord xy ∈ X 2 for which L ∪ {xy} is not bipartite, i.e. for every cord xy ∈ X 2 − A ∨ B.
Corollary 6.1. Given any X−tree T = (V, E), the following assertions are equivalent: (i) There exists a bipartite subset L of X 2 that is a topological lasso for T . (ii) There exists a topological lasso L for T with rk T (L) < |E|.
(ii ) There exists a topological lasso L for T with rk T (L) = |E| − 1.
(iii) Every T −cherry is a proper T −cherry.
A simple example to illustrate Corollary 6.1 is presented in Figure 3: assume that x 1 , x 2 , x 3 , x 4 , x 5 , x 6 are six distinct elements in X such that each of the three pairs is an independent subset of X 2 in M(T ). Clearly, this implies that a binary X−tree T for which M(T ) is a binary matroid must be a caterpillar tree. More generally, an arbitrary X−tree T for which M(T ) is a binary matroid must be either a star tree with at most five leaves or an X−tree for which -as in the case of the (binary) caterpillar trees -two interior vertices u and v of T exist for which the path from u to v in T passes every interior vertex of T and all of these except perhaps u and v have degree 3 while the two vertices u and v have degree 3 or 4. We will show in a separate paper that, conversely, M (T ) is a binary matroid whenever this holds.

Minimal strong lassos do not form a matroid
Although the minimal edge-weight lassos for any X−tree form a matroid defined on X 2 , the same is not always true for the minimal strong lassos.
Both L 1 and L 2 are minimal strong lassos for the X−tree that has one interior vertex adjacent to a, b, c, d, and a second interior vertex adjacent to e, f ; however, L 1 has one more element than L 2 .