Diversities and the Geometry of Hypergraphs

The embedding of finite metrics in $\ell_1$ has become a fundamental tool for both combinatorial optimization and large-scale data analysis. One important application is to network flow problems in which there is close relation between max-flow min-cut theorems and the minimal distortion embeddings of metrics into $\ell_1$. Here we show that this theory can be generalized considerably to encompass Steiner tree packing problems in both graphs and hypergraphs. Instead of the theory of $\ell_1$ metrics and minimal distortion embeddings, the parallel is the theory of diversities recently introduced by Bryant and Tupper, and the corresponding theory of $\ell_1$ diversities and embeddings which we develop here.


Introduction
In their influential paper "The Geometry of Graphs and its Algorithmic Applications", Linial et al. [15] introduce a novel and powerful set of techniques to the algorithm designer's toolkit. They show how to use the mathematics of metric embeddings to help solve difficult problems in combinatorial optimization. The approach inspired a large body of further work on metric embeddings and their applications.
Our objective here is to show how this extensive body of work might be generalized to the geometry of hypergraphs. Recall that a hypergraph H = (V, E) consists of a set of vertices V and a set of hyperedges E, where each A ∈ E is a subset of V . The underlying geometric objects in this new context will not be metric spaces, but diversities, a generalization of metrics recently introduced by Bryant and Tupper [4]. Diversities are a form of multi-way metric which have already given rise to a substantial, and novel, body of theory [4,7,10,18]. We hope to demonstrate that a switch to diversities opens up a whole new array of problems and potential applications, potentially richer than that for metrics.
The result of [15] which is of particular significance to us is the use of metric embeddings to bound the difference between cuts and flows in a multi-commodity flow problem. Let G = (V, E) be a graph with a non-negative edge capacity C uv ≥ 0 for every edge {u, v} ∈ E. We are given a set of demands D uv ≥ 0 for u, v ∈ V . The objective of the multi-commodity flow problem is to find the largest value of f such that we can simultaneously flow at least f · D uv units between u and v for all u and v. As usual, the total amount of flow along an edge cannot exceed its capacity.
Multi-commodity flow is a linear programming problem (LP) and can be solved in polynomial time. The dual of the LP is a relaxation of a min-cut problem which generalizes several NP-hard graph partition problems. Given S ⊆ V let Cap(S) be the sum of edge capacities of edges joining S and V \ S and let Dem(S) denote the sum of the demands for pairs u, v with u ∈ S and v ∈ V \ S. We then have f ≤ Cap(S) Dem(S) for every S ⊆ V . When there is a single demand, the minimum of Cap(S) Dem(S) equals the maximum value of f , a consequence of the max-flow min-cut theorem. In general, for more than one demand there will be a gap between the values of the minimum cut and the maximum flow. Linial et al [15], building on the work of [14], show that this gap can be bounded by the distortion required to embed a particular metric d (arising from the LP dual) into ℓ 1 space. The metric d is supported on the graph G(V, E), meaning that it is the shortest path metric for some weighting of the edges E. By applying the extensive literature on distortion bounds for metric embeddings they obtain new approximation bounds for the min-cut problem.
In this paper we consider generalizations of the multi-commodity flow and corresponding minimum cut problems. A natural generalization of the single-commodity maximum flow problem in a graph is fractional Steiner tree packing [11]. Given a graph G = (V, E) with weighted edges, and a subset S ⊆ V , find the maximum total weight of trees in G spanning S such that the sum of the weights of trees containing an edge does not exceed the capacity of that edge. Whereas multi-commodity flows are typically used to model transport of physical substances (or vehicles), the Steiner tree packing problem arises from models of information, particularly the broadcasting of information (see [13] for references).
The fractional Steiner tree packing problem generalizes further to incorporate multiple commodities, a formulation which occurs naturally in multicast and VLSI design applications (see [19]). For each S ⊆ V we have a demand D S (possibly zero) and the set T S of trees in the graph spanning S. A generalized flow in this context is an assignment of non-negative weights z t,S to the trees in T S for all S, with the constraint that for each edge, the total weight of trees including that edge does not exceed the edge's capacity. The objective is to find the largest value of f for which there is a flow with weights z t,S satisfying for all demand sets S. These problems translate directly to hypergraphs, permitting far more complex relationships between the different capacity constraints. As for graphs, we have demands D S defined for all S ⊆ V . Each hyperedge A ∈ E has a non-negative capacity. We let T S denote the set of all minimal connected sub-hypergraphs which include S (not necessarily trees). A flow in this context is an assignment of non-negative weights z t,S to the trees in T S for all S, with the constraint that for each hyperedge, the total weight of trees including that hyperedge does not exceed the hyperedge's capacity. As in the graph case, the aim is determine the largest value of f for which there is a flow with weights z t,S satisfying the constraint (1) for all demand sets S.
All of these generalizations of the multi-commodity flow problem have a dual problem that is a relaxation of a corresponding min-cut problem. For convenience, we assume any missing edges or hyperedges are included with capacity zero. For a subset U ⊆ V let ∂U be the set of edges or hyperedges which have endpoints in both U and V \ U . The min-cut problem in the case of graphs is to find the cut U minimizing where e runs over all pairs of distinct vertices in V , while in hypergraphs we find U which minimizes where A, B run over all subsets of V .
In both problems the value of a min-cut is an upper bound for corresponding value of the maximum flow. Linial et al. [15] showed that the ratio between the min-cut and the max flow can be bounded using metric embeddings. Our main result is that this relationship generalizes to the fractional Steiner problem with multiple demand sets, on both graphs and hypergraphs, once we consider diversities instead of metrics. The following theorems depend on the notions of diversities being supported on hypergraphs and ℓ 1 -embeddings of diversities, which we will define in subsequent sections.
be a set of edge capacities and {D S } S⊆V a set of demands. There is a diversity (V, δ) supported on H, such that the ratio of the min-cut to the maximum (generalized) flow for the hypergraph is bounded by the minimum distortion embedding of δ into ℓ 1 .
Gupta et al. [9] proved a converse of the result of Linial et al. by showing that, given any graph G and metric d supported on it, we could determine capacities and demands so that the bound given by the minimal distortion embedding of d into ℓ 1 was tight. We establish the analogous result for the generalized flow problem in hypergraphs.

Theorem 2
Let H = (V, E) be a hypergraph, and let δ be a diversity supported on it. There is a set {C A } A∈E of edge capacities and a set {D S } S⊆V of demands so that the ratio of the min-cut to the maximum (generalized) flow equals the distortion of the minimum distortion embedding of δ into ℓ 1 .
A major benefit of the link between min-cut and metric embeddings was that Linial et al. and others could make use of an extensive body of work on metric geometry to establish improved approximation bounds. In our context, the embeddings of diversities is an area which is almost completely unexplored. We prove a few preliminary bounds here, though much work remains.
The structure of this paper is as follows. We begin in Section 2 with a brief review of diversity theory, including a list of examples of diversities. In Section 3 we focus on L 1 and ℓ 1 diversities, which are the generalizations of L 1 and ℓ 1 metrics. These diversities arise in a variety of different contexts. Fundamental properties of L 1 diversities are established, many of which closely parallel results on metrics.
In Section 4 we show how the concepts of metric embedding and distortion are defined for diversities, and establish a range of preliminary bounds for distortion and dimension. Finally, in Section 5, we prove the analogues of Linial et al's [15] and Gupta et al's [9] results on multi-commodity flows, as stated in Theorems 1 and 2 above.

Diversities
A diversity is a pair (X, δ) where X is a set and δ is a function from the finite subsets of X to R satisfying (D1) δ(A) ≥ 0, and δ(A) = 0 if and only if |A| ≤ 1.
for all finite A, B, C ⊆ X. Diversities are, in a sense, an extension of the metric concept. Indeed, every diversity has an induced metric, given by d(a, b) = δ({a, b}) for all a, b ∈ X. Note also that δ is monotonic: A ⊆ B implies δ(A) ≤ δ(B). For convenience, in the remainder of the paper we will relax condition (D1) and allow δ(A) = 0 even when |A| > 1. Likewise, for metrics we allow d(x, y) = 0 even if x = y.
We define embeddings and distortion for diversities in the same way as for metric spaces. Let (X 1 , δ 1 ) and (X 2 , δ 2 ) be two diversities and suppose c ≥ 1. A map φ : for all finite A ⊆ X 1 . We say that φ is an isometric embedding if it has distortion 1 and an approximate embedding otherwise. Bryant and Tupper [4] provide several examples of diversities. We expand that list here.

Examples of diversities
1. Diameter diversity. Let (X, d) be a metric space. For all finite A ⊆ X let 3. L 1 diversity. Let (Ω, A, µ) be a measure space and let L 1 denote the set of all all measurable functions f : Ω → R with Ω |f (ω)|dµ(ω) < ∞. An L 1 diversity is a pair (L 1 , δ 1 ) where δ 1 (F ) is given by for all finite F ⊆ L 1 . To see that (L 1 , δ 1 ) satisfies (D2), consider the triangle inequality for the diameter diversity on a real line and integrate over ω. 6. Hypergraph Steiner diversity. Let H = (X, E) be a hypergraph and let w : E → R ≥0 be a nonnegative weight function. Given A ⊆ X let δ(A) denote the minimum of w(E ′ ) := e∈E ′ w(e) over all subsets E ′ ⊆ E such that the sub-hypergraph induced by E ′ is connected and includes A.
. Let X be the collection of all sets in Σ with finite measure. For sets 8. Smallest Enclosing Ball diversity. Let (X, d) be a metric space. For each finite A ⊆ X let δ(A) be the diameter of the smallest closed ball containing A. Note that if that every pair of points in (X, d) are connected by a geodesic then (X, d) will be the induced metric of (X, δ), though this does not hold in general. 9. Travelling Salesman diversity. Let (X, d) be a metric space. For every finite A ⊆ X, let δ(A) be the minimum of over all orderings a 1 , a 2 , a 3 , . . . , a |A| of A.
10. Mean-width diversity. We define the mean-width diversity for finite A ⊂ R n as the mean width of conv(A), the convex hull of A, suitably scaled. Specifically, given a compact convex set K ⊂ R n and unit vector u ∈ R n , the width of K in direction u is given by That is, w(K, u) is the minimum distance between two hyperplanes with normal u which enclose K. The mean width of K is given by where µ n−1 denotes the surface measure on the unit sphere S n−1 [23]. Shephard [20] observed that the mean width varies according to the space that K is sitting in, whereas a scaled absolute mean width so that the induced metric of δ w is the Euclidean metric. Here B(·, ·) is the beta function. Note that π B(n/2,1/2) = π 2 n 1/2 + o( 1 √ n ), see [1].
11. S-diversity. Let X be a collection of random variables taking values in the same state space. For every finite A = {A 1 , . . . , A k } ⊆ X let δ(A) be the probability that A 1 , A 2 , . . . , A k do not all have the same state. Then (X, δ) is a diversity, termed the S-diversity since S, the proportion of segregating (non-constant) sites, is a standard measure of genetic diversity in an alignment of genetic sequences (see, e.g. [3]).
Below, we will show that ℓ 1 diversities, phylogenetic diversities, measure diversities, mean-width diversities and S-diversities are all examples of L 1 -embeddable diversities.

Extremal diversities
In metric geometry we say that one metric dominates another on the same set if distances under the first metric are all greater than, or equal to, distances under the second. The relation forms a partial order on the cone of metrics for a set: given any two metric spaces (X, d 1 ) and (X, The partial order provides a particularly useful characterization of the standard shortest-path graph metric d G . Let G = (V, E) be a graph with edge weights w : Given that the geometry of graphs of [15] is based on the shortest path metric, it is natural to explore what arises when we apply the same approach to diversities.
We say that a diversity (X, δ 2 ) dominates another diversity (X, Applying these to graphs, and hypergraphs, we obtain the diversity analogue to the shortest-path metric.

Theorem 3
1. Let G = (V, E) be a graph with non-negative weight function w : E → R ≥0 . The Steiner tree diversity is the unique maximal diversity δ such that δ({u, v}) ≤ w({u, v}) for all {u, v} ∈ E.

Let
Proof: Note that 1. is a special case of 2. We prove 2.
Let δ H denote the hypergraph Steiner diversity for H. For any edge A, the edge itself forms a connected sub-hypergraph, so δ H (A) ≤ w(A). Let δ be any other diversity which also satisfies δ contains B, and has summed weight δ H (B). Multiple applications of the triangle inequality (D2) gives ✷ As a further consequence, we can show that the hypergraph Steiner diversity dominates all diversities with a given induced metric.

Theorem 4
Let (X, δ) be a diversity with induced metric space (X, d). Let δ diam denote the diameter diversity on X and let δ S denote the Steiner diversity on X. Then for all finite A ⊆ X, the last inequality following from the monotonicity of δ. Let G be the complete graph with vertex set A and edge weights w({a, a ′ }) = d(a, a ′ ). Then δ(A) ≤ δ S (A) by Theorem 3. To obtain the final inequality, consider any ordering of the elements of A: a 1 , a 2 , . . . , a |A| . Then, using the triangle inequality repeatedly gives

General Properties
L 1 diversities were defined in Section 2.1. We say that a diversity (X, δ) is L 1 -embeddable if there exists an isometric embedding of (X, δ) into an L 1 diversity. A direct consequence of the definition of L 1 diversities (and the direct sum of measure spaces) is that if (X, δ 1 ) and (X, δ 2 ) are both L 1 diversities then so are (X, δ 1 + δ 2 ) and λδ 1 for λ > 0. Hence the L 1 -embeddable diversities on a given set form a cone.
Deza and Laurent [6] make a systematic study of the identities and inequalities satisfied by the cone of L 1 metrics. Much of this work will no doubt have analogues in diversity theory. For one thing, every identity for L 1 metrics is also an identity for the induced metrics of L 1 diversities. However L 1 diversities will satisfy a far richer collection of identities. One example is the following.

Proposition 5
Let (X, δ) be L 1 -embeddable and let A 1 , . . . , A n be finite subsets of X with union A.
Proof: First suppose (X, δ) embeds isometrically in ℓ 1 1 , the diameter diversity on R. Let x m and x M be the minimum and maximum elements in A. Identify A n+1 with A 1 and A 0 with A n . There is i, j such that x m ∈ A i , x M ∈ A j and, without loss of generality, i ≤ j. If i = j then If i = j then, without loss of generality, i < j. Select y 1 , . . . , y n such that y i = x m , y j = x M and y k ∈ A k for all k. Then, considering two different paths from y i to y j we obtain The case for general L 1 -embeddable diversities can be obtained by integrating this inequality over the measure space. ✷ Espínola and Piatek [7] investigated when hyperconvexity for diversities implied hyperconvexity for their induced metrics, proving that this held whenever the induced metric (X, d) of a diversity satisfies for all A = {a 1 , . . . , a k } ⊆ X. (See [7] for definitions and results). A consequence of Proposition 5 is that this property holds for all L 1 -embeddable diversities.

Proposition 6
If (X, δ) is L 1 embeddable then its induced metric (X, d) satisfies (5) for all finite A ⊆ X.

Examples of L 1 -embeddable diversities
We now examine three examples of diversities (X, δ) which are L 1 -embeddable. In all three cases, the diversity need not be finite, nor even finite dimensional. Later, we examine L 1 -embeddable diversities for finite sets.

Proof:
We treat each kind of diversity in turn.

Measure diversities.
In a measure diversity any element A ∈ Σ can be naturally identified with the function 1 A in L 1 (Ω, Σ, µ).
Observe now that

Mean-width diversities.
Let (R n , δ w ) be the n-dimensional mean-width diversity. Consider L 1 (S n−1 , B, ν) where S n−1 is the unit sphere in R n , B is the Borel subsets of S n−1 and ν is the measure given by Thus (R k , δ w ) is embedded in L 1 .

S-diversities.
Let (X, δ) be an S-diversity. Suppose that the random variables in X have state space S and that they are defined on the same probability space (Ω, Σ, µ). For each X γ ∈ X let f γ : S × Ω → R be given by f γ (s, ω) = 1 if X γ (ω) = s and 0 otherwise. Then In the case of measure diversities, we can also prove a converse result, in the sense that every L 1 diversity can be embedded in a measure diversity. We first make some observations about R. Consider the map φ : R → P(R) given by To see that this is true, we consider three cases. We let x m be the minimum of all x i and x M be the maximum. In case 1, all the x i are non-negative.
This gives the result. In case 2, all the x i are negative and the result follows similarly. In case 3, some of the x i are positive and some of the x i are negative. In this case ∪ i φ(x i ) = [x m , x M ] and ∩ i φ(x i ) is empty.
Proposition 8 Any L 1 -embeddable diversity can be embedded in a measure diversity.
Proof: Without loss of generality, consider the diversity (X, δ 1 ) where X is a subset of L 1 (Ω, A, µ). We construct a new measure space (X × R, F , µ × λ), i.e. the product measure of (X, M, µ) with Lebesgue measure on R.
We then have that for all finite subsets {f 1 , . . . , f k } of X we have

Finite, L 1 -embeddable diversities
Further results can be obtained for L 1 -embeddable diversities (X, δ) when X is finite, say |X| = n. In this case, the study of L 1 diversities reduces to the study of non-negative combinations of cut diversities, also called split diversities, that are directly analogous to cut metrics. Given U ⊆ X define the diversity δ U by In other words, δ U (A) = 1 when U cuts A into two parts. The set of non-negative combinations of cut diversities for X form a cone which equals the set of L 1 -embeddable diversities on X.

Proposition 9
Suppose that |X| = n and (X, δ) is a diversity. The following are equivalent.
(iii) (X, δ) is a split system diversity (see [10]). That is, δ is a non-negative combination of cut diversities.
Proof: (i)⇒(iii) Let φ : x → f x be an embedding from X to L 1 (Ω, A, µ). For each U ⊆ X and each ω ∈ Ω let Then for all ω and all A ⊆ X we have (iii) ⇒ (ii). Fix x 0 ∈ X. We can write δ as for all A ⊆ X where U runs over all subsets of X containing x 0 . This collection of subsets of X can be partitioned into m = n ⌊n/2⌋ disjoint chains by Dilworth's theorem. Denote these chains by C 1 , . . . , C m so that We will show that for every chain C = C i the diversity is R-embeddable. The result follows. To this end, define φ : X → R by where U x is the minimal element of the chain C that contains x. Then (ii) ⇒ (i). Follows from the fact that ℓ m 1 is itself an L 1 diversity. ✷ Diversities formed from combinations of split diversities were studied by [10] and in literature on phylogenetic diversities [16,17,21]. Proposition 10 is a restatement of Theorems 3 and 4 in [3].

Proposition 10
Let (X, δ) be a finite, L 1 -embeddable diversity, where for all A ⊆ X, where we assume λ U = λ (X\U) . For all A ⊆ X we have the identity and if ∅ = A = X we have From these we obtain the following characterization of finite, L 1 -embeddable metrics.

Proposition 11 A finite diversity (X, δ) is L 1 -embeddable if and only if it satisfies (6) and
B:A⊆B for all A ⊆ X, such that ∅ = A = X.
Proof: Necessity follows from Proposition 10. For sufficiency, observe that the map from a weight assignment λ to a diversity U⊆X λ U δ U is linear and, by Proposition 10, invertible for the space of weight functions λ satisfying λ U = λ X\U for all U . The image of this map therefore has dimension 2 n−1 − 1. From (6) we that the diversities δ(A) for |A| odd can be written in terms of diversities δ(A) for |A| even. Hence the space of diversities satisfying (6) has dimension 2 n−1 − 1 and lies in the image of the map. Condition 8 ensures that the diversity is given by a non-negative combination of cut diversities. ✷

Minimal-distortion embedding of diversities
Given two metric spaces (X 1 , d 1 ) and (X 2 , d 2 ) we can ask what is the minimal distortion embedding of X 1 into X 2 , where the minimum is taken over all maps φ : X 1 → X 2 . Naturally, we can ask the same question for diversities. Whereas the question for metric spaces is well-studied (though still containing many interesting open problems) the situation for diversities is almost completely unexplored. We state some preliminary bounds here, most of which leverage on metric results. We begin by proving bounds for several types of diversities defined on R k .

Lemma 1 Let δ
(1) diam and δ (2) diam be the diameter diversities on R k , evaluated using ℓ 1 and ℓ 2 metrics respectively. Let δ 1 and δ w be the ℓ 1 and mean-width diversities on R k . Then for all finite All bounds are tight.

Proof:
The inequalities δ (1) diam (A) ≤ δ 1 (A) and δ (2) diam (A) ≤ δ w (A) are due to Theorem 4. To prove the ℓ 1 bounds, note that for each dimension i there are a (i) , b (i) ∈ A which maximize |a i −b i |. Hence with equality given by subsets of {±e i : i = 1, . . . , k}.
To prove the mean-width bound note that, by Jung's theorem [5], a set of points in R k with diameter d = δ (2) diam (A) is contained in some sphere with radius r, where Hence conv(A) is contained in a set with mean width 2r ≤ d where again B(·, ·) denotes the beta function. The bound holds in the limit for points distributed on the surface of a sphere. ✷ We now investigate upper bounds for the distortion of diversities into L 1 space. To begin, we consider only diversities which are themselves diameter diversities. In many senses, these diversities are similar to metrics, and it is perhaps no surprise that they can embedded with a similar upper bound as their metric counterparts.

Proposition 12
Let (X, d) be a metric space, |X| = n, and let (X, δ diam ) be the corresponding diameter diversity.
1. There is an embedding of (X, δ diam ) in ℓ k 1 with distortion O(log 2 n) and k = O(log n).

2.
There is an embedding of (X, δ diam ) in (R k , δ w ) with distortion O(log 3/2 n) and k = O(log n).
Proof: 1. Any metric on n points can be embedded into the metric space ℓ k [15]. Let φ be an embedding for (X, d) with d(x, y) ≤ d 1 (φ(x), φ(y)) ≤ Kd 1 (x, y) for all x, y ∈ X, where K is O(log n). As above, we let δ (1) diam denote the diameter diversity for the ℓ k 1 metric. For all A ⊆ X we have from Lemma 1 that The result now follows since k is O(log n) and K is O(log n).
2. As shown in [15] (see also [2]), there is an embedding φ of (X, d) into ℓ k 2 with for all x, y ∈ X, where k and K are O(log n). For all A ⊆ X we have from Lemma 1 that

The result follows. ✷
We now consider the problem of embedding general diversities. The bounds we obtain here can definitely be improved: we do little more than slightly extend the results for diameter diversities.

Theorem 13
Let (X, δ) be a diversity with |X| = n.  Proof: Any diversity can be approximated by the diameter diversity of its induced metric with distortion n, as shown in Theorem 4. This fact together with the previous theorem gives the required bounds. ✷ From upper bounds we switch to lower bounds. Any embedding of diversities with distortion K induces an embedding of the underlying metric with distortion at most K. Hence we can use the examples from metrics [14] to establish that there are diversities which cannot be embedded in ℓ 1 with better than an Ω(log n) distortion.
We have been able to obtain slightly tighter lower bounds for embeddings into ℓ k 1 where k is bounded. Proposition 14 Let (X, δ) be the n-point diversity with δ(A) = |A| − 1 for all non-empty A ⊆ X. Then the minimal distortion embedding of (X, δ) into ℓ k 1 has distortion at least (n − 1)/k.

Proof:
For any embedding φ of (X, δ), Lemma 1 shows that for some a, b ∈ X, a = b. The distortion of φ is equal to Taking A = X and B = {a, b} shows that the distortion is at least (n − 1)/k. ✷ A consequence of Proposition 14 is that there will, in general, be no embedding of diversities in ℓ 1 for which both the distortion and dimension is O(log n), or indeed polylog, ruling out a direct translation of the classical embedding results for finite metrics. Even so, we suspect that the upper bounds achieved in Theorem 13 can still be greatly improved.

The geometry of hypergraphs
Having reviewed diversities, ℓ 1 diversities, and the diversity embedding problems, we return to their application in combinatorial optimization. We will here establish analogous results to those of [15] and [9] for hypergraphs and diversity embeddings into ℓ 1 . We first state the extensions of maximum multicommodity flows and minimum cuts a little more formally.
Given a hypergraph H = (V, E), non-negative weights C e for e ∈ E and S ⊆ V , the goal is find the maximum weighted sum of minimal connected sub-hypergraphs covering S without exceeding the capacity of any hyperedge. Let T S be set of all minimal connected sub-hypergraphs of H that include S. For each sub-hypergraph t ∈ T S assign weight z t . We consider the following generalization of fractional Steiner tree packing [11] which we call maximum hypergraph Steiner packing: Identify z t satisfying the LP: maximize t∈TS z t subject to t∈TS:e∈t z t ≤ C e for all e ∈ E, z t ≥ 0, for all t ∈ T S .
As before, if we define C e for all subsets e of V , and let it be zero for e ∈ E, we can drop the dependence of the problem on E. The reference [12] studies an oriented version of this problem. As with flows, maximum hypergraph Steiner packing has a multicommodity version. For each subset S of V suppose we have non-negative demand D S . We view D and C as non-negative vectors indexed by all subsets of V . Suppose we want to simultaneously connect up all S ⊆ V with minimal connected sub-hypergraphs carrying flow f D S for all S ⊆ V and we want to maximize f . The corresponding optimization problem is: Note that we use z t,S rather than just z t because the same connected sub-hypergraph t might cover more than one set S in the hypergraph. We call the optimal value of f for this problem MaxHSP(V, C, D), for maximum multicommodity hypergraph Steiner packing.
Next we define the appropriate analogues of the min-cut problem, which we call minimum hypergraph cut. As before, we let ∂U be the set of hyperedges which have endpoints in both U and V \ U , and we make the simplifying assumption that every subset is a hyperedge, including any missing hyperedges with capacity zero. We define Below we will show that MaxHSP(V, C, D) ≤ MinHypCut(V, C, D). We define We say that a non-negative vector C is supported on the hypergraph H = (V, E) if C e = 0 for e ∈ E. Then for any hypergraph H we define γ(H) to be the greatest value of γ(V, C, D) over all nonnegative C and D such that C is supported on H.
We say that a diversity δ on V is supported on H = (V, E) if it is the hypergraph Steiner diversity of H for some set of non-negative weights C e for e ∈ E. For any diversity δ on V we define k 1 (δ) to be the minimal distortion between δ and an ℓ 1 -embeddable diversity on V . For any hypergraph H we define k 1 (H) to be the maximum of k 1 (δ) over all diversities δ supported on H. The major result for this section is that for all hypergraphs H k 1 (H) = γ(H).
The fact that γ(H) ≤ k 1 (H) (our Theorem 1) is the analogue of results in Section 4 of [15] and the fact that equality holds (our Theorem 2) is the analogue of Theorem 3.2 in [9].

Proposition 15
For all V, C, D, where ∆(V ) is the set of all diversities on V . In particular, the optimal δ ∈ ∆(V ) is supported on the hypergraph H = (V, E) where E is the set of all e such that C e > 0.

Proof:
We rewrite the linear program (11) in standard form. We break the equality constraint into ≤ and ≥ and note that we can omit the ≥ constraint, because it will never be active. Then we get Let d R be the dual variables corresponding to the first set of inequality constraints, and let y S be the inequalities corresponding to the second set of inequality constraints. Then the dual problem is By strong duality, (12) and (13) have the same optimal values. Next we show that (13) is equivalent to minimize R⊆V C R δ(R), subject to S⊆V D S δ(S) ≥ 1, δ is a diversity. (14) where the minimum is taken over all diversities.
To see the equivalence of (13) and (14), suppose that δ is a diversity solving (14). Let d R = δ(R) for all R ⊆ V and y S = δ(S) for all S ⊆ V . Then the objective function of (13) is the same, the second line of (13) still holds, the third line holds by the triangle inequality for diversities, and the fourth and fifth line hold by the non-negativity of diversities.
To see the other direction, suppose d R and y S solve (13). Let δ be the Steiner diversity on V generated edge weights d R , R ⊆ V . Since δ(R) ≤ d R for all R, this can only decrease the objective function. Also, by the definition of δ, δ(S) ≥ y S for all S, so the inequality of (14) is satisfied too. Thus the two LPs have the same minima.
Note we can assume that δ is the Steiner diversity for a weighted hypergraph with hyperedges {R : C R > 0}. If not, we can replace δ with the Steiner diversity on the hypergraph whose hyperedges are the set {R : C R > 0} and whose weights are the C R . This Steiner diversity will have the same value on the hyperedges as δ, so the objective function will not change, but the value can only increase on other subsets of V , and so the constraint is still satisfied.
Finally, (14) is equivalent to minimize R⊆V CRδ(R) S⊆V DS δ(S) , subject to δ is a diversity. (15) This is because, any solution of (14) will only have a smaller or equal value for the objective function of (15). And any solution of (15) can be rescaled without changing the objective function so that S⊆V D S δ(S) = 1, giving a feasible solution to (14) with the same objective function. This rescaling will not change the hypergraph that δ is supported on. ✷

Proposition 16
For all V, C, D, Proof: For any cut (U, V \ U ) of V , let δ U be the corresponding cut diversity. Then by definition we have that where we restrict U to values where the denominator is non-zero. We need to show that this value is not decreased by taking the minimum over all ℓ 1 -embeddable diversities instead. Let δ be ℓ 1 -embeddable diversity that minimizes the ratio. By Proposition 9, δ can be expressed as a finite linear combination of cut-diversities: δ = i a i δ Ui for some non-negative a i and some subsets U i of V . Let I be the index i that minimizes C · δ Ui /D · δ Ui . Then we claim that To see this, observe that a · x a · y for vectors a with a i ≥ 0 for all i, where x and y are non-negative vectors of the same size. We claim that G attains its minimum on this domain at a value of a consisting of a vector with a single non-zero entry. To show this, we compute the gradient of G ∇G(a) = 1 (a · y) 2 [x(a · y) − y(a · x)].
If x and y are parallel then the result immediately follows so assume that they are not. Then ∇G is not zero anywhere in the domain, and so the maximum of G must be taken on boundary of the domain. So at least one a i must be zero. Discard this term from the numerator and the denominator of G. Then repeat the argument for G as a function of a vector of one fewer entries. Repeating gives a single non-zero value, which may be set to 1. ✷ The following theorem implies Theorem 1. Proof: Since ∆ 1 (V ) ⊆ ∆(V ), the first inequality follows from the previous two results. For the second inequality, given V, C, D and hypergraph H = (V, E) supporting C, let δ solve the MaxHSP linear program (14). By Proposition 15 we know that δ is supported on H. Letδ be the minimal-distortion ℓ 1 embeddable diversity of δ. We may assume that δ ≤δ ≤ k 1 (H)δ. Then

Hence the upper bound in Theorem 17 is tight.
To prove this result, we will need a lemma from [9] which we reproduce here.

Lemma 2 (Claim A.2 of [9])
Let v, u ∈ R k be positive vectors. Define If S ⊆ R k is a closed set of positive vectors, define H(v, S) as min u∈S H(v, u). If K ⊂ R k is a closed convex cone, then where the maximum is taken over all non-negative vectors D, C ∈ R k for which D·u C·u ≤ 1 for any u ∈ K.
Proof: (of Theorem 18) Let δ be a diversity supported by the hypergraph H that maximizes k 1 (δ), and define λ = k 1 (δ) = k 1 (H). We need to show that λ ≤ max where the maximum is taken over all C, D where C is supported on H.
Let v be given by δ, and let K be the cone of all ℓ 1 -embeddable diversities on V . Then λ = H(v, K). We apply the lemma to show that where the maximum is taken over all non-negative vectors C, D which satisfy the restriction D·µ C·µ ≤ 1 for any ℓ 1 -embeddable diversity µ. This tells us that there exists C, D such that λ = D·δ C·δ and D·µ C·µ ≤ 1 for any ℓ 1 -embeddable diversity µ.
First we show that we may assume that C is supported on H. Suppose that for some set R ⊆ V , R ∈ E we have C R > 0. Since δ is supported on H there are hyperedges h 1 , . . . , h k that form a connected set covering R with δ(R) = i=1,...,k δ(h k ). Define a new vector C ′ by C ′ R = 0, C ′ hi = C hi + C R , for i = 1, . . . , k, and Even with this new C ′ , D we still have λ = D·δ C ′ ·δ and D·µ C ′ ·µ ≤ 1 for any ℓ 1 -embeddable diversity µ. To see this, first note that C ′ · µ ≥ C · µ so D · µ C ′ · µ ≤ D · µ C · µ ≤ 1.
Secondly, since δ satisfies δ(R) = i=1,...,k δ(h k ) and these are the only sets on which C is changed, it follows that C ′ · δ = C · δ. We repeat this procedure until we have C R < 0 only if R ∈ E.