On the VC-dimension of half-spaces with respect to convex sets

A family S of convex sets in the plane defines a hypergraph H = (S, E) as follows. Every subfamily S' of S defines a hyperedge of H if and only if there exists a halfspace h that fully contains S' , and no other set of S is fully contained in h. In this case, we say that h realizes S'. We say a set S is shattered, if all its subsets are realized. The VC-dimension of a hypergraph H is the size of the largest shattered set. We show that the VC-dimension for pairwise disjoint convex sets in the plane is bounded by 3, and this is tight. In contrast, we show the VC-dimension of convex sets in the plane (not necessarily disjoint) is unbounded. We provide a quadratic lower bound in the number of pairs of intersecting sets in a shattered family of convex sets in the plane. We also show that the VC-dimension is unbounded for pairwise disjoint convex sets in R^d , for d>2. We focus on, possibly intersecting, segments in the plane and determine that the VC-dimension is always at most 5. And this is tight, as we construct a set of five segments that can be shattered. We give two exemplary applications. One for a geometric set cover problem and one for a range-query data structure problem, to motivate our findings.


Introduction
Geometric hypergraphs (also called range-spaces) are central objects in computational geometry, statistical learning theory, combinatorial optimization, linear programming, discrepancy theory, databases and several other areas in mathematics and computer science.
In most of these cases, we have a finite set P of points in R d and a family of simple geometric regions, such as say, the family of all halfspaces in R d . Then we consider the combinatorial structure of the set system (P, {h ∩ P }) where h is any halfspace. A key property that such hypergraphs have is the so-called bounded VC-dimension (see later in this section for exact definitions). More precisely, when the underlying family consists of points, the VC-dimension of the corresponding graph is at most d + 1. Many optimization problems can be formulated on such structures. In this paper we initiate the study of a more complicated structure by allowing the underlying set of vertices to be arbitrary convex sets and not just points. We show that when the underlying family consists of pairwise disjoint convex sets in the plane then the corresponding hypergraph has VC-dimension at most 3 and this is tight. In this case the bound on the VC-dimension is the same as for points, however we explain later why the proof for pairwise disjoint convex sets has to be more technical. We also show that when the sets may have intersection, then the VC-dimension is unbounded. Moreover, we prove that even for pairwise disjoint convex sets in R d the VC-dimension is unbounded already for d ≥ 3. This is in sharp contrast to the situation when the underlying family consists of points.
We note that many deep results that hold for arbitrary hypergraphs with bounded VC-dimension readily apply to such hypergraphs. This includes, e.g., bounds on the discrepancy of such hypergraphs, bounds of O( 1 ε 2 ) on the size of ε-approximations and also bounds on matchings or spanning trees with (so-called) low crossing numbers (see, e.g., Chazelle and Welzl (1989); Matousek (1995); Matousek et al. (1993); Li et al. (2001)).

Preliminaries and Previous Work
A hypergraph H = (V, E) is a pair of sets such that E ⊆ 2 V . A geometric hypergraph is one that can be realized in a geometric way. For example, consider the hypergraph H = (V, E), where V is a finite subset of R d and E consists of all subsets of V that can be cut-off from V by intersecting it with a shape belonging to some family of "nice" geometric shapes, such as the family of all halfspaces. See Figure 1, for an illustration of a hypergraph induced by points in the plane with respect to disks.
The elements of V are called vertices, and the elements of E are called hyperedges. We consider the following kinds of geometric hypergraphs: Let C be a family of convex sets in R 2 (or, in general, in R d ). We say that a subfamily S ⊆ C is realized if there exists a halfspace h such that S = {C ∈ C | C ⊂ h}. In words, there exists a halfspace h such that the subfamily of C of all sets that are fully contained in h is exactly S. We refer to the hypergraph H = (C, {S | S is realized}) as the hypergraph induced by C. In the literature, hypergraphs that are induced by points with respect to geometric regions of some specific kind are also referred to as range spaces. We start by introducing the concept of VC-dimension.
VC-dimension and ε-nets A subset T ⊂ V is called a transversal (or a hitting set) of a hypergraph H = (V, E), if it intersects all sets of E. The transversal number of H, denoted by τ (H), is the smallest possible cardinality of a transversal of H. The fundamental notion of a transversal of a hypergraph is central in many areas of combinatorics and its relatives. In computational geometry, there is a particular interest in transversals, since many geometric problems can be rephrased as questions on the transversal number of certain hypergraphs. An important special case arises when we are interested in finding a small size set N ⊂ V that intersects all "relatively large" sets of E. This is captured in the notion of an ε-net for a hypergraph: In other words, a set N is an ε-net for a hypergraph H = (V, E) if it stabs all "large" hyperedges (i.e., those of cardinality at least ε|V |). The well-known result of Haussler and Welzl (1987) provides a combinatorial condition on hypergraphs that guarantees the existence of small ε-nets (see below). This requires the following well-studied notion of the Vapnik-Chervonenkis dimension Vapnik and Chervonenkis (1971): Definition 2 (VC-dimension). Let H = (V, E) be a hypergraph. A subset X ⊂ V (not necessarily in E) is said to be shattered by H if |{X ∩ S : S ∈ E}| = 2 X . The Vapnik-Chervonenkis dimension, also denoted the VC-dimension of H, is the maximum size of a subset of V shattered by H.
Relation between ε-nets and the VC-dimension Haussler and Welzl (1987) proved the following fundamental theorem regarding the existence of small ε-nets for hypergraphs with small VC-dimension.
Theorem 1 (ε-net theorem). Let us consider H = (V, E) a hypergraph with VC-dimension equal to d.
In fact, it can be shown that a random sample of vertices of size O( d ε log 1 ε ) is an ε-net for H with a positive constant probability Haussler and Welzl (1987).
Many hypergraphs studied in computational geometry and learning theory have a "small" VC-dimension, where by "small" we mean a constant independent of the number of vertices of the underlying hypergraph. In general, range spaces involving semi-algebraic sets of constant description complexity, i.e., sets defined as a Boolean combination of a constant number of polynomial equations and inequalities of constant max-imum degree, have finite VC-dimension. Halfspaces, balls, boxes, etc. are examples of ranges of this kind; see, e.g., Matoušek (2002); Pach and Agarwal (1995) for more details.
Thus, by Theorem 1, these hypergraphs admit "small" size ε-nets. Komlós et al. (1992) proved that the bound O( d ε log 1 ε ) on the size of an ε-net for hypergraphs with VC-dimension d is best possible. Namely, for a constant d, they construct a hypergraph H with VC-dimension d such that any ε-net for H must have size of at least Ω( 1 ε log 1 ε ). Recently, several breakthrough results provided better (lower and upper) bounds on the size of ε-nets in several special cases Alon (2012); Aronov et al. (2010); Pach and Tardos (2011).
In summary, the VC-dimension is a central notion in many areas. It proved to be a useful concepts with many applications. To the best of our knowledge the VC-dimension has not been studied for the geometric hypergraphs introduced in this paper.
Results We look at a selection of natural geometric hypergraphs that arise in our setting. Our main contribution is to determine its VC-dimension precisely, in all cases that we consider.
Theorem 2. For convex sets in the plane, possibly intersecting, the VC-dimension is unbounded.
Theorem 4. For convex disjoint sets in R 2 the VC-dimension is at most 3 and this is tight.
Theorem 5. For segments in the plane, possibly intersecting, the VC-dimension is at most 5 and this is tight.
In order to show the relevance of our findings to the field of algorithms, we give two simple exemplary applications that follow easily together with previous work.
Algorithmic Applications For our first expository application, we consider a natural hitting set problem.
Definition 3 (Hitting Halfplanes with segments). Given a set H of halfplanes and a set S of segments, the halfspace-segment-hitting set problem asks for a minimum set T ⊂ S, such that every halfplane h ∈ H contains at least one segment t ∈ T entirely. This is an optimization problem, where we try to minimize the size of T .
Using the framework of Brönnimann and Goodrich (1995), we get the following theorem.
Corollary 1. There is an O(log c)-approximation algorithm for the halfspace-segment-hitting set problem, where c is the size of the optimal solution.
Proof: We use Theorem 5 for segments in the plane and the framework from Brönnimann and Goodrich (1995).
As a second expository application, we have to introduce the problem of approximate range counting. Given a family of sets O and a halfplane h, we denote by n(h, O) the number of sets fully contained in h. We denote by |O| the relative number of sets. We want to construct a data structure that reports a number t such that for some given ε. Thus, here we allow an absolute error rather than a relative error. A simple way is to construct a data structure is to sample a family of sets P ⊆ O and query how many objects of P are fully contained inside h. If this set P is small we can do queries fast.
Corollary 2. Let O be a set of disjoint convex objects in the plane. Then there exists a set Proof: We use Theorem 4 for disjoint convex sets in the plane and the results from Li et al. (2001).
Note that there are many different notions of approximate range counting and we presented here a simple one. Recall that we only want to highlight the relevance of our findings for algorithmic applications.
Structure In Section 2, we show Theorems 2 and 3. In Section 3, we handle the case of disjoint sets in the plane, which shows Theorem 4. In Section 4, we show Theorem 5. In Section 5, we will consider the minimum number of intersections that shattered families of sets in the plane must have.

Convex sets in the plane and higher dimensions
In this section, we show that when the underlying convex sets may intersect, the VC-dimension can be unbounded.
Proof of Theorem 2: For any n > 0, we provide n convex sets in the plane that can be shattered. We denote [n] = {1, 2, . . . , n}. We place 2 n −2 points on the unit circle in the plane as follows. For every nontrivial subset I ⊂ [n], I = ∅, I = [n] we place a point p I on that unit circle. For each j ∈ [n] we define the convex set C j as the convex hull of all points p I for which j ∈ I. Namely C j = conv({p I | j ∈ I}). We claim that the family C = {C 1 , . . . , C n } is shattered. To see this, let S ⊆ C. If S is either empty or the whole family C then it is easy to see that it is realized as there is a halfplane containing all sets and there is also a halfplane containing none of the sets. So let I be the corresponding non-trivial set of indices corresponding to the members of S. Let us denote by J the set [n] \ I. Consider a line that separates the point p J from all other points, see Figure 2 for an illustration. We claim that the halfplane h bounded by and containing those points realizes the subfamily S. Indeed notice that for each C i ∈ S all points p K for which i ∈ K are contained in h so their convex hull C i is also contained in h. Note also that for any C j / ∈ S we have that j ∈ J so C j contains the point p J and hence it is not fully contained in h. This shows that S is realized for any S and hence C is shattered.
Proof of Theorem 3: For any n > 0, we provide n disjoint convex sets in R d that can be shattered. Let C 1 , . . . , C n be the set of convex shapes that we construct in the proof of Theorem 2. Map each point of C i such as (x, y) to a point (x, y, i) from R 2 to R 3 . With this mapping, all the convex sets will be disjoint and we can still shatter these sets as before, by considering vertical halfspaces. See Figure 3 for an illustration. The case for d > 3 follows in the same way.

Disjoint convex sets in the plane
From the previous section, we know that the VC-dimension is unbounded in the plane when shapes can be intersecting. Here we study the case where all the shapes are disjoint.
Lemma 1. Let C be a family of pairwise disjoint convex sets in the plane R 2 . Then, the hypergraph induced by C has VC-dimension at most 3.    Note that it is easy to find three disjoint convex sets that can be shattered, see Figure 5. As mentioned earlier, a hypergraph induced by a family of points in in the plane R 2 has VC-dimension at most 3 too. The usual proof uses Radon's theorem: Consider four points, they can be divided into two subsets A and B such that the convex hulls of A and B intersect, finally observe that no halfplane can realize A nor B.
As an illustration, in Figure 4, any halfplane that contains the points p 2 and p 4 must also contain p 1 or p 3 . However, the halfplane denoted by h in the figure includes the sets 2 and 4, but does not include the sets 1 and 3. Therefore h realizes {2, 4}. To show that no family of four pairwise disjoint convex sets is shattered by halfplanes, we need further arguments. We first prove the following useful lemma.
Lemma 2 (Convex Hull). Let C be a family of sets in the plane. If C is shattered, then each set in C contains a point on the boundary of the convex hull of C.
Note that in Lemma 2 the sets need not to be convex.
Proof: Let us assume to the contrary that there exists a set C contained in the convex hull of C , where C is a proper subset of C. Then any halfplane containing all elements in C must also contain C. Therefore, it is not possible to realize C , which implies that C is not shattered.
Proof of Lemma 1: Let us assume by contradiction that there exists a shattered family C = {1, 2, 3, 4} of four disjoint convex sets. For each convex set i in C, we denote by p i a point in i that lies on the boundary of the convex hull of C. The existence of this point is assured by Lemma 2. Without loss of generality, let us assume that p 2 and p 4 have the same y-coordinate, p 2 to the left of p 4 , with p 1 above and p 3 below them. By assumption, there exist a halfplane h containing 2 and 4 but not 1 nor 3. In particular, h contains p 2 and p 4 . However, as p 1 and p 3 are on the boundary of the convex hull of C, h must contain at least one of them, say p 1 . We denote by the bounding line of h, which is therefore below the segment between the points p 2 and p 4 . As the set {2, 4} is realized by h, 1 must contain a point q below . We denote by s the segment between the points p 1 and q. Likewise, we denote by t the segment between the points q and p 3 . Finally, we denote by r the union of s and t. Let us consider the boundary of the convex hull of C. The points p 1 and p 3 split it into two curves, one that contains p 2 and the other that contains p 4 . Let us consider the union of the curve that contains p 2 with the curve r. We have obtained a simple closed curve, which by the Jordan curve theorem splits the plane into two parts. In particular, it splits the convex hull of C into two parts. We say that the part which contains p 2 is to the left of r, and the other part to the right of r. As 1 is convex, s is fully contained inside 1, as its endpoints are contained in 1. By assumption, all i are pairwise disjoint. Thus 2 is not intersecting with s, therefore all points in 2 are to the left of r, as t lies below and 2 is above . By the same argument, all points in 4 are to the right of r. Note that any halfplane realizing {1, 3} contains p 1 , q and p 3 . By convexity, it contains the triangle with vertices p 1 , q and p 3 , and in particular it contains r. Thus, it would also contain 2 or 4, which is a contradiction.

Segments
A line segment in the plane can be viewed as the simplest convex set that is not a point. We now turn to study the special case of the VC-dimension of hypergraphs induced by line segments.
Lemma 3. Let S be a set of (not necessarily disjoint) line segments in R 2 . Then the hypergraph induced by S has VC-dimension at most 5. Before proceeding with the proof we need the following lemma. We say that a set of segments is in general position, if no three endpoints are collinear. We give an upper bound on the number of subsets that can be realized, by relating this number to the number of tangents to pairs of segments. To the best of our knowledge, we do not know of any previous result that uses the same argument.
Lemma 4. Let S be a set of n segments in the plane, in general position. Then the number of subsets of S that are realized is at most 2n (n − 1) + 2.
Proof: Let h be a halfplane realizing a subset S ⊆ S, with S = ∅ and S = S. See Figure 6 for an illustration. In the first step, we identify a unique tangent line , by some transformation argument. In the second step, we show that every pair of segments has at most four tangent lines. Thus, together with the trivial subsets of S, we can realize at most 4 n 2 + 2 = 2n(n − 1) + 2 subsets S . We denote by the bounding line. We orient from tail to head such that S lies to the left of . If there are several points on then we can clearly say, which is closest to its head in the obvious way. Translate h inward until its boundary line hits one element of S . This must happen as S = ∅. As the set S is in general position, touches S in at most two endpoints p, q. Suppose that, we touch indeed two points, the other case is handled similarly. Furthermore, we say that q is the point closer to the head of . Then we rotate counterclockwise around q, up until one of two events happen.
(a) The line touches another vertex of some segment s ∈ S at its head.
(b) The line touches an endpoint of some segment s ∈ (S \ S ) at its tail.
Note that it could also be that touches a vertex of some segment s ∈ (S \ S ) at its head. We ignore that case, as this event does not change whether h realizes S or not. It is easy to see that it is impossible that touches another vertex of some segment s ∈ S at its tail. In case (a), we touch a new point q and we proceed as before. In other words, we rotate counterclockwise around q , up until, either (a) or (b) will happen. In case (b), we stop. Note that since S = S, this will eventually happen. We will end up in a configuration, where touches a vertex of a segment s ∈ S at its head and a vertex of another segment t ∈ (S \ S ) at its tail. Note that both segments are to the left of , with respect to the orientation of . Note that the halfspace h defined by only needs an infinitesimally small rotation to realize the original set S that we started with. Thus if there were any halfspace realizing S , there must be one of the special type, that we just described. This shows the first step. In the second step, we will upper bound the number of those special configurations.
For the second step, consider two segments a, b ∈ S. See Figure 7 for an illustration. Note first that they are either crossing or they are disjoint. One of them must be contained in the set S that we want to realize and the other is not. This also immediately tells us the orientation of the line in the configuration. It is easy to check that all four configurations are displayed in Figure 7.
Proof of Lemma 3: Let S be a set of n line segments that can be shattered. We can assume that S is in general position, by some standard perturbation arguments. We will use the fact that the number of distinct subsets of S that are realized is at most 2n (n − 1) + 2, see Lemma 4. As there are 2 n subsets that need to be realized, for S to be shattered, we can conclude that it follows that 2 n ≤ 2n (n − 1) + 2. However, this inequality is violated for n ≥ 6, so n ≤ 5.
The next lemma shows the second part of Theorem 5.
Lemma 5. There exists a set of five segments that are shattered by halfplanes.

Proof:
The set is shown in Figure 8. The segments that are realized are shown in turquoise, and the other in orange. It is easy to find a halfplane realizing none or all the segments. We show in Figure 8 how to realize all the remaining configurations.

Number of intersections
From Lemma 1 we have proven that any shattered set of n convex sets are not pairwise disjoint when n ≥ 4. We show that there are quadratically many pairs of intersecting convex sets.
Lemma 6. In a shattered set of n convex sets there are at least n(n − 3)/6 intersections.
Proof: Consider the intersection graph G of the convex sets. As for any four vertices there is an edge, we obtain that the independence number of G is at most 3. (The independence number of a graph denotes the size of the largest independent set of the graph.) Therefore there is no K 4 in the complement of G.
Turán's theorem states that any graph with n vertices not containing K k+1 has at most (1 − 1/k) · n 2 /2 edges Turán (1941). Therefore, there are at most (1 − 1/3) · n 2 /2 = n 2 /3 non-edges in G. This is equivalent to having at least n 2 − n 2 /3 = n(n − 3)/6 edges in G. It would be interesting to find an upper bound on how few intersections there may be in a shattered set of n convex sets. The question can also be asked for n ≤ 5 when considering the more specific case of segments. We have given in Lemma 5 a shattered set of five segments with five intersections. We produce now an example of a shattered set with four segments having only one intersection.
Lemma 7. There exists a shattered set of four segments with only one intersection.
Proof: We consider the four segments as in Figure 9, denoted by {1, 2, 3, 4}. It is easy to realize none or all segments. To realize three of them, consider a halfplane whose bounding line intersects the fourth segment. Likewise to realize two consecutive segments, consider a halfplane whose bounding line intersects the two remaining segments. For opposite segments, say {1, 3}, take a halfplane not containing 4 whose bounding line intersects 2. To realize {2, 4}, take a halfplane not containing 1 whose bounding line is parallel to 4 and intersects 3. Finally the reader can check that for each set with exactly one segment i, it is possible to find a halfplane containing only i.

Open questions
As mentioned in Section 5, it would be interesting to find tighter lower bounds on the number of intersections in a shattered set of n convex sets in the plane. Likewise, we can ask the same question when the convex sets are constrained to be segments. By Lemma 1, we know that for any shattered set of at least four convex sets, there are two convex sets intersecting. We have shown in Lemma 7 that it is possible to find a shattered set of four convex sets, with only one intersection. Even more, this holds under the additional constraint that the convex sets be segments. Therefore, we ask whether this holds for any n: Is the lower bound on the number of intersections the same whether we consider segments or general convex sets? If not, what is the lower bound when considering polygons with k vertices? By Theorem 2, the VC-dimension of convex sets in the plane is unbounded. However, when restricting to segment, we have shown in Theorem 5 that the VC-dimension is at most 5, and this is tight. The problem of finding upper bounds on the VC-dimension naturally generalizes to other types of constrained convex sets, for instance polygons with k vertices.