VC-dimensions of random function classes

. For any class of binary functions on [ n ] = { 1 , . . . , n } a classical result by Sauer states a sufﬁcient condition for its VC-dimension to be at least d : its cardinality should be at least O ( n d − 1 ) . A necessary condition is that its cardinality be at least 2 d (which is O (1) with respect to n ). How does the size of a ‘typical’ class of VC-dimension d compare to these two extreme thresholds ? To answer this, we consider classes generated randomly by two methods, repeated biased coin ﬂips on the n -dimensional hypercube or uniform sampling over the space of all possible classes of cardinality k on [ n ] . As it turns out, the typical behavior of such classes is much more similar to the necessary condition; the cardinality k need only be larger than a threshold of 2 d for its VC-dimension to be at least d with high probability. If its expected size is greater than a threshold of O (log n ) (which is still signiﬁcantly smaller than the sufﬁcient size of O ( n d − 1 ) ) then it shatters every set of size d with high probability. The behavior in the neighborhood of these thresholds is described by the asymptotic probability distribution of the VC-dimension and of the largest d such that all sets of size d are shattered.


Introduction
Let n and d be two integers such that 1 d n and denote by [n] the set {1, . . ., n}.A class of binary functions is a subset of {0, 1} [n] .As only binary functions are considered we refer to them simply as functions.They will also be viewed as binary vectors: f = (f (1), . . ., f (n)).Let x = {x 1 , . . ., x d } be a subset of [n] and F ⊂ {0, 1} [n] be a class of functions.For any function f ∈ F , denote by f |x its restriction to x, i.e., f |x = (f (x 1 ), . . ., f (x d )).The class F is said to shatter x if: where | • | denotes the cardinality of a finite set.The Vapnik-Chervonenkis dimension of F , denoted as VC(F ), is defined as the size of the largest set x shattered by F .The following Sauer -Shelah lemma (Sauer (1972); Shelah (1972); Vapnik and Chervonenkis (1971)) is a fundamental result relating the VCdimension of a class of functions to its cardinality.(1) Then F shatters at least one set x ⊂ [n] of cardinality |x| = d.
An interesting extension (see Theorem 1 in Frankl (1983)) states that such a threshold (which holds for any class F ) arises due to the simple fact that any ideal class F 0 of this size must shatter some set of size d.More generally, the lemma holds for classes on infinite domains X where instead of |F | one has max Y ⊂X |F |Y | with Y running over all finite subsets such that |Y | d.Aside of being an interesting combinatorial result in set theory (Chapter 17 in Bollobás (1986)), Lemma 1 has been extended in various directions notably by Frankl (1983); Haussler and Long (1995); Alon et al. (1997); Anstee et al. (2005) and found applications in numerous fields such as combinatorial geometry (Pach and Agarwal (1995); Matousek (1998)), graph theory (Haussler and Welzl (1987); Anthony et al. (1995a)), empirical processes (Pollard (1984)) and statistical learning theory (Haussler (1992); Vapnik (1998)).
The VC-dimension has numerous extensions, for instance, the pseudo-dimension for real-valued function classes, the scale-sensitive (or fat-shattering) dimension which characterizes the so-called Glivenko-Cantelli classes (Alon et al. (1997)), and the testing dimension (Romanik and Smith (1994); Anthony et al. (1995b)) of F , denoted as TD(F ), which is defined as the maximal integer d such that all sets of size d are shattered by F .For other related dimensions see Haussler and Long (1995) and Anthony and Bartlett (1999).
Observe that as n tends to infinity and d remains fixed, the right hand side of ( 1) is of order O(n d−1 ).Thus O(n d−1 ) is a threshold point that dictates a sufficient cardinality for F to shatter at least one set of size d.As we shall show, it is not a necessary condition since it typically takes a class only of size O(log n) to shatter all sets of size d.In order to show this, our primary aim in this paper is to investigate the size of sets that are shattered by a random class F of functions (a random element of the power set P({0, 1} [n] )).
We consider two natural approaches: fixing the size to be k and drawing a class F with equal probability from all classes of size k (uniform model, Definition 2), or drawing each individual function from P({0, 1} [n] ) with an equal probability p in 2 n random trials (binomial model, Definition 1).We state several results on the asymptotic behavior of the size of shattered sets with an explicit dependence on k or p as n tends to infinity.
As a preview of our results, let us sketch the evolution of the size of sets shattered by a random class F under the uniform model with increasing k = k n .Initially, when k is fixed, sets of size d are shattered only if k 2 d .It turns out that at least one such set is shattered with high probability (w.h.p., i.e. tending to 1 as n tends to infinity).As soon as k n starts to increase to infinity there are shattered sets of any size and, moreover, any fixed set of size d (in particular the set [d]) is shattered w.h.p.If the speed at which k n grows is sufficiently slow, k n ≪ log n, then regardless of the value of d, there exists at least one set of size d which is not shattered (here a n ≪ b n denotes lim a n /b n = 0).For k n = α log n + O(1), there exists a finite d such that all sets of size d are shattered and at least one set of size d + 1 which is not shattered.Finally, for k n ≫ log n and any fixed d, all sets of size d are shattered.A similar behavior holds for the binomial model, where k n is now replaced by 2 n p n , i.e., the expected size of the random class F .
The results are stated in details in Sections 2 -5 where three kinds of events are studied: (i) shattering at least one set of size d, (ii) shattering a given set x of size d, (iii) shattering all sets of size d.Clearly, the VC-dimension and the testing-dimension of a random class F are related to (i) and (iii) respectively.In order to have a more complete comparison we introduce between them an intermediate dimension which is related to (ii) and is defined as follows.
Let F be a class of functions from [n] to {0, 1}.The initial-dimension of F , denoted as ID(F ), is the maximal integer d such that the set [d] is shattered by F .
Clearly, the three dimensions are related by In subsequent sections we obtain the asymptotic distribution of VC(F ) and ID(F ) under the binomial and uniform models where the (expected) number of functions in the class is fixed (Propositions 1 and 2).The asymptotics of TD(F ) turns out to be much sharper as for k n = c log n its distribution is concentrated on one or two values w.h.p. (Proposition 3).This has striking similarities with the well known result on the concentration of the clique number for random graphs (see Chap. 11 of Bollobás (2001)).
In our analysis we use standard techniques from discrete probability and the theory of random graphs (available in Janson et al. (2000)).In Ycart and Ratsaby (2007), we applied these to study the VCdimension of a random class of functions with a fixed number of ones, i.e. random hypergraphs.
The remainder of the paper is organized as follows: Section 2 defines the shattering events, the two probability models and describes their interdependence.Sections 3, 4 and 5 study the asymptotics of the VC-dimension, the initial-dimension and the testing-dimension respectively.

Probability models and events
We start by defining the events under study.They are associated with the shattering of sets by a random class of functions, i.e. a random variable with values in the power set P({0, 1} [n] ).For clarity, we defer the precise definition of probability distributions for random classes of functions to Section 2.2.
The events that we are interested in are the subsets of P({0, 1} n ) defined as follows: The event S [d] is the set of classes that shatter [d].It is thus equal to the event which is defined as follows: The main goal of the paper is to evaluate the probabilities of E d , S d , and A d .The events E d and A d can be expressed in terms of the S x 's, for x ⊂ [n] and |x| = d.These may in turn be expressed in terms of the C x,η 's, for η ∈ {0, 1} x .By definition, the class F shatters x if and only if for every η ∈ {0, 1} x there exists a function f whose restriction to x is η.Thus S x is the intersection of the events C x,η , η ∈ {0, 1} x , i.e., The (5)

Probability models
In this subsection we describe the underlying probability models with which a random class is generated.
In addition to mathematical definitions, we also discuss how such generation may be carried out in practice.When stating probability quantities we will either refer to the probability law of a random variable, e.g., P n,p , P * n,k or simply write P for the underlying probability distribution of the space on which the random variables are defined.
The two probability models by which a random class F of binary functions on [n] will be created are denoted by 'binomial' and 'uniform'.In the binomial model, a probability parameter 0 p 1 is defined and the random class F is constructed through 2 n independent coin tossings, one for each function in {0, 1} [n] , with a probability of success (i.e.selecting a function into F), equal to p.
Definition 1 (Binomial model) Let n be a positive integer.Let p be a real such that 0 p 1. We call binomial class with parameters n and p a random class of functions containing any given function with probability p, independently of the others.We shall denote by F n,p a binomial class with parameters n and p and by P n,p its probability distribution.Thus if F is any element of P({0, 1} [n] ), An alternate way to construct a random class F is to first choose its cardinality k, 0 k 2 n , and then select a class by a uniform random drawing from the family F (k) ⊂ P({0, 1} [n] ) of all subsets of {0, 1} [n] having k elements.
Definition 2 (Uniform model) Let n be a positive integer.Let k be an integer such that 1 k 2 n .We call uniform class with parameters n and k a random class with uniform distribution over all classes of k functions.We shall denote by F * n,k a uniform class with parameters n and k and by P * n,k its probability distribution.Thus if F is any element of P({0, 1} [n] ), These models are the most natural ways to define probability distributions on the set P({0, 1} [n] ) of all classes of functions on [n].They match the two basic models of random graphs (see Chap. 1 of Janson et al. (2000)).
The denomination 'binomial model' comes from the fact that the total number of functions in a binomial class follows a binomial distribution as does the total number of edges in a binomial graph.More precisely, if we denote by K the total number of functions in a binomial class, it is immediate from ( 6) that The two models are obviously related.Indeed, the conditional distribution of a binomial class, conditioned on having a cardinality k, is that of a uniform class, i.e., for any B ⊆ P({0, 1} [n] ), Conversely, knowing the values of P * n,k (B) for all k, one can compute P n,p (B) using the formula of total probabilities: In practice, one can construct a binomial class by first selecting its cardinality K according to the binomial distribution ( 8) and then choose a uniform class of this size.The expected number of functions in a binomial class is clearly E(K) = p2 n .When n is large, we expect both models to have the same behavior provided that k ∼ p2 n .This intuition is partially justified by the theoretical results presented in Section 2.1 of Bollobás (2001) or Section 1.4 of Janson et al. (2000).Some minor discrepancies between both models will be pointed out in Sections 3 and 5.One practical approach to construct a uniform class of cardinality k is to construct a random n × k binary matrix with the nk entries taking values 0 or 1 independently with probability 1/2.Denoting by Q * n,k the corresponding probability measure, then for any M ∈ M n×k ({0, 1}),  2005)), and we shall denote by S their set.We claim that the conditional distribution of the set of columns of a random binary matrix, knowing that it belongs to S, is the uniform distribution P * n,k .To see this, observe that the probability for a random binary matrix to be simple is For any fixed class F of k binary functions there are k! corresponding simple matrices in M n×k ({0, 1}).Therefore the conditional probability for the set of columns of a random matrix to be F is Thus the process of independently drawing random binary matrices until a simple one is obtained, yields a uniform class of functions.In practice, if k is reasonably small compared to 2 n , then the probability of the conditioning event S is close to 1 so, typically, after one or two random matrix generations a uniform class F of cardinality k is obtained.
In Sections 3 and 5, we use this alternate representation of P * n,k in order to compute the asymptotic probability of several types of shattering events (described in Section 2.1) as n tends to infinity, and for values of k n such that Q * n,kn (S) tends to 1. From ( 12), it is easy to deduce that this is true provided that The next auxiliary lemma is a technical result which will permit the interchange of Q * n,kn and P * n,kn and hence simplify some of the analysis in subsequent sections.
Proof: As seen above, .
We have B|S) are the same.✷ In the next sections we present for each of the events E d , S d and A d two types of results: the first type describes the values of the parameter (k n or p n ) for which the probability of the event tends to 0 or 1 (these are presented as Lemmas, Corollaries and Remarks).The second type gives the behavior of the probability around the critical value of the parameter where the transition occurs (these are presented as Propositions).

Asymptotics for the VC-dimension
By definition of the VC-dimension, a class of fewer than 2 d functions cannot shatter a set of size d.As we show in this section, as soon as the number of functions is at least 2 d then at least one set of size d is shattered by a random class F w.h.p.
Lemma 3 For any integer d > 0 let k be an integer satisfying k 2 d .Then Consider the event defined as the set of those matrices having a submatrix whose rows are indexed by x i and equal to M d .For i = 0, . . ., m − 1, these m events are clearly independent and have the same probability 2 −d2 d .Hence the probability that at least one of them is fulfilled is  We proceed now to obtain the asymptotic distribution of VC(F) where F is a random class under the binomial model P n,pn .Since the number of functions in {0, 1} [n] increases exponentially fast with n, in order to keep the expected cardinality of the random class a constant we choose a rate of decrease for p n as p n = c2 −n for some c > 0. From Section 2, the number K of functions in a random class F n,p follows the binomial law (8) with parameters 2 n and p n .It therefore converges to the Poisson distribution with parameter c, i.e., Conditioned on having a cardinality of K = k, by Remark 2, a random class F will have VC(F) = ⌊log 2 k⌋ w.h.p.Hence the event that the VC-dimension of the binomial class F n,pn is at least d is asymp-totically distributed as ⌊log 2 K⌋, where K follows the Poisson distribution with parameter c.This is stated formally in the next result.
Proposition 1 Fix any constant c > 0 and d 1. Assume that p n obeys lim n→∞ p n 2 n = c.Then Proof: From (10), we have Clearly, by definition of the VC-dimension, the factor The result follows since P * n,2 d (E d ) tends to 1 (by Lemma 3) and the binomial distribution with parameters 2 n and p n converges to the Poisson distribution with parameter c. ✷ While Lemma 3 established that a random class of functions of cardinality at least 2 d shatters one or more sets of size d w.h.p., one can expect at least O(n) sets of size d to be shattered.We now proceed to show that any particular set of size d has a positive probability of being shattered.be shattered by F? The answer is affirmative, provided k is large enough.As it turns out, the initialdimension of F has an asymptotic behavior which is similar to the distribution of its VC-dimension.We start with computing the probability of the event S x (see (3)) under the binomial model P n,p .

Asymptotics for the initial-dimension
Then under the binomial model with parameters n and p ∈ [0, 1] we have Proof: From (3), S x is the intersection of the events C x,η , η ∈ {0, 1} x .We will prove that these events are independent under P n,p and that each has probability We start with the latter.For an event B let us denote by B its complement We will use calligraphic capital letters to denote classes of functions and regular capital letters to denote events.Let C be a fixed class and F be a random class.Consider the event C := 'F contains at least one element of C'.Its complement C is: 'no function of C is contained in F'.By Definition 1, its probability is Let C x,η be the class of functions that coincide with η on x, i.e., f |x = η (there are 2 n−d such functions).Applying ( 17) to the event C x,η yields ( 16).
It remains to show that the events C x,η , η ∈ {0, 1} x are independent.It is a basic fact (see for instance Feller (1968), p. 115) that the independence of (B i ) i∈I is equivalent to the independence of their complements (B i ) i∈I .For 1 h 2 d , let η 1 , . . ., η h be distinct elements of {0, 1} x .Consider the class C = i=1,...,h C x,ηi whose cardinality is |C| = h2 n−d .The event 'F ∩ C = ∅' means that for all functions f in F, f |x = η i , 1 i h, or equivalently, the event i=1,...,h C x,ηi occurs.Resorting once more to (17) we have From this it follows that the events (C x,η ) η∈{0,1} x are mutually independent, and hence, so are the events (C x,η ) η∈{0,1} x .✷ From (15), and using the standard expansion which holds for −1 < t < 1 (see Feller (1968), p. 49), it follows that P n,pn (S x ) tends to zero if p n ≪ 2 −n and to 1 if p n ≫ 2 −n .As it was previously done for the event E d (Lemma 3 and Remark 1), we now state the critical value of the expected cardinality for the event S d under the binomial model.
Corollary 1 The probability P n,pn (S d ) that the initial-dimension is at least d tends to 0 or 1 according to whether the expected cardinality p n 2 n of the class tends to 0 or ∞, respectively.In particular, for any fixed d > 0, if the expected cardinality tends to infinity then the initial dimension is at least d w.h.p.
The proof follows from Lemma 4 and the above arguments.When p n = c2 −n the initial-dimension converges in distribution according to the following result.
Proposition 2 Fix any c > 0 and d 1 and assume p n satisfies lim n→∞ p n 2 n = c.Then The result directly follows from Lemma 4. ✷ For p n = c2 −n , it is interesting to compare the asymptotic probability distributions of the VC-dimension and the initial dimension, deduced from Propositions 1 and 2. P(VC(F) = d) ≡ P n,pn (E d ) − P n,pn (E d+1 ) and P(ID(F) = d) ≡ P n,pn (S d ) − P n,pn (S d+1 ) .
They turn out to be remarkably close.Table 1 gives their significant values, for c = 10.The two distributions differ by approximately one unit.Remark 3 From Propositions 1 and 2, the asymptotic probability distributions of the events E d and S d have a similar functional form with respect to the expected cardinality c, but the former has a significantly earlier transition from zero to one.On Figure 2, the asymptotic probability of E 4 and S 4 are plotted against the expected cardinality of the class.Suppose that p n is chosen so that the binomial class F has expected size p n 2 n = c = 2 d .Using the above probability distributions for the VC-dimension and the initial-dimension, the corresponding expected values E(VC(F)) and E(ID(F)) can be computed and analyzed in terms of the expected class size c.Table 2 displays these values for c = 2 d , d = 1, . . ., 10.As seen, the expected value of the VC-dimension of F is just slightly smaller than the size d of a set that could in theory be shattered by some class of the same cardinality as F.

Asymptotics for the testing-dimension
From Lemma 3 (Corollary 1) it follows that as the cardinality k (or expected cardinality p n 2 n ) of a random class tends to infinity, the VC-dimension and the initial-dimension both tend to infinity.It is still possible 9 10 E(VC(F)) 0.74 1.52 2.51 3.52 4.52 5.52 6.51 7.51 8.51 9.50 E(ID(F)) 0.42 0.91 1.55 2.24 2.96 3.75 4.55 5.30 6.07 6.97 Tab. 2: Expectations of the asymptotic probability distributions of the VC-dimension and initial-dimension under the binomial model Pn,p n for pn2 n = c = 2 d , d = 1, . . ., 10.
however that as this occurs, the event A d of the random class shattering all sets of size d (see Section 2.1) does not occur, even for d = 1.As we now show, the expected value of the number of unshattered sets of size d may tend to infinity even when the cardinality of the class tends to infinity.
In order to show this, we use the standard first-moment method (see Janson et al. (2000), p. 54).Let X be the random variable that counts the number of sets of size d which are not shattered by a random class F. We may express X as a sum of indicators I Sx (F) of the complement of the events S x (see Section 2.1), over all sets x with |x| = d.Hence the events 'X = 0' and A d are identical.Using Lemma 4, we may express the expected value of X as Assume p = p n = c 2 −n log n, for some positive constant c.As n tends to infinity then Thus E n,pn (X) tends to 0 if c < d2 d , to +∞ if c > d2 d , so p n = d2 d−n log n appears as a threshold for the expected number of unshattered sets.As we have previously done for the events E d and S d (Lemma 3 and Corollary 1) we now state the behavior of the probability of A d at the extreme values of the expected cardinality of the class.It is reasonable to expect that the same threshold as for the expected value of X, also holds for the probability of A d .Lemma 5 below states that the probability of A d tends to 1 if p n ≫ d2 d−n log n.We believe that it tends to 0 if p n ≪ d2 d−n log n.However, the first moment method only gives the first claim.For the second claim, the second-moment method (see for instance Janson et al. (2000) p. 54 or Spencer (1991) Theorem 3.1) may be used.This requires estimating the correlation between pairs of events S x , S y which we were not able to do under the binomial model.Thus Proposition 3 expresses a concentration result for the testing-dimension which is similar to the wellknown concentration theorem by Matula for the clique number of random graphs (see Theorem 7.1 in Janson et al. (2000) or Theorem 11.4 p. 228 of Bollobás (2001)).
We now proceed with the proof.
Proof: As in Section 3, we use Lemma 2 and replace P * n,k by Q * n,k .In terms of matrices, the event S x is identical to the event that the submatrix M with row indices i ∈ x ⊂ [n] and column indices 1 j 1 , . . ., j 2 d k, is equal to the matrix M d up to a permutation of columns (see the proof of Lemma 3).
As before, let X count the subsets x of size d that are not shattered by the random class F, i.e., X = x:|x|=d I Sx (F) .
Consider any x ⊂ [n] then recall from (3) that S x = η C x,η hence S x = η C x,η where η runs over the set {0, 1} x .
Consider any h distinct functions η 1 , . . ., η h ∈ {0, 1} x .A random matrix has every one of its k columns different from every η i , 1 i h, with probability Using the expression for the probability of a union of events (see Feller (1968) (1.5), p. 89) one obtains If k = k n tends to infinity, this sum tends to zero and the first term dominates.Hence: Denoting by E n,k (X) the expectation of X with respect to Q * n,k , we have which tends to i.e. that a Poisson approximation holds for X (Barbour et al., 1992).The technique of proof, based on the Stein-Chen method, is quite standard: we shall use the results stated in Janson (1994).
For d = 1, the subsets x are singletons and the events S x are independent.Their common probability is Q * n,kn (S x ) = n −1 (1 + o(1)).Hence For d 2, the family of indicators (I Sx ) is dissociated in the sense of Janson (1994) p. 10: the two sets of random variables {I Sx , x ∈ J} and {I Sy , y ∈ K} are independent whenever every x ∈ J is disjoint from every y ∈ K. Denote by Γ the set of all x ⊂ [n] with |x| = d.For x ∈ Γ, denote by Γ x the set of all y such that x ∩ y = ∅.By Theorem 4 p. 10 of Janson (1994), the total variation distance between the distribution of X and the Poisson distribution with parameter E n,kn (X) is bounded above by That ∆ h tends to zero follows from having a negative exponent, i.e., 2d − h + α d log(1 − 2 −d+1 + 2 −2d+h ) < 0 (24) for d 2 and h = 1, . . ., d − 1.Indeed, the left hand side of (24) vanishes both for h = 0 and h = d.As a function of h, its second derivative is positive on [0, d] hence it is strictly convex.Therefore it is strictly negative for all h = 1, . . ., d − 1. Hence ∆ h tends to zero with increasing n. ✷ 1365-8050 c 2008 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France Lemma 1 Let F be a class of functions on [n] with tends to 1. Denote by M d the 'complete' matrix with d rows and 2 d columns formed by all 2 d binary vectors of length d, ranked for instance in alphabetical order.The event E * d occurs if there exists an x = {x 1 , . . ., x d } ⊂ [n] such that the submatrix whose rows and columns are indexed by x and [2 d ] respectively, is equal to M d .Let m = ⌊n/d⌋.Let

Fig. 1 :
Fig. 1: VC-dimension of a random class of cardinality k (n → ∞) Consider a random class F of k binary functions on [n].Can a fixed set of size d in [n], for instance [d], (F) = d) .0098.2099.7310.0487P(ID(F) = d) .2766.6428.0672.0000Tab.1: Asymptotic distributions of the VC and the initial-dimensions under the binomial model Pn,p n for pn = c2 −n and c = 10.

Fig. 2 :
Fig. 2: Limiting probability distribution of the events E d (solid) and S d (dots) for d = 4, with respect to the expected cardinality c of the random class.Inflection points at c = 15 and c = 44, respectively.

Lemma 5
If p n ≫ d2 d−n log n, then the probability P n,pn (A d ) that the testing-dimension is at least d tends to 1.Proof: Assume that p n = c2 −n log n.Then from (20), E n,pn (X) tends to zero if c > d2 d .Hencelim n→∞ P n,pn (A d ) = lim n→∞ P n,pn (X > 0) lim n→∞ E n,pn (X) = 0which follows from an application of Markov's inequality for a non-negative integer-valued random variable.✷To obtain more precise estimates around the threshold, we shall use the uniform model.Proposition 3 below gives the asymptotic probability of the event A d at the threshold, as was previously done for the events E d and S d in Propositions 1 and 2.Proposition 3 Let d 1 and k = k n be positive integers such thatlim n→∞ k n − α d log n = c, where α d = − d log(1 − 2 −d )and c is a real constant.Thenlim n→∞ P * n,kn (A d ) = exp − 2 d d! (1 − 2 −d ) c .Remark 4 Denote c n = k n − α d log n.If c n tends to +∞, then P * n,kn (A d ) tends to 1.If c n tends to −∞, then P * n,kn (A d ) tends to 0. Hence the critical value of the class-cardinality k n is α d log n: between the binomial model and the uniform one, the threshold for the class cardinality shifts from 2 d log n to α d log n.Consider a random class of cardinality k n = α log n.If α = α d for all d, then w.h.p. the testing dimension of the class equals the largest integer d such that α > α d .If α = α d for some positive integer d, then w.h.p. all sets of size d − 1 are shattered (since α > α d−1 ), at least one set of size d + 1 is not (since α < α d+1 ), and there may be a positive probability that all sets of size d are shattered.Therefore the testing dimension of the class in this case (α = α d ) is either d − 1 or d w.h.p.Hence for any value of α the testing-dimension of a random class with cardinality α log n concentrates on one or two values.
VC-dimension of F is at least d if and only if there exists a set x with cardinality d which is shattered by F. Therefore E d is the union of the events S x over all x ⊂ [n] such that |x| = d, i.e., Finally, the testing-dimension of F is at least d if and only if all sets x of cardinality d are shattered by F. Therefore A d is the intersection of all the events S x , x ⊂ [n] with |x| = d, i.e., d⌋ , which tends to 1 as n increases.✷ Remark 1 When k n < 2 d no set of size d is shattered and hence P * n,kn (E d ) = 0.For k n > 2 d , P * n,kn (E d ) tends to 1. Hence for a uniform class, the critical value of the cardinality k n for the event E d is 2 d .For any fixed k > 0, it follows that w.h.p. a uniform class F of cardinality k has a VCdimension of at least ⌊log 2 k⌋, where ⌊•⌋ denotes the integer part and log 2 (a) the logarithm in base 2. Since any class of cardinality k cannot shatter a subset of [n] of size greater than ⌊log 2 k⌋ then this is also an upper bound on the VC-dimension of F. Hence, under the uniform model P * n,k , the VC-dimension of F converges in probability to ⌊log 2 k⌋.