Statistical Learning Theory Part 6: Vapnik–Chervonenkis (VC) Dimension

Proof of Consistency, Rates, and Generalization Bounds for ML Estimators over Infinite Function Classes leveraging the VC-dimension

Andrew Rothman
5 min readNov 7, 2023
Photo by Lewis Guapo on Unsplash

1: Background & Motivation

As stated in previous pieces in this series, Statistical Learning Theory provides a probabilistic framework for understanding the problem of Machine Learning Inference. In mathematical terms, the basic goals of Statistical Learning Theory can be formulated as:

image by author

This piece is Part 6 in a Series on Statistical Learning Theory. The first four pieces are:

In Part 1 of this series we derived Hoeffding’s Inequality from first principles, in Part 2 we proved optimality of the Bayes Classifier, in Part 3 we developed theory to assess consistency of data-adaptive machine learning sampling estimators, in Part 4 we derived consistency rates and generalization bounds of ML estimators over finite sized function classes, and in Part 5 we derived consistency rates and generalization bounds over infinite sized function classes leveraging the Shattering Coefficient. As discussed at the end of the last piece, the Shattering Coefficient can be a difficult capacity measure to measure and leverage. Therefore in this piece we will explore another capacity measure for infinite sized function classes (VC-dimension) and show how it can by used to upper-bound the Shattering Coefficient.

To motivate the current problem of interest, consider:

image by author

we define:

image by author

and recall:

image by author
image by author

Note, although in Part 5 we were able to derive generalization bounds and consistency rates for ML estimators over infinite sized function classes using the Shattering Coefficient, this approaches suffers from some drawbacks. Mainly:

  • Beyond simple toy examples, the Shattering Coefficient is generally very difficult to compute or calculate.
  • The Shattering Coefficient is also a function of the specific sample size “n”, meaning we need to know how fast said coefficient grows asymptotically in order to leverage it for the purposes used in this piece.

An alternative capacity measure is the Vapnik–Chervonenkis Dimension (VC-dimension). In this piece we show the VC-dimension can be used to provide an upper-bound for the Shattering Coefficient. And unlike the Shattering Coefficient, the VC-dimension is with respect to the function-class under question and is fixed with respect to sample size “n”.

image by author

The Table of Contents for this piece are as follows:

image by author

With that said, let’s jump in.

2: VC-dimension

2.1: Technical Definition of the VC-dimension

Originally defined by Vladimir Vapnik and Alexey Chervonenkis, the VC-dimension is a measure of the capacity of a function-class that can be learned by a statistical binary classification algorithm. Formally, the VC-dimension can be technically defined as follows:

image by author

We will next prove that:

image by author

2.2: Proof of Bounding the Shattering Coefficient with the VC-dimension

As stated in the previous section,

image by author

We would like to prove the above relations. We will do so in a series of three proofs:

image by author

Let’s get started.

image by author
image by author
image by author
image by author
image by author
image by author
image by author
image by author

2.3: Motivating Toy Examples of the VC-dimension

We will now explore two simple toy examples to motivate the VC-dimension

image by author
image by author
image by author
image by author
image by author

3: Generalization Bounds and Consistency Rates

Recall in part 5 of this series we proved the following statistical inequality:

image by author

… and coupled with the following result from the previous section showing we can bound the Shattering Coefficient with the VC-dimension:

image by author

Therefore:

image by author
image by author

4: Wrap-up & Conclusions

image by author
image by author

For reference of solid Statistical Learning Theory content, I would recommend textbooks “All of Statistics” and “All of Nonparametric Statistics” from Larry Wasserman (Statistics and Machine Learning Professor at Carnegie Mellon), Elements of Statistical Learning by faculty at Stanford, and Statistical Learning Theory by Vladimir Vapnik.

I look forward to writing future pieces, and please subscribe and follow me here on Medium!

--

--

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference