Statistical Learning Theory Part 6: Vapnik–Chervonenkis (VC) Dimension

Proof of Consistency, Rates, and Generalization Bounds for ML Estimators over Infinite Function Classes leveraging the VC-dimension

5 min readNov 7, 2023

1: Background & Motivation

As stated in previous pieces in this series, Statistical Learning Theory provides a probabilistic framework for understanding the problem of Machine Learning Inference. In mathematical terms, the basic goals of Statistical Learning Theory can be formulated as:

This piece is Part 6 in a Series on Statistical Learning Theory. The first four pieces are:

In Part 1 of this series we derived Hoeffding’s Inequality from first principles, in Part 2 we proved optimality of the Bayes Classifier, in Part 3 we developed theory to assess consistency of data-adaptive machine learning sampling estimators, in Part 4 we derived consistency rates and generalization bounds of ML estimators over finite sized function classes, and in Part 5 we derived consistency rates and generalization bounds over infinite sized function classes leveraging the Shattering Coefficient. As discussed at the end of the last piece, the Shattering Coefficient can be a difficult capacity measure to measure and leverage. Therefore in this piece we will explore another capacity measure for infinite sized function classes (VC-dimension) and show how it can by used to upper-bound the Shattering Coefficient.

To motivate the current problem of interest, consider:

we define:

and recall:

Note, although in Part 5 we were able to derive generalization bounds and consistency rates for ML estimators over infinite sized function classes using the Shattering Coefficient, this approaches suffers from some drawbacks. Mainly:

Beyond simple toy examples, the Shattering Coefficient is generally very difficult to compute or calculate.
The Shattering Coefficient is also a function of the specific sample size “n”, meaning we need to know how fast said coefficient grows asymptotically in order to leverage it for the purposes used in this piece.

An alternative capacity measure is the Vapnik–Chervonenkis Dimension (VC-dimension). In this piece we show the VC-dimension can be used to provide an upper-bound for the Shattering Coefficient. And unlike the Shattering Coefficient, the VC-dimension is with respect to the function-class under question and is fixed with respect to sample size “n”.

The Table of Contents for this piece are as follows:

With that said, let’s jump in.

2: VC-dimension

2.1: Technical Definition of the VC-dimension

Originally defined by Vladimir Vapnik and Alexey Chervonenkis, the VC-dimension is a measure of the capacity of a function-class that can be learned by a statistical binary classification algorithm. Formally, the VC-dimension can be technically defined as follows:

We will next prove that:

2.2: Proof of Bounding the Shattering Coefficient with the VC-dimension

As stated in the previous section,

We would like to prove the above relations. We will do so in a series of three proofs:

Let’s get started.

2.3: Motivating Toy Examples of the VC-dimension

We will now explore two simple toy examples to motivate the VC-dimension

3: Generalization Bounds and Consistency Rates

Recall in part 5 of this series we proved the following statistical inequality:

… and coupled with the following result from the previous section showing we can bound the Shattering Coefficient with the VC-dimension:

Therefore:

4: Wrap-up & Conclusions

For reference of solid Statistical Learning Theory content, I would recommend textbooks “All of Statistics” and “All of Nonparametric Statistics” from Larry Wasserman (Statistics and Machine Learning Professor at Carnegie Mellon), Elements of Statistical Learning by faculty at Stanford, and Statistical Learning Theory by Vladimir Vapnik.

I look forward to writing future pieces, and please subscribe and follow me here on Medium!