Statistical Learning Theory Part 4: Finite Function Classes — Consistency, Rates, & Bounds

Proof of Consistency, Rates, and Generalization Bounds for ML Estimators over Finite Function Classes

Andrew Rothman
4 min readOct 31, 2023
Photo by Pietro Jeng on Unsplash

1: Background & Motivation

As stated in previous pieces in this series, Statistical Learning Theory provides a probabilistic framework for understanding the problem of Machine Learning Inference. In mathematical terms, the basic goals of Statistical Learning Theory can be formulated as:

image by author

This piece is Part 4 in a Series on Statistical Learning Theory. The first three pieces are:

In Part 1 of this series we derived Hoeffding’s Inequality from first principles, in Part 2 we proved optimality of the Bayes Classifier, and in Part 3 we started to develop theory to assess consistency of data-adaptive machine learning sampling estimators. In this piece we assess consistency and generalization bounds for sampling estimators chosen from a finite-sized function class.

The Table of Contents for this piece are as follows:

image by author

With that said, let’s jump in.

2: Proof of Consistency for Finite Function Classes

Let’s begin with our problem set-up and assumptions. Consider we have:

image by author

we define:

image by author

and recall:

image by author
image by author
image by author
image by author
image by author

3: Toy Examples

We will now review two simple toy examples with finite function classes, one with the function class size m fixed and specified a priori, and another example with class size m specified as a function of sample size n.

3.1: Fixed Finite Function Class Size “m” specified a priori

Consider:

image by author

As an example of the finite function class of all piecewise constant functions given the support of Y and X above, below are the 8 functions possible for partition size k=3:

image by author
image by author

3.2: Finite Function Class Size “m” specified as a function of sample size “n”

Consider

image by author
image by author

4: Wrap-up & Conclusions

image by author

Whereas this piece dealt with consistency of machine learning sampling estimators with respect to finite function classes, what happens when our available function class is infinite? For instance, the function class of all linear models? Is it possible to construct generalization bounds and have consistency under these conditions? The answer is yes, of which we will begin exploring in part 5 with the Shattering Coefficient.

For reference of solid Statistical Learning Theory content, I would recommend textbooks “All of Statistics” and “All of Nonparametric Statistics” from Larry Wasserman (Statistics and Machine Learning Professor at Carnegie Mellon), Elements of Statistical Learning by faculty at Stanford, and Statistical Learning Theory by Vladimir Vapnik.

I look forward to writing future pieces, and please subscribe and follow me here on Medium!

--

--

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference