Statistical Learning Theory Part 3: Consistency of Machine Learning Estimators

Conditions for Convergence and Consistency of Learned ML Estimators

Andrew Rothman
3 min readOct 28, 2023

1: Background & Motivation

Statistical Learning Theory provides a probabilistic framework for understanding the problem of Machine Learning Inference. In mathematical terms, the basic goals of Statistical Learning Theory can be formulated as:

image by author

In Part 1 of this series we derived Hoeffding’s Inequality from first principles, and in Part 2 we proved optimality of the Bayes Classifier. In this piece we start considering the more “realistic” applied question of specifying a sampling estimator (i.e. Machine Learning model) learned from sample data. In this case, what do we know about the statistical and convergence properties of said estimator? Is it possible to construct generalization bounds around said estimators trained on finite samples?

This piece lays the ground-work for convergence and consistency of sampling estimators within the context of Statistical Learning Theory. This will lay critical groundwork for understanding generalization bounds for both finite and infinite-size function classes.

Additionally by the end of this piece we will prove uniform convergence in probability is sufficient for consistency of a sample-dependent learned estimators:

image by author

The Table of Contents for this piece are as follows:

image by author

With that said, let’s jump in.

2: Function Classes, Empirical Risk Minimization, and Consistency

2.1: Problem Setup

Consider we have:

image by author

2.2: Empirical Risk Minimization

image by author
image by author

2.3: Consistency Statements

image by author
image by author
image by author

3: Conditions for Convergence & Consistency

3.1: Weak Law of Large Number (WLLN) for Fixed Functions

image by author
image by author

3.2: Failure of WLLN for Sample-Dependent Learned Functions:

image by author

3.3 Uniform Convergence for Learned Functions

image by author
image by author

4: Wrap-up and Conclusions

image by author

For reference of solid Statistical Learning Theory content, I would recommend textbooks “All of Statistics” and “All of Nonparametric Statistics” from Larry Wasserman (Statistics and Machine Learning Professor at Carnegie Mellon), Elements of Statistical Learning by faculty at Stanford, and Statistical Learning Theory by Vladimir Vapnik.

I look forward to writing future pieces, and please subscribe and follow me here on Medium!

--

--

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference