Statistical Learning Theory Part 3: Consistency of Machine Learning Estimators
Conditions for Convergence and Consistency of Learned ML Estimators
1: Background & Motivation
Statistical Learning Theory provides a probabilistic framework for understanding the problem of Machine Learning Inference. In mathematical terms, the basic goals of Statistical Learning Theory can be formulated as:
In Part 1 of this series we derived Hoeffding’s Inequality from first principles, and in Part 2 we proved optimality of the Bayes Classifier. In this piece we start considering the more “realistic” applied question of specifying a sampling estimator (i.e. Machine Learning model) learned from sample data. In this case, what do we know about the statistical and convergence properties of said estimator? Is it possible to construct generalization bounds around said estimators trained on finite samples?
This piece lays the ground-work for convergence and consistency of sampling estimators within the context of Statistical Learning Theory. This will lay critical groundwork for understanding generalization bounds for both finite and infinite-size function classes.
Additionally by the end of this piece we will prove uniform convergence in probability is sufficient for consistency of a sample-dependent learned estimators:
The Table of Contents for this piece are as follows:
With that said, let’s jump in.
2: Function Classes, Empirical Risk Minimization, and Consistency
2.1: Problem Setup
Consider we have:
2.2: Empirical Risk Minimization
2.3: Consistency Statements
3: Conditions for Convergence & Consistency
3.1: Weak Law of Large Number (WLLN) for Fixed Functions
3.2: Failure of WLLN for Sample-Dependent Learned Functions:
3.3 Uniform Convergence for Learned Functions
4: Wrap-up and Conclusions
For reference of solid Statistical Learning Theory content, I would recommend textbooks “All of Statistics” and “All of Nonparametric Statistics” from Larry Wasserman (Statistics and Machine Learning Professor at Carnegie Mellon), Elements of Statistical Learning by faculty at Stanford, and Statistical Learning Theory by Vladimir Vapnik.
I look forward to writing future pieces, and please subscribe and follow me here on Medium!