Mathematical Derivations of Boosting Procedures with full Computational Simulation

Photo by Oscar Nord on Unsplash

1: Introduction

Boosting is a family of ensemble Machine Learning techniques for both discrete and continuous random variable targets. Boosting models take the form of Non-Parametric Additive models and are most typically specified with additive components being “weak learners”. From an empirical risk decomposition perspective, where it can be easily shown the Mean Squared Error (MSE) of any arbitrary statistical estimator is the additive sum of the squared bias and sampling variance of said sampling estimator…


Hands-on Tutorials

Mathematical Derivations and a Computational Simulation

Photo by AltumCode on Unsplash

1: Introduction

Generalized Linear Models (GLMs) play a critical role in fields including Statistics, Data Science, Machine Learning, and other computational sciences.

Part I of this Series provided a thorough mathematical overview with proofs of common GLMs, both in Canonical and Non-Canonical forms. Part II provided historical and mathematical context of common iterative numerical fitting procedures for GLMs including Newton-Raphson, Fisher Scoring, Iteratively Reweighted Least Squares, and Gradient Descent.

In the last of this three-part Series, we explore Neural…


Mathematical Derivations and Implementation of Iterative Fitting Techniques with Computational Simulations.

Photo by Mohammad Rahmani on Unsplash

1: Background and Motivation

Generalized Linear Models (GLMs) play a critical role in fields including Statistics, Data Science, Machine Learning, and other computational sciences.

In Part I of this Series, we provided a thorough mathematical overview (with proofs) of common GLMs both in Canonical and Non-Canonical forms. The next problem to tackle is, how do we actually fit data to GLM models?

When looking at GLMs from a historical context, there are three important data-fitting procedures which are closely connected:

  • Newton-Raphson
  • Fisher Scoring
  • Iteratively Reweighted Least Squares (IRLS)

I have found the relationships and motivations of these techniques is often poorly understood, with the…


Intuition for Unifying Theory of GLMs with Derivations in Canonical and Non-Canonical Forms

Photo by ThisisEngineering RAEng on Unsplash

1: Background and Motivation

Generalized Linear Models (GLMs) play a critical role in fields including Statistics, Data Science, Machine Learning, and other computational sciences. This class of models are a generalization of ordinary linear regression for certain response variable types with error distribution models other than a normal distribution.

To refresh our memories on ordinary linear regression:


Efficiency & Statistical Power gains from Conditional Covariate Adjustment in A/B Testing & Randomized Trials

Photo by Caspar Camille Rubin on Unsplash

1: Background and Motivation

Causal Inference is a field that touches several domains and is of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers.

To date I have written several pieces on methods/topics in the Causal Inference space. These include:

This piece concerns…


Mathematically Rigorous derivations and Computational Simulations of FWER and FDR upper-bounding Procedures

Photo by Emile Perron on Unsplash

1: Background and Motivation

Causal Inference is a field of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers. With respect to the recovery of unbiased estimates of location parameters of Causal Effects in both randomized and non-randomized settings, I have written several pieces including:

In this piece, we shift…


Getting Started

Specification of M-Bias, and an Introduction to the Challenges of Causal Discovery

Photo by Joshua Aragon on Unsplash

1. Background and Motivation

Causal Inference is a field that touches several domains and is of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers. Recovery of unbiased estimates of Causal Effects is at times a tough task, particularly in non-randomized settings. I have written several technical pieces on leveraging G-Methods for necessary adjustment to recover causal inferences/contrasts of interests; these include pieces on Efficient Sampling Designs in Causal Inference, Doubly Robust Estimation Techniques, G-Estimation of Structural Nested Models, and Marginal Structural Modeling for informative censoring adjustment in randomized A/B Tests.

This piece is a…


Hands-on Tutorials

Required Adjustment of A/B Tests via G-Methods in the Presence of Informative Censoring

Photo by Arnold Francisca on Unsplash

1: Background and Motivation

Causal Inference is of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers. Recovery of unbiased estimates of Causal Effects is at times a tough task. In my previous pieces on Doubly Robust Estimation and G-Estimation of Structural Nested Models, we discussed leveraging G-Methods in non-randomized settings. We also discussed issues with M-Bias and confounding identification in non-randomized settings.

Contrary to popular belief, recovery of valid estimates of Causal Effects can be equally as difficult (or sometimes even unidentifiable and impossible) with A/B Testing (aka randomized trials). Regarding the use of…


Hands-on Tutorials

Mathematical Derivation and Computational Simulation of the semi-parametric class of G-methods

Photo by Fotis Fotopoulos on Unsplash

1: Introduction

Causal Inference is a field that touches several domains and is of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers. Recovery of unbiased estimates of Causal Effects is at times a tough task, even in randomized settings. This task can be particularly challenging in non-randomized settings, requiring an array of often empirically untestable assumptions to hold that have both mathematical and philosophical implications.

In the interest of recovering unbiased estimates of Mean Causal Effects of an Intervention-Outcome relationship, there are several tools we can leverage. Under the assumptions of correct…


Hands-on Tutorials

Computational Simulation of G-Methods under Model Misspecification

Photo by Caspar Camille Rubin on Unsplash

1. Background and Motivation

Causal Inference is a field with wide-ranging implications, from clinical trials and A/B testing to observational and natural experiments; it’s a field that touches nearly every domain and is of interest to many practitioners including Statisticians, Machine Learning Scientists, and Computational Researchers. Recovery of unbiased estimates of Causal Effects is at times a tough task, even in Randomized settings. This task can be particularly challenging in Non-Randomized settings, requiring an array of often empirically untestable assumptions to hold that have both mathematical and philosophical implications.

In the interest of recovering an unbiased estimate of a Mean Causal Effect of an…

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store