STAT 991: Topics In Modern Statistical Learning (UPenn, 2022 Spring)
This class surveys advanced topics in statistical learning based on student presentations.
The core topic of the course is uncertainty quantification for machine learning methods. While modern machine learning methods can have a high prediction accuracy in a variety of problems, it is still challenging to properly quantify their uncertainty. There has been a recent surge of work developing methods for this problem. It is one of the fastest developing areas in contemporary statistics. This course will survey a variety of different problems and approaches, such as calibration, prediction intervals (and sets), conformal inference, OOD detection, etc. We will discuss both empirically successful/popular methods as well as theoretically justified ones. See below for a sample of papers.
In addition to the core topic, there may be a (brief) discussion of a few additional topics:
- Influential recent "breakthrough" papers applying machine learning (GPT-3, AlphaFold, etc), to get a sense of the "real" problems people want to solve.
- Important recent papers in statistical learning theory; to set the a sense of progress on the theoretical foundations of the area.
Part of the class will be based on student presentations of papers. We imagine a critical discussion of one or two papers per lecture; and several contiguous lectures on the same theme. The goal will be to develop a deep understanding of recent research.
See also the syllabus.
Influential recent ML papers
Why are people excited about ML?
- Dermatologist–level classification of skin cancer with deep neural networks
- Language Models are Few-Shot Learners
- Highly accurate protein structure prediction with AlphaFold
- End to End Learning for Self-Driving Cars
Uncertainty quantification
Why do we need to quantify uncertainty? What are the main approaches?
Conformal prediction++
- Vovk et al.'s paper series, and books
- Takeuchi’s prediction regions and theory, and old lecture notes
- Inductive Conformal Prediction
- (A few) Papers from CMU group
- Review emphasizing exchangeability: Exchangeability, Conformal Prediction, and Rank Tests
- Predictive inference with the jackknife+. Slides.
- Nested conformal prediction and quantile out-of-bag ensemble methods
- Conditional Validity
- X-Conditional validity: Already listed above: Mondrian Confidence Machines (also in Vovk'05 book), Lei & Wasserman'14
- Y-conditional: Classification with confidence
- others: equalized coverage
- Distribution Shift
- (essentially) known covariate shift
- estimated covariate shift, semiparametric efficiency
- testing covariate shift: A Distribution-Free Test of Covariate Shift Using Conformal Prediction
- online gradient descent on the quantile loss: Adaptive Conformal Inference Under Distribution Shift; aggregation
- more general weighted schemes: Conformal prediction beyond exchangeability
- Applications to various statistical models
- Causal estimands and Counterfactuals: Chernozhukov et al, An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls, Cattaneo et al, Lei and Candes, Conformal Inference of Counterfactuals and Individual Treatment Effects
- Quantile regression: Romano et al
- Conditional distribution test: Hu & Lei
- Dependence
- Conformal prediction for dynamic time-series
- Exact and robust conformal inference methods for predictive machine learning with dependent data
- Model-Free Prediction Principle (Politis and collaborators). book, brief paper
Tolerance Regions and Related Notions
- Wilks's original paper, 1941
- Wald's multivariate extension, 1943
- Tukey's paper series: 1, 2, 3; Fraser & Wormleighton's extensions 1
- Books
- David & Nagaraja: Order statistics, Sec 7.2 (short but good general intro)
- Krishnamoorthy & Mathew: Statistical tolerance regions
- Connections between inductive conformal prediction, training set conditional validity, tolerance regions:
Calibration
- Classics
- Sec 5.a of Robert Miller's monograph: Statistical Prediction by Discriminant Analysis (1962). Calibration is called "validity" here.
- Calibration of Probabilities: The State of the Art to 1980
- A.P. Dawid, The Well-Calibrated Bayesian
- DeGroot & Fienberg, The Comparison and Evaluation of Forecasters, 1983
- Testing
- Early works: Cox, 1958, Miller's monograph above
- Mincer & Zamowitz: The Evaluation of Economic Forecasts (1969) introducing the idea of regressing the outcomes on the predicted scores; sometimes called Mincer-Zamowitz regression
- On Testing the Validity of Sequential Probability Forecasts
- Comparing predictive accuracy
- T-Cal: An optimal test for the calibration of predictive models
- On-line setting (some of it is non-probabilistic):
- Foster & Vohra (1998) Asymptotic Calibration. Biometrika
- Vovk, V. and Shafer, G. (2005) Good randomized sequential probability forecasting is always possible. JRSS-B
- Scoring rules, etc
- Winkler, Scoring rules and the evaluation of probabilities
- Gneiting et al., Probabilistic forecasts, calibration and sharpness
- Modern ML
- On Calibration of Modern Neural Networks; suggests using Mincer-Zamowitz regression for re-calibration
- Measuring Calibration in Deep Learning
- Distribution-free binary classification: prediction sets, confidence intervals and calibration
- Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification
- Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
- Calibration Error for Heterogeneous Treatment Effects
- theory in random features models: A study of uncertainty quantification in overparametrized high-dimensional models
- theory on distance to calibration: A Unifying Theory of Distance from Calibration
Types of uncertainty
- Kiureghian and Ditlevsen: Aleatory or epistemic? does it matter?
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Empirics
Bayesian approaches, ensembles
Baseline methods:
- Deep ensembles: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- MC Dropout: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning; Uncertainty in Deep Learning, Yarin Gal PhD Thesis
- Deep Ensembles Work, But Are They Necessary?
Other approaches:
Dataset shift
Lectures
Lecture 1-2: Introduction. By Edgar Dobriban.
Lecture 3-8: Conformal Prediction, Calibration. By Edgar Dobriban. Caveat: handwritten and may be hard to read. To be typed up in the future.
Lecture 9 onwards: student presentations.
Presentation 1: Deep Learning in Medical Imaging by Rongguang Wang.
Presentation 2: Introduction to Fairness in Machine Learning by Harry Wang.
Presentation 3: Conformal Prediction with Dependent Data by Kaifu Wang.
Presentation 4: Bayesian Calibration by Ryan Brill.
Presentation 5: Conditional Randomization Test by Abhinav Chakraborty.
Presentation 6: Distribution Free Prediction Sets and Regression by Anirban Chatterjee.
Presentation 7: Advanced Topics in Fairness by Alexander Tolbert.
Presentation 8: Calibration and Quantile Regression by Ignacio Hounie.
Presentation 9: Conformal Prediction under Distribution Shift by Patrick Chao and Jeffrey Zhang.
Presentation 10: Testing for Outliers with Conformal p-values by Donghwan Lee.
Presentation 11: Out-of-distribution detection and Likelihood Ratio Tests by Alex Nguyen-Le.
Presentation 12: Online Multicalibration and No-Regret Learning by Georgy Noarov.
Presentation 13: Online Asymptotic Calibration by Juan Elenter.
Presentation 14: Calibration in Modern ML by Soham Dan.
Presentation 15: Bayesian Optimization and Some of its Applications by Seong Han.
Presentation 16: Distribution-free Uncertainty Quantification Impossibility and Possibility I by Xinmeng Huang.
Presentation 17: Distribution-free Uncertainty Quantification Impossibility and Possibility II by Shuo Li.
Presentation 18: Top-label calibration and multiclass-to-binary reductions by Shiyun Xu.
Presentation 19: Ensembles for uncertainty quantification by Rahul Ramesh.
Presentation 20: Universal Inference by Behrad Moniri.
Presentation 21: Typicality and OOD detection by Eric Lei.
Presentation 22: Bayesian uncertainty quantification and dropout by Samar Hadou. (See lec 27 for an introduction).
Presentation 23: Distribution-Free Risk-Controlling Predictio Sets by Ramya Ramalingam.
Presentation 24: Task-Driven Detection_of Distribution Shifts by Charis Stamouli.
Presentation 25: Calibration: a transformation-based method and a connection with adversarial robustness by Sooyong Jang.
Presentation 26: A Theory of Universal Learning by Raghu Arghal.
Presentation 27: Deep Ensembles: An introduction by Xiayan Ji.
Presentation 28: Why are Convolutional Nets More Sample-efficient than Fully-Connected Nets? by Evangelos Chatzipantazis.
Presentation 29: E-values by Sam Rosenberg.
Other topics
OOD Detection
- A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
- Likelihood Ratios for Out-of-Distribution Detection
- Testing for Outliers with Conformal p-values
- Conformal Anomaly Detection on Spatio-Temporal Observations with Missing Data
- iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection
Classical statistical goals: confidence intervals, (single and multiple) hypothesis testing
- Hartigan '1969
- E-values: Calibration, combination, and applications
- Universal Inference ** A Note on Universal Inference
- Permutation-based Feature Importance Test (PermFIT)
- Only Closed Testing Procedures are Admissible for Controlling False Discovery Proportions
Inductive biases
- Combining Ensembles and Data Augmentation can Harm your Calibration
- AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Reviews, applications, etc
- A review of uncertainty quantification in deep learning: Techniques, applications and challenges
- A Survey of Uncertainty in Deep Neural Networks
- Deeply uncertain: comparing methods of uncertainty quantification in deep learning algorithms
- Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
- Aleatoric and Epistemic Uncertainty in Machine Learning. An Introduction to Concepts and Methods
- Uncertainty Baselines. Benchmarks for Uncertainty and Robustness in Deep Learning
- A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
- Generalized OOD detection
- List of resources on conformal prediction
Learning theory & training methods
- A Theory of Universal Learning
- Orthogonal Statistical Learning
- Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
- Deep learning: a statistical viewpoint
Distributed learning
- Optimal Complexity in Decentralized Training
- The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication
Other materials
Related educational materials
Recent workshops and tutorials on related topics
- Workshop on Distribution-Free Uncertainty Quantification at ICML 2022
- ICML 2021 Workshop on Uncertainty & Robustness in Deep Learning
- Workshop on Distribution-Free Uncertainty Quantification at ICML: 2021, 2022
- Video tutorial by AN Angelopoulos and S Bates
- NeurIPS 2020 Tutorial on Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning
Seminar series
Software tools
- Uncertainty Toolbox, associated papers
- Uncertainty Baselines
- MAPIE, conformal-type methods
- crepes
- Fortuna; paper
Probability background
- Penn courses STAT 430, STAT 930.
- Stat 110: Probability, Harvard. edX course, book
- Online probability book
ML background
- Penn courses CIS 520, ESE 546, STAT 991, and links therein