Augmenting simple models with machine learning Jim Savage Data - PowerPoint PPT Presentation

Augmenting simple models with machine learning Jim Savage Data Science Lead Lendable @khakieconomist

Cheers to • Sarah Tan • David Miller • Chris Edmond • Eugene Dubossarsky

Outline • Estimating causal relationships • Proximity score matching • The problem of model non-stationarity in time series models

Problem #1: drawing causal inference from observational data Question for the audience: What is a college degree worth? How would you go about estimating it?

Experimental vs observational data Experimental data = easy causal inference Observational data = hard “causal” inference • We want to know E(dy|exogenous intervention in X)— how much we expect y to change • We have never observed an exogenous intervention in X

Not a predictive problem! • Predictive models give us E(y|X)—fancy correlation • Looks the same, but is wildly different • In absence of causal reasoning, more data & fancier models often just make us more certain of the wrong answer.

Neyman-Rubin causal model The fundamental problem of causal inference is that booting up a parallel universe whenever we want to draw causal inference is too much work.

How do we estimate treatment effects? • Regression with controls (try to take care of the effect of observed confounders) • Panel data • Natural experiments • Matching

Regression helps us deal with observed confounders

But be careful what you control for

Multiple observations of the same unit over time can help control for unobserved confounders that don’t vary over time

IV & natural experiments can help… and are difficult to find, impossible to verify

Pros and cons • All the above methods are better than comparing averages! • Often no good natural experiments exist (makes IV hard!) • Often we’re worried that unobserved confounders vary over time (fixed effects assumption violated) • Decisions still have to be made

Matching methods • Idea: build up a control group that is as similar as possible to the treatment group • Run your analysis (comparison of groups, regression, etc) on this sub-group. Discard those who were never likely to take up treatment.

Matching methods • Idea: build up a control group that is as similar as possible to the treatment group • Once you have this “synthetic control”, use some causal model. • Pray it has balanced your groups on the factors that matter.

Exact matching • Pair treated observation with untreated observation that is the same on observed covariates. • Run out of dimensions very quickly…

Matching using a metric • Define some metric for matching (Euclidian, Mahalanobis, etc.) • Group observations that are “close” in the X space • Run analysis on this subset. • But which Xs matter?

Propensity score matching • Estimate model of “propensity to get treatment” • p(treated | X) • For each treated observation, choose an untreated observation whose modelled propensity is closest (or some other matching technique).

Propensity score matching • Big problem: “Smith-Todd” • Change your propensity model, change your treatment effect. Can be meaningless. • Despite this, very widely used (~15k citations)

Proximity matching • Like metric matching, we match on the Xs • Like propensity score matching, we take into account how the Xs affect treatment probability. • Use the Random Forest proximity matrix

The Random Forest • Essentially a collection of CART models • Each estimated on a random subset of the data • In each node, a sample of possible Xs drawn to be considered for a split • Each tree fairly different.

Random forest proximity • When two people end up in the same terminal node of a tree, they are said to be proximate • The proximity score (i, j) is the proportion of terminal nodes shared by individuals i and j . • We calculate it on held-out observations • It is a measure of similarity between two individuals in terms of their Xs • But only the similarity in terms of the Xs that matter to y • A metric-free, scale invariant supervised similarity score

Introduction to analogy weighting • Motivation 1: want parameters most relevant to today. • Motivation 2: want to know when model is least likely to do a good job.

Unprincipled approaches

Analogy weighting: the idea • Train a random forest on the dependent variable of interest with potentially many Xs • Take the proximity matrix from the random forest • Use the relevant row from this matrix to weight the observations in your parametric model • This is akin to training your model on the relevant history

Implementing • For very simple models, canned functions normally take a weights argument. • For complex models, weights are not normally included. • Use Stan • Direct call to increment_log_prob rather than sampling notation

When should I ignore my model? ?

And when history is not relevant?

Covariance in scale- correlation form Σ = diag( σ ) Ω diag( σ ) • Here, sigma is a vector of standard deviations, and Omega is a correlation matrix • We can give sigma a non-negative prior (say, half Cauchy), and Omega an LKJ prior • LKJ is a one-parameter distribution of correlation matrices. • Low values of the parameter give (approaching 1) give uniform prior over correlations. • High values (approaching infinity) give an identity matrix.

Application: volatility modelling during financial crisis • Most volatility models work like so: returns vector(t) ~ multivariate distribution(expected return(t), covariance(t)) • Expected returns model is just a forecasting model • Covariance needs to be explicitly modelled • Multivariate GARCH common. • CCC Garch allows time varying shock magnitudes • DCC allows time varying correlations that update with correlated shocks

LKJ as a “danger prior” in volatility models • Idea: when we have relevant histories, we learn correlation structure from the data. • When we have no relevant history, our likelihood does not impact the posterior and we revert to the prior. • Using an LKJ prior with low degrees of freedom gives us highly correlated returns in unprecedented states.

Questions?

Augmenting simple models with machine learning Jim Savage Data - PowerPoint PPT Presentation

Augmenting simple models with machine learning Jim Savage Data Science Lead Lendable @khakieconomist Cheers to Sarah Tan David Miller Chris Edmond Eugene Dubossarsky Outline Estimating causal relationships Proximity

Augmenting Paths Math 482, Lecture 25 Misha Lavrov April 3, 2020 The greedy algorithm

Augmenting the Planner with Machine Learning Manchester PostgreSQL Meetup Anthony Kleerekoper

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

2 6 1 2 3 : black nodes 1 1 1 : red nodes What is the size of the root of the above

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Astroparticle Physics at the DUNE Experiment Ins Gil-Botella CIEMAT Madrid on behalf of

Advanced Simulation - Lecture 2 Patrick Rebeschini January 17th, 2018 Patrick Rebeschini

Creating Probabilistic Databases from Imprecise Time-Series Data Saket Sathe, Hoyoung Jeung, Karl

Systemic risk and financial regulations J on Dan elsson Systemic Risk Centre London

I Am Not An Attorney What Well Cover: Land Trusts Corporations Asset Protection

AAPICO HITECH PLC [AH] FY2017 Performance & Business Update Opportunity Day March 13, 2018

Third Quarter Replay Through November 3, 2017 2017 Results 877-660-6853 Domestic 201-612-7415

Asterism Smart Astrophotography Mount Kenny Ramos, Joy Gu, and Yuyi Shen Use Case Jones, Trevor.

Augmenting simple models with machine learning Jim Savage Data - PowerPoint PPT Presentation

Augmenting simple models with machine learning Jim Savage Data Science Lead Lendable @khakieconomist Cheers to Sarah Tan David Miller Chris Edmond Eugene Dubossarsky Outline Estimating causal relationships Proximity

Augmenting Paths Math 482, Lecture 25 Misha Lavrov April 3, 2020 The greedy algorithm

Augmenting the Planner with Machine Learning Manchester PostgreSQL Meetup Anthony Kleerekoper

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

2 6 1 2 3 : black nodes 1 1 1 : red nodes What is the size of the root of the above

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Classification of curves Simple, not closed Simple, closed Closed, not simple Not simple, not

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Astroparticle Physics at the DUNE Experiment Ins Gil-Botella CIEMAT Madrid on behalf of

Advanced Simulation - Lecture 2 Patrick Rebeschini January 17th, 2018 Patrick Rebeschini

Creating Probabilistic Databases from Imprecise Time-Series Data Saket Sathe, Hoyoung Jeung, Karl

Systemic risk and financial regulations J on Dan elsson Systemic Risk Centre London

I Am Not An Attorney What Well Cover: Land Trusts Corporations Asset Protection

AAPICO HITECH PLC [AH] FY2017 Performance &amp; Business Update Opportunity Day March 13, 2018

Third Quarter Replay Through November 3, 2017 2017 Results 877-660-6853 Domestic 201-612-7415

Asterism Smart Astrophotography Mount Kenny Ramos, Joy Gu, and Yuyi Shen Use Case Jones, Trevor.

AAPICO HITECH PLC [AH] FY2017 Performance & Business Update Opportunity Day March 13, 2018