[PPT] - Economics for Data Science Chiara Binelli Academic year 2019-2020 PowerPoint Presentation

SLIDE 1

Economics for Data Science

Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it

SLIDE 2

To Explain or to Predict? (Shmueli 2010)

Causal and predictive modelling differs along 4 main dimensions:

1. Causation (x causes y) vs. association (x is associated

to y).

2. Theory (model) vs. data (data-driven approach to

establish how x is related to y).

3. Retrospective (test an existing set of hypothesis) vs.

prospective (predict new observations).

4. Bias (focus on minimizing bias to get the correct impact
f x on y) vs. bias-variance trade off (balance bias and

variance to get the best predictions).

SLIDE 3

Economics and Machine Learning

Economics’ approach:

1. Theory that identifies a specific relationship of interest (ex. impact of going to college on wages). 2. Goal: estimate the causal impact of the variable of interest (ex. going to college) on the outcome (ex. wages).

Effort to estimate unbiased effects with carefully constructed standard errors.
Supervised machine learning’s approach:

1. Predict how a given outcome varies with a large number of potential predictors. 2. May or not use prior theory to establish which predictors are relevant.

Data-driven model selection to identify meaningful predictive variables.
Less attention to statistical significance and more attention to prediction

accuracy.

SLIDE 4

Concrete Example: Einav and Levin (2014)

Goal: assess if taking online classes improves earnings.
Economics Approach:

– Either design an experiment that induces some workers to take on line classes for reasons unrelated to their earning potential.

e.g. change in the price of online classes.

– Or, absent the experiment, choose econometric technique to estimate the unbiased impact of online classes on earnings. – Focus on:

Obtaining a point estimate of the impact of online classes on

earnings that is precisely estimated.

Discussing whether there are omitted variables that might confound a

causal interpretation (e.g. workers’ ambition driving a decision to take classes and work harder at the same time).

SLIDE 5

Concrete Example: Einav and Levin (2014)

Data Science Approach:

– Identify what variables predict earnings, given a vast set of predictors in the data, and the potential for building a model that predicts earnings well, both in sample and out of sample. – Focus on:

Model that predicts earnings both for individuals that have and for

individuals that have not taken online classes.

– NOTE: causality and statistical significance are more difficult to assess since the exact source of variation identifying the impact

f a given x on y is difficult to assess.

SLIDE 6

From Economics to Data Science:

1. Provide a Theory
Example: online advertising auctions.
Important question for Google or Facebook:

– Which ads to show online and how much to charge for the ads? 1. Machine learning methods to build a predictive model to assess the likelihood that a user will click on an ad. By exploiting the enormous amount of data available online, this predictive model tells us which ads to show. 2. Economic theory to build auction models to set prices.

Several e-commerce companies have built teams of

economists (often academic economists with PhDs), statisticians and computer scientists.

SLIDE 7

From Economics to Data Science:

2. Focus on Causality

(Pearl 2018)

Human-level AI cannot emerge solely from model-blind

learning machines; it requires the symbiotic collaboration

f data and models.
Data science is only as much of a science as it facilitates

the interpretation of data - a two-body problem, connecting data to reality.

Data alone are hardly a science, regardless how big

they get and how skillfully they are manipulated.

– We need a theory to interpret the data.

SLIDE 8

From Data Science to Economics

Test robustness to misspecifications.
Extract economically meaningful information from new sources of

data (e.g. images and satellite data), innovative research designs, nowcasting.

– When reliable data are missing such as in measuring poverty: Blumenstock (2016), Jean et al. (2016), Blumestock, Cadamuro and On (2015). – Bernheim et al. (2013) use a machine learning algorithm trained on a subset of respondents to a survey to predict actual choices from survey respondents, thus providing a tool to infer actual from reported behavior.

Provide better predictions.

– Improve standard estimation techniques by providing good predictive models (e.g. first stage of IV estimation, propensity score for matching estimator). – Answer pure prediction policy problems (Kleinberg et al. 2015).

New tools for causal inference.

– Construct counterfactuals for causal inference (Varian 2016).

SLIDE 9

From Data Science to Economics:

1. Test Robustness to Misspecifications

(Athey and Imbens 2015)

Researches interested in the effect of a given variable

(x) on an outcome (y) typically report the estimated impact of x on y and a measure of uncertainty of the estimate such as the standard error (se).

Problem: the measure of uncertainty, that is the

statistical significance of the estimate, depends on the model’s specification:

– Different specifications of the model (variables included, functional forms, etc.) produce different estimates of x on y and associated se.

Athey and Imbens (2015) propose a simple machine

learning approach to assess the sensitivity of the point estimates to model specification.

SLIDE 10

Prediction Policy Problems (Kleinberg et al. 2015)

Many questions where causal inference is not necessary.

– Example 1: is the chance of rain high enough to require an umbrella? The benefits of an umbrella depend on rain. – Example 2: Are the benefits of a hip surgery high enough to justify the surgery? The benefits of a hip surgery depend on whether the patient lives long enough after the surgery (Kleinberg et al. 2015). – Example 3: detain or release someone arrested before trial’s decision? The decision depends on prediction of arrestee’s probability of committing a crime.

Therefore, PURE PREDICTION PROBLEMS.

SLIDE 11

Prediction Policy Problems (Kleinberg et al. 2015)

OLS focuses on unbiasedness (it is the best linear unbiased

estimator) and provides poor predictions.

Problem: given a dataset D of n points (y,x), pick a function

that predicts the y value of a new data point. Goal is to minimize a loss function that we take to be

OLS finds that minimizes in-sample error:
PROBLEM: ensuring zero bias in sample creates problems
ut of sample.

2

)) ( ˆ ( x f y 







n i i i

x f y

1 2

)) ( ˆ ( min

OLS

f ˆ

SLIDE 12

Prediction Policy Problems (Kleinberg et al. 2015)

MSE at the new point x:
By ensuring 0 bias, OLS allows no trade off.

   

             

2

2 2 2

) ) ˆ ( ( )) ˆ ( ) ( ˆ ( ) ) ( ˆ ( ) (

Bias

y y E y E x f E y x f E x MSE

Variance

     

SLIDE 13

Prediction Policy Problems (Kleinberg et al. 2015)

Machine learning maximize predictive performance by

exploiting this variance-bias trade off.

Instead of minimizing only in-sample error, ML minimizes:
R(f) is a regularizer that penalizes functions that create
variance. is the price at which we trade off variance to bias.
OLS: ; LASSO: ; RIDGE: d=2

         

Variance Bias n i i i

f R x f y ) ( )) ( ˆ ( min

1 2

  







  1 , ) (   d f R

d



SLIDE 14

Prediction and Causal Questions (Athey 2017)

Pure prediction problems do not answer the more complex question of

estimating heterogeneous effects.

Hip surgery example: the benefits of a hip surgery depend on whether the

patient lives long enough after the surgery (Kleinberg et al. 2015).

We know the effect of the treatment is negative for the patients that will

die, so for them it is easy to decide for no surgery.

However, an important open question remains: which patients should be

given priority to receive surgery among the ones that are likely to survive more than one year?

Causal question that requires estimating counterfactuals scenarios of

the effects of alternative policies of assigning patients to hip surgeries.

SLIDE 15

From Data Science to Economics:

2. New Tools for Causal Inference
When estimating the causal effect of a given treatment,

we want to compare the observed outcome with the hypothetical outcome in the absence of the treatment (counterfactual).

Machine learning methods can be used to build the best

predictive model for the counterfactual without the (sometimes excessive) monetary costs of running a randomized controlled experiment.

Ex.: compare actual visits to a website following an

advertisement campaign (observed outcome) to the predicted visits absent the advertisement (counterfactual outcome) using time series data on past visits, seasonal effects, data on Google queries (pages 22-24 Varian 2014).

SLIDE 16

Allow to flexibly control for a large number of covariates

and for heterogeneous effects.

Very promising and emerging literature that combines

machine learning methods with standard applied econometrics’ techniques to improve estimation of causal effects (Section 4.2 in Athey and Imbens 2016).

– Random forests, boosting or LASSO to estimate propensity score in the presence of many covariates (matching). – Improved LASSO through a double-selection method of selecting covariates that are correlated with the outcome, and covariates that are correlated with the treatment so that LASSO can produce causal effects (Belloni et al. 2013). – Warger and Athey (2015): modification of random forests to produce asymptotically unbiased estimates with CI.

From Data Science to Economics:

2. New Tools for Causal Inference

SLIDE 17

From Data Science to Economics:

2. New Tools for Causal Inference

Taking a broader perspective, two main types of questions:

1. Quantify how a given treatment affects the subjects:

– Examples: effect of a drug on health outcomes; effect of class size

n students’ learning; effect of an ad campaign on consumers’

spending (Varian 2014).  Classic treatment-control group comparison and machine learning can help by building counterfactuals through predictions.

2. Quantify how a given treatment affects the “experimenter”:

– Example: if I increase ad expenditure by x%, how many extra sales will I get? – The answer depends on how consumers respond to the ad, but we do not need to model how they respond.

Example: we care about an increase in the number of visits to our website rather

than in how this happened (more clicks on a given ad, more search queries, etc.)

 Machine learning is essential to build a predictive model for the counterfactual.

SLIDE 18

Example of Type 2 Question: Varian (2016, PNAS)

Example of an ad campaign.

– Question: if I increase ad expenditure by x%, how many extra sales will I get?

The advertiser increases ad spend for a given period of

time and would like to compare the sales amount after the increase with what would have happened to sales without the increase in ad expenditure.

– NOTE: this differs from “pure prediction problems” where causal inference is not necessary.

How to compute the counterfactual?

– With a predictive model using data from before treatment.

SLIDE 19

Example of Type 2 Question: Varian (2016, PNAS)

For type 1 questions (effect of treatment on subjects):

– Treated and untreated (control) groups. – Comparison of outcomes between treated and control groups.

For type 2 questions (effect of treatment on experimenter):

– All subjects are treated for a given period. – 4 step process (TTTC) to build and use a predictive model:

1. TRAIN: machine learning tools to tune model’s parameters.
2. TEST: apply the model to a test set to check how well it performs.
3. TREAT: apply the model during treatment period to predict the

counterfactual.

4. COMPARE: compare what actually happened to the treated to the

prediction (given by the model) of what would have happened without the treatment.

SLIDE 20

Example of Type 2 Question: Varian (2016, PNAS)

TTTC process is a generalization of the classic treatment-

control approach to experiments.

Key difference:

– Classic approach requires a control group, which provides an estimate of the counterfactual. – TTTC allows constructing a predictive model of the counterfactual even if we do NOT have a true control group.

NOTE: TTTC estimates only the TTE (average effect of

treatment on the treated).

SLIDE 21

Framework of Potential Outcomes (Rubin’s Causal Model)

Each individual has two potential outcomes

– Potential outcome without treatment – Potential outcome with treatment

 Treatment Effect: for each individual, but only one of the two outcomes is observed

D=1 if i receives treatment, else D=0
Observed outcome:
If individual is treated:

– is observed, – is a counterfactual

If individual is not treated:

– is observed, – is a counterfactual

SLIDE 22

Different Approaches

f Program Evaluation

1. Run an experiment and use simple differences estimator. 2. Use observational data to construct the counterfactual:

a. Selection on observables:
“Unconfoundedness assumption”: we assume to observe all X

variables that affect both participation decision or treatment (ex. completing college) and outcome of interest (ex. wages).

Diff-in-Diff
Matching
Regression discontinuity
b. Selection on unobservables
Instrumental variable estimation
Control function approach

SLIDE 23

Jokes on Data Science

SLIDE 24

However, Machine Learning is Very Helpful

Machine learning can be very helpful to implement policies

and solve real-life problems.

Example: Out for Justice project

https://seanjtaylor.github.io/out-for-justice/

Goals of the project:
1. Evaluate potential positions for patrol cars to respond to

different types of crimes fast.

2. Set up good car positions: take an existing set of patrol car

Economics for Data Science

Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it

To Explain or to Predict? (Shmueli 2010)

Causal and predictive modelling differs along 4 main dimensions:

to y).

establish how x is related to y).

prospective (predict new observations).

variance to get the best predictions).

Economics and Machine Learning

Concrete Example: Einav and Levin (2014)

Concrete Example: Einav and Levin (2014)

From Economics to Data Science:

economists (often academic economists with PhDs), statisticians and computer scientists.

From Economics to Data Science:

(Pearl 2018)

learning machines; it requires the symbiotic collaboration

the interpretation of data - a two-body problem, connecting data to reality.

they get and how skillfully they are manipulated.

From Data Science to Economics

data (e.g. images and satellite data), innovative research designs, nowcasting.

From Data Science to Economics:

(Athey and Imbens 2015)

(x) on an outcome (y) typically report the estimated impact of x on y and a measure of uncertainty of the estimate such as the standard error (se).

statistical significance of the estimate, depends on the model’s specification:

learning approach to assess the sensitivity of the point estimates to model specification.

Prediction Policy Problems (Kleinberg et al. 2015)

Prediction Policy Problems (Kleinberg et al. 2015)

estimator) and provides poor predictions.

that predicts the y value of a new data point. Goal is to minimize a loss function that we take to be

)) ( ˆ ( x f y 





x f y

)) ( ˆ ( min

f ˆ

Prediction Policy Problems (Kleinberg et al. 2015)

   

             

) ) ˆ ( ( )) ˆ ( ) ( ˆ ( ) ) ( ˆ ( ) (

y y E y E x f E y x f E x MSE

     

Prediction Policy Problems (Kleinberg et al. 2015)

exploiting this variance-bias trade off.

         

f R x f y ) ( )) ( ˆ ( min

  





  1 , ) (   d f R



Prediction and Causal Questions (Athey 2017)

From Data Science to Economics:

we want to compare the observed outcome with the hypothetical outcome in the absence of the treatment (counterfactual).

predictive model for the counterfactual without the (sometimes excessive) monetary costs of running a randomized controlled experiment.

and for heterogeneous effects.

machine learning methods with standard applied econometrics’ techniques to improve estimation of causal effects (Section 4.2 in Athey and Imbens 2016).

From Data Science to Economics:

From Data Science to Economics:

Taking a broader perspective, two main types of questions:

Example of Type 2 Question: Varian (2016, PNAS)

time and would like to compare the sales amount after the increase with what would have happened to sales without the increase in ad expenditure.

Example of Type 2 Question: Varian (2016, PNAS)

Example of Type 2 Question: Varian (2016, PNAS)

control approach to experiments.

treatment on the treated).

Framework of Potential Outcomes (Rubin’s Causal Model)

Different Approaches

1. Run an experiment and use simple differences estimator. 2. Use observational data to construct the counterfactual:

Jokes on Data Science

However, Machine Learning is Very Helpful

and solve real-life problems.

https://seanjtaylor.github.io/out-for-justice/

different types of crimes fast.

positions and show how to readjust them to improve response times.