Economics for Data Science Chiara Binelli Academic year 2019-2020 - PowerPoint PPT Presentation

Economics for Data Science Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it

Data Science and Economics • Economics approach: 1. Have a theory that identifies a relationship of interest (ex. impact of completing college on wages). 2. Estimate the impact of a treatment (ex. completing college) on an outcome variable (ex. wages) holding everything else constant.  Focus on some coefficients of interest to estimate causal effects.  Effort to estimate unbiased effects with carefully constructed standard errors. • Data Science approach – data-driven approach: 1. Predict how a given outcome varies with a large number of potential predictors. 2. May or not use prior theory to establish which predictors are relevant.  Data-driven model selection to identify meaningful predictive variables.  Less attention to statistical uncertainty and standard errors and more to model uncertainty.

Data Science and Economics Two main limitations of the Data Science approach: 1. Lack of theory: data-driven approach (predictive models are chosen using data driven cross-validation methods). 2. Lack of statistical significance: focus on predictions that minimize mean-squared errors without much attention to statistical significance since the exact source of variation identifying the prediction is difficult to assess. – Thus, bias is allowed in order to reduce variance . – Example : LASSO penalizes the inclusion of covariates so that if two covariates are correlated only one will be included and its parameter will reflect the impact of both included and excluded;  OVB !

Data Science and Economics • Economists often interested in assessing the effectiveness of a policy or testing theories that predict a causal relationship – Main goal is to identify statistically significant causal effects. – A model with high degree of predictive fit is seen as secondary to finding an empirical specification that identifies a causal effect. • Common Data Science techniques such as classification and regression trees, lasso, boosting, and cross- validation have not been much used in Economics.

Data Science and Economics • Concrete example (Einav and Levin 2014): assess if taking online classes improves earnings. • Economics approach : – Either design an experiment that induces some workers to take on line classes for reasons unrelated to their earning potential. • e.g. change in the price of online classes. – Or, absent the experiment, use observational data to estimate the impact of online classes on earnings in an unbiased way. – Focus on: • Obtaining a point estimate of the impact of online classes on earnings that is precisely estimated. • Discussing whether there are omitted variables that might confound a causal interpretation (e.g. workers’ ambition driving a decision to take classes and work harder at the same time).

Data Science and Economics • Data Science approach : – Identify which variables predict earnings, given a vast set of predictors in the data, and the potential for building a model that predicts earnings well, both in sample and out of sample. – Focus on: • Model that predicts earnings both for individuals that have and have not taken online classes. – NOTE : focus is not on causal effect and statistical significance but rather on prediction.

Machine Learning and Statistical Inference • Flexibility of machine learning algorithms means that two different functions that use different variables can produce similar predictions: – In traditional estimation, large standard errors express the uncertainty in attributing effects. – In machine learning, lack of consistency in model’s selection – how to measure this? • Computing standard errors in machine learning algorithms is difficult due to the data-driven approach: – Leeb and Potscher (2006, 2008) develop conditions under which it is impossible to compute consistent estimates of model parameters after data-driven selection.

Big Data and Statistical Inference • When big data represent all the data for a given set of variables, should we compute standard errors? • Very much YES! – The error of a model comes from two sources: omitted variables and measurement error • Omitted variables error: some relevant explanatory variables are omitted, thus get in then error term. • Measurement error: the dependent variable is measured with error.

Big Data and Statistical Inference • Sample error is very different from model error: it is the difference between the sample-based regression results and the results based on the full population. • Probability theory tells us that for a well constructed sample, regression coefficients are unbiased estimates of population regression coefficients. • Tests of statistical significance are relevant both for samples and for entire data populations. – To read more on this, see Babones, S. J. 2013. Statistical Modeling with Cross-Sectional Designs , Chapter 5, pp. 107-118.

Data Science and Economics • Due to theory, the Economics approach is more interpretable in terms of which variation identifies the impact of interest and its statistical significance. • The Data Science approach is better for predictions: – Examples: comparison of performance of OLS vs machine learning algorithms (regression trees, random forest, LASSO, ensemble) in Mullainathan and Spiess (2017 Journal of Economic Perspectives ); advantages of using ensembles methods to improve predictions (Athey et al. 2019). – Intuition: machine learning algorithms easily allow introducing pairwise interactions between all potential predictors. • The two approaches have mutual benefits.

Economics for Data Science • From Economics to Data Science: 2 main contributions 1. Provide a theory: theory to ask interesting questions and to analyze complex big datasets. With data complexity, crucial to have models to guide choice of variables, relationships between variables, hypothesis to test and experiments to run. 2. Focus on causality: crucial to answer important questions. • From Data Science to Economics: 3 main contributions 1. Test robustness to misspecifications (Athey and Imbens 2015). 2. New tools for causal inference. 3. Better predictions.

From Economics to Data Science: 1. Provide a Theory • Example: online advertising auctions. • Important question for Google or Facebook: – Which ads to show online and how much to charge for the ads? 1. Machine learning methods to build a predictive model to assess the likelihood that a user will click on an ad. By exploiting the enormous amount of data available online, this predictive model tells us which ads to show. 2. Economic theory to build auction models to set prices. • Several e-commerce companies have built teams of economists (often with PhDs in Economics), statisticians and computer scientists.

From Economics to Data Science: 1. Provide a Theory • A theory is a way to investigate the mechanism through which X affects Y. It is a way to make ML interpretable. • “Interpretable machine learning”: ML field to go beyond a “black box” approach to explain the logic behind predictions. – To interpret a model, we require the following insights: 1. Identify the most important features. 2. For any single prediction, the effect of each feature in the data on that particular prediction. 3. Effect of each feature over a large number of possible predictions • Molnar (2019): https://christophm.github.io/interpretable-ml- book/ and Kaggle crash course on ML explainability: https://www.kaggle.com/learn/machine-learning-explainability

From Economics to Data Science: 2. Focus on Causality • Machine learning algorithms optimize properties of the observed data: improve performance by optimizing parameters over a set of inputs. E.g. to build a predictive model we minimize over fit. – “As long as we optimize some properties of the observed data, however noble or sophisticated, while making no reference to the world outside the observed data, we are limited to questions of association.” Pearl (2018) • However, lots of important questions involve cause-and-effect relationships. Until recently we had no mathematical framework to articulate and answer these questions. • “More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history“ Garry King, Harvard. “The Causal Revolution" (Pearl and Mackenzie, 2018).

From Economics to Data Science: 2. Focus on Causality (Pearl 2018) • Human-level AI cannot emerge solely from model-blind learning machines; it requires the symbiotic collaboration of data and models. • Data science is only as much of a science as it facilitates the interpretation of data - a two-body problem, connecting data to reality. • Data alone are hardly a science, regardless how big they get and how skillfully they are manipulated. – We need a theory to interpret the data.

Economics for Data Science Chiara Binelli Academic year 2019-2020 - PowerPoint PPT Presentation

Economics for Data Science Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it Data Science and Economics Economics approach: 1. Have a theory that identifies a relationship of interest (ex. impact of completing

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

ECONOMICS Mrs Auld & Mr Lamb Why study Economics? Careers in Economics Course

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

START BUILDING YOUR FUTURE WITH Why Study Economics? Department of Economics Understanding:

Non-neoclassical economics from late 19 th century to 1930s Historical school of economics

Scientific Economics: new vs old economics, or neoclassical economics as a pseudoscience Terry

The Role of Economics at the The Role of Economics at the Federal Trade Commission Federal Trade

The Economics of The Economics of Maori Fishing Maori Fishing Basil Sharp Basil Sharp

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Economics of Constitutional Change in Scotland March 2013 Graeme Blackett BiGGAR Economics

Department of Economics First-Year Induction Meeting for BSc Economics L100 Degree BSc Economics

2020 Lectures on Urban Economics Lecture 8: Dynamics in Spatial Economics Esteban Rossi-Hansberg

Economics of biocontrol Graeme Murphy Economics of biocontrol Economics is extremely useful as a

Neoclassical economics (1890s 1930s) Two founders of neoclassical economics: Alfred

Decline of classical economics and the rise of neoclassical economics From 1870s on, classical

Computing in Economics Keith Finlay Department of Economics Tulane University 10/26/2014

Lecture 6: Requirements Engineering 2018-05-07 Prof. Dr. Andreas Podelski, Dr. Bernd Westphal

Classical Labor Supply: Micro and Macro Elasticities ECON 34430: Topics in Labor Markets T.

A4 Gettin tting g St Star arte ted d wi with th PB PBIS Presenters: Heather Peshak

How a Patient Tragedy Fueled a Culture of Joy

No More Bad Meetings! October 19, 2015 Participation is Strongly Encouraged Type questions into

omue 2010 1 AARHUS UNIVERSITY FROM WORKSHOPS TO WALKSHOPS 17 OCTOBER 2010 MATTHIAS KORN OMUE

RIXS-hXES @ XFELs MARTINA DELLANGELA CNR-IOM, TRIESTE Outline RIXS-hXES@ XFELs I.

Edge Intelligence: the Confluence of Edge Computing and Artificial Intelligence Hailiang Zhao

Sambuz

Useful Links

Newsletter

Mail Us

Economics for Data Science Chiara Binelli Academic year 2019-2020 - PowerPoint PPT Presentation

Economics for Data Science Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it Data Science and Economics Economics approach: 1. Have a theory that identifies a relationship of interest (ex. impact of completing

WHY STUDY ECONOMICS? Choosing a major or minor in economics MYTHS OF ECONOMICS: Economics is

ECONOMICS Mrs Auld &amp; Mr Lamb Why study Economics? Careers in Economics Course

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

START BUILDING YOUR FUTURE WITH Why Study Economics? Department of Economics Understanding:

Non-neoclassical economics from late 19 th century to 1930s Historical school of economics

Scientific Economics: new vs old economics, or neoclassical economics as a pseudoscience Terry

The Role of Economics at the The Role of Economics at the Federal Trade Commission Federal Trade

The Economics of The Economics of Maori Fishing Maori Fishing Basil Sharp Basil Sharp

MSc Economics and MSc Finance Prof. Manolis Galenianos, Director MSc Economics Prof. Alessio

Economics of Constitutional Change in Scotland March 2013 Graeme Blackett BiGGAR Economics

Department of Economics First-Year Induction Meeting for BSc Economics L100 Degree BSc Economics

2020 Lectures on Urban Economics Lecture 8: Dynamics in Spatial Economics Esteban Rossi-Hansberg

Economics of biocontrol Graeme Murphy Economics of biocontrol Economics is extremely useful as a

Neoclassical economics (1890s 1930s) Two founders of neoclassical economics: Alfred

Decline of classical economics and the rise of neoclassical economics From 1870s on, classical

Computing in Economics Keith Finlay Department of Economics Tulane University 10/26/2014

Lecture 6: Requirements Engineering 2018-05-07 Prof. Dr. Andreas Podelski, Dr. Bernd Westphal

Classical Labor Supply: Micro and Macro Elasticities ECON 34430: Topics in Labor Markets T.

A4 Gettin tting g St Star arte ted d wi with th PB PBIS Presenters: Heather Peshak

How a Patient Tragedy Fueled a Culture of Joy

No More Bad Meetings! October 19, 2015 Participation is Strongly Encouraged Type questions into

omue 2010 1 AARHUS UNIVERSITY FROM WORKSHOPS TO WALKSHOPS 17 OCTOBER 2010 MATTHIAS KORN OMUE

RIXS-hXES @ XFELs MARTINA DELLANGELA CNR-IOM, TRIESTE Outline RIXS-hXES@ XFELs I.

Edge Intelligence: the Confluence of Edge Computing and Artificial Intelligence Hailiang Zhao

Sambuz

Useful Links

Newsletter

Mail Us

ECONOMICS Mrs Auld & Mr Lamb Why study Economics? Careers in Economics Course