Prediction, Estimation, and Attribution Bradley Efron - PowerPoint PPT Presentation

Prediction, Estimation, and Attribution Bradley Efron brad@stat.stanford.edu Department of Statistics Stanford University

Regression Gauss (1809), Galton (1877) › Prediction random forests, boosting, support vector machines, neural nets, deep learning › Estimation OLS, logistic regression, GLM: MLE › Attribution (significance) ANOVA, lasso, Neyman–Pearson Bradley Efron, Stanford University Prediction, Estimation, and Attribution 2 36

Estimation Normal Linear Regression › Observe y i = — i + › i for i = 1 ; : : : ; n — i = x t i ˛ x i a p -dimensional covariate y n = X n ˆ p ˛ p + ǫ n › i ‰ N (0 ; ff 2 ) ˛ unknown › Surface plus noise y = µ ( x ) + ǫ › Surface f µ ( x ) ; x 2 Xg : codes scientific truth (hidden by noise) › Newton’s second law acceleration = force / mass Bradley Efron, Stanford University Prediction, Estimation, and Attribution 3 36

Newton's 2nd law: acceleration=force/mass a c c e l e r a t i o n mass e c r o f Bradley Efron, Stanford University Prediction, Estimation, and Attribution 4 36

If Newton had done the experiment A c c e l e r a t i o n mass e c r o f Bradley Efron, Stanford University Prediction, Estimation, and Attribution 5 36

Example The Cholesterol Data › n = 164 men took cholostyramine › Observe ( c i ; y i ) c i = normalized compliance (how much taken) y i = reduction in cholesterol y i = x t › Model i ˛ + › i x t i = (1 ; c i ; c 2 i ; c 3 › i ‰ N (0 ; ff 2 ) i ) › n = 164 , p = 4 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 6 36

OLS cubic regression: cholesterol decrease vs normalized compliance; bars show 95% confidence intervals for the curve. Adj Rsquared =.481 ● ● 100 ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● cholesterol decrease ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 ● ● ● ● −2 −1 0 1 2 normalized compliance sigmahat=21.9; only intercept and linear coefs significant Bradley Efron, Stanford University Prediction, Estimation, and Attribution 7 36

Neonate Example › n = 800 babies in an African facility › 600 lived, 200 died › 11 covariates: apgar score, body weight, . . . › Logistic regression n = 800 , p = 11 ‰ 800 ˆ 11 ; binomial) glm ( y X 800 y i = 1 or 0 as baby dies or lives x i = i th row of X (vector of 11 covariates) › Linear logistic surface, Bernoulli noise Bradley Efron, Stanford University Prediction, Estimation, and Attribution 8 36

Output of logistic regression program predictive error 15% estimate st.error z -value p -value gest ` : 474 .163 ` 2 : 91 .004** ap ` : 583 .110 ` 5 : 27 .000*** bwei ` : 488 .163 ` 2 : 99 .003** resp .784 .140 5.60 .000*** cpap .271 .122 2.21 .027* ment 1.105 .271 4.07 .000*** ` : 089 ` : 507 rate .176 .612 hr .013 .108 .120 .905 head .103 .111 .926 .355 gen ` : 001 .109 ` : 008 .994 temp .015 .124 .120 .905 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 9 36

Prediction Algorithms Random Forests, Boosting, Deep Learning, . . . › Data d = f ( x i ; y i ) ; i = 1 ; 2 ; : : : ; n g y i = response x i = vector of p predictors (Neonate: n = 800 , p = 11 , y = 0 or 1) › Prediction rule f ( x; d ) New ( x , ?) gives ^ y = f ( x; d ) Go directly for high predictive accuracy; › Strategy forget (mostly) about surface + noise › Machine learning Bradley Efron, Stanford University Prediction, Estimation, and Attribution 10 36

Classification Using Regression Trees › n cases: n 0 = “0” and n 1 = “1” › p predictors (features) (Neonate: n = 800 ; n 0 = 600 ; n 1 = 200 ; p = 11 ) › Split into two groups with predictor and split value chosen to maximize difference in rates › Then split the splits, etc.. . . (some stopping rule) Bradley Efron, Stanford University Prediction, Estimation, and Attribution 11 36

Classification Tree: 800 neonates, 200 died ( <<−− lived died −−>> ) cpap< 0.6654 | gest>=−1.672 gest>=−1.941 ap>=−1.343 1 0 1 1/40 resp< 1.21 544/73 3/11 1 5/22 ● 0 1 worst bin 39/29 13/32 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 12 36

Random Forests Breiman (2001) 1. Draw a bootstrap sample of original n cases 2. Make a classification tree from the bootstrap data set except at each split use only a random subset of the p predictors 3. Do all this lots of times ( ı 1000) 4. Prediction rule For any new x predict ^ y = majority of the 1000 predictions Bradley Efron, Stanford University Prediction, Estimation, and Attribution 13 36

The Prostate Cancer Microarray Study › n = 100 men: 50 prostate cancer, 50 normal controls › For each man measure activity of p = 6033 genes › Data set d is 100 ˆ 6033 matrix (“wide”) › Wanted: Prediction rule f ( x; d ) that inputs new 6033-vector x and outputs ^ y correctly predicting cancer/normal Bradley Efron, Stanford University Prediction, Estimation, and Attribution 14 36

Random Forests for Prostate Cancer Prediction › Randomly divide the 100 subjects into “training set” of 50 subjects (25 + 25) “test set” of the other 50 (25 + 25) › Run R program randomforest on the training set › Use its rule f ( x; d train ) on the test set and see how many errors it makes Bradley Efron, Stanford University Prediction, Estimation, and Attribution 15 36

Prostate cancer prediction using random forests Black is cross−validated training error, Red is test error rate train err 5.9% 0.5 test err 2.0% 0.4 0.3 error 0.2 0.1 0.0 0 100 200 300 400 500 number trees Bradley Efron, Stanford University Prediction, Estimation, and Attribution 16 36

Now with boosting algorithm 'gbm' error rates 0.5 train 0%, test=4% 0.4 0.3 err.rate 0.2 0.1 0.0 0 100 200 300 400 # tree Bradley Efron, Stanford University Prediction, Estimation, and Attribution 17 36

Now using deep learning (“Keras”) # parameters = 780 ; 738 1.0 data training 0.9 validation 0.8 acc 0.7 0.6 0.5 0 100 200 300 400 500 epoch Bradley Efron, Stanford University Prediction, Estimation, and Attribution 18 36

Prediction is Easier than Estimation › Observe ind x 1 ; x 2 ; x 3 ; : : : ; x 25 ‰ N ( —; 1) ? x = mean, x = median — › Estimation  ? ff ffi x ) 2 n x ) 2 o E ( — ` E ( — ` — = 1 : 57 › Wish to predict new X 0 ‰ N ( —; 1) › Prediction  ? ff ffi x ) 2 n x ) 2 o E ( X 0 ` E ( X 0 ` — = 1 : 02 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 19 36

Prediction is Easier than Attribution ind N genes: z j ‰ N ( ‹ j ; 1) ; j = 1 ; 2 ; : : : ; N › Microarray study N 0 : ‹ j = 0 (null genes) N 1 : ‹ j > 0 (non-null) 8 + sick > < › New subject’s microarray: x j ‰ N ( ˚ ‹ j ; 1) ` healthy > : › Prediction „ « 1 = 2 Possible if N 1 = O N 0 › Attribution Requires N 1 = O ( N 0 ) › Prediction allows accrual of “weak learners” Bradley Efron, Stanford University Prediction, Estimation, and Attribution 20 36

Prediction and Medical Science › Random forest test set predictions made only 1 error out of 50! › Promising for diagnosis › Not so much for scientific understanding › Next “Importance measures” for the predictor genes Bradley Efron, Stanford University Prediction, Estimation, and Attribution 21 36

Importance measures for genes in randomForest prostate analysis; Top two genes # 1022 and 5569 0.35 ● ● 0.30 0.25 0.20 Importance ● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 0.00 0 20 40 60 80 100 index Bradley Efron, Stanford University Prediction, Estimation, and Attribution 22 36

Were the Test Sets Really a Good Test? › Prediction can be highly context-dependent and fragile › Before Randomly divided subjects into “training” and “test” › Next 50 earliest subjects for training 50 latest for test both 25 + 25 Bradley Efron, Stanford University Prediction, Estimation, and Attribution 23 36

Random Forests: Train on 50 earliest, Test on 50 latest subjects; Test error was 2%, now 24% 0.5 train err 0% test err 24% 0.4 0.3 error 0.2 0.1 before 2% ● 0.0 0 100 200 300 400 500 number trees Bradley Efron, Stanford University Prediction, Estimation, and Attribution 24 36

Prediction, Estimation, and Attribution Bradley Efron - PowerPoint PPT Presentation

Prediction, Estimation, and Attribution Bradley Efron brad@stat.stanford.edu Department of Statistics Stanford University Regression Gauss (1809), Galton (1877) Prediction random forests, boosting, support vector machines, neural nets,

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Attribution Models and Implications HFMA Managed Care Education Committee July 16, 2014 Tim Ford

ATTRIBUTION FOR TV ADVERTISING June 20, 2019 Presented By: BY THE END OF THE SESSION YOU WILL

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

Health in Alabamas Third Century Dr. Monica Baskin HE HEALTH TH IN IN ALABAMAS THIRD

D STATE EMPLOYEE BENEFITS COMMITTEE PCSK9 CHOLESTEROL MEDICATIONS July 27, 2015 High

Familial hypercholesterolemia, PCSK9 inhibition, and other lipid biomarkers of cardiovascular

The Sk Skinny inny on Lean Beef Everybody Needs Activity And Good Nutrition! Todays Beef is

16% http://www.diabetes.org OECD 2015 We are not only sicker , but we are also exporting our

How to deal with haemolysed and hyperlipidemic samples: an EBF perspective Presenter: Benno

Glycogen function Cellular glucose store Liver Body sump Brain Local

Michael DAmato Trans Disciplinary University: Bangalaru India The Pursuit of Understanding: