 
              Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides
Contents I 1 What is Data Science? 2 Learning Theory 2.1 What is Statistical Learning? 2.2 Assessing Model Accuracy 3 Linear Regression 3.1 Simple Linear Regression 3.2 Multiple Linear Regression 3.3 Other Considerations in the Regression Model 3.4 Revisiting the Marketing Data Questions 3.5 Linear Regression vs. K -Nearest Neighbors 4 Classification 4.1 Overview of Classification 4.2 Why Not Linear Regression? 4.3 Logistic Regression 4.4 Linear Discriminant Analysis 4.5 A Comparison of Classification Methods 5 Resampling Methods Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 3 / 462
Contents II 5.1 Cross Validation 5.2 The Bootstrap 6 Linear Model Selection and Regularization 6.1 Subset Selection 6.2 Shrinkage Methods 6.3 Dimension Reduction Methods 6.4 Considerations in High Dimensions 6.5 Miscellanea 7 Nonlinear Regression Models 7.1 Polynomial Regression 7.2 Step Functions 7.3 Regression Splines 7.4 Smoothing Splines 7.5 Generalized Additive Models 8 Tree-Based Methods 8.1 Decision Tree Fundamentals 8.2 Bagging, Random Forests and Boosting Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 4 / 462
Contents III 9 Unsupervised Learning 9.1 Principal Components Analysis 9.2 Clustering Methods Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 5 / 462
Contents 2 Learning Theory 2.1 What is Statistical Learning? 2.2 Assessing Model Accuracy Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 26 / 462
Contents 2 Learning Theory 2.1 What is Statistical Learning? 2.2 Assessing Model Accuracy Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 27 / 462
Learning Theory Example: Advertising channels • Given a data set containing the sales numbers for a given product in 200 markets, allocate an advertising budget across the three media channels TV , radio and newspaper . • The sales numbers for each medium are available for different advertising budget values. • We will try to model the dependence of sales on advertising budgets. • Terminology: independent variables,  X 1 : TV budget input variables,  X 2 : radio budget predictors,  X 3 : newpaper budget variables, features dependent variable, Y : sales target variable, response Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 28 / 462
Learning Theory Example: Advertising channels 25 25 25 20 20 20 Sales 15 Sales 15 Sales 15 10 10 10 5 5 5 0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 TV Radio Newspaper Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 29 / 462
Learning Theory Example: Advertising channels 25 25 25 20 20 20 Sales 15 Sales 15 Sales 15 10 10 10 5 5 5 0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100 TV Radio Newspaper X = ( X 1 , . . . , X p ) , p = # predictors , Y = f ( X ) + ε ε : random error term , E [ ε ] = 0 , (2.1) f : systematic information X provides about Y . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 29 / 462
Learning Theory Example: Income • Data set shows income against years of education for 30 people. • Objective: determine function f relating income as response to years of education as predictor. • f generally unknown, must be estimated from the data. • Here: data simulated, so f available. • In another data set, income is given with respect to two input variables: years of education and seniority . • Statistical learning is concerned with techniques for estimating f from a data set. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 30 / 462
Learning Theory Example: Income 80 80 70 70 60 60 Income Income 50 50 40 40 30 30 20 20 10 12 14 16 18 20 22 10 12 14 16 18 20 22 Years of Education Years of Education Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 31 / 462
Learning Theory Example: Income Income Seniority Y e a r s o f E d u c a t i o n Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 32 / 462
Learning Theory Example: Income Income Seniority Y e a r s Two main reasons for o f E d u estimating f : c a t i o n prediction and inference . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 32 / 462
Learning Theory Prediction • Suppose inputs X readily available, but outputs Y difficult to obtain. • Since errors average out, predict Y using ˆ f : estimate for f , Y = ˆ ˆ f ( X ) , ˆ Y : prediction for Y = f ( X ) . • Often ˆ f only available as a black box , i.e., a procedure for generating ˆ Y given X . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 33 / 462
Learning Theory Prediction • Suppose inputs X readily available, but outputs Y difficult to obtain. • Since errors average out, predict Y using ˆ f : estimate for f , Y = ˆ ˆ f ( X ) , ˆ Y : prediction for Y = f ( X ) . • Often ˆ f only available as a black box , i.e., a procedure for generating ˆ Y given X . Example: X 1 , . . . , X p : characteristics of a patient’s blood samples, measured in lab. : patient’s risk for severe adverse reaction to particular drug. Y For obvious reasons, having an accurate estimate ˆ Y = ˆ f ( X ) is preferable to evaluating Y = f ( X ) . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 33 / 462
Learning Theory Prediction Accuracy of ˆ Y ≈ Y depends on reducible error and irreducible error . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Learning Theory Prediction Accuracy of ˆ Y ≈ Y depends on reducible error and irreducible error . • reducible error : f − ˆ f . Can be made smaller and smaller by employing increasingly sophisticated statistical learning techniques. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Learning Theory Prediction Accuracy of ˆ Y ≈ Y depends on reducible error and irreducible error . • reducible error : f − ˆ f . Can be made smaller and smaller by employing increasingly sophisticated statistical learning techniques. • irreducible error : ε . Present even for f = ˆ f , cannot be predicted from X . Possible sources: • Additional variables Y may depend on but which are not observed/measured. • Unmeasurable variation. (E.g.: Adverse reaction may depend on manufacturing variations in drug or variations in patient’s sensitivity over time.) Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Learning Theory Prediction Accuracy of ˆ Y ≈ Y depends on reducible error and irreducible error . • reducible error : f − ˆ f . Can be made smaller and smaller by employing increasingly sophisticated statistical learning techniques. • irreducible error : ε . Present even for f = ˆ f , cannot be predicted from X . Possible sources: • Additional variables Y may depend on but which are not observed/measured. • Unmeasurable variation. (E.g.: Adverse reaction may depend on manufacturing variations in drug or variations in patient’s sensitivity over time.) • Quantitative measure: mean squared error (MSE) � Y ) 2 � ( Y − ˆ = [ f ( X ) − ˆ f ( X )] 2 E + Var ε � �� � � �� � irreducible reducible Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Learning Theory Prediction Accuracy of ˆ Y ≈ Y depends on reducible error and irreducible error . • reducible error : f − ˆ f . Can be made smaller and smaller by employing increasingly sophisticated statistical learning techniques. • irreducible error : ε . Present even for f = ˆ f , cannot be predicted from X . Possible sources: • Additional variables Y may depend on but which are not observed/measured. • Unmeasurable variation. (E.g.: Adverse reaction may depend on manufacturing variations in drug or variations in patient’s sensitivity over time.) • Quantitative measure: mean squared error (MSE) � Y ) 2 � ( Y − ˆ = [ f ( X ) − ˆ f ( X )] 2 E + Var ε � �� � � �� � irreducible reducible • Note: irreducible error always a lower bound on prediction accuracy. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Learning Theory Inference Inference seeks to determine how the individual predictors X 1 , . . . , X p affect the response Y . In particular, this involves more detailed knowledge about ˆ f than simply considering it a black box. Things to investigate: Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
Learning Theory Inference Inference seeks to determine how the individual predictors X 1 , . . . , X p affect the response Y . In particular, this involves more detailed knowledge about ˆ f than simply considering it a black box. Things to investigate: • Identify those predictors with the strongest effect on Y . Can be a small subset of X 1 , . . . , X p . Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
Recommend
More recommend