Introduction to Data Science
Winter Semester 2019/20 Oliver Ernst
TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik
Introduction to Data Science Winter Semester 2019/20 Oliver Ernst - - PowerPoint PPT Presentation
Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory 2.1 What is Statistical Learning?
TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik
1 What is Data Science? 2 Learning Theory
3 Linear Regression
4 Classification
5 Resampling Methods
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 3 / 462
6 Linear Model Selection and Regularization
7 Nonlinear Regression Models
8 Tree-Based Methods
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 4 / 462
9 Unsupervised Learning
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 5 / 462
2 Learning Theory
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 26 / 462
2 Learning Theory
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 27 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 28 / 462
50 100 200 300 5 10 15 20 25 TV Sales 10 20 30 40 50 5 10 15 20 25 Radio Sales 20 40 60 80 100 5 10 15 20 25 Newspaper Sales Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 29 / 462
50 100 200 300 5 10 15 20 25 TV Sales 10 20 30 40 50 5 10 15 20 25 Radio Sales 20 40 60 80 100 5 10 15 20 25 Newspaper Sales
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 29 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 30 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 31 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 32 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 32 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 33 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 33 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
irreducible
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
irreducible
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 34 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 35 / 462
By Dvortygirl - Own work, CC BY-SA 3.0
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 36 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 37 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 37 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 37 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 38 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 39 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 39 / 462
1 Assume specific functional form for f , popular example is the linear model
2 Train or fit the chosen model to the data, i.e., choose parameters {βj}p j=0
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 40 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 41 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 42 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 43 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 44 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 45 / 462
Y e a r s
E d u c a t i
Seniority Income
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 45 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 46 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 46 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 47 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 48 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 48 / 462
i=1 fall into (more or less) distinct groups.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 48 / 462
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 2 4 6 8 X1 X1 X2 X2
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 49 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 50 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 51 / 462
2 Learning Theory
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 52 / 462
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 53 / 462
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 53 / 462
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 53 / 462
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 53 / 462
20 40 60 80 100 2 4 6 8 10 12 X Y 2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 Flexibility Mean Squared Error
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 54 / 462
20 40 60 80 100 2 4 6 8 10 12 X Y 2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 Flexibility Mean Squared Error
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 55 / 462
20 40 60 80 100 −10 10 20 X Y 2 5 10 20 5 10 15 20 Flexibility Mean Squared Error
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 56 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 57 / 462
Source: anorak.co.uk
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 58 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 59 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 59 / 462
2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 Flexibility 2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 Flexibility 2 5 10 20 5 10 15 20 Flexibility MSE Bias Var
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 60 / 462
n
yi}
yi} =
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 61 / 462
n
yi}
yi} =
y0} over a test set of obser-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 61 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 62 / 462
X2
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 63 / 462
j
j
j
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 64 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 65 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 66 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 67 / 462
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 68 / 462
0.01 0.02 0.05 0.10 0.20 0.50 1.00 0.00 0.05 0.10 0.15 0.20 1/K Error Rate Training Errors Test Errors
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 69 / 462