Introduction to Data Science
Winter Semester 2018/19 Oliver Ernst
TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst - - PowerPoint PPT Presentation
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultt fr Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory 2.1 What is Statistical Learning?
TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik
1 What is Data Science? 2 Learning Theory
3 Linear Regression
4 Classification
5 Resampling Methods
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 3 / 496
6 Linear Model Selection and Regularization
7 Nonlinear Regression Models
8 Tree-Based Methods
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 4 / 496
9 Support Vector Machines
10 Unsupervised Learning
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 5 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 246 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 247 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 248 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 249 / 496
1 Set M0 to be the null model, i.e., containing only constant term β0. 2 for k = 1, 2, . . . , p a Fit all
k
b Pick best (smallest RSS, i.e., largest R2) among these, call it Mk. 3 Select single best model among M0, . . . , Mp using model selection criteri-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 250 / 496
2 4 6 8 10 2e+07 4e+07 6e+07 8e+07
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 251 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 252 / 496
1 Set M0 to be the null model, i.e., containing only constant term β0. 2 for k = 0, 1, . . . , p − 1 a Consider all p − k models augmenting Mk by one additional predictor. b Pick best (smallest RSS, i.e., largest R2) among these, call it Mk+1. 3 Select single best model among M0, . . . , Mp using model selection criteri-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 253 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 254 / 496
1 Set Mp to be the full model, containing all p predictors. 2 for k = p, p − 1, . . . , 1 a Consider all k models containing all but one of the predictors in Mk. b Pick best (smallest RSS, i.e., largest R2) among these k models, call it
3 Select single best model among M0, . . . , Mp using model selection criteri-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 255 / 496
1 Indirectly estimate test error by making an adjustment to training error to
2 Directly estimate test error using either validation set approach or cross-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 256 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 257 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 258 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 259 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 260 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 261 / 496
2 4 6 8 10 10000 15000 20000 25000 30000
Number of Predictors Cp
2 4 6 8 10 10000 15000 20000 25000 30000
Number of Predictors BIC
2 4 6 8 10 0.86 0.88 0.90 0.92 0.94 0.96
Number of Predictors Adjusted R2
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 262 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 263 / 496
2 4 6 8 10 100 120 140 160 180 200 220
Number of Predictors Square Root of BIC
2 4 6 8 10 100 120 140 160 180 200 220
Number of Predictors Validation Set Error
2 4 6 8 10 100 120 140 160 180 200 220
Number of Predictors Cross−Validation Error
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 264 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 265 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 266 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 267 / 496
n
p
2
2.
n
p
2
p
j = RSS +λ˜
2,
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 268 / 496
2, which shrinks
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 269 / 496
1e−02 1e+00 1e+02 1e+04 −300 −100 100 200 300 400
Standardized Coefficients Income Limit Rating Student
0.0 0.2 0.4 0.6 0.8 1.0 −300 −100 100 200 300 400
Standardized Coefficients
λ 2/ˆ
R λ 2/ˆ
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 270 / 496
j,λXj depends on λ
n
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 271 / 496
1e−01 1e+01 1e+03 10 20 30 40 50 60
0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 60
λ 2/ˆ
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 272 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 273 / 496
n
p
2
p
j,λ being ex-
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 274 / 496
20 50 100 200 500 2000 5000 −200 100 200 300 400
Standardized Coefficients
0.0 0.2 0.4 0.6 0.8 1.0 −300 −100 100 200 300 400
Standardized Coefficients Income Limit Rating Student
λ 1/ˆ
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 275 / 496
L λ = arg min β n
p
2
R λ = arg min β n
p
2
2 ≤ s,
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 276 / 496
β n
p
2
p
s
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 277 / 496
1 +β2 2 = s (right).
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 278 / 496
j=1 |βj|q for q < 2 progressively sharper (no longer a
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 279 / 496
0.02 0.10 0.50 2.00 10.00 50.00 10 20 30 40 50 60
0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 60
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 280 / 496
0.02 0.10 0.50 2.00 10.00 50.00 20 40 60 80 100
0.4 0.5 0.6 0.7 0.8 0.9 1.0 20 40 60 80 100
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 281 / 496
p
p
p
j
p
p
R =
L =
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 282 / 496
−1.5 −0.5 0.0 0.5 1.0 1.5 −1.5 −0.5 0.5 1.5 Coefficient Estimate Ridge Least Squares −1.5 −0.5 0.0 0.5 1.0 1.5 −1.5 −0.5 0.5 1.5 Coefficient Estimate Lasso Least Squares
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 283 / 496
j=1 g(βj) for pdf g.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 284 / 496
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 285 / 496
5e−03 5e−02 5e−01 5e+00 25.0 25.2 25.4 25.6
5e−03 5e−02 5e−01 5e+00 −300 −100 100 300
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 286 / 496
0.0 0.2 0.4 0.6 0.8 1.0 200 600 1000 1400
0.0 0.2 0.4 0.6 0.8 1.0 −5 5 10 15
λ 1/ ˆ
λ 1/ ˆ
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 287 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 288 / 496
M
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 289 / 496
M
M
p
p
M
p
m=1 φjk,mθm.
m=1.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 290 / 496
10 20 30 40 50 60 70 5 10 15 20 25 30 35
Population Ad Spending
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 291 / 496
20 30 40 50 5 10 15 20 25 30
Population Ad Spending
−20 −10 10 20 −10 −5 5 10
1st Principal Component 2nd Principal Component
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 292 / 496
Introduction to Data Science Winter Semester 2018/19 293 / 496
−3 −2 −1 1 2 3 20 30 40 50 60
−3 −2 −1 1 2 3 5 10 15 20 25 30
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 294 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 295 / 496
−1.0 −0.5 0.0 0.5 1.0 20 30 40 50 60
−1.0 −0.5 0.0 0.5 1.0 5 10 15 20 25 30
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 296 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 297 / 496
10 20 30 40 10 20 30 40 50 60 70
10 20 30 40 50 100 150
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 298 / 496
10 20 30 40 10 20 30 40 50 60 70
0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 60 70
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 299 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 300 / 496
2 4 6 8 10 −300 −100 100 200 300 400
Number of Components Standardized Coefficients Income Limit Rating Student
2 4 6 8 10 20000 40000 60000 80000
Number of Components Cross−Validation MSE
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 301 / 496
j=1 φj,1Xj to coefficient of
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 302 / 496
20 30 40 50 60 5 10 15 20 25 30
Population Ad Spending
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 303 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 304 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 305 / 496
−1.5 −1.0 −0.5 0.0 0.5 1.0 −10 −5 5 10 −1.5 −1.0 −0.5 0.0 0.5 1.0 −10 −5 5 10
X X Y Y
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 306 / 496
5 10 15 0.2 0.4 0.6 0.8 1.0
Number of Variables R2
5 10 15 0.0 0.2 0.4 0.6 0.8
Number of Variables Training MSE
5 10 15 1 5 50 500
Number of Variables Test MSE
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 307 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 308 / 496
1 16 21 1 2 3 4 5 1 28 51 1 2 3 4 5 1 70 111 1 2 3 4 5
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 309 / 496
1 Shrinkage plays key role in high dimensions. 2 Correct value of tuning parameter essential. 3 Test error increases with dimension, unless additional features informative.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 310 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 311 / 496
6 Linear Model Selection and Regularization
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 312 / 496
i β + εi, i = 1, . . . , n, for which the uncorrelated
7C.F. Gauss, 1777–1855, A.A. Markov, 1856–1922
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 313 / 496
β ,
Σ = x⊤Σ−1x.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 314 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 315 / 496
1 Representation of A as sum of rank-1 matrices:
r
k 2 Singular vector mapping properties:
3
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 316 / 496
4 Eigenspaces of AA⊤ and A⊤A:
1, . . . , σ2 r are the non-zero eigenvalues of A⊤A and AA⊤, respectively:
r O
O O
r O
O O
kvk
i uk
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 317 / 496
5 If A = A⊤ ∈ Rn×n with non-zero eigenvalues
6 The (p-dimensional) unit sphere is mapped by A to an ellipsoid (in Rn)
7 For A ∈ Rn×p there holds A2 = σ1 and AF =
1 + · · · + σ2 r .
n . 8 Anlalogous statements hold for complex-valued matrices A = UΣV H (U, V
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 318 / 496
willkürlicher Funktionen nach Systemen vorgeschriebener. Math. Ann., 63 (1907), pp. 433–476
(1936), pp. 211–218
(1960), pp. 50–59
k
i
k+1 + · · · + σ2 r .
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 319 / 496
β∈Rp
2 + λβ2 2.
2
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 320 / 496
1, . . . , σ2 p),
p
j y)vj
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 321 / 496
j + λu⊤ j y,
p
j + λ(u⊤ j y)vj.
j=1 u⊤
j y
σj vj
j
j + λ,
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 322 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 323 / 496
p
p
p
2
2
p
p
p
p
8Note that these are real and positive as C is symmetric and positive-definite.
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 324 / 496
F = WΛ1/2W ⊤2 F = C1/22 F.
k
j ,
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 325 / 496
j X. Then
j X − E
j X
j (X − µ)
j (X − µ))(X − µ)⊤wj)
j E
j Cwj = λj.
p
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 326 / 496
1 X, Z2 = φ⊤ 2 X, we conclude by
2 Cφ1
2 Cφ1 = 0,
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 327 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 328 / 496
w1,w2,...,wn−k∈Rn
0=x∈Rn x⊥w1,w2,...,wn−k
w1,w2,...,wk−1∈Rn
0=x∈Rn x⊥w1,w2,...,wk−1
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 329 / 496
2
p
p
p
j
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 330 / 496
2
M
M
M
j
j=k+1 λj
j=1 λj
j=1 λj
j=1 λj
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 331 / 496
ne⊤X = [x1, . . . , xp] and
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 332 / 496
Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 333 / 496