CS109A Introduction to Data Science
Pavlos Protopapas , Kevin Rader and Chris Tanner
Advanced Section #3: Methods of Regularization and their justifications
1
Advanced Section #3: Methods of Regularization and their - - PowerPoint PPT Presentation
Advanced Section #3: Methods of Regularization and their justifications Robbert Struyven and Pavlos Protopapas (viz. Camilo Fosco) CS109A Introduction to Data Science Pavlos Protopapas , Kevin Rader and Chris Tanner 1 Outline Motivation for
1
CS109A, PROTOPAPAS, RADER
2
CS109A, PROTOPAPAS, RADER
3
Why do we regularize?
4
CS109A, PROTOPAPAS, RADER
5
CS109A, PROTOPAPAS, RADER
6
CS109A, PROTOPAPAS, RADER
7
/012 /034
The variance of the estimator is affected by the irreducible noise
control over this. But the variance also depends on the predictors themselves! This is the important part.
CS109A, PROTOPAPAS, RADER
9
8
Perturbations Condition number of π+π
CS109A, PROTOPAPAS, RADER
9
Image from βInstability of Least Squares, Least Absolute Deviation and Least Median of Squares Linear Regressionβ, Ellis et al. (1998)
CS109A, PROTOPAPAS, RADER
10
Instability destroyer
11
CS109A, PROTOPAPAS, RADER
12
# + π| πΎ |# #
Regularization factor
CS109A, PROTOPAPAS, RADER
13
Eigendecompostion
Ξ," = π"
,"
β± πV
," .
CS109A, PROTOPAPAS, RADER
14
Added constant π
CS109A, PROTOPAPAS, RADER
# = π+π = 1).
15
CS109A, PROTOPAPAS, RADER
# < πΉ
#
16
CS109A, PROTOPAPAS, RADER
17
CS109A, PROTOPAPAS, RADER
18
# # + π| πΎ |# #
interpretation)
CS109A, PROTOPAPAS, RADER
9 b
bcd π β ππΎ #
#
β
# #
19
CS109A, PROTOPAPAS, RADER
π + π«π π π
,ππππ
20
Monsieur Ridge
Tikhonov Matrix
CS109A, PROTOPAPAS, RADER
21
The ridge estimator is where the constraint and the loss intersect. The values of the coefficients decrease as lambda increases, but they are not nullified. Ridge estimator
CS109A, PROTOPAPAS, RADER
22
Ridge curves the loss function in colinear problems, avoiding instability.
Yes, LASSO is an acronym
23
CS109A, PROTOPAPAS, RADER
24
9
# + π πΎ "
CS109A, PROTOPAPAS, RADER
25
CS109A, PROTOPAPAS, RADER
26
CS109A, PROTOPAPAS, RADER
27
CS109A, PROTOPAPAS, RADER
28
CS109A, PROTOPAPAS, RADER
# + π πΎ "
/ # to simplify the equations.
29
CS109A, PROTOPAPAS, RADER
+π§ + π,
+π§,
+π§ β π,
30
CS109A, PROTOPAPAS, RADER
+π§ β πΛ ππ π¦F +π§ > πΛ
+π§ + πΛ ππ βπ¦F + π§ > πΛ
+π§ β π‘πππ π¦F +π§ β πΛ ππ |π¦F +π§| > πΛ
31
CS109A, PROTOPAPAS, RADER
+π§
+π§ < 0 β πβ² > βπ¦F +π§
+π§ > 0 β πβ² > π¦F +π§
32
CS109A, PROTOPAPAS, RADER
/βΉ = Ε0 πβ² > |π¦F +π§|
+π§ β π‘πππ π¦F +π§ β πΛ,
+π§|
+π§| is smaller than π/2.
33
CS109A, PROTOPAPAS, RADER
Ε½
34
CS109A, PROTOPAPAS, RADER
35
The Lasso estimator tends to zero out parameters as the OLS loss can easily intersect with the constraint on one of the axis. The values of the coefficients decrease as lambda increases, and are nullified fast. Lasso estimator
Estimators, assemble
36
CS109A, PROTOPAPAS, RADER
37
CS109A, PROTOPAPAS, RADER
9
# + π" πΎ " + π# πΎ # #
38
LASSO Ridge
CS109A, PROTOPAPAS, RADER
9
# + π [π½ πΎ " + 1 β π½
#]
/v /vΕ½/b .
39
CS109A, PROTOPAPAS, RADER
b c β’ ππΎ β π #
40
Visualization is key
41
CS109A, PROTOPAPAS, RADER
42
CS109A, PROTOPAPAS, RADER
43
Elastic Net
CS109A, PROTOPAPAS, RADER
44
βThe right way of looking at itβ - Kevin Rader, probably
45
CS109A, PROTOPAPAS, RADER
46
CS109A, PROTOPAPAS, RADER
47
CS109A, PROTOPAPAS, RADER
9 π(πΎ|π)
48
CS109A, PROTOPAPAS, RADER
49
CS109A, PROTOPAPAS, RADER
N=32
50
Prior
N=0 N=500 True beta
CS109A, PROTOPAPAS, RADER
9
9
51
#
#
CS109A, PROTOPAPAS, RADER
9
9
# + 2π# ," πΎ # #
9
9
# + π," πΎ "
52
CS109A, PROTOPAPAS, RADER
53
CS109A, PROTOPAPAS, RADER
54
CS109A, PROTOPAPAS, RADER
55
# π·Β± ," # exp β 1
,"π
CS109A, PROTOPAPAS, RADER
#
Β±
,"π
56 1 There is an easy formula to automatically obtain the betas as well, available in chapter 13, p. 464 of
Murphyβs βMachine Learning β A Probabilistic Perspectiveβ.
57
CS109A, PROTOPAPAS, RADER
#
58
This πF
# is the coefficient of determination
X as predictors.
CS109A, PROTOPAPAS, RADER
," #
59
CS109A, PROTOPAPAS, RADER
9βββΒΉ πβπΎβ β πβ # # + πΏ πΎβ "
,v
bπΎ
60
LASSO problem!