-0.5 -80 0 Data 0 0.5 -60 T a rget 1 h -40 0.5 Fit - - PowerPoint PPT Presentation

0 5 80 0 data 0 0 5 60 t a rget 1 h 40 0 5 fit 20 1 0 y
SMART_READER_LITE
LIVE PREVIEW

-0.5 -80 0 Data 0 0.5 -60 T a rget 1 h -40 0.5 Fit - - PowerPoint PPT Presentation

PSfrag replaements PSfrag replaements -0.8 -0.6 PSfrag replaements -0.4 -0.2 Fit 0 Review of Leture 11 Fitting the noise, sto hasti/deterministi 0.2 0.4 Deterministi noise Overtting 0.6 0.8 1 -1 Fitting the


slide-1
SLIDE 1 Review
  • f
Le ture 11
  • Overtting
Fitting the data mo re than is w a rranted PSfrag repla ements

x y

Fit Data T a rget Fit
  • 1
  • 0.5
0.5 1
  • 30
  • 20
  • 10
10 V C allo ws it; do esn't p redi t it Fitting the noise, sto hasti /deterministi
  • Deterministi
noise PSfrag repla ements

x y h∗ f

  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8 1
  • 100
  • 80
  • 60
  • 40
  • 20
20 40 PSfrag repla ements Numb er
  • f
data p
  • ints, N
T a rget
  • mplexit
y , Qf 0.5 1 80 100 120
  • 0.2
  • 0.1
0.1 0.2 50 100
slide-2
SLIDE 2 Lea rning F rom Data Y aser S. Abu-Mostafa Califo rnia Institute
  • f
T e hnology Le ture 12: Regula rization Sp
  • nso
red b y Calte h's Provost O e, E&AS Division, and IST
  • Thursda
y , Ma y 10, 2012
slide-3
SLIDE 3 Outline
  • Regula
rization
  • info
rmal
  • Regula
rization
  • fo
rmal
  • W
eight de a y
  • Cho
  • sing
a regula rizer

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 2/21
slide-4
SLIDE 4 T w
  • app
roa hes to regula rization Mathemati al: Ill-p
  • sed
p roblems in fun tion app ro ximation Heuristi : Handi apping the minimization
  • f E
in

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 3/21
slide-5
SLIDE 5 A familia r example PSfrag repla ements

x y

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8 1
  • 8
  • 6
  • 4
  • 2
2 4 6 without regula rization with regula rization PSfrag repla ements

x y

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8 1
  • 2
  • 1.5
  • 1
  • 0.5
0.5 1 1.5 2

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 4/21
slide-6
SLIDE 6 and the winner is . . . without regula rization with regula rization PSfrag repla ements

x y ¯ g(x) sin(πx)

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8 1
  • 3
  • 2
  • 1
1 2 3 PSfrag repla ements

x y ¯ g(x) sin(πx)

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 0.6 0.8 1
  • 1.5
  • 1
  • 0.5
0.5 1 1.5 bias = 0.21 va r = 1.69 bias = 0.23 va r = 0.33

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 5/21
slide-7
SLIDE 7 The p
  • lynomial
mo del

H

q : p
  • lynomials
  • f
  • rder Q
linea r regression in Z spa e

z =      1 L1(x)

. . .

L

q(x)

     H

q =

  

Q

  • q=0

wq Lq(x)   

Legendre p
  • lynomials:
PSfrag repla ements

L1

x

PSfrag repla ements

L2

1 2(3x2 − 1)

PSfrag repla ements

L3

1 2(5x3 − 3x)

PSfrag repla ements

L4

1 8(35x4 − 30x2 + 3)

PSfrag repla ements

L5

1 8(63x5 · · · )

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 6/21
slide-8
SLIDE 8 Un onstrained solution Given

(x1, y1), · · · , (xN, yn) − → (z1, y1), · · · , (zN, yn)

Minimize

E

in(w) =

1 N

N

  • n=1

(w

Tzn − yn)2 Minimize

1 N (Zw − y)

T(Zw − y)

w

lin = (Z TZ)−1Z Ty

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 7/21
slide-9
SLIDE 9 Constraining the w eights Ha rd
  • nstraint:

H2

is
  • nstrained
version
  • f H10
with wq = 0 fo r q > 2 Softer version:

Q

  • q=0

w2

q ≤ C

  • soft-o
rder
  • nstraint
Minimize

1 N (Zw − y)

T(Zw − y) subje t to:

w

Tw ≤ C Solution: w reg instead
  • f w
lin

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 8/21
slide-10
SLIDE 10 Solving fo r w reg

w

lin

w

tw = C

w E

in =
  • nst.

∇E

in normal Minimize

E

in(w) =

1 N (Zw − y)

T(Zw − y) subje t to:

w

Tw ≤ C

∇E

in(w reg) ∝ −w reg

= −2 λ

Nw

reg

∇E

in(w reg) + 2 λ

Nw

reg = 0 Minimize

E

in(w) + λ

Nw

Tw

C ↑ λ ↓

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 9/21
slide-11
SLIDE 11 Augmented erro r Minimizing

E

aug(w) = E in(w) + λ

Nw

Tw

=

1 N (Zw − y)

T(Zw − y) + λ

Nw

Tw un onditionally

solves − Minimizing

E

in(w) =

1 N (Zw − y)

T(Zw − y) subje t to:

w

Tw ≤ C

← −

V C fo rmulation

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 10/21
slide-12
SLIDE 12 The solution Minimize

E

aug(w) = E in(w) + λ

Nw

Tw

= 1 N

  • (Zw − y)
T(Zw − y) + λ w tw
  • ∇E
aug(w) = 0

= ⇒ Z

T(Zw − y) + λw = 0

w

reg = (Z TZ + λI)−1 Z Ty (with regula rization) as
  • pp
  • sed
to

w

lin = (Z TZ)−1Z Ty (without regula rization)

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 11/21
slide-13
SLIDE 13 The result Minimizing

E

in(w) + λ

N w

Tw fo r dierent λ 's:

λ = 0 λ = 0.0001 λ = 0.01 λ = 1

PSfrag repla ements

x y

Fit Data T a rget Fit
  • 1
  • 0.5
0.5 1
  • 30
  • 20
  • 10
10 PSfrag repla ements

x y

  • 1
  • 0.5
0.5 1 0.5 1 1.5 2 PSfrag repla ements

x y

  • 1
  • 0.5
0.5 1 0.5 1 1.5 2 PSfrag repla ements

x y

  • 1
  • 0.5
0.5 1 0.5 1 1.5 2
  • vertting

− → − → − → − →

undertting

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 12/21
slide-14
SLIDE 14 W eight `de a y' Minimizing

E

in(w) + λ

N w

Tw is alled w eight de a y. Why? Gradient des ent:

w(t + 1) = w(t) − η ∇E

in
  • w(t)
  • − 2 η λ

N w(t) = w(t) (1 − 2η λ N) − η ∇E

in
  • w(t)
  • Applies
in neural net w
  • rks:

w

Tw =

L

  • l=1

d(l−1)

  • i=0

d(l)

  • j=1
  • w(l)

ij

2

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 13/21
slide-15
SLIDE 15 V a riations
  • f
w eight de a y Emphasis
  • f
ertain w eights:

Q

  • q=0

γq w2

q

Examples:

γq = 2q = ⇒

lo w-o rder t

γq = 2−q = ⇒

high-o rder t Neural net w
  • rks:
dierent la y ers get dierent γ 's Tikhonov regula rizer:

w

TΓ TΓw

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 14/21
slide-16
SLIDE 16 Even w eight gro wth! W e ` onstrain' the w eights to b e la rge
  • bad!
Pra ti al rule: sto hasti noise is `high-frequen y' deterministi noise is also non-smo
  • th

= ⇒

  • nstrain
lea rning to w a rds smo
  • ther
hyp
  • theses
PSfrag repla ements Regula rization P a rameter, λ Exp e ted E
  • ut
w eight gro wth w eight de a y 0.5 1 1.5 0.5 1 1.5 2 2.5 3 3.5 4

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 15/21
slide-17
SLIDE 17 General fo rm
  • f
augmented erro r Calling the regula rizer Ω = Ω(h) , w e minimize

E

aug(h) = E in(h) + λ

NΩ(h)

Rings a b ell?

↓ ↓ E

  • ut(h) ≤ E
in(h) +

Ω(H) E

aug is b etter than E in as a p ro xy fo r E
  • ut

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 16/21
slide-18
SLIDE 18 Outline
  • Regula
rization
  • info
rmal
  • Regula
rization
  • fo
rmal
  • W
eight de a y
  • Cho
  • sing
a regula rizer

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 17/21
slide-19
SLIDE 19 The p erfe t regula rizer Ω Constraint in the `dire tion'
  • f
the ta rget fun tion (going in ir les ) Guiding p rin iple: Dire tion
  • f
smo
  • ther
  • r
simpler Chose a bad Ω? W e still have λ ! regula rization is a ne essa ry evil

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 18/21
slide-20
SLIDE 20 Neural-net w
  • rk
regula rizers W eight de a y: F rom linea r to logi al PSfrag repla ements linea r tanh ha rd threshold

+1 −1

  • 4
  • 2
2 4
  • 1
  • 0.5
0.5 1 W eight elimination: F ew er w eights =

smaller V C dimension Soft w eight elimination:

Ω(w) =

  • i,j,l
  • w(l)

ij

2 β2 +

  • w(l)

ij

2

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 19/21
slide-21
SLIDE 21 Ea rly stopping as a regula rizer

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0.5 1 1.5 2 2.5 3 3.5 Epochs Error

E E

top bottom

Early stopping in

  • ut
Regula rization through the
  • ptimizer!
When to stop? validation

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 20/21
slide-22
SLIDE 22 The
  • ptimal λ
PSfrag repla ements Regula rization P a rameter, λ Exp e ted E
  • ut

σ2 = 0 σ2 = 0.25 σ2 = 0.5

0.5 1 1.5 2 .25 0.5 .75 1 Sto hasti noise Deterministi noise PSfrag repla ements Regula rization P a rameter, λ Exp e ted E
  • ut

Qf = 15 Qf = 30 Qf = 100

0.5 1 1.5 2 0.2 0.4 0.6

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 12 21/21