aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi - - PowerPoint PPT Presentation

aug h e in h onstrained
SMART_READER_LITE
LIVE PREVIEW

aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi - - PowerPoint PPT Presentation

Review of Leture 12 Cho osing a regula rizer Regula rization aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi smo oth, simple h N ( h ) E most used: w eight dea y onst. in = ( h )


slide-1
SLIDE 1 Review
  • f
Le ture 12
  • Regula
rization
  • nstrained −

un onstrained

w

lin

w

tw = C

w E

in =
  • nst.

∇E

in normal Minimize E aug(w) = E in(w) + λ

Nw

Tw
  • Cho
  • sing
a regula rizer

E

aug(h) = E in(h) + λ

N Ω(h) Ω(h)

: heuristi → smo
  • th,
simple h most used: w eight de a y

λ

: p rin ipled; validation

λ = 0.0001 λ = 1.0

PSfrag repla ements

x y

  • 1
  • 0.5
0.5 1 0.5 1 1.5 2 PSfrag repla ements

x y

  • 1
  • 0.5
0.5 1 0.5 1 1.5 2
slide-2
SLIDE 2 Lea rning F rom Data Y aser S. Abu-Mostafa Califo rnia Institute
  • f
T e hnology Le ture 13: V alidation Sp
  • nso
red b y Calte h's Provost O e, E&AS Division, and IST
  • T
uesda y , Ma y 15, 2012
slide-3
SLIDE 3 Outline
  • The
validation set
  • Mo
del sele tion
  • Cross
validation

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 2/22
slide-4
SLIDE 4 V alidation versus regula rization In
  • ne
fo rm
  • r
another,

E

  • ut(h) = E
in(h) +
  • vert
p enalt y Regula rization:

E

  • ut(h) = E
in(h) +
  • vert
p enalt y
  • regula
rization estimates this quantit y V alidation:

E

  • ut(h)
validation estimates this quantit y

= E

in(h) +
  • vert
p enalt y

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 3/22
slide-5
SLIDE 5 Analyzing the estimate On
  • ut-of-sample
p
  • int (x, y)
, the erro r is e(h(x), y) Squa red erro r:
  • h(x) − y

2

Bina ry erro r:

h(x) = y

  • E
  • e(h(x), y)
  • = E
  • ut(h)
va r
  • e(h(x), y)
  • = σ2

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 4/22
slide-6
SLIDE 6 F rom a p
  • int
to a set On a validation set (x1, y1), · · · , (xK, yK) , the erro r is E v al(h) = 1

K

K

  • k=1
e(h(xk), yk)

E

  • E
v al(h)
  • =

1 K

K

  • k=1

E

  • e(h(xk), yk)
  • = E
  • ut(h)
va r
  • E
v al(h)
  • =

1 K2

K

  • k=1
va r
  • e(h(xk), yk)
  • = σ2

K E

v al(h) = E
  • ut(h) ± O

1 √ K

  • A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 5/22
slide-7
SLIDE 7

K

is tak en
  • ut
  • f N
Given the data set D = (x1, y1), · · · , (xN, yN)

K

p
  • ints
  • D
v al

validation

N − K

p
  • ints
  • D
train

training

O

  • 1

√ K

  • :
Small K

= ⇒

bad estimate La rge K

= ⇒

? PSfrag repla ements Numb er
  • f
Data P
  • ints, N − K
Exp e ted Erro r

E

  • ut

E

in 20 40 60 80 100 120 0.05 0.1 0.15 0.2 0.25

← − K

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 6/22
slide-8
SLIDE 8

K

is put ba k into N

D − → D

train ∪ D v al

↓ ↓ ↓

N N − K K

D = ⇒ g D

train

= ⇒ g− E

v al = E v al(g−) La rge K =

bad estimate! Rule
  • f
Thumb:

K = N 5

D

v al

D

(N)

D

train

(N − K)

g

(K)

E

v al(g )

g

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 7/22
slide-9
SLIDE 9 Why `validation'

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0.5 1 1.5 2 2.5 3 3.5 Epochs Error

E E

top bottom

Early stopping in

  • ut

D

v al is used to mak e lea rning hoi es If an estimate
  • f E
  • ut
ae ts lea rning: the set is no longer a test set! It b e omes a validation set

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 8/22
slide-10
SLIDE 10 What's the dieren e? T est set is unbiased; validation set has
  • ptimisti
bias T w
  • hyp
  • theses

h1

and h2 with

E

  • ut(h1) = E
  • ut(h2) = 0.5
Erro r estimates e1 and e2 unifo rm
  • n [0, 1]
Pi k

h ∈ {h1, h2}

with e = min( e1, e2)

E(

e) < 0.5
  • ptimisti
bias

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 9/22
slide-11
SLIDE 11 Outline
  • The
validation set
  • Mo
del sele tion
  • Cross
validation

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 10/22
slide-12
SLIDE 12 Using D v al mo re than
  • n e

H1 H2 HM g1 g2 gM · · · · · · E1 · · · EM D

v al

D

train

gm∗ E2

(Hm∗, Em∗)

| {z }

pi k the b est

D

M

mo dels H1, . . . , HM Use D train to lea rn g−

m

fo r ea h mo del Evaluate g−

m

using D v al :

Em = E

v al(g−

m);

m = 1, . . . , M

Pi k mo del m = m∗ with smallest Em

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 11/22
slide-13
SLIDE 13 The bias PSfrag repla ements V alidation Set Size, K Exp e ted Erro r

E

v al
  • g−

m∗

  • E
  • ut
  • g−

m∗

  • 5
15 25 0.5 0.6 0.7 0.8 W e sele ted the mo del Hm∗ using D v al

E

v al(g−

m∗)

is a biased estimate
  • f E
  • ut(g−

m∗)

Illustration: sele ting b et w een 2 mo dels

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 12/22
slide-14
SLIDE 14 Ho w mu h bias F
  • r M
mo dels: H1, . . . , HM

D

v al is used fo r training
  • n
the nalists mo del:

H

val = {g−

1 , g− 2 , . . . , g−

m} Ba k to Ho eding and V C!

E

  • ut(g−

m∗) ≤ E

v al(g−

m∗) + O

  • ln M

K

  • regula
rization λ ea rly-stopping T

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 13/22
slide-15
SLIDE 15 Data
  • ntamination
Erro r estimates: E in, E test, E v al Contamination: Optimisti (de eptive) bias in estimating E
  • ut
T raining set: totally
  • ntaminated
V alidation set: slightly
  • ntaminated
T est set: totally ` lean'

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 14/22
slide-16
SLIDE 16 Outline
  • The
validation set
  • Mo
del sele tion
  • Cross
validation

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 15/22
slide-17
SLIDE 17 The dilemma ab
  • ut K
The follo wing hain
  • f
reasoning:

E

  • ut(g)≈
(small K )

E

  • ut(g−)≈
(la rge K )

E

v al(g−) highlights the dilemma in sele ting K : Can w e have K b
  • th
small and la rge?

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 16/22
slide-18
SLIDE 18 Leave
  • ne
  • ut

N − 1

p
  • ints
fo r training, and 1 p
  • int
fo r validation!

Dn = (x1, y1), . . . , (xn−1, yn−1),

  • (xn, yn), (xn+1, yn+1), . . . , (xN, yN)
Final hyp
  • thesis
lea rned from Dn is g−

n

en = E v al(g−

n ) =

e (g−

n (xn), yn)

ross validation erro r:

E

v = 1

N

N

  • n=1
en

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 17/22
slide-19
SLIDE 19 Illustration
  • f
ross validation PSfrag repla ements e1

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
  • 0.2
0.2 0.4 0.6 0.8 1 1.2 PSfrag repla ements e2

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 PSfrag repla ements e3

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15

E

v = 1

3 (

e1 + e2 + e3 )

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 18/22
slide-20
SLIDE 20 Mo del sele tion using CV PSfrag repla ements e1

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
  • 0.2
0.2 0.4 0.6 0.8 1 1.2 PSfrag repla ements e2

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 PSfrag repla ements e3

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 PSfrag repla ements e1

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 PSfrag repla ements e2

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 PSfrag repla ements e3

x y

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 Linea r: Constant:

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 19/22
slide-21
SLIDE 21 Cross validation in a tion Digits lassi ation task Dierent erro rs PSfrag repla ements A verage Intensit y Symmetry Not

1

Not 1 0.1 0.2 0.3 0.4
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

(1, x1, x2) → (1, x1, x2, x2

1, x1x2, x2 2, x3 1, x2 1x2, . . . , x5 1, x4 1x2, x3 1x2 2, x2 1x3 2, x1x4 2, x5 2)

PSfrag repla ements # F eatures Used

E

  • ut

E

v

E

in 5 10 15 20 0.01 0.02 0.03

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 20/22
slide-22
SLIDE 22 The result without validation with validation PSfrag repla ements A verage Intensit y Symmetry 0.05 0.1 0.15 0.2 0.25 0.3 0.35
  • 5
  • 4
  • 3
  • 2
  • 1
PSfrag repla ements A verage Intensit y Symmetry 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
  • 5
  • 4.5
  • 4
  • 3.5
  • 3
  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

E

in = 0%

E

  • ut = 2.5%

E

in = 0.8%

E

  • ut = 1.5%

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 21/22
slide-23
SLIDE 23 Leave mo re than
  • ne
  • ut
Leave
  • ne
  • ut:

N

training sessions
  • n N − 1
p
  • ints
ea h Mo re p
  • ints
fo r validation?

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

train train v alidate

D z }| {

N K

training sessions
  • n N − K
p
  • ints
ea h 10-fold ross validation: K = N

10

A

M L

Creato r: Y aser Abu-Mostafa
  • LFD
Le ture 13 22/22