aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi - PowerPoint PPT Presentation

Review of Le ture 12 Cho osing a regula rizer Regula rization aug ( h ) = E in ( h ) + λ • onstrained − un onstrained : heuristi → smo oth, simple h • N Ω( h ) E most used: w eight de a y onst. in = → Ω( h ) lin : p rin ipled; validation E normal w λ in PSfrag repla ements PSfrag repla ements w λ = 0 . 0001 λ = 1 . 0 t w = C -1 -1 -0.5 -0.5 ∇ E T w 0 0 0.5 0.5 Minimize E aug ( w ) = E in ( w ) + λ 1 1 0 0 0.5 0.5 y y 1 1 w 1.5 1.5 2 2 x x N w

Lea rning F rom Data Y aser S. Abu-Mostafa Califo rnia Institute of T e hnology Le ture 13 : V alidation Sp onso red b y Calte h's Provost O� e, E&AS Division, and IST T uesda y , Ma y 15, 2012 •

Outline The validation set • Mo del sele tion • Cross validation • Creato r: Y aser Abu-Mostafa - LFD Le ture 13 2/22 M � A L

V alidation versus regula rization In one fo rm o r another, over�t p enalt y out ( h ) = E in ( h ) + Regula rization: E over�t p enalt y out ( h ) = E in ( h ) + regula rization estimates this quantit y E V alidation: � �� over�t p enalt y out ( h ) in ( h ) + validation estimates this quantit y = E E Creato r: Y aser Abu-Mostafa - LFD Le ture 13 3/22 � �� M � A L

Analyzing the estimate On out-of-sample p oint ( x , y ) , the erro r is e ( h ( x ) , y ) Squa red erro r: Bina ry erro r: � � 2 h ( x ) − y � h ( x ) � = y � e ( h ( x ) , y ) out ( h ) va r e ( h ( x ) , y ) � � = E E � � = σ 2 Creato r: Y aser Abu-Mostafa - LFD Le ture 13 4/22 M � A L

F rom a p oint to a set e ( h ( x k ) , y k ) On a validation set ( x 1 , y 1 ) , · · · , ( x K , y K ) , the erro r is E v al ( h ) = 1 K � K e ( h ( x k ) , y k ) v al ( h ) out ( h ) k =1 K 1 � � � � � = = E E E E K va r va r e ( h ( x k ) , y k ) v al ( h ) k =1 K = σ 2 1 � � � � � = E K 2 v al ( h ) = E out ( h ) ± O K k =1 � 1 � Creato r: Y aser Abu-Mostafa - LFD Le ture 13 5/22 √ E K M � A L

PSfrag repla ements is tak en out of N 20 Given the data set D = ( x 1 , y 1 ) , · · · , ( x N , y N ) K 40 60 p oints validation p oints training 80 train v al 100 120 → N − K → K : Small K bad estimate � �� 0.05 D D out r 0.1 Erro La rge K ? � � 0.15 1 = ⇒ O √ e ted K 0.2 E in Exp Numb er of Data P oints, N − K 0.25 = ⇒ E Creato r: Y aser Abu-Mostafa - LFD Le ture 13 6/22 ← − K M � A L

is put ba k into N train ∪ D v al K train D − → D D ↓ ↓ ↓ ( N ) train N − K v al N K D La rge K = bad estimate! ( N − K ) v al = E v al ( g − ) g − D = ⇒ D = ⇒ g Rule of Thumb: ⇒ E D g v al ( g ) ( K ) Creato r: Y aser Abu-Mostafa - LFD Le ture 13 7/22 K = N 5 g E M � A L

Why `validation' is used to mak e lea rning hoi es v al top 3.5 If an estimate of E a�e ts lea rning: out D 3 the set is no longer a test set! 2.5 2 It b e omes a validation set Error 1.5 E 1 Early stopping out 0.5 E Creato r: Y aser Abu-Mostafa - LFD Le ture 13 8/22 in 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Epochs bottom M � A L

What's the di�eren e? T est set is unbiased; validation set has optimisti bias T w o hyp otheses and h 2 with out ( h 1 ) = E out ( h 2 ) = 0 . 5 Erro r estimates e 1 and e 2 unifo rm on [0 , 1] h 1 E Pi k with e = min( e 1 , e 2 ) e ) < 0 . 5 optimisti bias h ∈ { h 1 , h 2 } E ( Creato r: Y aser Abu-Mostafa - LFD Le ture 13 9/22 M � A L

Using D mo re than on e v al mo dels H 1 , . . . , H M train Use D to lea rn g − fo r ea h mo del train M v al H 1 H 2 · · · H M Evaluate g − using D : D v al pi k the b est m g 1 g 2 · · · gM v al ( g − ( H m ∗ , Em ∗ ) D m Pi k mo del m = m ∗ with smallest E m E 1 E 2 · · · EM | {z } E m = E m ); m = 1 , . . . , M D Creato r: Y aser Abu-Mostafa - LFD Le ture 13 11/22 gm ∗ M � A L

PSfrag repla ements The bias 0.8 W e sele ted the mo del H m ∗ using D v al 0.7 is a biased estimate of E v al ( g − out ( g − 0.6 Illustration: sele ting b et w een 2 mo dels out m ∗ ) m ∗ ) E 0.5 r Erro v al 5 15 25 � � e ted g − E m ∗ Exp � � g − V alidation Set Size, K E m ∗ Creato r: Y aser Abu-Mostafa - LFD Le ture 13 12/22 M � A L

Ho w mu h bias F o r M mo dels: H 1 , . . . , H M is used fo r �training� on the �nalists mo del : v al val = { g − m } D Ba k to Ho e�ding and V C! 1 , g − 2 , . . . , g − H out ( g − v al ( g − �� regula rization λ ea rly-stopping T ln M m ∗ ) ≤ E m ∗ ) + O E K Creato r: Y aser Abu-Mostafa - LFD Le ture 13 13/22 M � A L

Data ontamination Erro r estimates: E in , E test , E v al Contamination: Optimisti (de eptive) bias in estimating E out T raining set: totally ontaminated V alidation set: slightly ontaminated T est set: totally ` lean' Creato r: Y aser Abu-Mostafa - LFD Le ture 13 14/22 M � A L

The dilemma ab out K The follo wing hain of reasoning: out ( g ) ≈ out ( g − ) ≈ v al ( g − ) (small K ) (la rge K ) E E E highlights the dilemma in sele ting K : Can w e have K b oth small and la rge? Creato r: Y aser Abu-Mostafa - LFD Le ture 13 16/22 M � A L

Leave one out p oints fo r training, and 1 p oint fo r validation! �� N − 1 Final hyp othesis lea rned from D n is g − D n = ( x 1 , y 1 ) , . . . , ( x n − 1 , y n − 1 ) , ( x n , y n ) , ( x n +1 , y n +1 ) , . . . , ( x N , y N ) e n = E e ( g − v al ( g − n ross validation erro r: e n v = 1 n ) = n ( x n ) , y n ) N � E Creato r: Y aser Abu-Mostafa - LFD Le ture 13 17/22 N n =1 M � A L

PSfrag repla ements PSfrag repla ements PSfrag repla ements Illustration of ross validation 0.1 0.1 0.2 0.2 0.3 0.3 0.1 0.4 0.4 0.2 0.5 0.5 0.3 0.6 0.6 0.4 0.7 0.7 e 3 e 2 0.5 0.8 0.8 0.6 0.9 0.9 0.7 0.65 0.65 0.8 0.7 0.7 0.9 0.75 0.75 -0.2 0.8 0.8 e 1 0 0.85 0.85 0.2 0.9 0.9 0.4 0.95 0.95 0.6 1 1 y y y 0.8 1.05 1.05 1 1.1 1.1 1.2 1.15 1.15 e 1 + e 2 + e 3 ) v = 1 x x x Creato r: Y aser Abu-Mostafa - LFD Le ture 13 18/22 3 ( E M � A L

PSfrag repla ements PSfrag repla ements PSfrag repla ements Mo del sele tion using CV 0.1 0.1 0.2 0.2 0.3 0.3 0.1 0.4 0.4 0.2 0.5 0.5 0.3 0.6 0.6 0.4 0.7 0.7 e 3 e 2 0.5 0.8 0.8 0.6 0.9 0.9 Linea r: 0.7 0.65 0.65 0.8 0.7 0.7 PSfrag repla ements 0.9 PSfrag repla ements 0.75 PSfrag repla ements 0.75 -0.2 0.8 0.8 e 1 0 0.85 0.85 0.2 0.9 0.9 0.4 0.95 0.95 0.6 1 1 y y y 0.8 1.05 1.05 1 1.1 1.1 0.1 0.1 0.1 1.2 1.15 1.15 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 e 3 0.6 0.6 0.6 e 2 x x x e 1 0.7 0.7 0.7 Constant: 0.8 0.8 0.8 0.9 0.9 0.9 0.7 0.7 0.7 0.75 0.75 0.75 0.8 0.8 0.8 0.85 0.85 0.85 0.9 0.9 0.9 0.95 0.95 0.95 y y y 1 1 1 1.05 1.05 1.05 Creato r: Y aser Abu-Mostafa - LFD Le ture 13 19/22 1.1 1.1 1.1 x x x M � A L

PSfrag repla ements PSfrag repla ements Not Cross validation in a tion Digits lassi� ation task Di�erent erro rs 0.03 0.1 0.2 0.02 0.3 out 0.4 -8 -7 0.01 -6 v E -5 Not 1 Symmetry -4 -3 A verage Intensit y 5 10 15 20 -2 E in -1 0 # F eatures Used 1 E Creato r: Y aser Abu-Mostafa - LFD Le ture 13 20/22 (1 , x 1 , x 2 ) → (1 , x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 , x 3 1 , x 2 1 x 2 , . . . , x 5 1 , x 4 1 x 2 , x 3 1 x 2 2 , x 2 1 x 3 2 , x 1 x 4 2 , x 5 2 ) M � A L

PSfrag repla ements 0.05 0.1 0.15 0.2 0.25 0.3 0.35 PSfrag repla ements 0.4 The result -5 -4.5 without validation with validation -4 -3.5 0.05 0.1 -3 0.15 0.2 -2.5 0.25 -2 0.3 0.35 -1.5 -5 Symmetry Symmetry -4 -1 -3 A verage Intensit y -2 -0.5 -1 0 0 A verage Intensit y in = 0% out = 2 . 5% in = 0 . 8% out = 1 . 5% Creato r: Y aser Abu-Mostafa - LFD Le ture 13 21/22 E E E E M � A L

Leave mo re than one out Leave one out: training sessions on N − 1 p oints ea h Mo re p oints fo r validation? N train v alidate train D z }| { D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 D 9 D 10 training sessions on N − K p oints ea h 10-fold ross validation: K = N N K Creato r: Y aser Abu-Mostafa - LFD Le ture 13 22/22 10 M � A L

aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi - PowerPoint PPT Presentation

Review of Leture 12 Cho osing a regula rizer Regula rization aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi smo oth, simple h N ( h ) E most used: w eight dea y onst. in = ( h )

35 30 33 20 10 10 8 7 0 Feb 10 Aug 10 Feb 11 Aug 11 Feb 12 Aug 12 Feb 13 Aug 13

01 Meet ALICE ALICE A sset L imited I ncome C onstrained E mployed ALICE How we learned

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

Strategic Planning The Process Jul/Aug Online Consultation Survey, analysis 16 Aug Feedback to

Dr G D Pol Foundation YMT College of Management PRESENTATION SCHEDULE M M S SEM III Week Roll

Close Down Governance Plan A summary of the remaining key governance meetings from 31 Jul 17 to 31

Aug 09 2020 Aug 09 2020 body-fluids-and-electrolytes-a-programmed-presentation

Aug 13 2020 Aug 13 2020 three-little-pigs-story-powerpoint-presentation

Workshop Workshop on Large on Large- -Scale Disaster Recovery Scale Disaster Recovery i i

2019- 20 Mid-Lachlan Creeks Meeting Condobolin 5 November 2019 Wyangala Drought Inflows for 36

Aug 19 2020 Aug 19 2020 tea-time-tradition-presentation-and-recipes

Aug 24 2020 Aug 24 2020 cascading-style-sheets-separating-content-from-presentation

AWARDS 01 AUG UST 31 AUG UST 2015 AWARDS REF NO : PRO JEC T C O NT RAC T O R AMO UNT

Pe de stria n & Bic yc le Co unting Aug ust 22, 2018 Aug ust 23, 2018 L uke Hanson Sa

FI N A N CI A L H I G H L I G HT S 13 Aug 15 Aug 2016 2015 Var m m % Group sales

Regional policies for European level promotion 59th ERSA Congress LYON Aug. 27 Aug. 30, 2019

JUST THE MATHS SLIDES NUMBER 16.5 LAPLACE TRANSFORMS 5 (The Heaviside step function) by

Potential evaporation vs. available heat flux R N - H G Atm S 547 Lecture 11, Slide 1

Privilege and Responsibility Personal and Social Freedom in a

SECURITY AND PRIVACY OF MACHINE LEARNING Ian Goodfellow Staff Research Scientist Google Brain

Lecture 3.3: Normal subgroups Matthew Macauley Department of Mathematical Sciences Clemson

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted twitter.com User / Site Streams

CS 225 Data Structures Au August 26 Cl Classes es and Ref efer eren ence ce Variables

Adversarial Training and Provable Defenses: Bridging the Gap S 0

aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi - PowerPoint PPT Presentation

Review of Leture 12 Cho osing a regula rizer Regula rization aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi smo oth, simple h N ( h ) E most used: w eight dea y onst. in = ( h )

35 30 33 20 10 10 8 7 0 Feb 10 Aug 10 Feb 11 Aug 11 Feb 12 Aug 12 Feb 13 Aug 13

01 Meet ALICE ALICE A sset L imited I ncome C onstrained E mployed ALICE How we learned

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

Strategic Planning The Process Jul/Aug Online Consultation Survey, analysis 16 Aug Feedback to

Dr G D Pol Foundation YMT College of Management PRESENTATION SCHEDULE M M S SEM III Week Roll

Close Down Governance Plan A summary of the remaining key governance meetings from 31 Jul 17 to 31

Aug 09 2020 Aug 09 2020 body-fluids-and-electrolytes-a-programmed-presentation

Aug 13 2020 Aug 13 2020 three-little-pigs-story-powerpoint-presentation

Workshop Workshop on Large on Large- -Scale Disaster Recovery Scale Disaster Recovery i i

2019- 20 Mid-Lachlan Creeks Meeting Condobolin 5 November 2019 Wyangala Drought Inflows for 36

Aug 19 2020 Aug 19 2020 tea-time-tradition-presentation-and-recipes

Aug 24 2020 Aug 24 2020 cascading-style-sheets-separating-content-from-presentation

AWARDS 01 AUG UST 31 AUG UST 2015 AWARDS REF NO : PRO JEC T C O NT RAC T O R AMO UNT

Pe de stria n &amp; Bic yc le Co unting Aug ust 22, 2018 Aug ust 23, 2018 L uke Hanson Sa

FI N A N CI A L H I G H L I G HT S 13 Aug 15 Aug 2016 2015 Var m m % Group sales

Regional policies for European level promotion 59th ERSA Congress LYON Aug. 27 Aug. 30, 2019

JUST THE MATHS SLIDES NUMBER 16.5 LAPLACE TRANSFORMS 5 (The Heaviside step function) by

Potential evaporation vs. available heat flux R N - H G Atm S 547 Lecture 11, Slide 1

Privilege and Responsibility Personal and Social Freedom in a

SECURITY AND PRIVACY OF MACHINE LEARNING Ian Goodfellow Staff Research Scientist Google Brain

Lecture 3.3: Normal subgroups Matthew Macauley Department of Mathematical Sciences Clemson

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted twitter.com User / Site Streams

CS 225 Data Structures Au August 26 Cl Classes es and Ref efer eren ence ce Variables

Adversarial Training and Provable Defenses: Bridging the Gap S 0

Pe de stria n & Bic yc le Co unting Aug ust 22, 2018 Aug ust 23, 2018 L uke Hanson Sa