Scribing and ! ) & Scribes Sarthah Hao ( today Thanks ; - - PowerPoint PPT Presentation

▶

Apr 09, 2023 39 likes •203 views

Homework Scribing and ! ) & Scribes Sarthah Hao ( today Thanks ; Friday Available Dec 12 ; ) ( Homework I Fri Due 26 - LATEX template Scribing Schedule + - Notes Lecture 1 - Distribution Prior Posterior Recap

SLIDE 1 Homework and Scribing Scribes today ; Sarthah & Hao ( Thanks ! ) Available Friday 12 Dec ;

Homework

I ( Due Fri 26 )

Scribing

Schedule + LATEX template

Notes

Lecture 1

SLIDE 2 Distribution Recap ! Prior and Posterior Example : Biased Coin tl :p , Density Prior O Be ,,a( a

,p#

trim

)

Likelihood

~ Binom ;a( (

}

PLO 's "m)=Beta( O;o+a,p+b ) a=§ , ,yn b=N . a ( Sufficient Statistics ) Prior I Trial to Trials loco Trials Yi = y , :b :( 0,1 ,o,l , 'll ) a

b=3Z8 PLO )= p( Oly , ) = p( O ;y , :b ) p( Oiyiiiooo ) Betalo ;2i2 ) Betalo ;2 , 3 ) Betalb ; 6,4 ) =Beta( 0,674,330)

SLIDE 3 Bayesian Regression distribution

functions

' i '

diggllloise

dist yn = fcxn ) + En fnpc f) En~ PIE n=1 , ... ,N Linear Basis Function

fix )

f- ' ⇒ of

x D f ( In ) := It In = Iwaka fCInl÷ Tvtocxn ) D= ' D

I=×w

×=tI÷El}

t.ae#=t9tEIII1NX1NxDpxl

SLIDE 4 Example : Polynomial Basis yn = Wttcfkn ) t En w°~p( I ) En~ PIE ) n= I , ... , N 4 Reduces to linear reg . when ¢ is identity Polynomial Basis ¢d(×n ) = xD Prior

in implies prior

functions f

SLIDE 5 Over fitting and Under fitting

Under

fitting Residuals are strongly correlated Over fitting Poor generalization

SLIDE 6 Ridge Regression ( LZ Regularization ) Objective :

[ TEI

= IF cyn . wto .it +

ia¥

, wa ' I =

I = 10

SLIDE 7 Ridge Regression : Probabilistic Interpretation yn = It ¢( In 1 + En

Wd~

Normal (

s ) does not depend yn ~ Normal ( uiiofkn ) , 6)

En

Normal ( 0,6 )

) Maximum a Posteriori Est.mat@l5tkPH5lPl5largmwaxpcwl51-arggnaxpC5.w ) = arynfax log piyii ) µ " ( log is monotonic log pl 5,5 ) =

E. log plyntw

) + log pews can ignore Normal ( x ; M , 6) log Phil = £ ,

logfntrs

I ¥E

= ⇐ ae

'M%2 §e , log pcynlw ) = § , legato .)

{

( Yn . wtocxn ) )2 e-

SLIDE 8 Ridge Regression : Probabilistic Interpretation

[TEl=n§

.cyn

. wto.si +

,÷

Maximum a Posteriori Estimation yn =Itoki ) t En Wd~

Norm

( o , S ) En~µorm( 0,6 ) µ Depends log pl 5,5 ) =

E. log

plynliu ) + log pw ) any an s and 6 N D y =

tzsf

( yn

w¢( In ))2

. ztz § woi t.co/nst n =L =L 62 argngin Lridtelw ) = arggmax leg piyii ) I = I

SLIDE 9 Ridge Regression s yn = it Tcfkn ) t En Wd ~ Norm ( 0,5 ) En~µorm( 0,6 ) argugax log piwly ) 9=91--0 7=5--1 7=5--10 IEN I 6 →

(

precise

bservations

) s → as ( uninformative prior ) E [ En2]= 10 # [ woi )

SLIDE 10 Posterior Predictive Distribution Yi yz Y } 3h55 Small °

(

00 7=5 y # SZ °

(

y #

Large

A 000

(

y # y* Posterior y* ~ pcy 1 flx* ) ) f ~ p ( f 1 Yi :5 ) given previous

bservations

SLIDE 11 Posterior Predictive Distribution f* . | y , yz Y } 5h55 l , Small °

7=91 f

,y

" l

co 1 , y # , f * 1 Large 9

y # y* P(y*ly , :n ) = df pcytlfspcfly , :X , ) f*= E[ 4*14 , :µ ) ( full

distribution

: Gaussian Process ) ( kernel Lidge regression )

SLIDE 12 Regression with Kennels : Motivation Idea : What if we used lots

features D >> N ?

yn = wtokn ) + En 5 = TQ 5 + E Want to calculate , * " Rabab * µ [%¥%

;)

> E[

4*14=5

) = |dy*d5 y* pcytlwlpiwl 51 When posterior Gaussian : to L

|do F- [ 4*16=0 ) pail 51 E[ T 14=5 ] d. = angjnaxplw 15 ) =

|dI

wt¢(x* ) pc

515

) ' , ( take as given for now ) = # [ WTI § =y ] ¢k* )

SLIDE 13 Regression with Kennels : Motivation Idea : What if we used lots

features D >> N ?

yn = wtokn ) + En 5 = TQ 5 + E an

nasa

. * , t.FI?.) f* = E[ 4*14=5 ] =

6*+46*1

Solve by tuwmf I * = ungngnax log pc 5,6 ) y derivative wrt I = angmoax 1 5- Easily

E) + iwtw d

=z#tCy

. Que 't

)+7ws4→

It5=#oI+i)w*

SLIDE 14 Regression with Kennels : Motivation Idea : What if we used lots

features D >> N ?

yn = wtokn ) + En 5 = TQ 5 + E DENY Dgt 'ya

" pnxbab pµ |%¥,¥,.) d

01+5=0+01+71

)w' * → ui* = ( It #

t.IT#ty

Ridge regression : | Alternative invert Dxb matrix formal at '~ b×µ µ xD Dxt ' 0(µ3 ) ( OITOI + II ) "0It = OIT (0101++91) " is better than O(p3 ) Invert N×N matrix when D > > N

SLIDE 15 The Kernel Trick OIOIT = ¢( I. stock ) ... CKIMQKI ) = K

| quit.io#i...ocxIiocxxi/

Idea : use kernel function h( xi , Ej ) := 4655445 ;) Expected value f* = ¢(I*stw*

# ( 1<+91515 =(¢(E*st¢t ) ( 1<+9+-515 Never need = In hcE* , # ( k+I±sInym } to computed

SLIDE 16 kernel Ridge Regression f * = li ( k + Its ' ' g 5=14 ' , ' ' ' , Yn ) Knm = be ( In ,Im ) ten = lr ( I* , In )

,p#

En

,÷

Norm

,*y*

4*14=5

515

,y