over fitting
play

Over fitting distribution functions over Bayesian Regression / - PowerPoint PPT Presentation

Over fitting Under fitting and - Under fitting Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist ' f. fnpc f) fcxn ) - ,N En n=1 + En~ PIE yn = ... , of Basis Function


  1. Over fitting Under fitting and - Under fitting Over fitting

  2. distribution functions over Bayesian Regression / " ' i diggllloise dist ' f. fnpc f) fcxn ) - ,N En n=1 + En~ PIE yn = ... , ⇒ of Basis Function Linear • a fix f- ) ' • . • • x x I=×w = Iwaka D ×=tI÷El} := fCInl÷ f ( In ) Tvtocxn ) It In D= ' D t.ae#=t9tEIII1NX1NxDpxl - "

  3. Polynomial Example Basis : Wttcfkn ) w°~p( I N ) En ) I t En~ PIE yn n= = ... , , 4 Reduces linear when ¢ to identity is reg . Polynomial Basis xD ¢d(×n ) = Prior in implies on f functions prior on

  4. Ridge Regression ( Regularization ) LZ ia¥ TEI [ Objective : = IF cyn .it ' wto + wa . , I I=i I = o 10 =

  5. Often Ridge Probabilistic Regression Interpretation : ETE - iii. cyn-wtoc.int ) ' to - wa Maximum Posteriori Estimation a Norm ( 0,62 ) Norm ( o , 5) IT ) En Wd En t ~ yn - = pin log 15 ) pin 15 ) angry ax = argument - ¥155 Pig , hi log ) = anginwax ) tlogpcw log piylw ) = anginwax

  6. often Ridge Probabilistic Regression Interpretation : £ rider ④ I Lte ) ' into btw ¢⇒ ) , yn + - ,n= - . n - . . . log¥e Maximum Posteriori Estimation a Norm ( 0,62 ) Norm ( o , 5) IT ) En Wd En t ~ yn - = Norm ( Bto w--oThi ( In ) 62 ) yn ~ , argwfaxpcw-iys-a-gwyaxlogplyiwstlogp.tw ) ' ku ) ) log piyiwi In yn - = I ( log pins log =

  7. Ridge Regression I Tolka ) S Norm Wdr Norinco ,6 ) ( o S ) En Enn t yn = , log piety ) argue ax D= % D= D= , - = no = o - I observations ) 6 precise → o uninformative prior ) C s → as

  8. Plf 'S ) gf Posterior Predictive - I ) ↳ b 5 Ys Ys Yi ya ° Small D O ooo Distribution # y ° a ooo µ O Large 0 ooo A # y x* = Idf ply # Posterior [ - I f) 't 't ' ly ) Predictive ply pcfly ) previous given y* observations far new

  9. Posterior Predictive Distribution ooo € o hi * f I . b 5 I 43 Ss y , ya ,*y* , ° Small o o 00 I 9--91 , " O I o CO l # y * f , O I Large 0 A # y y* . I f Incorporate about Idea uncertainty predictions in :

  10. Processes Gaussian Informal Definition y Nonparametric " " of : data determined from Formal Nonparametric Distribution View Functions : on f ) fly function PC Mean ~ ° In , :w x ) µ( h( J function ... * • ' ) Covariance - × ,× • . fcx ) • h ) . F GPC µ . ~ YnIF=f~ • . , ( f I Norm ,6 ) ) × Likelihood - prior Posterior ) p( y f) fly , I pcf ) PC : = , :µ in , PC y ) :n ; likelihood Marginal

  11. Processes Gaussian Practical View : Generalization Multivariate Normal of i= 1£ Prior ) pitch I ) function blini flu µ on at values finite [ lil :-. In , Em ) set of points m t ' E " - fetal E ( pi Norm E ) . ,FEmD - n , , . . Norml fu 6 ) Yul E. fan , x Pl5* Predictive pc j*Iy ) : = pigs

  12. Distributions Joint Igf Values Function an Norm I lqY¥ , 1,1ha .*x N M N X N N El x acx.xx.gg ) I I ¥1 - ' ,X* I h ( ) be CX X x , M N M MX M Xl X = ( ME ) - Nord # , pika ) ) x ) pic t.lu#xlhcx:xmtotD . . , . hcxt.xt.fm?:ii...hcx.?.xnyhlxE,I - ( HIM . ,µEn , thx " thx 'm , In ) , ) - . . , . . . I E t E hit E ,X ) ) Now ( pic x ) = - , , kk ,Xlt6I ) Nonmlpikl 4- E ) Norm I EI ~ o n , 't ) I s hfxx tell ,XltdI Y pelxl *

  13. lil Normals Multivariate " Properties of Gaussian Probability for variables density jointly RN NXM N ZE N MXN til , 1¥81 ) :/ pie , f) Norm = perm M M xN MXM ur Norm , A ) Marginals poet a- = Norm ( p ;D B ) pcp 's = N I XX I x N next ' Norm ' If ( I pip A- a ) Conditional 1/5 ; b- E ) = - - , c ) ' A- CT B - Mx M MXN NXM N Xxl

  14. Distribution Predictive Computing the a- B a- I A tuft :¥¥÷D "i¥lK¥ :& * : " B b- D c Predictive values new on ( C Norm ( ' I C ) 4*1 b- 4 a- I CTA B " CT A- nu - - . , - kfxtxlfhlx ,xlt62IJ Norm ( pix ' II - MIN ) 't 't ~ , hlx.xlt62IThk.x.tl ) k(x4x* ) ( lek - \

  15. Distribution Predictive Computing the Y a- B I A lek ,x* , he CXX ) * In at Norm ( / t delxtt.NL#x*)t62IH ° ) I ( 4*14=5 o , B b- D C Predictive values new on ' . nu Norm ( O ~ C ) y B CTA " - , Norm ( let XIX ) ( klx ' ) ' 5 ,x)taI ~ hlx.xlt62IThk.x.tl ) k(x*,x* ) ( tix - - yuk ) 5=4 Assume pix In ) practice : o = \

  16. Processes Gaussian Practical View : Generalization Multivariate Normal of i= 1£ Prior ) pitch I ) function blini flu µ on at values finite [ lil :-. In , Em ) set of points m t ' E " - fetal E ( pi Norm E ) . ,FEmD - n , , . . Norml fu 6 ) Yul E. fan , x Can P'5t compute Predictive pcyttly ) linear ← ; = using pig ) algebra

  17. Wr have does OICXIW not Gaussian Processes Interpretation diagonal be i to ( ) Bayesian Ridge Regression Features with N XD DXI Norm ( Norm ( ) Y )Efw ELY ) EI n m , tell ) OI ( Xo ) OIK ] ) mo = = = Cov [ 4 ] Cov I w ] IK ) I 62 t = Covlw ] # I Xlt IN 62 I + = N xD Dxb µ DX kk ④ G) So # HT ,Nt5I 6- I t = = - . - hkh.im ) CENTS , Ym ] fun =¢ dem ) Cov .

  18. OIKIW does have not Gaussian Processes Interpretation be diagonal i to ( ) Bayesian Ridge Regression with Features Norm ( mo , Norm ( Y ) W EI n - , Equivalent Process Gaussian GPG.ucxt.hk.lt ) Norm ( I ) Y FIX ) Fr 62 n , IO ( x ) So KIT kcx OI ,x ) MX ) = mo = kernel Use Computational Advantage trick to : high feat dimensional evaluate products spaces in inner -

  19. Gaussian Ridge Regression Processes us GPC.mx/,hlx.Xl ) Norm ( I ) 4 FIX ) Fr 62 n , 't 14=51 IO ( x ) So # KIT kcx OI ,x ) MX ) CID mo = = to N I ' S Derive Homework this in ; titty :X ) ( let x. x ) 0447=011×9 Ely his = # KIT ( ) # Cx )Ttg÷I5' y Ik Complexity : + III ) " OI Htt ) ( OI KY # OCD ) It Cxlty K ) = Posterior Ridge Regression to equivalent Conclusion : mean

  20. Kennel Regularization Regression - based in 008 ¥1 ⑧ ⑧ Ifl ooo € ① o hi I . 3h55 , Ys ya , be yz9uy5 Small I y* " O I I D= o co i ' 15.1 # y I O Large 0 A # y y* . .de/0eCXm In In In kernel Choice of = § Old ( ) So ht > ) ) implies functor , choice of D e

  21. Kernel Hyper parameters Choosing Rasmussen Carl Source : exp (-1×2*3) h( ' ) Squared Exponential x : x. = - l stronger Large regularization means

  22. kernels Matern Idea functions h times differentiable prior over : Rainforth , Bessel PhD Thesis Tom function Source : - k=Lu - D I 'll ) 6f }l÷ , KfM2q leu ( ' ) E HE ,I 0=312,512 , - = " . - Gamma function

  23. Kernels Basis Duvenaud PhD David Source Thesis : ,

  24. Combinations kernels of Duvenaud PhD David Source Thesis : , ' ) ' ) Sumi ,E I be ,( I. thzk h( In ,I = . ' ) hz( I ,I ' ) Product ,k I be ,( I. hl In : . =

  25. Gaussian Processes Limitations : Limitation Computational I Complexity i NXM µxN FIT .fr/Vonm(k(xIx)lklx.xlto2IYy hlx.xlt62IThk.x.tl ) k(x4x* ) ( lek - NXN NXM MAN MXN Limitation Dimensionality 2 : Rib best Gaussian work when Xue processes ( f with assumptions Intuition smoothness I to D f : dimensionality ) high D because helpful of not curse in

  26. Bayesian Regression Processes Gaussian and Bayesian Regression ( XP f pcfly ) ; = angfmae Ridge MAP estimation regression is - Gaussian with weights prior on variances D= Regularization increases observation with 67 - weight decreases ' noise and with Idf Gaussian Processes ply 'T f) # plfly ) : y r . Provides estimate of uncertainty predictions in - Posterior equivalent is ridge to regression mean - > ) OCD ) complexity vs -

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend