lecture 6
play

Lecture 6 Jan-Willem van de Meent Regression Curve Fitting - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 6 Jan-Willem van de Meent Regression Curve Fitting (according to XKCD) https://xkcd.com/2048/ Linear Regression Goal: Approximate points with a line or


  1. Unsupervised Machine Learning 
 and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 6 Jan-Willem van de Meent

  2. Regression

  3. Curve Fitting (according to XKCD) https://xkcd.com/2048/

  4. Linear Regression Goal: Approximate points with a line or hyper-surface

  5. Linear Regression Assume f is a linear combination of D features ε ∼ Norm ( 0, σ 2 ) For N points we write Learning : Estimate w Prediction : Estimate y’ given x’

  6. Error Measure: Sum of Squares Mean Squared Error (MSE): N E ( w ) = 1 X ( w T x n � y n ) 2 N n =1 = 1 N k Xw � y k 2 where — x 1 T — 2 3 2 y 1 T 3 — x 2 T — y 2 T 6 7 6 7 X = y = 6 7 6 7 4 5 4 5 . . . . . . — x NT — y NT

  7. Minimizing the Error E ( w ) = 1 N k Xw � y k 2 5 E ( w ) = 2 N X T ( Xw � y ) = 0 2 X T Xw = X T y w = X † y where X † = ( X T X ) � 1 X T is the ’pseudo-inverse’ of X

  8. Minimizing the Error E ( w ) = 1 N k Xw � y k 2 5 E ( w ) = 2 N X T ( Xw � y ) = 0 2 X T Xw = X T y w = X † y where X † = ( X T X ) � 1 X T is the ’pseudo-inverse’ of X Matrix Cookbook (on course website)

  9. Ordinary Least Squares Construct matrix X and the vector y from the dataset { ( x 1 , y 1 ) , x 2 , y 2 ) , . . . , ( x N , y N ) } (each x includes x 0 = 1) as follows:  — x T   y T  1 — 1 — x T y T 2 —     2 X = y =         . . . . . . — x T y T N — N Compute X † = ( X T X ) − 1 X T Return w = X † y

  10. Basis function regression Linear regression Basis function regression For N samples Polynomial regression

  11. Polynomial Regression M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  12. Polynomial Regression Underfit M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  13. Polynomial Regression M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x Overfit M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  14. Regularization L 2 regularization (ridge regression) minimizes: E ( w ) = 1 N k Xw � y k 2 + λ k w k 2 where λ � 0 and k w k 2 = w T w � k k L 1 regularization (LASSO) minimizes: E ( w ) = 1 N k Xw � y k 2 + λ | w | 1 D where λ � 0 and | w | 1 = P | ω i | i =1

  15. Regularization

  16. Regularization L 2: closed form solution w = ( X T X + λ I ) � 1 X T y L 1: No closed form solution. Use quadratic programming: minimize k Xw � y k 2 k w k 1  s s . t .

  17. Maximum Likelihood

  18. Regression: Probabilistic Interpretation ? What is the probability

  19. Regression: Probabilistic Interpretation Least Squares 
 Objective Likelihood

  20. Maximum Likelihood Least Squares 
 Objective Log-Likelihood Maximizing the likelihood minimizes the sum of squares

  21. Maximum a Posteriori

  22. Regression with Priors Can we maximize ? (i.e. can we perform MAP estimation?)

  23. Regression with Priors From Bayes Rule

  24. Maximum a Posteriori Maximum a Posteriori is Equivalent to Ridge Regression

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend