statistical machine learning
play

Statistical Machine Learning Lecture 08: Regression Kristian - PowerPoint PPT Presentation

Statistical Machine Learning Lecture 08: Regression Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters Statistical Machine Learning Summer Term 2020 1 / 55 Todays Objectives Make you understand


  1. Statistical Machine Learning Lecture 08: Regression Kristian Kersting TU Darmstadt Summer Term 2020 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 1 / 55

  2. Today’s Objectives Make you understand how to learn a continuous function Covered Topics Linear Regression and its interpretations What is overfitting? Deriving Linear Regression from Maximum Likelihood Estimation Bayesian Linear Regression K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 2 / 55

  3. Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 3 / 55

  4. 1. Introduction to Linear Regression Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 4 / 55

  5. 1. Introduction to Linear Regression Reminder Our task is to learn a mapping f from input to output f : I → O , y = f ( x ; θ ) Input: x ∈ I (images, text, sensor measurements, ...) Output: y ∈ O Parameters: θ ∈ Θ (what needs to be “learned”) Regression Learn a mapping into a continuous space O = R O = R 3 . . . K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 5 / 55

  6. 1. Introduction to Linear Regression Motivation You want to predict the torques of a robot arm y = I ¨ q − µ ˙ q + mlg sin ( q ) � ¨ ˙ � � � ⊺ = q q sin( q ) I − µ mlg = φ ( x ) ⊺ θ Can we do this with a data set? � � � D = ( x i , y i ) � i = 1 · · · n � A linear regression problem! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 6 / 55

  7. 1. Introduction to Linear Regression Least Squares Linear Regression We are given pairs of training data points and associated function values ( x i , y i ) � � x 1 ∈ R d , . . . , x n X = Y = { y 1 ∈ R , . . . , y n } Note: here we only do the case y i ∈ R . In general y i can have more than one dimension, i.e., y i ∈ R f for some positive f Start with linear regressor x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n One linear equation for each training data point/label pair Exactly the same basic setup as for least-squares classification! Only the values are continuous K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 7 / 55

  8. 1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

  9. 1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 Step 2 : Rewrite ˆ x ⊺ i ˆ w = y i ∀ i = 1 , . . . , n K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

  10. 1. Introduction to Linear Regression Least Squares Linear Regression x ⊺ i w + w 0 = y i ∀ i = 1 , . . . , n Step 1 : Define � x i � w � � ˆ ˆ x i = w = 1 w 0 Step 2 : Rewrite ˆ x ⊺ i ˆ w = y i ∀ i = 1 , . . . , n Step 3 : Matrix-vector notation X ⊺ ˆ ˆ w = y where ˆ X = [ˆ x 1 , . . . , ˆ x n ] (each ˆ x i is a vector) and y = [ y 1 , . . . , y n ] ⊺ K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 8 / 55

  11. 1. Introduction to Linear Regression Least Squares Linear Regression Step 4 : Find the least squares solution � � 2 � ˆ � � ˆ X ⊺ w − y w = arg min � w � � 2 � ˆ � � ∇ w X ⊺ w − y = 0 � X ⊺ � − 1 ˆ � X ˆ ˆ ˆ w = Xy A closed form solution! K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 9 / 55

  12. 1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

  13. 1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? The inverse is a R D × D matrix � D 3 � Naive inversion takes O , but better methods exist What can we do if the input dimension D is too large? K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

  14. 1. Introduction to Linear Regression Least Squares Linear Regression X ⊺ � − 1 ˆ � X ˆ ˆ w = ˆ Xy Where is the costly part of this computation? The inverse is a R D × D matrix � D 3 � Naive inversion takes O , but better methods exist What can we do if the input dimension D is too large? Gradient descent Work with fewer dimensions K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 10 / 55

  15. 1. Introduction to Linear Regression Mechanical Interpretation K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 11 / 55

  16. 1. Introduction to Linear Regression Geometric Interpretation Predicted outputs are Linear Combinations of Features! Samples are projected in this Feature Space K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 12 / 55

  17. 1. Introduction to Linear Regression Polynomial Regression How can we fit arbitrary polynomials using least-squares regression? We introduce a feature transformation as before y ( x ) = w ⊺ φ ( x ) M � = w i φ i ( x ) i = 0 Assume φ 0 ( x ) = 1 φ i ( . ) are called the basis functions Still a linear model in the parameters w E.g. fitting a cubic polynomial � 1 , x , x 2 , x 3 � ⊺ φ ( x ) = K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 13 / 55

  18. � ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 0 (constant value) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 14 / 55

  19. � ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 1 (line) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 15 / 55

  20. � ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 3 (cubic) ✂☎✄✝✆ 1 0 −1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 16 / 55

  21. � ✁ 1. Introduction to Linear Regression Polynomial Regression Polynomial of degree 9 ✂☎✄✝✆ 1 0 −1 0 1 Massive overfitting K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 17 / 55

  22. 2. Maximum Likelihood Approach to Regression Outline 1. Introduction to Linear Regression 2. Maximum Likelihood Approach to Regression 3. Bayesian Linear Regression 4. Wrap-Up K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 18 / 55

  23. 2. Maximum Likelihood Approach to Regression Overfitting Relatively little data leads to Enough data leads to a good overfitting estimate 1 1 0 0 − 1 − 1 0 1 0 1 K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 19 / 55

  24. 2. Maximum Likelihood Approach to Regression Probabilistic Regression Assumption 1 : Our target function values are generated by adding noise to the function estimate y = f ( x , w ) + ǫ y - target function value; f - regression function; x - input value; w - weights or parameters; ǫ - noise K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 55

  25. 2. Maximum Likelihood Approach to Regression Probabilistic Regression Assumption 1 : Our target function values are generated by adding noise to the function estimate y = f ( x , w ) + ǫ y - target function value; f - regression function; x - input value; w - weights or parameters; ǫ - noise Assumption 2 : The noise is a random variable that is Gaussian distributed � 0 , β − 1 � ǫ ∼ N � � � � � � f ( x , w ) , β − 1 � � � = N p y � x , w , β y f ( x , w ) is the mean; β − 1 is the variance ( β is the precision) Note that y is now a random variable with underlying probability � � � � distribution p y � x , w , β K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 20 / 55

  26. 2. Maximum Likelihood Approach to Regression Probabilistic Regression K. Kersting based on Slides from J. Peters · Statistical Machine Learning · Summer Term 2020 21 / 55

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend