least mean squares regression
play

Least Mean Squares Regression Machine Learning 1 Least Squares - PowerPoint PPT Presentation

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent 2 Least Squares Method for regression Examples The


  1. Least Mean Squares Regression Machine Learning 1

  2. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 2

  3. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 3

  4. What’s the mileage? Suppose we want to predict the mileage of a car from its weight and age What we want: A function that can Weight Age (x 100 lb) (years) Mileage predict mileage x 1 x 2 using x 1 and x 2 31.5 6 21 36.2 2 25 43.1 0 18 27.6 2 30 4

  5. Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 5

  6. Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Parameters of the model Also called weights Collectively, a vector Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 6

  7. Linear regression: The strategy For simplicity, we will assume • Inputs are vectors: 𝐲 ∈ ℜ ! that the first feature is always 1. • Outputs are real numbers: 𝑧 ∈ ℜ 1 𝑦 ! 𝒚 = ⋮ • We have a training set 𝑦 " D = { 𝐲 ! , 𝑧 ! , 𝐲 " , 𝑧 " , ⋯ } This lets makes notation easier • We want to approximate 𝑧 as 𝑧 = 𝑥 ! + 𝑥 " 𝑦 " + ⋯ + 𝑥 # 𝑦 # 𝑧 = 𝐱 $ 𝐲 𝐱 is the learned weight vector in ℜ # 7

  8. Examples y x 1 One dimensional input 8

  9. Examples y Predict using y = w 1 + w 2 x 2 x 1 One dimensional input 9

  10. Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input 10

  11. Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input Two dimensional input Predict using y = w 1 + w 2 x 2 +w 3 x 3 11

  12. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 12

  13. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 13

  14. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 14

  15. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be One strategy for learning: Find the w with least cost on this data 15

  16. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 16

  17. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 17

  18. Least Mean Squares (LMS) Regression Learning: minimizing mean squared error 18

  19. Least Mean Squares (LMS) Regression Learning: minimizing mean squared error Different strategies exist for learning by optimization • Gradient descent is a popular algorithm (For this particular minimization objective, there is also an analytical solution. No need for gradient descent) 19

  20. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 20

  21. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 21

  22. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 22

  23. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 23

  24. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 24

  25. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 25

  26. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 26

  27. We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 27

  28. We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. What is the gradient of J? t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 28

  29. We are trying to minimize Gradient of the cost • The gradient is of the form • Remember that w is a vector with d elements – w = [w 1 , w 2 , w 3 , ! w j , ! , w d ] 29

  30. We are trying to minimize Gradient of the cost • The gradient is of the form 30

  31. We are trying to minimize Gradient of the cost • The gradient is of the form 31

  32. We are trying to minimize Gradient of the cost • The gradient is of the form 32

  33. We are trying to minimize Gradient of the cost • The gradient is of the form 33

  34. We are trying to minimize Gradient of the cost • The gradient is of the form 34

  35. We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector 35

  36. We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector Sum of Error × Input 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend