linear regression gradient descent
play

Linear Regression & Gradient Descent Many slides attributable - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Linear Regression & Gradient Descent Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten,


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Linear Regression & Gradient Descent Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 1

  2. LR & GD Unit Objectives • Exact solutions of least squares • 1D case without bias • 1D case with bias • General case • Gradient descent for least squares Mike Hughes - Tufts COMP 135 - Spring 2019 3

  3. What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4

  4. Task: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Spring 2019 5

  5. Visualizing errors Mike Hughes - Tufts COMP 135 - Spring 2019 6

  6. Regression: Evaluation Metrics N 1 • mean squared error X y n ) 2 ( y n − ˆ N n =1 N • mean absolute error 1 X | y n − ˆ y n | N n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 7

  7. Linear Regression Parameters: w = [ w 1 , w 2 , . . . w f . . . w F ] weight vector b bias scalar Prediction: F X y ( x i ) , ˆ w f x if + b f =1 Training: find weights and bias that minimize error Mike Hughes - Tufts COMP 135 - Spring 2019 8

  8. Sales vs. Ad Budgets Mike Hughes - Tufts COMP 135 - Spring 2019 9

  9. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 10

  10. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist! x = mean( x 1 , . . . x N ) ¯ With only one feature (F=1): y = mean( y 1 , . . . y N ) ¯ P N n =1 ( x n − ¯ x )( y n − ¯ y ) b = ¯ y − w ¯ x w = P N n =1 ( x n − ¯ x ) 2 Where does this come from? Mike Hughes - Tufts COMP 135 - Spring 2019 11

  11. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist!   x 11 . . . x 1 F 1 x 21 . . . x 2 F 1 ˜   X = With many features (F >= 1 ):   . . .   x N 1 . . . x NF 1 [ w 1 . . . w F b ] T = ( ˜ X T ˜ X ) − 1 ˜ X T y Where does this come from? Mike Hughes - Tufts COMP 135 - Spring 2019 12

  12. Derivation Notes http://www.cs.tufts.edu/comp/135/2019s/notes /day03_linear_regression.pdf Mike Hughes - Tufts COMP 135 - Spring 2019 13

  13. When does the Least Squares estimator exist? • Fewer examples than features (N < F) Infinitely many solutions! • Same number of examples and features (N=F) Optimum exists if X is full rank • More examples than features (N > F) Optimum exists if X is full rank Mike Hughes - Tufts COMP 135 - Spring 2019 14

  14. More compact notation θ = [ b w 1 w 2 . . . w F ] x n = [1 x n 1 x n 2 . . . x nF ] ˜ y ( x n , θ ) = θ T ˜ ˆ x n N X y ( x n , θ )) 2 J ( θ ) , ( y n − ˆ n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 15

  15. Idea: Optimize via small steps Mike Hughes - Tufts COMP 135 - Spring 2019 16

  16. Derivatives point uphill Mike Hughes - Tufts COMP 135 - Spring 2019 17

  17. To minimize, go downhill Step in the opposite direction of the derivative Mike Hughes - Tufts COMP 135 - Spring 2019 18

  18. Steepest descent algorithm input: initial θ ∈ R input: step size α ∈ R + while not converged: θ ← θ − α d d θ J ( θ ) Mike Hughes - Tufts COMP 135 - Spring 2019 19

  19. Steepest descent algorithm input: initial θ ∈ R input: step size α ∈ R + while not converged: θ ← θ − α d d θ J ( θ ) Mike Hughes - Tufts COMP 135 - Spring 2019 20

  20. How to set step size? Mike Hughes - Tufts COMP 135 - Spring 2019 21

  21. How to set step size? • Simple and usually effective: pick small constant α = 0 . 01 • Improve: decay over iterations α t = C α t = ( C + t ) − 0 . 9 t • Improve: Line search for best value at each step Mike Hughes - Tufts COMP 135 - Spring 2019 22

  22. How to assess convergence? • Ideal: stop when derivative equals zero • Practical heuristics: stop when … • when change in loss becomes small | J ( ✓ t ) − J ( ✓ t − 1 ) | < ✏ • when step size is indistinguishable from zero ↵ | d d ✓ J ( ✓ ) | < ✏ Mike Hughes - Tufts COMP 135 - Spring 2019 23

  23. Visualizing the cost function “Level set” contours : all points with same function value Mike Hughes - Tufts COMP 135 - Spring 2019 24

  24. In 2D parameter space gradient = vector of partial derivatives Mike Hughes - Tufts COMP 135 - Spring 2019 25

  25. Gradient Descent DEMO https://github.com/tufts-ml-courses/comp135-19s- assignments/blob/master/labs/GradientDescentDemo. ipynb Mike Hughes - Tufts COMP 135 - Spring 2019 26

  26. Fitting a line isn’t always ideal Mike Hughes - Tufts COMP 135 - Spring 2019 27

  27. Can fit linear functions to nonlinear features A nonlinear function of x: y ( x i ) = θ 0 + θ 1 x i + θ 2 x 2 i + θ 3 x 3 ˆ i Can be written as a linear function of φ ( x i ) = [ x i x 2 x 3 i ] i y ( φ ( x i )) = θ 0 + θ 1 φ ( x i ) 1 + θ 2 φ ( x i ) 2 + θ 3 φ ( x i ) 3 ˆ “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw data Mike Hughes - Tufts COMP 135 - Spring 2019 28

  28. What feature transform to use? • Anything that works for your data! • sin / cos for periodic data • polynomials for high-order dependencies φ ( x i ) = [ x i x 2 x 3 i ] i • interactions between feature dimensions φ ( x i ) = [ x i 1 x i 2 x i 3 x i 4 ] • Many other choices possible Mike Hughes - Tufts COMP 135 - Spring 2019 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend