linear regression
play

Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Office hour Chen Gao Shih-Yang Su Feedback (Thanks!) Notation? More descriptive slides? Video/audio recording? TA hours


  1. Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative • Office hour • Chen Gao • Shih-Yang Su • Feedback (Thanks!) • Notation? • More descriptive slides? • Video/audio recording? • TA hours (uniformly spread over the week)?

  3. Recap: Machine learning algorithms Supervised Unsupervised Learning Learning Discrete Classification Clustering Dimensionality Continuous Regression reduction

  4. Recap: Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

  5. Recap: Instance/Memory-based Learning 1. A distance metric • Continuous? Discrete? PDF? Gene data? Learn the metric? 2. How many nearby neighbors to look at? • 1? 3? 5? 15? 3. A weighting function (optional) • Closer neighbors matter more 4. How to fit with the local points? • Kernel regression Slide credit: Carlos Guestrin

  6. Validation set • Spliting training set: A fake test set to tune hyper-parameters Slide credit: CS231 @ Stanford

  7. Cross-validation • 5-fold cross-validation -> split the training data into 5 equal folds • 4 of them for training and 1 for validation Slide credit: CS231 @ Stanford

  8. Things to remember • Supervised Learning • Training/testing data; classification/regression; Hypothesis • k-NN • Simplest learning algorithm • With sufficient data, very hard to beat “strawman” approach • Kernel regression/classification • Set k to n (number of data points) and chose kernel width • Smoother than k-NN • Problems with k-NN • Curse of dimensionality • Not robust to irrelevant features • Slow NN search: must remember (very large) dataset for prediction

  9. Today’s plan : Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

  10. Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

  11. Regression Training set real-valued output Learning Algorithm 𝑦 𝑧 ℎ Size of house Hypothesis Estimated price

  12. House pricing prediction Price ($) in 1000’s 400 300 200 100 2000 500 1000 1500 2500 Size in feet^2

  13. Training set Size in feet^2 (x) Price ($) in 1000’s (y) 2104 460 1416 232 1534 315 𝑛 = 47 852 178 … … • Notation: • 𝑛 = Number of training examples • 𝑦 = Input variable / features Examples: 𝑦 (1) = 2104 • 𝑧 = Output variable / target variable 𝑦 (2) = 1416 • ( 𝑦 , 𝑧 ) = One training example 𝑧 (1) = 460 • ( 𝑦 (𝑗) , 𝑧 (𝑗) ) = 𝑗 𝑢ℎ training example Slide credit: Andrew Ng

  14. Model representation 𝑧 = ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 Training set Shorthand ℎ 𝑦 Learning Algorithm Price ($) in 1000’s 400 300 200 𝑦 𝑧 ℎ 100 Size of house Hypothesis Estimated price 2000 500 1000 1500 2500 Size in feet^2 Univariate linear regression Slide credit: Andrew Ng

  15. Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

  16. Size in feet^2 (x) Price ($) in 1000’s (y) Training set 2104 460 1416 232 1534 315 𝑛 = 47 852 178 … … • Hypothesis ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝜄 0 , 𝜄 1 : parameters/weights How to choose 𝜄 𝑗 ’s? Slide credit: Andrew Ng

  17. ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝑧 𝑧 𝑧 3 3 3 2 2 2 1 1 1 𝑦 𝑦 𝑦 1 2 3 1 2 3 1 2 3 𝜄 0 = 1.5 𝜄 0 = 0 𝜄 0 = 1 𝜄 1 = 0 𝜄 1 = 0.5 𝜄 1 = 0.5 Slide credit: Andrew Ng

  18. Cost function 2 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 minimize • Idea: 𝜄 0 , 𝜄 1 Choose 𝜄 0 , 𝜄 1 so that ℎ 𝜄 𝑦 is close to 𝑧 for our ℎ 𝜄 𝑦 𝑗 = 𝜄 0 + 𝜄 1 𝑦 (𝑗) training example (𝑦, 𝑧) 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 𝑧 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ Price ($) in 1000’s 𝑗=1 400 300 200 minimize 𝐾 𝜄 0 , 𝜄 1 100 Cost function 𝑦 𝜄 0 , 𝜄 1 500 1000 1500 2000 2500 Size in feet^2 Slide credit: Andrew Ng

  19. Simplified • Hypothesis: • Hypothesis: ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 ℎ 𝜄 𝑦 = 𝜄 1 𝑦 𝜄 0 = 0 • Parameters: • Parameters: 𝜄 0 , 𝜄 1 𝜄 1 • Cost function: • Cost function: 𝑛 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 𝐾 𝜄 1 = 1 2 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ 2𝑛 ෍ 𝑗=1 𝑗=1 • Goal : • Goal : minimize 𝐾 𝜄 0 , 𝜄 1 minimize 𝐾 𝜄 1 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Slide credit: Andrew Ng

  20. ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

  21. ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

  22. ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

  23. ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

  24. ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

  25. • Hypothesis: ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 • Parameters: 𝜄 0 , 𝜄 1 2 1 • Cost function: 𝐾 𝜄 0 , 𝜄 1 = 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 • Goal : minimize 𝐾 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Slide credit: Andrew Ng

  26. Cost function Slide credit: Andrew Ng

  27. How do we find good 𝜄 0 , 𝜄 1 that minimize 𝐾 𝜄 0 , 𝜄 1 ? Slide credit: Andrew Ng

  28. Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

  29. Gradient descent Have some function 𝐾 𝜄 0 , 𝜄 1 Want argmin 𝐾 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Outline: • Start with some 𝜄 0 , 𝜄 1 • Keep changing 𝜄 0 , 𝜄 1 to reduce 𝐾 𝜄 0 , 𝜄 1 until we hopefully end up at minimum Slide credit: Andrew Ng

  30. Slide credit: Andrew Ng

  31. Gradient descent Repeat until convergence{ 𝜖 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 (for 𝑘 = 0 and 𝑘 = 1 ) } 𝛽 : Learning rate (step size) 𝜖 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 : derivative (rate of change) Slide credit: Andrew Ng

  32. Gradient descent Correct: simultaneous update Incorrect: 𝜖 𝜖 temp0 ≔ 𝜄 0 −𝛽 𝐾 𝜄 0 , 𝜄 1 temp0 ≔ 𝜄 0 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜖𝜄 0 𝜖𝜄 0 𝜖 𝜄 0 ≔ temp0 temp1 ≔ 𝜄 1 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜖 𝜖𝜄 1 temp1 ≔ 𝜄 1 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜄 0 ≔ temp0 𝜖𝜄 1 𝜄 1 ≔ temp1 𝜄 1 ≔ temp1 Slide credit: Andrew Ng

  33. 𝜖 𝜄 1 ≔ 𝜄 1 − 𝛽 𝐾 𝜄 1 𝜖𝜄 1 𝐾 𝜄 1 𝜖 3 𝐾 𝜄 1 < 0 𝜖𝜄 1 𝜖 𝐾 𝜄 1 > 0 2 𝜖𝜄 1 1 𝜄 1 0 1 2 3 Slide credit: Andrew Ng

  34. Learning rate

  35. Gradient descent for linear regression Repeat until convergence{ 𝜖 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 (for 𝑘 = 0 and 𝑘 = 1 ) } • Linear regression model ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ 𝑗=1 Slide credit: Andrew Ng

  36. Computing partial derivative 2 𝜖 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 • 2𝑛 σ 𝑗=1 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 = 𝜖𝜄 𝑘 2 𝜖 1 𝜄 0 + 𝜄 1 𝑦 (𝑗) − 𝑧 𝑗 𝑛 2𝑛 σ 𝑗=1 = 𝜖𝜄 𝑘 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 • 𝑘 = 0 : 𝑛 σ 𝑗=1 𝜖𝜄 0 𝐾 𝜄 0 , 𝜄 1 = 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 • 𝑘 = 1 : 𝑛 σ 𝑗=1 𝜖𝜄 1 𝐾 𝜄 0 , 𝜄 1 = Slide credit: Andrew Ng

  37. Gradient descent for linear regression Repeat until convergence{ 𝑛 𝜄 0 ≔ 𝜄 0 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 ෍ 𝑗=1 𝑛 𝜄 1 ≔ 𝜄 1 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 𝑛 ෍ 𝑗=1 } Update 𝜄 0 and 𝜄 1 simultaneously Slide credit: Andrew Ng

  38. Batch gradient descent • “Batch”: Each step of gradient descent uses all the training examples Repeat until convergence{ 𝑛 : Number of training examples 𝑛 𝜄 0 ≔ 𝜄 0 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 ෍ 𝑗=1 𝑛 𝜄 1 ≔ 𝜄 1 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 𝑛 ෍ 𝑗=1 } Slide credit: Andrew Ng

  39. Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

  40. Training dataset Size in feet^2 (x) Price ($) in 1000’s (y) 2104 460 1416 232 1534 315 852 178 … … ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 Slide credit: Andrew Ng

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend