Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative • Office hour • Chen Gao • Shih-Yang Su • Feedback (Thanks!) • Notation? • More descriptive slides? • Video/audio recording? • TA hours (uniformly spread over the week)?

Recap: Machine learning algorithms Supervised Unsupervised Learning Learning Discrete Classification Clustering Dimensionality Continuous Regression reduction

Recap: Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

Recap: Instance/Memory-based Learning 1. A distance metric • Continuous? Discrete? PDF? Gene data? Learn the metric? 2. How many nearby neighbors to look at? • 1? 3? 5? 15? 3. A weighting function (optional) • Closer neighbors matter more 4. How to fit with the local points? • Kernel regression Slide credit: Carlos Guestrin

Validation set • Spliting training set: A fake test set to tune hyper-parameters Slide credit: CS231 @ Stanford

Cross-validation • 5-fold cross-validation -> split the training data into 5 equal folds • 4 of them for training and 1 for validation Slide credit: CS231 @ Stanford

Things to remember • Supervised Learning • Training/testing data; classification/regression; Hypothesis • k-NN • Simplest learning algorithm • With sufficient data, very hard to beat “strawman” approach • Kernel regression/classification • Set k to n (number of data points) and chose kernel width • Smoother than k-NN • Problems with k-NN • Curse of dimensionality • Not robust to irrelevant features • Slow NN search: must remember (very large) dataset for prediction

Today’s plan : Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

Linear Regression • Model representation • Cost function • Gradient descent • Features and polynomial regression • Normal equation

Regression Training set real-valued output Learning Algorithm 𝑦 𝑧 ℎ Size of house Hypothesis Estimated price

House pricing prediction Price ($) in 1000’s 400 300 200 100 2000 500 1000 1500 2500 Size in feet^2

Training set Size in feet^2 (x) Price ($) in 1000’s (y) 2104 460 1416 232 1534 315 𝑛 = 47 852 178 … … • Notation: • 𝑛 = Number of training examples • 𝑦 = Input variable / features Examples: 𝑦 (1) = 2104 • 𝑧 = Output variable / target variable 𝑦 (2) = 1416 • ( 𝑦 , 𝑧 ) = One training example 𝑧 (1) = 460 • ( 𝑦 (𝑗) , 𝑧 (𝑗) ) = 𝑗 𝑢ℎ training example Slide credit: Andrew Ng

Model representation 𝑧 = ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 Training set Shorthand ℎ 𝑦 Learning Algorithm Price ($) in 1000’s 400 300 200 𝑦 𝑧 ℎ 100 Size of house Hypothesis Estimated price 2000 500 1000 1500 2500 Size in feet^2 Univariate linear regression Slide credit: Andrew Ng

Size in feet^2 (x) Price ($) in 1000’s (y) Training set 2104 460 1416 232 1534 315 𝑛 = 47 852 178 … … • Hypothesis ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝜄 0 , 𝜄 1 : parameters/weights How to choose 𝜄 𝑗 ’s? Slide credit: Andrew Ng

ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝑧 𝑧 𝑧 3 3 3 2 2 2 1 1 1 𝑦 𝑦 𝑦 1 2 3 1 2 3 1 2 3 𝜄 0 = 1.5 𝜄 0 = 0 𝜄 0 = 1 𝜄 1 = 0 𝜄 1 = 0.5 𝜄 1 = 0.5 Slide credit: Andrew Ng

Cost function 2 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 minimize • Idea: 𝜄 0 , 𝜄 1 Choose 𝜄 0 , 𝜄 1 so that ℎ 𝜄 𝑦 is close to 𝑧 for our ℎ 𝜄 𝑦 𝑗 = 𝜄 0 + 𝜄 1 𝑦 (𝑗) training example (𝑦, 𝑧) 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 𝑧 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ Price ($) in 1000’s 𝑗=1 400 300 200 minimize 𝐾 𝜄 0 , 𝜄 1 100 Cost function 𝑦 𝜄 0 , 𝜄 1 500 1000 1500 2000 2500 Size in feet^2 Slide credit: Andrew Ng

Simplified • Hypothesis: • Hypothesis: ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 ℎ 𝜄 𝑦 = 𝜄 1 𝑦 𝜄 0 = 0 • Parameters: • Parameters: 𝜄 0 , 𝜄 1 𝜄 1 • Cost function: • Cost function: 𝑛 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 𝐾 𝜄 1 = 1 2 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ 2𝑛 ෍ 𝑗=1 𝑗=1 • Goal : • Goal : minimize 𝐾 𝜄 0 , 𝜄 1 minimize 𝐾 𝜄 1 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Slide credit: Andrew Ng

ℎ 𝜄 𝑦 , function of 𝑦 𝐾 𝜄 1 , function of 𝜄 1 𝑧 𝐾 𝜄 1 3 3 2 2 1 1 𝑦 𝜄 1 1 2 3 0 1 2 3 Slide credit: Andrew Ng

• Hypothesis: ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 • Parameters: 𝜄 0 , 𝜄 1 2 1 • Cost function: 𝐾 𝜄 0 , 𝜄 1 = 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 • Goal : minimize 𝐾 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Slide credit: Andrew Ng

Cost function Slide credit: Andrew Ng

How do we find good 𝜄 0 , 𝜄 1 that minimize 𝐾 𝜄 0 , 𝜄 1 ? Slide credit: Andrew Ng

Gradient descent Have some function 𝐾 𝜄 0 , 𝜄 1 Want argmin 𝐾 𝜄 0 , 𝜄 1 𝜄 0 , 𝜄 1 Outline: • Start with some 𝜄 0 , 𝜄 1 • Keep changing 𝜄 0 , 𝜄 1 to reduce 𝐾 𝜄 0 , 𝜄 1 until we hopefully end up at minimum Slide credit: Andrew Ng

Slide credit: Andrew Ng

Gradient descent Repeat until convergence{ 𝜖 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 (for 𝑘 = 0 and 𝑘 = 1 ) } 𝛽 : Learning rate (step size) 𝜖 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 : derivative (rate of change) Slide credit: Andrew Ng

Gradient descent Correct: simultaneous update Incorrect: 𝜖 𝜖 temp0 ≔ 𝜄 0 −𝛽 𝐾 𝜄 0 , 𝜄 1 temp0 ≔ 𝜄 0 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜖𝜄 0 𝜖𝜄 0 𝜖 𝜄 0 ≔ temp0 temp1 ≔ 𝜄 1 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜖 𝜖𝜄 1 temp1 ≔ 𝜄 1 −𝛽 𝐾 𝜄 0 , 𝜄 1 𝜄 0 ≔ temp0 𝜖𝜄 1 𝜄 1 ≔ temp1 𝜄 1 ≔ temp1 Slide credit: Andrew Ng

𝜖 𝜄 1 ≔ 𝜄 1 − 𝛽 𝐾 𝜄 1 𝜖𝜄 1 𝐾 𝜄 1 𝜖 3 𝐾 𝜄 1 < 0 𝜖𝜄 1 𝜖 𝐾 𝜄 1 > 0 2 𝜖𝜄 1 1 𝜄 1 0 1 2 3 Slide credit: Andrew Ng

Learning rate

Gradient descent for linear regression Repeat until convergence{ 𝜖 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 (for 𝑘 = 0 and 𝑘 = 1 ) } • Linear regression model ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 𝑛 𝐾 𝜄 0 , 𝜄 1 = 1 2 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 ෍ 𝑗=1 Slide credit: Andrew Ng

Computing partial derivative 2 𝜖 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 • 2𝑛 σ 𝑗=1 𝜖𝜄 𝑘 𝐾 𝜄 0 , 𝜄 1 = 𝜖𝜄 𝑘 2 𝜖 1 𝜄 0 + 𝜄 1 𝑦 (𝑗) − 𝑧 𝑗 𝑛 2𝑛 σ 𝑗=1 = 𝜖𝜄 𝑘 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 • 𝑘 = 0 : 𝑛 σ 𝑗=1 𝜖𝜄 0 𝐾 𝜄 0 , 𝜄 1 = 𝜖 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 • 𝑘 = 1 : 𝑛 σ 𝑗=1 𝜖𝜄 1 𝐾 𝜄 0 , 𝜄 1 = Slide credit: Andrew Ng

Gradient descent for linear regression Repeat until convergence{ 𝑛 𝜄 0 ≔ 𝜄 0 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 ෍ 𝑗=1 𝑛 𝜄 1 ≔ 𝜄 1 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 𝑛 ෍ 𝑗=1 } Update 𝜄 0 and 𝜄 1 simultaneously Slide credit: Andrew Ng

Batch gradient descent • “Batch”: Each step of gradient descent uses all the training examples Repeat until convergence{ 𝑛 : Number of training examples 𝑛 𝜄 0 ≔ 𝜄 0 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 ෍ 𝑗=1 𝑛 𝜄 1 ≔ 𝜄 1 − 𝛽 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑦 𝑗 𝑛 ෍ 𝑗=1 } Slide credit: Andrew Ng

Training dataset Size in feet^2 (x) Price ($) in 1000’s (y) 2104 460 1416 232 1534 315 852 178 … … ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 Slide credit: Andrew Ng

Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Linear Regression Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Office hour Chen Gao Shih-Yang Su Feedback (Thanks!) Notation? More descriptive slides? Video/audio recording? TA hours

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Verifying remote computations using PCPs Srinath Setty, Andrew Blumberg, and Michael Walfish UT

The Energetic Cost of Adaptive Feet in Walking 12. 9, 2011 Seungmoon Song and Hartmut Geyer

IDOLS WITH FEET OF CLAY: ON THE SECURITY OF BOOTLOADERS AND FIRMWARE UPDATERS FOR THE IOT Lionel

Natural Language from 20,000 feet AI Class 28 (no reading) Slides from Paula Matuszek and

The he Sales C Com ompa parison Appr pproa oach Pa Part C C 2020 Le Level el I I

Luke 24:3649 35 Then the two from Emmaus told their story of how Jesus had appeared to them as

Autonomous Navigation Mangal Kothari Department of Aerospace Engineering Indian Institute of

This is the presentation given at the 8/31/2020 Public Hearing on August 31, 2020. This version