Least Mean Squares Regression Machine Learning 1 Least Squares - PowerPoint PPT Presentation

Least Mean Squares Regression Machine Learning 1

Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 2

What’s the mileage? Suppose we want to predict the mileage of a car from its weight and age What we want: A function that can Weight Age (x 100 lb) (years) Mileage predict mileage x 1 x 2 using x 1 and x 2 31.5 6 21 36.2 2 25 43.1 0 18 27.6 2 30 4

Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 5

Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Parameters of the model Also called weights Collectively, a vector Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 6

Linear regression: The strategy For simplicity, we will assume • Inputs are vectors: 𝐲 ∈ ℜ ! that the first feature is always 1. • Outputs are real numbers: 𝑧 ∈ ℜ 1 𝑦 ! 𝒚 = ⋮ • We have a training set 𝑦 " D = { 𝐲 ! , 𝑧 ! , 𝐲 " , 𝑧 " , ⋯ } This lets makes notation easier • We want to approximate 𝑧 as 𝑧 = 𝑥 ! + 𝑥 " 𝑦 " + ⋯ + 𝑥 # 𝑦 # 𝑧 = 𝐱 $ 𝐲 𝐱 is the learned weight vector in ℜ # 7

Examples y x 1 One dimensional input 8

Examples y Predict using y = w 1 + w 2 x 2 x 1 One dimensional input 9

Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input 10

Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input Two dimensional input Predict using y = w 1 + w 2 x 2 +w 3 x 3 11

What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 13

What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be One strategy for learning: Find the w with least cost on this data 15

Least Mean Squares (LMS) Regression Learning: minimizing mean squared error 18

Least Mean Squares (LMS) Regression Learning: minimizing mean squared error Different strategies exist for learning by optimization • Gradient descent is a popular algorithm (For this particular minimization objective, there is also an analytical solution. No need for gradient descent) 19

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 21

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 22

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 23

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 24

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 25

We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 26

We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 27

We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. What is the gradient of J? t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 28

We are trying to minimize Gradient of the cost • The gradient is of the form • Remember that w is a vector with d elements – w = [w 1 , w 2 , w 3 , ! w j , ! , w d ] 29

We are trying to minimize Gradient of the cost • The gradient is of the form 30

We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector 35

We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector Sum of Error × Input 36

Least Mean Squares Regression Machine Learning 1 Least Squares - PowerPoint PPT Presentation

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent 2 Least Squares Method for regression Examples The

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

1 Least Squares Regression Suppose someone hands you a stack of N vectors, { x N } , each of

Deep Learning - Theory and Practice Linear Regression, Least Squares 20-02-2020 Classification

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Least Squares (outline) Standard regression: Fit data with weighted sum of regressors.

X-Ray Magnetic Circular Dichroism: basic concepts and theory for 4f rare earth ions and 3d metals

Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other

Association Rules from transactional databases ! Mining multilevel association rules from

Adult Disability Provider Forum 11 th December 2019 Agenda HCC Commissioning Update LeDeR

Contingent Purchase Price in Taxable Acquisitions Contingent Purchase Price in Taxable

Selvi Kadirvel and Jos A. B. Fortes Outline Motivation Goals Problem Scope

Statics of Structural Statics of Structural Supports Supports Supports Different types of

Split-Dollar Life Insurance Arrangements: Exciting Estate Planning Opportunities What Allows