Machine Learning
Least Mean Squares Regression
1
Least Mean Squares Regression Machine Learning 1 Least Squares - - PowerPoint PPT Presentation
Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent 2 Least Squares Method for regression Examples The
1
2
3
4
Weight (x 100 lb) x1 Age (years) x2 Mileage 31.5 6 21 36.2 2 25 43.1 18 27.6 2 30
5
6
Parameters of the model Also called weights Collectively, a vector
7
For simplicity, we will assume that the first feature is always 1. ๐ = 1 ๐ฆ! โฎ ๐ฆ" This lets makes notation easier
8
x1 y One dimensional input
9
x1 y Predict using y = w1 + w2 x2 One dimensional input
10
x1 y Predict using y = w1 + w2 x2 The linear function is not our only choice. We could have tried to fit the data as another polynomial One dimensional input
11
x1 y Predict using y = w1 + w2 x2 The linear function is not our only choice. We could have tried to fit the data as another polynomial One dimensional input Two dimensional input Predict using y = w1 + w2 x2 +w3 x3
12
13
Sum of squared costs over the training set
14
Sum of squared costs over the training set
15
16
Sum of squared costs over the training set
17
Sum of squared costs over the training set
18
(For this particular minimization objective, there is also an analytical solution. No need for gradient descent)
19
20
21
J(w) w Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
22
J(w) w w0 Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
23
J(w) w w0 w1 Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
24
J(w) w w0 w1 w2 Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
25
J(w) w w0 w1 w2 w3 Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
26
J(w) w w0 w1 w2 w3 Intuition: The gradient is the direction
get to the minimum, go in the opposite direction We are trying to minimize
t)
27
r: Called the learning rate (For now, a small constant. We will get to this later) We are trying to minimize
t)
28
r: Called the learning rate (For now, a small constant. We will get to this later)
We are trying to minimize
29
We are trying to minimize
30
We are trying to minimize
31
We are trying to minimize
32
We are trying to minimize
33
We are trying to minimize
34
We are trying to minimize
35
One element
vector We are trying to minimize
36
One element
vector Error Input ร Sum of We are trying to minimize
37
r: Called the learning rate (For now, a small constant. We will get to this later) We are trying to minimize
38
r: Called the learning rate (For now, a small constant. We will get to this later) We are trying to minimize One element
39
r: Called the learning rate (For now, a small constant. We will get to this later) We are trying to minimize One element
40
r: Called the learning rate (For now, a small constant. We will get to this later) We are trying to minimize One element
41
r: Called the learning rate (For now, a small constant. We will get to this later)
We are trying to minimize One element
42
r: Called the learning rate (For now, a small constant. We will get to this later)
This algorithm is guaranteed to converge to the minimum of J if r is small enough. Why? The objective J is a convex function We are trying to minimize One element
43
44
We are trying to minimize
45
We are trying to minimize
46
47
48
49
50
โ More sophisticated algorithms choose the step size automatically and converge faster
โ Yet, almost all the algorithms we will learn in the class can be traced back to gradient decent algorithms for different loss functions and different hypotheses spaces
51
52
53
Hint: You have to take the derivative of the objective with respect to the vector w and set it to zero.