1
ADVANCED MACHINE LEARNING
1
ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 - - PowerPoint PPT Presentation
ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING Regression: Principle N Map N-dim. input x to a continuous output y . Learn a function of the type:
1
ADVANCED MACHINE LEARNING
1
2
ADVANCED MACHINE LEARNING
2
N N
y
1,...
Estimate that best predict set of training points , ?
i i i M
f x y
x
1
x
1
y
2
x
2
y
3
x
3
y
4
x
4
y
True function Estimate
3
ADVANCED MACHINE LEARNING
3
N N
y
1,...
Estimate that best predict set of training points , ?
i i i M
f x y
x
1
x
1
y
2
x
2
y
3
x
3
y
4
x
4
y
True function Estimate
4
ADVANCED MACHINE LEARNING
4
Support Vector Machine Relevance Vector Machine Boosting – random projections Boosting – random gaussians Random forest Gaussian Process Support vector regression Relevance vector regression Gaussian process regression Gradient boosting Locally weighted projected regression
5
ADVANCED MACHINE LEARNING
5
Support Vector Machine Relevance Vector Machine Boosting – random projections Support vector regression Relevance Vector Machine Relevance vector regression
66
ADVANCED MACHINE LEARNING
77
ADVANCED MACHINE LEARNING
1,...
i i i M
How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space 2. Supervised learning – minimizes an error function. First determine a way to measure error on testing set in the linear case!
88
ADVANCED MACHINE LEARNING
Assume a linear mapping , s.t. .
T
f y f x w x b
Measure the error on prediction
How to estimate and to best predict the pair of training points , ?
i i i M
w b x y
x
y
T
y f x w x b
99
ADVANCED MACHINE LEARNING
Set an upper bound on the error and consider as correctly classified all points such that ( ) . f x y
x
y
T
y f x w x b
Penalize only datapoints that are not contained in the -tube.
10 10
ADVANCED MACHINE LEARNING
x
-margin
The -margin is a measure of the width of the -insensitive tube. It is a measure of the precision of the regression. A small ||w|| corresponds to a small slope for f. In the linear case, f is more horizontal. y
11 11
ADVANCED MACHINE LEARNING
x
-margin
y A large ||w|| corresponds to a large slope for f. In the linear case, f is more vertical. The flatter the slope of the function f, the larger the margin. To maximize the margin, we must minimize the norm of w.
12 12
ADVANCED MACHINE LEARNING
2
1,...
This can be rephrased as a constraint-based optimization problem
1 minimize 2 , subject to ,
i i
i i
i M
w w x b y y w x b
Need to penalize points outside the -insensitive tube.
Consider as correctly classified all points such that ( ) . f x y
13 13
ADVANCED MACHINE LEARNING
Need to penalize points outside the -insensitive tube.
* 2 * 1 * *
Introduce slack variables , , 0: 1 C minimize + 2 , subject to , 0,
i i
i i M i i i i i i i i i
C w M w x b y y w x b
i
* i
14 14
ADVANCED MACHINE LEARNING
All points outside the -tube become Support Vectors
i
* i
We now have the solution to the linear regression problem. How to generalize this to the nonlinear case?
* 2 * 1 * *
Introduce slack variables , , 0: 1 C minimize + 2 , subject to , 0,
i i
i i M i i i i i i i i i
C w M w x b y y w x b
15 15
ADVANCED MACHINE LEARNING
2 * * * 1 1 1 * * 1
1 C C L , , *, = + 2 , ,
i i i i i i
M M i i i i i i i M i i M i i i
w b w M M y w x b y w x b
Lagrangian = Objective function + multipliers * constraints
16 16
ADVANCED MACHINE LEARNING
2 * * * 1 1 1 * * 1
1 C C L , , *, = + 2 , ,
i i i i i i
M M i i i i i i i M i i M i i i
w b w M M y w x b y w x b
i
* i
Constraints on points lying on either side of the -tube.
*
0 for all points that satisfy the constraints points inside the -tube
i
i
i
* i
17 17
ADVANCED MACHINE LEARNING
Requiring that the partial derivatives are all zero:
* 1
L 0.
M i i i i
w x w
* 1
.
M i i i i
w x
Linear combination of support vectors
* 1
L 0.
i
M i i
b
* 1 1
i
M M i i i
Rebalancing the effect of the support vectors on both sides of the -tube
The solution is given by:
* 1
M i i i i
18 18
ADVANCED MACHINE LEARNING
Lift x into feature space and then perform linear regression in feature space.
Linear Case: , Non-Linear Case: , y f x w x b x x y f x w x b w lives in feature space!
x x
19 19
ADVANCED MACHINE LEARNING
2 * 1 * *
i i i
M i i i i i i i i
20 20
ADVANCED MACHINE LEARNING
2 * * * 1 1 * 1 * 1
1 C C L , , *, = + 2 , ,
i i i i i i
M M i i i i i i i M i i M i i i
w b w M M y w x b y w x b
Lagrangian = Objective function + multipliers * constraints
21 21
ADVANCED MACHINE LEARNING
And replacing in the primal Lagrangian, we get the Dual optimization problem:
*
* * , 1 * * , 1 1 * * 1
1 , 2 max subject to 0 and , 0,
i i i i i
M i j i j j i j M M i i i i i M i i i i
k x x y C M
, ,
i j i j
k x x x x
Kernel Trick
22 22
ADVANCED MACHINE LEARNING
The solution is given by:
* 1
M i i i i
Linear Coefficients (Lagrange multipliers for each constraint). If one uses RBF Kernel, M un-normalized isotropic Gaussians centered on each training datapoint.
23 23
ADVANCED MACHINE LEARNING
The solution is given by:
y x
Kernel places a Gaussian function on each SV
* 1
M i i i i
24 24
ADVANCED MACHINE LEARNING
* 1
M i i i i
y x
The solution is given by: The Lagrange multipliers define the importance of each Gaussian function.
* 1
1.5
2
2
4
3
* 3
* 5
6
b Converges to b when SV effect vanishes.
1
x
2
x
3
x
4
x
5
x
6
x
Y=f(x)
25 25
ADVANCED MACHINE LEARNING
* 1 * 1
SVR gives the following estimate for each pair of datapoints , , , i 1... For the set of 3 points drawn below, 1 a) compute an estimate of using: ,
j j M j j j i i i M j i j i i i
y x y k x x b M b b y k x x M
1
with RBF kernel, and plot . b) Plot the regressive curve and show how it varies depending on the kernel width and .
M j
b
26 26
ADVANCED MACHINE LEARNING
a) To answer this question, you must first determine the number of SVs. This depends on epsilon. If we assume a small epsilon, all 3 points become SVs. is then influenced by the value of the kernel b
* 1 1
1 , and then, 1 is the mean of the data. As the kernel width grows, is pulled toward the SVs modulated by th
M M j i j i j i i j i
k x x x x b y M b
* 1
e kernel 1 , With only 2 SVs, the influence of the SVs cancels out and we are back to the mean
M M j i j i i j i j
b y k x x M
27 27
ADVANCED MACHINE LEARNING
Kernel width = 0.001 , epsilon = 0.1 Kernel width = 0.001 , epsilon = 0.24 Kernel width = 0.1 , epsilon = 0.1 With small kernel width, the effect of each SV is well separated and the curve comes back to b in-between two SVs. With a large kernel width, b changes and the curve yields a smooth interpolation in-between SVs. With a large epsilon, one point is absorbed in the epsilon tube and is no longer a SV.
28 28
ADVANCED MACHINE LEARNING
* 1
Recall the solution to SVM: , a) What type of function can you model with the homogeneous polynomial? b) What minimum order of a homogeneous polynomial kernel do you need to achi
M i i i i
y f x k x x b f
eve good regression on the set of 3 points below?
29 29
ADVANCED MACHINE LEARNING
The equation for a homogeneous polynomial in the 1-D case below is: For the set of points below, we need at minimum p=2 and 2 SVs. See also the supplementary exercises posted on the class’s website!
* 1
)( ) single term scaled ( : & shifted polynomial
i p i i i
y x x b
30 30
ADVANCED MACHINE LEARNING
2 * 1 * *
1 C minimize + 2 , subject to , 0,
i i
M i i i i i i i i i
w M w x b y y w x b
The solution to SVR we just saw is referred to as SVR Two Hyperparameters C controls the penalty term on poor fit. determines the minimal required precision.
31 31
ADVANCED MACHINE LEARNING
Effect of the RBF kernel width on the fit. Here fit using C=100, =0.1, kernel width=0.01.
32 32
ADVANCED MACHINE LEARNING
Effect of the RBF kernel width on the fit. Here fit using C=100, =0.01, kernel width=0.01
Overfitting
33 33
ADVANCED MACHINE LEARNING
Effect of the RBF kernel width on the fit. Here fit using C=100, =0.05, kernel width=0.01 Reduction of the effect of the kernel width on the fit by choosing appropriate hyperparameters. .
34 34
ADVANCED MACHINE LEARNING
Mldemos does not display the support vectors if there is more than one point for the same x!
35 35
ADVANCED MACHINE LEARNING
2 * 1 * *
1 C minimize + 2 , subject to , 0,
i i
M i i i i i i i i i
w M w x b y y w x b
The solution to SVR we just saw is referred to as SVR Two Hyperparameters C controls the penalty term on poor fit determines the minimal required precision
36 36
ADVANCED MACHINE LEARNING
As in the classification case, the optimization framework used for support vector regression is extended with: n-SVR: yielding a sparser version of SVR and relaxing the constraint
provides also a sparser version of SVM and offers a probabilistic interpretation of the solution. (see Tipping 2011, supplementary material to the class)
37 37
ADVANCED MACHINE LEARNING
As the number of data grows, so does the number of support vectors.
fraction of support vectors (see previous case for SVM)
38 38
ADVANCED MACHINE LEARNING
As for n-SVM, one can rewrite the problem as a convex optimization expression:
2 * 1 , , * *
1 1 min under constraints 2 , , 0, 0 1, 0, 0. The margin error is given by all the data points for which 0. is an upp
j j j j
M j j j w T j T j j j j
w C M w x b y y w x b
n n n
er bound on the fraction of training error and a lower bound on the fraction of support vectors.
39 39
ADVANCED MACHINE LEARNING
Effect of the automatic adaptation of using n-SVR
40 40
ADVANCED MACHINE LEARNING
Effect of the automatic adaptation of using n-SVR Added noise on data
41 41
ADVANCED MACHINE LEARNING
Same principle as that described for RVM (see slides on SVM and extensions). The derivation of the parameters however differ (see Tipping 2011 for details). To recall, we start from the solution of SVM.
1
, , ....... 1
M i i i T T
y x f x k x x b x i z x x x M
Rewrite the solution of SVM as a linear combination over M basis functions
In the (binary) classification case, [0;1]. In the regression case, . y y
A sparse solution has a majority of entries with alpha zero.
1 1 2 3
. . . . .
M M
42
ADVANCED MACHINE LEARNING
42
Solution with -SVR: RBF kernel , C=3000, =0.08, s=0.05, 37 support vectors
43
ADVANCED MACHINE LEARNING
43
Solution with n-SVR: RBF kernel , C=3000, n=0.04, s=0.001, 17 support vectors
44
ADVANCED MACHINE LEARNING
44
Solution with RVR: RBF kernel , =0.08, s=0.05, 7 support vectors
45
ADVANCED MACHINE LEARNING
45
46 46
ADVANCED MACHINE LEARNING
Extremely fast computation (object flies in half a second); re- estimation of arm motion to adapt to noisy visual detection of object.
47 47
ADVANCED MACHINE LEARNING
Not at the center of mass
48 48
ADVANCED MACHINE LEARNING
Gather Demonstrations of free flying object Build model of dynamics using Support Vector Regression Compute derivative (closed form) Use model in Extended Kalman Filter for real-time tracking
1
,
i
M T T i i i
x k x x x x b
Precision (1cm, 1degree); Computation 0.17-0.32 second ahead of time
Kim, S. and Billard, A. (2012) Estimating the non-linear dynamics of free-flying objects. Robotics and Autonomous Systems, Volume 60, Issue 9, P. 1108–1122..
49 49
ADVANCED MACHINE LEARNING
Systematic assessment of sensitivity to choice of hyperparameters and choice of kernel.
C C SVR with polynomial kernel SVR with RBF kernel Kernel Width
50 50
ADVANCED MACHINE LEARNING Kim, Shukla, Billard, IEEE Transaction on Robotics, 2014IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award 2014 Kim, Shukla, Billard, IEEE Transaction on Robotics, 2015 IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award
51 51
ADVANCED MACHINE LEARNING Kim, Shukla, Billard, IEEE Transaction on Robotics, 2014IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award 2014 Kim, Shukla, Billard, IEEE Transaction on Robotics, 2015 IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award
52 52
ADVANCED MACHINE LEARNING
Shukla and Billard, NIPS 2012 - http://asvm.epfl.ch
53 53
ADVANCED MACHINE LEARNING
54 54
ADVANCED MACHINE LEARNING
Build a partition with support Vector Machine (SVM)
55 55
ADVANCED MACHINE LEARNING
Build a partition with support Vector Machine (SVM) Attractors not located at the right place
56 56
ADVANCED MACHINE LEARNING
Extend the SVM optimization framework with new constraints Maximize classification margin All points correctly classified Follow dynamics Stability at attractor
Shukla and Billard, NIPS 2012 - http://asvm.epfl.ch
57 57
ADVANCED MACHINE LEARNING
Standard SVM - SVs
Const. bias New b- SVs Non-linear bias
58 58
ADVANCED MACHINE LEARNING
59 59
ADVANCED MACHINE LEARNING
60 60
ADVANCED MACHINE LEARNING
61 61
ADVANCED MACHINE LEARNING