ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 - - PowerPoint PPT Presentation

advanced machine learning non linear regression techniques
SMART_READER_LITE
LIVE PREVIEW

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 - - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING Regression: Principle N Map N-dim. input x to a continuous output y . Learn a function of the type:


slide-1
SLIDE 1

1

ADVANCED MACHINE LEARNING

1

ADVANCED MACHINE LEARNING Non-linear regression techniques

slide-2
SLIDE 2

2

ADVANCED MACHINE LEARNING

2

Regression: Principle

 

Map N-dim. input to a continuous output . Learn a function of the type: : and .

N N

x y f y f x    

y

 

1,...

Estimate that best predict set of training points , ?

i i i M

f x y

 x

1

x

1

y

2

x

2

y

3

x

3

y

4

x

4

y

True function Estimate

slide-3
SLIDE 3

3

ADVANCED MACHINE LEARNING

3

Regression: Issues

 

Map N-dim. input to a continuous output . Learn a function of the type: : and .

N N

x y f y f x    

y

 

1,...

Estimate that best predict set of training points , ?

i i i M

f x y

 x

1

x

1

y

2

x

2

y

3

x

3

y

4

x

4

y

True function Estimate

Fit strongly influenced by choice of:

  • datapoints for training
  • complexity of the model (interpolation)
slide-4
SLIDE 4

4

ADVANCED MACHINE LEARNING

4

Regression Algorithms in this Course

Support Vector Machine Relevance Vector Machine Boosting – random projections Boosting – random gaussians Random forest Gaussian Process Support vector regression Relevance vector regression Gaussian process regression Gradient boosting Locally weighted projected regression

slide-5
SLIDE 5

5

ADVANCED MACHINE LEARNING

5

Today, we will see:

Support Vector Machine Relevance Vector Machine Boosting – random projections Support vector regression Relevance Vector Machine Relevance vector regression

slide-6
SLIDE 6

66

ADVANCED MACHINE LEARNING

Support Vector Regression

slide-7
SLIDE 7

77

ADVANCED MACHINE LEARNING

 

 

1,...

Assume a nonlinear mapping , s.t. . How to estimate to best predict the pair of training points , ?

i i i M

f y f x f x y

How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space 2. Supervised learning – minimizes an error function.  First determine a way to measure error on testing set in the linear case!

Support Vector Regression

slide-8
SLIDE 8

88

ADVANCED MACHINE LEARNING

 

Assume a linear mapping , s.t. .

T

f y f x w x b   

Measure the error on prediction

Support Vector Regression

  1,...

How to estimate and to best predict the pair of training points , ?

i i i M

w b x y

x

y

 

T

y f x w x b   

slide-9
SLIDE 9

99

ADVANCED MACHINE LEARNING

Support Vector Regression

Set an upper bound on the error and consider as correctly classified all points such that ( ) . f x y    

x

y

 

T

y f x w x b   

Penalize only datapoints that are not contained in the -tube. 

slide-10
SLIDE 10

10 10

ADVANCED MACHINE LEARNING

x

Support Vector Regression

-margin

The -margin is a measure of the width of the -insensitive tube. It is a measure of the precision of the regression. A small ||w|| corresponds to a small slope for f. In the linear case, f is more horizontal. y

slide-11
SLIDE 11

11 11

ADVANCED MACHINE LEARNING

x

Support Vector Regression

-margin

y A large ||w|| corresponds to a large slope for f. In the linear case, f is more vertical. The flatter the slope of the function f, the larger the margin.  To maximize the margin, we must minimize the norm of w.

slide-12
SLIDE 12

12 12

ADVANCED MACHINE LEARNING

Support Vector Regression

2

1,...

This can be rephrased as a constraint-based optimization problem

  • f the form:

1 minimize 2 , subject to ,

i i

i i

i M

w w x b y y w x b  

 

          

Need to penalize points outside the -insensitive tube.

Consider as correctly classified all points such that ( ) . f x y   

slide-13
SLIDE 13

13 13

ADVANCED MACHINE LEARNING

Support Vector Regression

Need to penalize points outside the -insensitive tube.

 

* 2 * 1 * *

Introduce slack variables , , 0: 1 C minimize + 2 , subject to , 0,

i i

i i M i i i i i i i i i

C w M w x b y y w x b          

                  

i

* i

slide-14
SLIDE 14

14 14

ADVANCED MACHINE LEARNING

Support Vector Regression

All points outside the -tube become Support Vectors

i

* i

We now have the solution to the linear regression problem. How to generalize this to the nonlinear case?

 

* 2 * 1 * *

Introduce slack variables , , 0: 1 C minimize + 2 , subject to , 0,

i i

i i M i i i i i i i i i

C w M w x b y y w x b          

                  

slide-15
SLIDE 15

15 15

ADVANCED MACHINE LEARNING

Support Vector Regression

 

   

   

2 * * * 1 1 1 * * 1

1 C C L , , *, = + 2 , ,

i i i i i i

M M i i i i i i i M i i M i i i

w b w M M y w x b y w x b              

   

            

   

We can solve this quadratic problem by introducing sets of , Lagrange multipliers and writing the Lagrangian :   

Lagrangian = Objective function + multipliers * constraints

slide-16
SLIDE 16

16 16

ADVANCED MACHINE LEARNING

 

   

   

2 * * * 1 1 1 * * 1

1 C C L , , *, = + 2 , ,

i i i i i i

M M i i i i i i i M i i M i i i

w b w M M y w x b y w x b              

   

            

   

Support Vector Regression

i

* i

Constraints on points lying on either side of the -tube.

*

0 for all points that satisfy the constraints points inside the -tube

i

i

     

i

 

* i

 

slide-17
SLIDE 17

17 17

ADVANCED MACHINE LEARNING

Support Vector Regression

Requiring that the partial derivatives are all zero:

 

* 1

L 0.

M i i i i

w x w  

     

 

* 1

.

M i i i i

w x  

  

Linear combination of support vectors

 

* 1

L 0.

i

M i i

b  

    

* 1 1

i

M M i i i

 

 

 

 

Rebalancing the effect of the support vectors on both sides of the -tube

The solution is given by:

 

 

* 1

, ,

M i i i i

y f x w x b x x b  

     

slide-18
SLIDE 18

18 18

ADVANCED MACHINE LEARNING

Lift x into feature space and then perform linear regression in feature space.

     

 

 

Linear Case: , Non-Linear Case: , y f x w x b x x y f x w x b           w lives in feature space!

 

x x  

Support Vector Regression

slide-19
SLIDE 19

19 19

ADVANCED MACHINE LEARNING

Support Vector Regression

     

2 * 1 * *

In feature space, we obtain the same constrained optimization problem: 1 C minimize + 2 , subject to , 0,

i i i

M i i i i i i i i

w M w x b y y w x b          

                 

slide-20
SLIDE 20

20 20

ADVANCED MACHINE LEARNING

Support Vector Regression

 

     

 

 

 

2 * * * 1 1 * 1 * 1

1 C C L , , *, = + 2 , ,

i i i i i i

M M i i i i i i i M i i M i i i

w b w M M y w x b y w x b                

   

            

   

We can solve this quadratic problem by introducing sets of , Lagrange multipliers and writing the Lagrangian :   

Lagrangian = Objective function + multipliers * constraints

slide-21
SLIDE 21

21 21

ADVANCED MACHINE LEARNING

Support Vector Regression

And replacing in the primal Lagrangian, we get the Dual optimization problem:

          

*

* * , 1 * * , 1 1 * * 1

1 , 2 max subject to 0 and , 0,

i i i i i

M i j i j j i j M M i i i i i M i i i i

k x x y C M

 

            

   

                      

   

     

, ,

i j i j

k x x x x   

Kernel Trick

slide-22
SLIDE 22

22 22

ADVANCED MACHINE LEARNING

Support Vector Regression

The solution is given by:

 

   

* 1

,

M i i i i

y f x k x x b  

   

Linear Coefficients (Lagrange multipliers for each constraint). If one uses RBF Kernel, M un-normalized isotropic Gaussians centered on each training datapoint.

slide-23
SLIDE 23

23 23

ADVANCED MACHINE LEARNING

The solution is given by:

Support Vector Regression

y x

Kernel places a Gaussian function on each SV

 

   

* 1

,

M i i i i

y f x k x x b  

   

slide-24
SLIDE 24

24 24

ADVANCED MACHINE LEARNING

 

   

* 1

,

M i i i i

y f x k x x b  

   

Support Vector Regression

y x

The solution is given by: The Lagrange multipliers define the importance of each Gaussian function.

* 1

1.5  

2

2  

4

3  

* 3

1.5  

* 5

1  

6

2.5  

b Converges to b when SV effect vanishes.

1

x

2

x

3

x

4

x

5

x

6

x

Y=f(x)

slide-25
SLIDE 25

25 25

ADVANCED MACHINE LEARNING

Support Vector Regression: Exercise I

         

* 1 * 1

SVR gives the following estimate for each pair of datapoints , , , i 1... For the set of 3 points drawn below, 1 a) compute an estimate of using: ,

j j M j j j i i i M j i j i i i

y x y k x x b M b b y k x x M    

 

            

 

1

with RBF kernel, and plot . b) Plot the regressive curve and show how it varies depending on the kernel width and .

M j

b 

slide-26
SLIDE 26

26 26

ADVANCED MACHINE LEARNING

a) To answer this question, you must first determine the number of SVs. This depends on epsilon. If we assume a small epsilon, all 3 points become SVs. is then influenced by the value of the kernel b

   

* 1 1

  • width. With a very small kernel width

1 , and then, 1 is the mean of the data. As the kernel width grows, is pulled toward the SVs modulated by th

M M j i j i j i i j i

k x x x x b y M b  

  

               

 

   

* 1

e kernel 1 , With only 2 SVs, the influence of the SVs cancels out and we are back to the mean

  • f the data.

M M j i j i i j i j

b y k x x M  

 

        

 

Support Vector Regression: Exercise I Solution

slide-27
SLIDE 27

27 27

ADVANCED MACHINE LEARNING

Support Vector Regression: Exercise I Solution

Kernel width = 0.001 , epsilon = 0.1 Kernel width = 0.001 , epsilon = 0.24 Kernel width = 0.1 , epsilon = 0.1 With small kernel width, the effect of each SV is well separated and the curve comes back to b in-between two SVs. With a large kernel width, b changes and the curve yields a smooth interpolation in-between SVs. With a large epsilon, one point is absorbed in the epsilon tube and is no longer a SV.

slide-28
SLIDE 28

28 28

ADVANCED MACHINE LEARNING

Support Vector Regression: Exercise II

 

   

* 1

Recall the solution to SVM: , a) What type of function can you model with the homogeneous polynomial? b) What minimum order of a homogeneous polynomial kernel do you need to achi

M i i i i

y f x k x x b f  

   

eve good regression on the set of 3 points below?

slide-29
SLIDE 29

29 29

ADVANCED MACHINE LEARNING

The equation for a homogeneous polynomial in the 1-D case below is: For the set of points below, we need at minimum p=2 and 2 SVs. See also the supplementary exercises posted on the class’s website!

Support Vector Regression: Solutions Exercise II

* 1

)( ) single term scaled ( : & shifted polynomial

i p i i i

y x x b

slide-30
SLIDE 30

30 30

ADVANCED MACHINE LEARNING

 

2 * 1 * *

1 C minimize + 2 , subject to , 0,

i i

M i i i i i i i i i

w M w x b y y w x b        

                 

SVR: Hyperparameters

The solution to SVR we just saw is referred to as SVR Two Hyperparameters C controls the penalty term on poor fit.  determines the minimal required precision.

slide-31
SLIDE 31

31 31

ADVANCED MACHINE LEARNING

Effect of the RBF kernel width on the fit. Here fit using C=100, =0.1, kernel width=0.01.

SVR: Effect of Hyperparameters

slide-32
SLIDE 32

32 32

ADVANCED MACHINE LEARNING

Effect of the RBF kernel width on the fit. Here fit using C=100, =0.01, kernel width=0.01

SVR: Effect of Hyperparameters

 Overfitting

slide-33
SLIDE 33

33 33

ADVANCED MACHINE LEARNING

SVR: Effect of Hyperparameters

Effect of the RBF kernel width on the fit. Here fit using C=100, =0.05, kernel width=0.01 Reduction of the effect of the kernel width on the fit by choosing appropriate hyperparameters. .

slide-34
SLIDE 34

34 34

ADVANCED MACHINE LEARNING

SVR: Effect of Hyperparameters

Mldemos does not display the support vectors if there is more than one point for the same x!

slide-35
SLIDE 35

35 35

ADVANCED MACHINE LEARNING

 

2 * 1 * *

1 C minimize + 2 , subject to , 0,

i i

M i i i i i i i i i

w M w x b y y w x b        

                 

SVR: Hyperparameters

The solution to SVR we just saw is referred to as SVR Two Hyperparameters C controls the penalty term on poor fit  determines the minimal required precision

slide-36
SLIDE 36

36 36

ADVANCED MACHINE LEARNING

Extensions of SVR

As in the classification case, the optimization framework used for support vector regression is extended with:  n-SVR: yielding a sparser version of SVR and relaxing the constraint

  • f choosing , the width of the insensitive tube.
  • Relevance Vector Regression: the regression version of RVM, which

provides also a sparser version of SVM and offers a probabilistic interpretation of the solution. (see Tipping 2011, supplementary material to the class)

slide-37
SLIDE 37

37 37

ADVANCED MACHINE LEARNING

As the number of data grows, so does the number of support vectors.

nSVR puts a lower bound on the

fraction of support vectors (see previous case for SVM)

 

0,1 n 

Support Vector Regression: n-SVR

slide-38
SLIDE 38

38 38

ADVANCED MACHINE LEARNING

Support Vector Regression: n-SVR

As for n-SVM, one can rewrite the problem as a convex optimization expression:

 

   

2 * 1 , , * *

1 1 min under constraints 2 , , 0, 0 1, 0, 0. The margin error is given by all the data points for which 0. is an upp

j j j j

M j j j w T j T j j j j

w C M w x b y y w x b

 

n        n    n

                              

er bound on the fraction of training error and a lower bound on the fraction of support vectors.

slide-39
SLIDE 39

39 39

ADVANCED MACHINE LEARNING

Effect of the automatic adaptation of  using n-SVR

nSVR: Example

slide-40
SLIDE 40

40 40

ADVANCED MACHINE LEARNING

nSVR: Example

Effect of the automatic adaptation of  using n-SVR Added noise on data

slide-41
SLIDE 41

41 41

ADVANCED MACHINE LEARNING

Relevance Vector Regression (RVR)

Same principle as that described for RVM (see slides on SVM and extensions). The derivation of the parameters however differ (see Tipping 2011 for details). To recall, we start from the solution of SVM.

   

 

       

1

, , ....... 1

M i i i T T

y x f x k x x b x i z x x x M     

          

Rewrite the solution of SVM as a linear combination over M basis functions

In the (binary) classification case, [0;1]. In the regression case, . y y  

A sparse solution has a majority of entries with alpha zero.

1 1 2 3

. . . . .

M M

                                                    

slide-42
SLIDE 42

42

ADVANCED MACHINE LEARNING

42

Comparison -SVR, n-SVR, RVR

Solution with -SVR: RBF kernel , C=3000, =0.08, s=0.05, 37 support vectors

slide-43
SLIDE 43

43

ADVANCED MACHINE LEARNING

43

Comparison -SVR, n-SVR, RVR

Solution with n-SVR: RBF kernel , C=3000, n=0.04, s=0.001, 17 support vectors

slide-44
SLIDE 44

44

ADVANCED MACHINE LEARNING

44

Comparison -SVR, n-SVR, RVR

Solution with RVR: RBF kernel , =0.08, s=0.05, 7 support vectors

slide-45
SLIDE 45

45

ADVANCED MACHINE LEARNING

45

SVR: Examples of Applications

slide-46
SLIDE 46

46 46

ADVANCED MACHINE LEARNING

Catching Object in Flight

Extremely fast computation (object flies in half a second); re- estimation of arm motion to adapt to noisy visual detection of object.

slide-47
SLIDE 47

47 47

ADVANCED MACHINE LEARNING

Learn a model of the translational and rotational motion of the object

Not at the center of mass

Catching Object in Flight

slide-48
SLIDE 48

48 48

ADVANCED MACHINE LEARNING

Gather Demonstrations of free flying object Build model of dynamics using Support Vector Regression Compute derivative (closed form) Use model in Extended Kalman Filter for real-time tracking

 

 

1

,

i

M T T i i i

x k x x x x b 

     

Precision (1cm, 1degree); Computation 0.17-0.32 second ahead of time

Kim, S. and Billard, A. (2012) Estimating the non-linear dynamics of free-flying objects. Robotics and Autonomous Systems, Volume 60, Issue 9, P. 1108–1122..

Catching Object in Flight

slide-49
SLIDE 49

49 49

ADVANCED MACHINE LEARNING

Systematic assessment of sensitivity to choice of hyperparameters and choice of kernel.

C C SVR with polynomial kernel SVR with RBF kernel Kernel Width

slide-50
SLIDE 50

50 50

ADVANCED MACHINE LEARNING Kim, Shukla, Billard, IEEE Transaction on Robotics, 2014IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award 2014 Kim, Shukla, Billard, IEEE Transaction on Robotics, 2015 IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award

slide-51
SLIDE 51

51 51

ADVANCED MACHINE LEARNING Kim, Shukla, Billard, IEEE Transaction on Robotics, 2014IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award 2014 Kim, Shukla, Billard, IEEE Transaction on Robotics, 2015 IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award

slide-52
SLIDE 52

52 52

ADVANCED MACHINE LEARNING

Shukla and Billard, NIPS 2012 - http://asvm.epfl.ch

Learning a Multi-Attractor System

slide-53
SLIDE 53

53 53

ADVANCED MACHINE LEARNING

2 Crossing over

slide-54
SLIDE 54

54 54

ADVANCED MACHINE LEARNING

Build a partition with support Vector Machine (SVM)

Learning a Multi-Attractor System

slide-55
SLIDE 55

55 55

ADVANCED MACHINE LEARNING

Build a partition with support Vector Machine (SVM) Attractors not located at the right place

Learning a Multi-Attractor System

slide-56
SLIDE 56

56 56

ADVANCED MACHINE LEARNING

Extend the SVM optimization framework with new constraints Maximize classification margin All points correctly classified Follow dynamics Stability at attractor

Shukla and Billard, NIPS 2012 - http://asvm.epfl.ch

Learning a Multi-Attractor System

slide-57
SLIDE 57

57 57

ADVANCED MACHINE LEARNING

Standard SVM - SVs

Const. bias New b- SVs Non-linear bias

 

f x

Learning a Multi-Attractor System

slide-58
SLIDE 58

58 58

ADVANCED MACHINE LEARNING

Several possible grasping points

slide-59
SLIDE 59

59 59

ADVANCED MACHINE LEARNING

slide-60
SLIDE 60

60 60

ADVANCED MACHINE LEARNING

Multiple Attractors System

slide-61
SLIDE 61

61 61

ADVANCED MACHINE LEARNING