MACHINE LEARNING Linear and Weighted Regression Support Vector - - PowerPoint PPT Presentation

machine learning
SMART_READER_LITE
LIVE PREVIEW

MACHINE LEARNING Linear and Weighted Regression Support Vector - - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE LEARNING Classification (reminder) Maps N -dimensions input x N to discrete values y E.g.: x = [ Length , Color ]


slide-1
SLIDE 1

APPLIED MACHINE LEARNING

1

MACHINE LEARNING

Linear and Weighted Regression Support Vector Regression

slide-2
SLIDE 2

APPLIED MACHINE LEARNING

2

How to estimate a continuous output value y?

Color Length Bananas Apples

Maps N-dimensions input x ∈ ℝN to discrete values y

x = [Length, Color] “Banana” or “Apple”

Classification (reminder)

E.g.:

slide-3
SLIDE 3

APPLIED MACHINE LEARNING

3 Income: GDP 2003 (log scale) Life satisfaction India Nigeria Cambodia China US Japan Italy 10 000 20 000 30 000 40 000 3 4 5 6 7 Bangladesh

Maps N-dimensions input x ∈ ℝN to continuous values y ∈ ℝ

Continuous value of life satisfaction

Regression: introduction

Income (GDP)

slide-4
SLIDE 4

APPLIED MACHINE LEARNING

4

Maps N-dimensions input x ∈ ℝN to continuous values y ∈ ℝ

Income (GDP) Continuous value of life satisfaction

Regression: introduction

Income: GDP 2003 (log scale) Life satisfaction India Nigeria Cambodia China US Japan Italy 10 000 20 000 30 000 40 000 3 4 5 6 7 Bangladesh

Query point: Russia GDP = 30 000 Estimation of life satisfaction = 6.5

slide-5
SLIDE 5

APPLIED MACHINE LEARNING

5

School of Engineering – Section Microtechnique @ 2004 A.. Billard – Adapted from Blei 99 and Dorr & Montz 2004

Example of Use of Regressive Methods

Predict the number of diplomas that will be awarded in the next ten years across the two EPF  the number of diploma follow a non-linear curve as a function of time.

slide-6
SLIDE 6

APPLIED MACHINE LEARNING

6

Example of Use of Regressive Methods

Predict the velocity of the robot given its position.

x*: target

 

x f x 

slide-7
SLIDE 7

APPLIED MACHINE LEARNING

7

Example of Use of Regressive Methods

slide-8
SLIDE 8

APPLIED MACHINE LEARNING

8

 

; ,

T

y f x w b w x b   

Linear Regression

Linear regression searches a linear mapping between input x and

  • utput y, parametrized by the slope vector w and intercept b.

y x

b

slide-9
SLIDE 9

APPLIED MACHINE LEARNING

9

 

; ,

T

y f x w b w x b   

One can omit the intercept by centering the data:

Linear Regression

Linear regression searches a linear mapping between input x and

  • utput y, parametrized by the slope vector w and intercept b.

 

*

' and ' , , : mean on and ' ' ' with ' Least-square estimate of ' ' ' ' '.

T T T T

y y y x x x x y x y y w x b b b w x y b y w x y w x              

slide-10
SLIDE 10

APPLIED MACHINE LEARNING

10

 

;

T

y f x w w x  

Linear Regression

Linear regression searches a linear mapping between input x and

  • utput y, parametrized by the slope vector w.

y x

slide-11
SLIDE 11

APPLIED MACHINE LEARNING

11

Linear Regression

1 2 1 2

Pair of training points [ ... ] and [ ... ] , .

M M i N i

M X x x x y y y y x y    

Find the optimal parameter w through least-square regression:

 

2 * 1

1 min 2

i i

M T w i

w w x y

       

Finds an analytical solution through partial differentiation:

 

1 *= T T

w X X X Y

ℝ ℝ

slide-12
SLIDE 12

APPLIED MACHINE LEARNING

12 x

All points have equal weight.

Regression through weighted Least Square

 

2 * 1 2 1

1 min , & ... 2

i i i i

M T M w i

w w x y     

          

Weighted Linear Regression

y

 Standard linear regression

ˆ : estimator y

slide-13
SLIDE 13

APPLIED MACHINE LEARNING

13

y ˆ

Points in red have large weights.

Regression through weighted Least Square

 

2 * 1

1 min , 2

i i i i

M T w i

w w x y  

        

Weighted Linear Regression

x y

slide-14
SLIDE 14

APPLIED MACHINE LEARNING

14

y ˆ

Regression through weighted Least Square

Weighted Linear Regression

Points in red have large weights.

x y

 

2 * 1

1 min , 2

i i i i

M T w i

w w x y  

        

slide-15
SLIDE 15

APPLIED MACHINE LEARNING

15

1 2

Assuming a set of weights for all datapoints, we set B a diagonal matrix with entries , ..... ................... Change of variable: and . Minimizing f

i i M T

B Z BX v By                    

 

1

  • r MSE, one gets an estimator for at the query point:

ˆ

T T T T

y y x w x Z Z Z v

 

 

1 * T T

w X X X y

Contrast to the solution for un-weighted linear regression

Weighted Linear Regression

slide-16
SLIDE 16

APPLIED MACHINE LEARNING

16

assumes that a single linear dependency applies everywhere. Not true for data sets with local dependencies.

y x

Limitations of Linear Regression

 

2 * 1

1 min , : constant weights 2

i i i i

M T w i

w w x y  

        

Regression through weighted Least Square

slide-17
SLIDE 17

APPLIED MACHINE LEARNING

17

assumes that a single linear dependency applies everywhere. Not true for data sets with local dependencies.  It would be useful to design a regression method that estimates best the linear dependencies locally.

Limitations of Linear Regression

 

2 * 1

1 min , : constant weights 2

i i i i

M T w i

w w x y  

        

Regression through weighted Least Square

slide-18
SLIDE 18

APPLIED MACHINE LEARNING

18

Locally Weighted Regression

 

 

 

 

 

 

 

,

= , , with , , , .

i i i i i

d x x i x

K d x x K d x x e d x x x x 

    

ˆ y x

X: query point

Estimate is determined through local influence of each group of datapoints

       

1 1

/ : weights function of x

i i j i

M M i j

y x x y x x   

 

 

 

y

slide-19
SLIDE 19

APPLIED MACHINE LEARNING

19

Locally Weighted Regression

 

 

 

 

 

 

 

,

= , , with , , , .

i i i i i

d x x i x

K d x x K d x x e d x x x x 

   Estimate is determined through local influence of each group of datapoints

X: query point

 

ˆ y x

Generates a smooth function y(x)

       

1 1

/ : weights function of x

i i j i

M M i j

y x x y x x   

 

 

 

y

slide-20
SLIDE 20

APPLIED MACHINE LEARNING

20

Locally Weighted Regression

Estimate is determined through local influence of each group of datapoints

       

1 1

/ : weights function of x

i i j i

M M i j

y x x y x x   

 

 

 

Model-free regression! No longer explicit model of the form

T

y w x 

Regression computed at each query point. Depends on training points.

 

 

 

= ,

i

i x

K d x x 

slide-21
SLIDE 21

APPLIED MACHINE LEARNING

21

Locally Weighted Regression

       

1 1

/ : weights function of x

i i j i

M M i j

y x x y x x   

 

 

 

Estimate is determined through local influence of each group of datapoints

 

 

 

 

2 1

ˆ min min , Local cost function at , the query point.

i i

M i

J x y y K d x x x

 

Optimal solution to the local cost function:

 

 

 

= ,

i

i x

K d x x 

slide-22
SLIDE 22

APPLIED MACHINE LEARNING

22

 

 

 

= ,

i

i x

K d x x 

Locally Weighted Regression

Estimate is determined through local influence of each group of datapoints

       

1 1

/ : weights function of x

i i j i

M M i j

y x x y x x   

 

 

 

Which training points? Which kernel?

slide-23
SLIDE 23

APPLIED MACHINE LEARNING

23

Exercise Session Part I

slide-24
SLIDE 24

APPLIED MACHINE LEARNING

24

Data-driven Regression

y x Blue: true function Red: estimated function

Good prediction depends on the choice of datapoints.

slide-25
SLIDE 25

APPLIED MACHINE LEARNING

25

y

Good prediction depends on the choice of datapoints. The more datapoints, the better the fit. Computational costs increase dramatically with number of datapoints

x

Data-driven Regression

Blue: true function Red: estimated function

slide-26
SLIDE 26

APPLIED MACHINE LEARNING

26

y

Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters.

Gaussian Process Regression (GPR) uses all datapoints (model-free) x

Data-driven Regression

Gaussian Process Regression not covered in class! Not examined in the final exam! Blue: true function Red: estimated function

slide-27
SLIDE 27

APPLIED MACHINE LEARNING

27

y

Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters.

Gaussian Process Regression (GPR) uses all datapoints (model-free) Support Vector Regression (SVR) picks a subset of datapoints (support vectors) x

Data-driven Regression

Blue: true function Red: estimated function

slide-28
SLIDE 28

APPLIED MACHINE LEARNING

28

y x

Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters.

Gaussian Process Regression (GPR) uses all datapoints (model-free) Support Vector Regression (SVR) picks a subset of datapoints (support vectors) Gaussian Mixture Regression (GMR) generates a new set of datapoints (centers of Gaussian functions)

Data-driven Regression

Blue: true function Red: estimated function

slide-29
SLIDE 29

APPLIED MACHINE LEARNING

29

y x

Data-driven Regression

Estimate of the noise is important to measure goodness of fit.

slide-30
SLIDE 30

APPLIED MACHINE LEARNING

31

y x Support Vector Regression (SVR) assumes an estimate of the noise model (e-tube) and then compute f directly within a noise-tolerance.

Estimate of the noise is important to measure goodness of fit.

Data-driven Regression

slide-31
SLIDE 31

APPLIED MACHINE LEARNING

32

y x Gaussian Mixture Regression (GMR) builds a local estimate of the noise model through the variance of the system.

Estimate of the noise is important to measure goodness of fit.

Data-driven Regression

slide-32
SLIDE 32

APPLIED MACHINE LEARNING

33

Support Vector Regression

slide-33
SLIDE 33

APPLIED MACHINE LEARNING

34

 

 

1,...

Assume a nonlinear mapping , s.t. . How to estimate to best predict the pair of training points , ?

i i i M

f y f x f x y

How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space 2. Supervised learning – minimizes an error function.  First determine a way to measure error on testing set in the linear case!

Support Vector Regression

slide-34
SLIDE 34

APPLIED MACHINE LEARNING

35

 

Assume a linear mapping , s.t. .

T

f y f x w x b   

Measure the error on prediction b is estimated as in SVR through least-square regression on support vectors; hence we ignore it for the rest of the developments .

Support Vector Regression

  1,...

How to estimate and to best predict the pair of training points , ?

i i i M

w b x y

x

y

 

T

y f x w x b   

slide-35
SLIDE 35

APPLIED MACHINE LEARNING

36

Support Vector Regression

Set an upper bound on the error and consider as correctly classified all points such that ( ) , Penalize only datapoints that are not contained in the -tube. f x y e e e  

x

y

 

T

y f x w x b   

slide-36
SLIDE 36

APPLIED MACHINE LEARNING

37

x

Support Vector Regression

e-margin

The e-margin is a measure of the width of the e-insensitive tube. It is a measure of the precision of the regression. A small ||w|| corresponds to a small slope for f. In the linear case, f is more horizontal. y

slide-37
SLIDE 37

APPLIED MACHINE LEARNING

38

x

Support Vector Regression

e-margin

y A large ||w|| corresponds to a large slope for f. In the linear case, f is more vertical. The flatter the slope of the function f, the larger the emargin.  To maximize the margin, we must minimize the norm of w.

slide-38
SLIDE 38

APPLIED MACHINE LEARNING

41

Support Vector Regression

2

1,...

This can be rephrased as a constraint-based optimization problem

  • f the form:

1 minimize 2 , subject to ,

i i

i i

i M

w w x b y y w x b e e

 

          

Need to penalize points outside the e-insensitive tube.

slide-39
SLIDE 39

APPLIED MACHINE LEARNING

42

Support Vector Regression

Need to penalize points outside the e-insensitive tube.

 

* 2 * 1 * *

Introduce slack variables , , 0 : 1 C minimize + 2 , subject to , 0,

i i

i i M i i i i i i i i i

C w M w x b y y w x b     e  e   

                  

i

* i

slide-40
SLIDE 40

APPLIED MACHINE LEARNING

43

Support Vector Regression

All points outside the e-tube become Support Vectors

i

* i

 

* 2 * 1 * *

Introduce slack variables , , 0 : 1 C minimize + 2 , subject to , 0,

i i

i i M i i i i i i i i i

C w M w x b y y w x b     e  e   

                  

We now have the solution to the linear regression problem. How to generalize this to the nonlinear case?

slide-41
SLIDE 41

APPLIED MACHINE LEARNING

44

Lift x into feature space and then perform linear regression in feature space.

     

 

 

Linear Case: , Non-Linear Case: , y f x w x b x x y f x w x b           w lives in feature space!

 

x x  

Support Vector Regression

slide-42
SLIDE 42

APPLIED MACHINE LEARNING

45

Support Vector Regression

     

2 * 1 * *

In feature space, we obtain the same constrained optimization problem: 1 C minimize + 2 , subject to , 0,

i i

M i i i i i i i i i

w M w x b y y w x b    e   e   

                 

slide-43
SLIDE 43

APPLIED MACHINE LEARNING

46

Support Vector Regression

 

     

 

 

 

2 * * * 1 1 1 * * 1

1 C C L , , *, = + 2 , ,

i i i i i

M M i i i i i i i M i i i M i i i

w b w M M y w x b y w x b          e    e  

   

            

   

Again, we can solve this quadratic problem by introducing sets of Lagrange multipliers and writing the Lagrangian :

Lagrangian = Objective function + l * constraints

slide-44
SLIDE 44

APPLIED MACHINE LEARNING

47

 

     

 

 

 

2 * * * 1 1 1 * * 1

1 C C L , , *, = + 2 , ,

i i i i i

M M i i i i i i i M i i i M i i i

w b w M M y w x b y w x b          e    e  

   

            

   

Support Vector Regression

i

* i

Constraints on points lying on either side of the e-tube

* &

0 for all points that do not satisfy the constraints points outside the -tube

i

i

  e  

i

 

* i

 

slide-45
SLIDE 45

APPLIED MACHINE LEARNING

48

Support Vector Regression

i

 

* i

 

Requiring that the partial derivatives are all zero:

   

* 1

L 0.

i

M i i i

w x w   

     

   

* 1

.

i

M i i i

w x   

  

Linear combination of support vectors

 

* 1

L 0.

i

M i i

b  

    

* 1 1

i

M M i i i

 

 

 

 

Rebalancing the effect of the support vectors on both sides of the e-tube

* &

0 for all points that do not satisfy the constraints points outside the -tube

i

i

  e  

slide-46
SLIDE 46

APPLIED MACHINE LEARNING

49

            

*

* * , 1 * * , 1 1 * * 1

1 , 2 max subject to 0 and , 0,

i i i i i

M i j i j j i j M M i i i i i M i i i i

x x y C M

 

      e        

   

                      

   

Support Vector Regression

And replacing in the primal Lagrangian, we get the Dual optimization problem:

     

, ,

i j i j

k x x x x   

Kernel Trick

slide-47
SLIDE 47

APPLIED MACHINE LEARNING

50

Support Vector Regression

The solution is given by:

 

   

* 1

,

i

M i i i

y f x k x x b  

   

Linear Coefficients (Lagrange multipliers for each constraint). If one uses RBF Kernel, M un-normalized isotropic Gaussians centered on each training datapoint.

 

   

 

* 1

, ,

i

M i i i

y f x w x b x x b    

     

     

, ,

i j i j

k x x x x   

Kernel Trick

slide-48
SLIDE 48

APPLIED MACHINE LEARNING

51

Support Vector Regression

y x

 

   

* 1

,

i

M i i i

y f x k x x b  

   

The solution is given by: Kernel places a Gauss function on each SV

slide-49
SLIDE 49

APPLIED MACHINE LEARNING

52

Support Vector Regression

y x

 

   

* 1

,

i

M i i i

y f x k x x b  

   

The solution is given by: The Lagrange multipliers define the importance of each Gaussian function.

* 1

1.5  

2

2  

4

3  

* 3

1.5  

* 5

1  

6

2.5  

b Converges to b when SV effect vanishes.

1

x

2

x

3

x

4

x

5

x

6

x

Y=f(x)

slide-50
SLIDE 50

APPLIED MACHINE LEARNING

53

         

* 1 * 1 1

SVR gives the following estimate for each pair of datapoints , , , i 1... An estimate of can be computed using the above: 1 ,

j j M j j i i i i M M j j i i i j i

y x y k x x b M b b y k x x M    

  

             

  

Support Vector Regression

slide-51
SLIDE 51

APPLIED MACHINE LEARNING

54

 

2 * 1 * *

1 C minimize + 2 , subject to , 0,

i i

M i i i i i i i i i

w M w x b y y w x b   e  e   

                 

eSVR: Hyperparameters

The solution to SVR we just saw is referred to as eSVR Two Hyperparameters C controls the penalty term on poor fit e determines the minimal required precision

slide-52
SLIDE 52

APPLIED MACHINE LEARNING

55

Effect of the RBF kernel width on the fit. Here fit using C=100, e=0.1, kernel width=0.01.

eSVR: Effect of Hyperparameters

slide-53
SLIDE 53

APPLIED MACHINE LEARNING

56

Effect of the RBF kernel width on the fit. Here fit using C=100, e=0.01, kernel width=0.01

eSVR: Effect of Hyperparameters

 Overfitting

slide-54
SLIDE 54

APPLIED MACHINE LEARNING

57

eSVR: Effect of Hyperparameters

Effect of the RBF kernel width on the fit. Here fit using C=100, e=0.05, kernel width=0.01 Reduction of the effect of the kernel width on the fit by choosing appropriate hyperparameters. .

slide-55
SLIDE 55

APPLIED MACHINE LEARNING

58

eSVR: Effect of Hyperparameters

Mldemos does not display the support vectors if there is more than one point for the same x!

slide-56
SLIDE 56

APPLIED MACHINE LEARNING

59

Summary

Linear regression can be solved through Least-Mean-Square estimation and yields an optimal analytical solution. Weighted regression offers the possibility to perform a local regression and yields also an optimal analytical solution. The estimate is no longer global and is computed around each group of data point! Support Vector Regression: performs regression on a non-linear

  • function. Determines automatically the important points. The

estimate is globally optimal.

slide-57
SLIDE 57

APPLIED MACHINE LEARNING

60

Examples of Applications of SVR Next

slide-58
SLIDE 58

APPLIED MACHINE LEARNING

61

Catching Objects in Flight

Model Object’s Dynamics

Build model of dynamics using Support Vector Regression Compute derivative (closed form) Use model in Extended Kalman Filter for real-time tracking

 

 

1

,

i

M T T i i i

x k x x x x b 

     

slide-59
SLIDE 59

APPLIED MACHINE LEARNING

62

  • Designed for children from 1 year of age
  • Fruit of 3 years of development

Lorenzo Piccardi, Jean-Baptiste Keller, Martin Duvanel, Olivier Barbey, Karim Benmachiche, Dario Poggiali, Dave Bergomi, Basilio Noris

  • 2 cameras, 2 microphones, 1 mirror

96° x 96° field of view, 25Hz / 50Hz, 180g

www.pomelo-technologies.com

Application of SVR: Mapping Eyes to Gaze

slide-60
SLIDE 60

APPLIED MACHINE LEARNING

63

  • B. Noris, J.-B. Keller and A. Billard. A Wearable Gaze Tracking System for Children in Unconstrained Environments. Int. journal of Computer Vision and

Image Understanding, 2011

Application of SVR: Mapping Eyes to Gaze

slide-61
SLIDE 61

APPLIED MACHINE LEARNING

64

96° 96°

Application of SVR: Mapping Eyes to Gaze

slide-62
SLIDE 62

APPLIED MACHINE LEARNING

65

Use Support Vector Regression to learn the mapping from eyes appearance to gaze coordinates.

?

Application of SVR: Mapping Eyes to Gaze

slide-63
SLIDE 63

APPLIED MACHINE LEARNING

66

We normalize the image through high-pass filtering We collect images of the eyes and directions of the gaze.

+ + + + ...

Learn mapping Eye Image  Position in Image through Support Vector Regression (SVR)

Application of SVR: Mapping Eyes to Gaze

slide-64
SLIDE 64

APPLIED MACHINE LEARNING

67

Different elements give different cues

Pupil, Iris and Cornea

?

Wrinkles, Eyelids and Eyelashes Support Vector Regression

Application of SVR: Mapping Eyes to Gaze

slide-65
SLIDE 65

APPLIED MACHINE LEARNING

68

Application of SVR: Mapping Eyes to Gaze

slide-66
SLIDE 66

APPLIED MACHINE LEARNING

69

From

  • bject recognition using

Eye tracking To reconstructing path in Shop

www.pomelo-technologies.com

slide-67
SLIDE 67

APPLIED MACHINE LEARNING

70

Gaze tracking using SVR Object detection using SVM

Monitoring Consumers’ Visual Behavior

slide-68
SLIDE 68

APPLIED MACHINE LEARNING

71

Msc Projects in Industry