Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

sparse kernel machines rvm
SMART_READER_LITE
LIVE PREVIEW

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - - PowerPoint PPT Presentation

Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I.


slide-1
SLIDE 1

Introduction Regression Model RVM for classification Summary

Sparse Kernel Machines - RVM

Henrik I. Christensen

Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 1 / 22

slide-2
SLIDE 2

Introduction Regression Model RVM for classification Summary

Outline

1

Introduction

2

Regression Model

3

RVM for classification

4

Summary

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 2 / 22

slide-3
SLIDE 3

Introduction Regression Model RVM for classification Summary

Introduction

We discussed memory based methods earlier Sparse methods are directed at memory based systems with minimum (but representative) training samples Last time we talked about support vector machines A few challenges - ie., multi-class classification What we could be more Bayesian in our formulation?

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 3 / 22

slide-4
SLIDE 4

Introduction Regression Model RVM for classification Summary

Outline

1

Introduction

2

Regression Model

3

RVM for classification

4

Summary

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 4 / 22

slide-5
SLIDE 5

Introduction Regression Model RVM for classification Summary

Regression model

We are seen continuous / Bayesian regression models before p(t|x, w, β) = N(t|y(x), β−1) We have the linear model for fusion of data y(x) =

N

  • i=1

wiφi(x) = wTφ(x) A relevance vector formulation would then be: y(x) =

N

  • i=1

wik(x, xi) + b

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 5 / 22

slide-6
SLIDE 6

Introduction Regression Model RVM for classification Summary

The collective model

Consider N observation vectors collected in a data matrix X where row i is the data vector xi. The corresponding target vector t ∈ {t1, t2, ..., tN} the likelihood is then: p(t|X, w, β) =

N

  • i=1

p(ti|xi, w, β−1) If we consider weights to be zero-mean Gaussian we have p(w|α) =

N

  • i=0

N(wi|0, α−1) ie we have different uncertainties/precision for each factor

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 6 / 22

slide-7
SLIDE 7

Introduction Regression Model RVM for classification Summary

More shuffling

Reorganizing using the results from linear regression we get p(w|t, X, α, β) = N(w|m, Σ) where m = βΣΦTt Σ =

  • A + βΦTΦ

T where Φ is the design matrix and A = diag(αi). In many cases the design matrix is the same as the GRAM matrix i.e. Φij = k(xi, xj).

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 7 / 22

slide-8
SLIDE 8

Introduction Regression Model RVM for classification Summary

Estimation of α and β

Using maximum likelihood we can derive estimates for α and β. We can integrate out w p(t|X, α, β) =

  • p(t|X, w, β)p(w|α)dw

The log likelihood is then ln p(t|X, α, β) = ln N(t|0, C) = −1 2

  • N ln(2π) + ln |C| + tTCt
  • where

C = β−1I + ΦA−1ΦT

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 8 / 22

slide-9
SLIDE 9

Introduction Regression Model RVM for classification Summary

Re-estimation of α and β

We can then re-estimate α and β from αnew

i

= γi m2

i

(βnew)−1 = ||t − Φm||2 N −

i γi

where γi are precision estimates defined by γi = 1 − α1Σii the precision will go to zero for some of these - ie. very large uncertainty and the corresponding α values will go to zero. In the sense of an SVM the training data becomes irrelevant.

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 9 / 22

slide-10
SLIDE 10

Introduction Regression Model RVM for classification Summary

Regression for new data

Once hyper parameters have been estimated regression can be performed p(t|x, X, t, α∗, β∗) = N(t|mTφ(x), σ2(x)) where σ2(x) = (β∗)−1 + φ(x)TΣφ(x)

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 10 / 22

slide-11
SLIDE 11

Introduction Regression Model RVM for classification Summary

Illustrative example

x t 1 −1 1

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 11 / 22

slide-12
SLIDE 12

Introduction Regression Model RVM for classification Summary

Status

Relevance vectors are similar in style to support vectors Defined within a Bayesian framework Training requires inversion of an (N + 1) × (N + 1) matrix which can be (very) costly In general the resulting set of vectors is much smaller The basis functions should be chosen carefully for the training. Ie. analyze your data to fully understand what is going on. The criteria function is no longer a quadratic optimization problem, and convexity is not guaranteed.

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 12 / 22

slide-13
SLIDE 13

Introduction Regression Model RVM for classification Summary

Analysis of sparsity

There is a different way to estimate the parameters that is more

  • efficient. I.e brute force is not always optimal

The iterative estimation of α poses a challenge, but does suggest an

  • alternative. Consider a rewrite of the C matrix

C = β−1I +

  • j=i

α−1

j

φjφT

j + α−1 i

φiφT

i

= C−i + +α−1

i

φiφT

i

I.e. we have made the contribution of the i’th term explicit. Standard linear algebra allow us to rewrite det(c) = |C| = |C−i||1 − +α−1

i

φT

i C−1 −i φi|

C−1 = C−1

−i − C−1 −i φiφT i C−1 −i

αi + φT

i C−1 −i φi

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 13 / 22

slide-14
SLIDE 14

Introduction Regression Model RVM for classification Summary

The seperated log likelihood

This allow us to rewrite the log likelihood L(α) = L(α−i) + λ(αi) The contribution of alpha is then λ(αi) = 1 2

  • ln αi − ln(αi + si) +

q2

i

αi + si

  • Here we have the complete dependency on αi

We have used si = φT

i C−1 −i φi

qi = φT

i C−1 −i t

si is known as the sparsity and qi is known as the quality of φi

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 14 / 22

slide-15
SLIDE 15

Introduction Regression Model RVM for classification Summary

Evaluation for stationary conditions

It can be shown (see Bishop pp. 351-352) if q2

i > si then there is a stable solution

αi = s2

i

q2

i − si

  • therwise αi goes to infinity == irrelevant

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 15 / 22

slide-16
SLIDE 16

Introduction Regression Model RVM for classification Summary

Status

There are efficient (non-recursive) ways to evaluate the parameters. The relative complexity is still significant.

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 16 / 22

slide-17
SLIDE 17

Introduction Regression Model RVM for classification Summary

Outline

1

Introduction

2

Regression Model

3

RVM for classification

4

Summary

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 17 / 22

slide-18
SLIDE 18

Introduction Regression Model RVM for classification Summary

Relevance vectors for classification

For classification we can apply the same framework Consider the two class problem with binary targets t ∈ {0, 1} then the form is y(x) = σ(wtφ(x)) where σ(.) is the logistic sigmoid function Closed form integration is no longer an option We can use the Laplace approach to estimate the mode and which in turn allow estimation of weights (α) and in term re-estimate the mode and then new values for α until convergence. The process is similar to regression (see book)

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 18 / 22

slide-19
SLIDE 19

Introduction Regression Model RVM for classification Summary

Synthetic example

−2 2 −2 2 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 19 / 22

slide-20
SLIDE 20

Introduction Regression Model RVM for classification Summary

Outline

1

Introduction

2

Regression Model

3

RVM for classification

4

Summary

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 20 / 22

slide-21
SLIDE 21

Introduction Regression Model RVM for classification Summary

Summary

A Bayesian approach to definition of a sparse model The model is more comprehensive / but also with more assumptions Creates sparser model with ’similar’ performance Training can be slow - especially for large data-sets Execution is faster due to a sparser model Selection of basis functions for relevance vectors can pose a challenge.

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 21 / 22

slide-22
SLIDE 22

Introduction Regression Model RVM for classification Summary

Projects

Halfway through the course! Covered the basics Next Monday & Wednesday - IROS-09 Next Friday - Update on projects

What is your problem? What is your approach? How will you train the system? How will you evaluate performance?

Henrik I. Christensen (RIM@GT) Relevance Vector Machines 22 / 22