Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - - PowerPoint PPT Presentation

relevance vector machines
SMART_READER_LITE
LIVE PREVIEW

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - - PowerPoint PPT Presentation

Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines Outline Introduction Relevance Vector Machines Examples Summary


slide-1
SLIDE 1

Outline Introduction Relevance Vector Machines Examples Summary

Relevance Vector Machines

Jukka Lankinen

LUT

February 21, 2011

Jukka Lankinen Relevance Vector Machines

slide-2
SLIDE 2

Outline Introduction Relevance Vector Machines Examples Summary

Introduction Support Vector Machines Relevance Vector Machines Model / Regression Marginal Likelihood Classification Examples Regression Classification Summary Relevance vector machines Exercise

Jukka Lankinen Relevance Vector Machines

slide-3
SLIDE 3

Outline Introduction Relevance Vector Machines Examples Summary Support Vector Machines

Introduction

◮ The relevance vector machine (RVM) is a bayesian sparse

kernel technique for regression and classification

◮ Solves some problems with the support vector machines

(SVM)

◮ Used in detection and classification. Detecting cancer cells,

classificating DNA sequences... etc.

Jukka Lankinen Relevance Vector Machines

slide-4
SLIDE 4

Outline Introduction Relevance Vector Machines Examples Summary Support Vector Machines

Support Vector Machines (SVM)

◮ A non-probabilistic decision machine. Returns point estimate

for regression and binary decision for classification.

◮ Makes decisions based on the function:

y(x; w) = wiK(x, xi) + w0 (1)

◮ where K is the kernel function and w0 is the bias. ◮ Attempts to minimize the error while simultaneously maximize

the margin between the two classes.

Jukka Lankinen Relevance Vector Machines

slide-5
SLIDE 5

Outline Introduction Relevance Vector Machines Examples Summary Support Vector Machines

Support Vector Machines (SVM)

y = 1 y = 0 y = −1 margin y = 1 y = 0 y = −1

Jukka Lankinen Relevance Vector Machines

slide-6
SLIDE 6

Outline Introduction Relevance Vector Machines Examples Summary Support Vector Machines

SVM Problems

◮ The number of required support vectors typically grows

linearly with the size of the training set

◮ Non-probabilistic predictions. ◮ Requires estimation of error/margin trade-off parameters ◮ K(x, xi) must satisfy mercel’s condition.

Jukka Lankinen Relevance Vector Machines

slide-7
SLIDE 7

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

Relevance Vector Machines

◮ Apply bayesian treatment to SVM. ◮ Associates a prior over the model weights governed by a set of

hyperparameters.

◮ Posterior distributions of the majority of weights are peaked

around zero. Training vectors associated with the non-zero weights are the ’relevance vectors’.

◮ Typically utilizes fewer kernel functions than SVM.

Jukka Lankinen Relevance Vector Machines

slide-8
SLIDE 8

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

The model

◮ For given data set of input-target pairs {xn, tn}N n=1

tn = y(xn; w) + ǫn (2)

◮ where ǫn are samples from some noise process which is

assumed to be mean-zero Gaussian with variance σ2. Thus, p(tn|x) = N(tn|y(xn), σ2) (3)

Jukka Lankinen Relevance Vector Machines

slide-9
SLIDE 9

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

The model (cont.)

◮ encode sparsity in the prior.

p(w|α) =

N

  • i=0

N(wi|0, α−1

i

) (4)

◮ which is Gaussian, but conditioned on α. ◮ we must define hyperpriors over all αm to complete the

specification of hierarchical prior: p(wm) =

  • p(wm|αm)p(αm)dαm

(5)

Jukka Lankinen Relevance Vector Machines

slide-10
SLIDE 10

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

Regression

◮ The model has independent Gaussian noise:

tn ∼ N(y(xn; w), σ2)

◮ Corresponding likelihood:

p(t|w, σ2) = (2πσ2)−N/2exp

  • − 1

2σ2 t − Φw2

  • (6)

◮ where t = (tq, ..., tN), w = (wq, ..., wM) and Φ is the NxM

’design’ matrix with Φnm = φm(xn)

Jukka Lankinen Relevance Vector Machines

slide-11
SLIDE 11

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

The model (cont.)

◮ The desired posterior over all unknowns:

p(w, α, σ2|t) = p(t|w, α, σ2)p(w, α, σ2) p(t) (7)

◮ When given a new test point, x∗, predictions are made for the

corresponding target t∗, in terms of predictive distribution: p(t∗|t) =

  • p(t∗|w, α, σ2)p(w, α, σ2|t)dwdαdσ2

(8)

◮ But we have a problem here. We cannot perform these

computations analytically. Approximations are needed.

Jukka Lankinen Relevance Vector Machines

slide-12
SLIDE 12

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

The model (cont.)

◮ We need to decompose the posterior as:

p(w, α, σ2|t) = p(w|t, α, σ2)p(α, σ2|t) (9)

◮ And so, the posterior distribution over the weights is:

p(w|t, α, σ2) = p(t|w, α, σ2)p(w|α) p(t|α, σ2) ∼ N(w|µ, Σ) (10)

◮ where

Σ = (σ−2ΦTΦ + A)−1 (11) µ = σ−2ΣΦTt (12)

Jukka Lankinen Relevance Vector Machines

slide-13
SLIDE 13

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

Marginal Likelihood

◮ Marginal Likelihood can be written as

p(t|α, σ2) =

  • p(t|w, σ2)p(w|α)dw

(13)

◮ Maximizing the marginal likelyhood function is known as the

type-II maximum likelihood method.

◮ We must optimize p(t|α, σ2). There are a few ways to do this.

Jukka Lankinen Relevance Vector Machines

slide-14
SLIDE 14

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

Marginal Likelihood optimization

◮ Maximizes (13) with iterative re-estimation. ◮ Differentiating logp(t|α, σ2) gives iterative re-estimation

approach: αnew

i

= γi µ2

i

(14) (σ2)new = t − Φµ2 N − ΣM

i=1γi

(15)

◮ where we have defined quantities as γi = 1 − αiΣii. γi is a

measure of how ’well-determined’ is the parameter wi

Jukka Lankinen Relevance Vector Machines

slide-15
SLIDE 15

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

RVMs for classification

◮ The likelihood P(t|w) is now Bernoulli:

P(t|w) =

N

  • n=1

g{y(xn; w)}t

n[1 − g{y(xn; w)}]1−tn

(16)

◮ with g(y) = 1/(1 + e−y) the sigmoid function. ◮ No noise variance, same sparse prior as regression. ◮ Unlike regression, The weight posteriors p(w|t, α) cannot be

  • btained analytically. Approximations are once again needed.

Jukka Lankinen Relevance Vector Machines

slide-16
SLIDE 16

Outline Introduction Relevance Vector Machines Examples Summary Model / Regression Marginal Likelihood Classification

Gaussian posterior approximation

◮ Find posterior mode wMP for current values of α by using

  • ptimization

◮ Compute Hessian ◮ Negate and invert to give the covariance for a gaussian

approximation p(w|t, α) ≈ N(wMP, Σ)

◮ α are updated using µ and Σ.

Jukka Lankinen Relevance Vector Machines

slide-17
SLIDE 17

Outline Introduction Relevance Vector Machines Examples Summary Regression Classification

RVM Regression Example

◮ ’sinc’ function: sinc(x) = sin(x)/x ◮ Linear spline kernel: K(xm, xn) =

1 + xmxn + xmxnmin(xm, xn) − xm+xn

2

min(xm, xm)2 + min(xm,xn)3

3 ◮ with ǫ = 0.01, 100 uniform, noise-free samples.

Jukka Lankinen Relevance Vector Machines

slide-18
SLIDE 18

Outline Introduction Relevance Vector Machines Examples Summary Regression Classification

RVM Regression Example

Jukka Lankinen Relevance Vector Machines

slide-19
SLIDE 19

Outline Introduction Relevance Vector Machines Examples Summary Regression Classification

RVM Regression Example

Jukka Lankinen Relevance Vector Machines

slide-20
SLIDE 20

Outline Introduction Relevance Vector Machines Examples Summary Regression Classification

RVM Classification Example

◮ Ripley’s synthetic data ◮ Gaussian kernel: K(xm, xn) = exp(−r−2)xm − xn2 ◮ with r = 0.5

Jukka Lankinen Relevance Vector Machines

slide-21
SLIDE 21

Outline Introduction Relevance Vector Machines Examples Summary Regression Classification

RVM Classification Example

Jukka Lankinen Relevance Vector Machines

slide-22
SLIDE 22

Outline Introduction Relevance Vector Machines Examples Summary Relevance vector machines Exercise

Summary

◮ Sparsity: the prediction of new inputs depend on the kernel

function evaluated at a subset of the training data points.

◮ TODO ◮ More detailed explanation in the original publication: Tipping

M., Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research 1, 2001, pp. 211-244

Jukka Lankinen Relevance Vector Machines

slide-23
SLIDE 23

Outline Introduction Relevance Vector Machines Examples Summary Relevance vector machines Exercise

Exercise

◮ Fetch Tipping’s matlab toolbox for sparse bayes from http:

//www.vectoranomaly.com/downloads/downloads.htm.

◮ Try SparseBayesDemo.m with different likelihood models

(Gaussian, Bernoulli...) and familiarize yourself with the toolbox

◮ Try to replicate results from the regression example.

Jukka Lankinen Relevance Vector Machines