Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - PowerPoint PPT Presentation

Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance Vector Machines Examples Summary Introduction Support Vector Machines Relevance Vector Machines Model / Regression Marginal Likelihood Classification Examples Regression Classification Summary Relevance vector machines Exercise Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Introduction ◮ The relevance vector machine (RVM) is a bayesian sparse kernel technique for regression and classification ◮ Solves some problems with the support vector machines (SVM) ◮ Used in detection and classification. Detecting cancer cells, classificating DNA sequences... etc. Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Support Vector Machines (SVM) ◮ A non-probabilistic decision machine. Returns point estimate for regression and binary decision for classification. ◮ Makes decisions based on the function: y ( x ; w ) = w i K ( x , x i ) + w 0 (1) ◮ where K is the kernel function and w 0 is the bias. ◮ Attempts to minimize the error while simultaneously maximize the margin between the two classes. Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Support Vector Machines (SVM) y = − 1 y = 1 y = 0 y = 0 y = − 1 y = 1 margin Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary SVM Problems ◮ The number of required support vectors typically grows linearly with the size of the training set ◮ Non-probabilistic predictions. ◮ Requires estimation of error/margin trade-off parameters ◮ K ( x , x i ) must satisfy mercel’s condition. Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Relevance Vector Machines ◮ Apply bayesian treatment to SVM. ◮ Associates a prior over the model weights governed by a set of hyperparameters. ◮ Posterior distributions of the majority of weights are peaked around zero. Training vectors associated with the non-zero weights are the ’relevance vectors’. ◮ Typically utilizes fewer kernel functions than SVM. Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model ◮ For given data set of input-target pairs { x n , t n } N n =1 t n = y ( x n ; w ) + ǫ n (2) ◮ where ǫ n are samples from some noise process which is assumed to be mean-zero Gaussian with variance σ 2 . Thus, p ( t n | x ) = N ( t n | y ( x n ) , σ 2 ) (3) Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ encode sparsity in the prior. N � N ( w i | 0 , α − 1 p ( w | α ) = ) (4) i i =0 ◮ which is Gaussian, but conditioned on α . ◮ we must define hyperpriors over all α m to complete the specification of hierarchical prior: � p ( w m | α m ) p ( α m ) d α m p ( w m ) = (5) Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Regression ◮ The model has independent Gaussian noise: t n ∼ N ( y ( x n ; w ) , σ 2 ) ◮ Corresponding likelihood: � − 1 � p ( t | w , σ 2 ) = (2 πσ 2 ) − N / 2 exp 2 σ 2 � t − Φ w � 2 (6) ◮ where t = ( t q , ..., t N ), w = ( w q , ..., w M ) and Φ is the NxM ’design’ matrix with Φ n m = φ m ( x n ) Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ The desired posterior over all unknowns: p ( w , α, σ 2 | t ) = p ( t | w , α, σ 2 ) p ( w , α, σ 2 ) (7) p ( t ) ◮ When given a new test point, x ∗ , predictions are made for the corresponding target t ∗ , in terms of predictive distribution: � p ( t ∗ | w , α, σ 2 ) p ( w , α, σ 2 | t ) dwd α d σ 2 p ( t ∗ | t ) = (8) ◮ But we have a problem here. We cannot perform these computations analytically. Approximations are needed. Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ We need to decompose the posterior as: p ( w , α, σ 2 | t ) = p ( w | t , α, σ 2 ) p ( α, σ 2 | t ) (9) ◮ And so, the posterior distribution over the weights is: p ( w | t , α, σ 2 ) = p ( t | w , α, σ 2 ) p ( w | α ) ∼ N ( w | µ, Σ) (10) p ( t | α, σ 2 ) ◮ where Σ = ( σ − 2 Φ T Φ + A ) − 1 (11) µ = σ − 2 ΣΦ T t (12) Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Marginal Likelihood ◮ Marginal Likelihood can be written as � p ( t | α, σ 2 ) = p ( t | w , σ 2 ) p ( w | α ) dw (13) ◮ Maximizing the marginal likelyhood function is known as the type-II maximum likelihood method. ◮ We must optimize p ( t | α, σ 2 ). There are a few ways to do this. Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Marginal Likelihood optimization ◮ Maximizes (13) with iterative re-estimation. ◮ Differentiating logp ( t | α, σ 2 ) gives iterative re-estimation approach: = γ i α new (14) i µ 2 i ( σ 2 ) new = � t − Φ µ � 2 (15) N − Σ M i =1 γ i ◮ where we have defined quantities as γ i = 1 − α i Σ ii . γ i is a measure of how ’well-determined’ is the parameter w i Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary RVMs for classification ◮ The likelihood P ( t | w ) is now Bernoulli: N � n [1 − g { y ( x n ; w ) } ] 1 − t n g { y ( x n ; w ) } t P ( t | w ) = (16) n =1 ◮ with g ( y ) = 1 / (1 + e − y ) the sigmoid function. ◮ No noise variance, same sparse prior as regression. ◮ Unlike regression, The weight posteriors p ( w | t , α ) cannot be obtained analytically. Approximations are once again needed. Jukka Lankinen Relevance Vector Machines

Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Gaussian posterior approximation ◮ Find posterior mode w M P for current values of α by using optimization ◮ Compute Hessian ◮ Negate and invert to give the covariance for a gaussian approximation p ( w | t , α ) ≈ N ( w M P , Σ) ◮ α are updated using µ and Σ. Jukka Lankinen Relevance Vector Machines

Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Regression Example ◮ ’sinc’ function: sinc ( x ) = sin ( x ) / x ◮ Linear spline kernel: K ( x m , x n ) = min ( x m , x m ) 2 + min ( x m , x n ) 3 1 + x m x n + x m x n min ( x m , x n ) − x m + x n 2 3 ◮ with ǫ = 0 . 01, 100 uniform, noise-free samples. Jukka Lankinen Relevance Vector Machines

Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Regression Example Jukka Lankinen Relevance Vector Machines

Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Classification Example ◮ Ripley’s synthetic data ◮ Gaussian kernel: K ( x m , x n ) = exp ( − r − 2 ) � x m − x n � 2 ◮ with r = 0 . 5 Jukka Lankinen Relevance Vector Machines

Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Classification Example Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance vector machines Relevance Vector Machines Exercise Examples Summary Summary ◮ Sparsity: the prediction of new inputs depend on the kernel function evaluated at a subset of the training data points. ◮ TODO ◮ More detailed explanation in the original publication: Tipping M., Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research 1, 2001, pp. 211-244 Jukka Lankinen Relevance Vector Machines

Outline Introduction Relevance vector machines Relevance Vector Machines Exercise Examples Summary Exercise ◮ Fetch Tipping’s matlab toolbox for sparse bayes from http: //www.vectoranomaly.com/downloads/downloads.htm . ◮ Try SparseBayesDemo.m with different likelihood models (Gaussian, Bernoulli...) and familiarize yourself with the toolbox ◮ Try to replicate results from the regression example. Jukka Lankinen Relevance Vector Machines

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - PowerPoint PPT Presentation

Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines Outline Introduction Relevance Vector Machines Examples Summary

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Ipopt Tutorial Andreas W achter IBM T.J. Watson Research Center andreasw@watson.ibm.com

Comparison of Local Feature Descriptors Subhransu Maji Department of EECS, University of

Simon Pabst (Double Negative VFX) Talk Overview 1. The need in production (Jeff) 2. The

An approach to limit states in An approach to limit states in advanced materials advanced

Download the notebook for this section from the CS109 repo or here: http://bit.ly/109_S6 1

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

MARKOV MODELING AND TRAFFIC FLOW MODELING FILTERS APPLIED IN EXISTING SIGNALING OF CELLULAR

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - PowerPoint PPT Presentation

Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines Outline Introduction Relevance Vector Machines Examples Summary

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Ipopt Tutorial Andreas W achter IBM T.J. Watson Research Center andreasw@watson.ibm.com

Comparison of Local Feature Descriptors Subhransu Maji Department of EECS, University of

Simon Pabst (Double Negative VFX) Talk Overview 1. The need in production (Jeff) 2. The

An approach to limit states in An approach to limit states in advanced materials advanced

Download the notebook for this section from the CS109 repo or here: http://bit.ly/109_S6 1

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

MARKOV MODELING AND TRAFFIC FLOW MODELING FILTERS APPLIED IN EXISTING SIGNALING OF CELLULAR

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David