Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Relevance Vector Machines 1 / 22

Introduction Regression Model RVM for classification Summary Outline Introduction 1 Regression Model 2 RVM for classification 3 Summary 4 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 2 / 22

Introduction Regression Model RVM for classification Summary Introduction We discussed memory based methods earlier Sparse methods are directed at memory based systems with minimum (but representative) training samples Last time we talked about support vector machines A few challenges - ie., multi-class classification What we could be more Bayesian in our formulation? Henrik I. Christensen (RIM@GT) Relevance Vector Machines 3 / 22

Introduction Regression Model RVM for classification Summary Regression model We are seen continuous / Bayesian regression models before p ( t | x , w , β ) = N ( t | y ( x ) , β − 1 ) We have the linear model for fusion of data N � w i φ i ( x ) = w T φ ( x ) y ( x ) = i =1 A relevance vector formulation would then be: N � y ( x ) = w i k ( x , x i ) + b i =1 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 5 / 22

Introduction Regression Model RVM for classification Summary The collective model Consider N observation vectors collected in a data matrix X where row i is the data vector x i . The corresponding target vector t ∈ { t 1 , t 2 , ..., t N } the likelihood is then: N � p ( t i | x i , w , β − 1 ) p ( t | X , w , β ) = i =1 If we consider weights to be zero-mean Gaussian we have N � N ( w i | 0 , α − 1 ) p ( w | α ) = i =0 ie we have different uncertainties/precision for each factor Henrik I. Christensen (RIM@GT) Relevance Vector Machines 6 / 22

Introduction Regression Model RVM for classification Summary More shuffling Reorganizing using the results from linear regression we get p ( w | t , X , α, β ) = N ( w | m , Σ ) where β ΣΦ T t = m � T � A + β Φ T Φ = Σ where Φ is the design matrix and A = diag ( α i ). In many cases the design matrix is the same as the GRAM matrix i.e. Φ ij = k ( x i , x j ). Henrik I. Christensen (RIM@GT) Relevance Vector Machines 7 / 22

Introduction Regression Model RVM for classification Summary Estimation of α and β Using maximum likelihood we can derive estimates for α and β . We can integrate out w � p ( t | X , α, β ) = p ( t | X , w , β ) p ( w | α ) d w The log likelihood is then ln p ( t | X , α, β ) = ln N ( t | 0 , C ) − 1 � � N ln(2 π ) + ln | C | + t T Ct = 2 where C = β − 1 I + ΦA − 1 Φ T Henrik I. Christensen (RIM@GT) Relevance Vector Machines 8 / 22

Introduction Regression Model RVM for classification Summary Re-estimation of α and β We can then re-estimate α and β from γ i α new = i m 2 i || t − Φm || 2 ( β new ) − 1 = N − � i γ i where γ i are precision estimates defined by γ i = 1 − α 1 Σ ii the precision will go to zero for some of these - ie. very large uncertainty and the corresponding α values will go to zero. In the sense of an SVM the training data becomes irrelevant. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 9 / 22

Introduction Regression Model RVM for classification Summary Regression for new data Once hyper parameters have been estimated regression can be performed p ( t | x , X , t , α ∗ , β ∗ ) = N ( t | m T φ ( x ) , σ 2 ( x )) where σ 2 ( x ) = ( β ∗ ) − 1 + φ ( x ) T Σ φ ( x ) Henrik I. Christensen (RIM@GT) Relevance Vector Machines 10 / 22

Introduction Regression Model RVM for classification Summary Illustrative example 1 t 0 −1 0 1 x Henrik I. Christensen (RIM@GT) Relevance Vector Machines 11 / 22

Introduction Regression Model RVM for classification Summary Status Relevance vectors are similar in style to support vectors Defined within a Bayesian framework Training requires inversion of an ( N + 1) × ( N + 1) matrix which can be (very) costly In general the resulting set of vectors is much smaller The basis functions should be chosen carefully for the training. Ie. analyze your data to fully understand what is going on. The criteria function is no longer a quadratic optimization problem, and convexity is not guaranteed. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 12 / 22

Introduction Regression Model RVM for classification Summary Analysis of sparsity There is a different way to estimate the parameters that is more efficient. I.e brute force is not always optimal The iterative estimation of α poses a challenge, but does suggest an alternative. Consider a rewrite of the C matrix β − 1 I + � α − 1 j + α − 1 φ j φ T φ i φ T = C j i i j � = i C − i + + α − 1 φ i φ T = i i I.e. we have made the contribution of the i ’th term explicit. Standard linear algebra allow us to rewrite | C − i || 1 − + α − 1 φ T i C − 1 det ( c ) = | C | = − i φ i | i − i − C − 1 i C − 1 − i φ i φ T C − 1 C − 1 − i = i C − 1 α i + φ T − i φ i Henrik I. Christensen (RIM@GT) Relevance Vector Machines 13 / 22

Introduction Regression Model RVM for classification Summary The seperated log likelihood This allow us to rewrite the log likelihood L ( α ) = L ( α − i ) + λ ( α i ) The contribution of alpha is then q 2 λ ( α i ) = 1 � � i ln α i − ln( α i + s i ) + 2 α i + s i Here we have the complete dependency on α i We have used i C − 1 φ T = s i − i φ i i C − 1 φ T q i = − i t s i is known as the sparsity and q i is known as the quality of φ i Henrik I. Christensen (RIM@GT) Relevance Vector Machines 14 / 22

Introduction Regression Model RVM for classification Summary Evaluation for stationary conditions It can be shown (see Bishop pp. 351-352) if q 2 i > s i then there is a stable solution s 2 i α i = q 2 i − s i otherwise α i goes to infinity == irrelevant Henrik I. Christensen (RIM@GT) Relevance Vector Machines 15 / 22

Introduction Regression Model RVM for classification Summary Status There are efficient (non-recursive) ways to evaluate the parameters. The relative complexity is still significant. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 16 / 22

Introduction Regression Model RVM for classification Summary Relevance vectors for classification For classification we can apply the same framework Consider the two class problem with binary targets t ∈ { 0 , 1 } then the form is y ( x ) = σ ( w t φ ( x )) where σ ( . ) is the logistic sigmoid function Closed form integration is no longer an option We can use the Laplace approach to estimate the mode and which in turn allow estimation of weights ( α ) and in term re-estimate the mode and then new values for α until convergence. The process is similar to regression (see book) Henrik I. Christensen (RIM@GT) Relevance Vector Machines 18 / 22

Introduction Regression Model RVM for classification Summary Synthetic example 2 0 −2 −2 0 2 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 19 / 22

Introduction Regression Model RVM for classification Summary Summary A Bayesian approach to definition of a sparse model The model is more comprehensive / but also with more assumptions Creates sparser model with ’similar’ performance Training can be slow - especially for large data-sets Execution is faster due to a sparser model Selection of basis functions for relevance vectors can pose a challenge. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 21 / 22

Introduction Regression Model RVM for classification Summary Projects Halfway through the course! Covered the basics Next Monday & Wednesday - IROS-09 Next Friday - Update on projects What is your problem? What is your approach? How will you train the system? How will you evaluate performance? Henrik I. Christensen (RIM@GT) Relevance Vector Machines 22 / 22

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I.

Jewelry Box JewelryBox Install RVM JewelryBox Install RVM Google rvm Click first result

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Running vacuum model in non-flat universe Yan-Ting Hsu National Tsing Hua University NCTS Dark

Introduction Computer Science & Engineering 423/823 I Similar to SSSP, but find shortest paths

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Making Sense of Jesus Miracles The Importance of Context Gods Omnipotence Can God make a

Matrix-Multiply FSMD Start Din WEn WEn Cnt=0 WAddr WAddr Start=0 S1 Sum=0 RegFile

Search for systems of linked symmetric 2 (36 , 15 , 6) designs Matan Ziv-Av Ben-Gurion

Difference sets and Hadamard matrices Padraig Cathin University of Queensland 5 November

Constructing Low-latency Involutory MDS Matrices with Lightweight Circuits Shun Li , Siwei Sun,

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I.

Jewelry Box JewelryBox Install RVM JewelryBox Install RVM Google rvm Click first result

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Running vacuum model in non-flat universe Yan-Ting Hsu National Tsing Hua University NCTS Dark

Introduction Computer Science &amp; Engineering 423/823 I Similar to SSSP, but find shortest paths

Median Matrix Completion: from Embarrassment to Optimality Xiaojun Mao School of Data Science

PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan

Making Sense of Jesus Miracles The Importance of Context Gods Omnipotence Can God make a

Matrix-Multiply FSMD Start Din WEn WEn Cnt=0 WAddr WAddr Start=0 S1 Sum=0 RegFile

Search for systems of linked symmetric 2 (36 , 15 , 6) designs Matan Ziv-Av Ben-Gurion

Difference sets and Hadamard matrices Padraig Cathin University of Queensland 5 November

Constructing Low-latency Involutory MDS Matrices with Lightweight Circuits Shun Li , Siwei Sun,

Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT

Introduction Computer Science & Engineering 423/823 I Similar to SSSP, but find shortest paths