sparse kernel machines rvm
play

Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I.


  1. Introduction Regression Model RVM for classification Summary Sparse Kernel Machines - RVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Relevance Vector Machines 1 / 22

  2. Introduction Regression Model RVM for classification Summary Outline Introduction 1 Regression Model 2 RVM for classification 3 Summary 4 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 2 / 22

  3. Introduction Regression Model RVM for classification Summary Introduction We discussed memory based methods earlier Sparse methods are directed at memory based systems with minimum (but representative) training samples Last time we talked about support vector machines A few challenges - ie., multi-class classification What we could be more Bayesian in our formulation? Henrik I. Christensen (RIM@GT) Relevance Vector Machines 3 / 22

  4. Introduction Regression Model RVM for classification Summary Outline Introduction 1 Regression Model 2 RVM for classification 3 Summary 4 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 4 / 22

  5. Introduction Regression Model RVM for classification Summary Regression model We are seen continuous / Bayesian regression models before p ( t | x , w , β ) = N ( t | y ( x ) , β − 1 ) We have the linear model for fusion of data N � w i φ i ( x ) = w T φ ( x ) y ( x ) = i =1 A relevance vector formulation would then be: N � y ( x ) = w i k ( x , x i ) + b i =1 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 5 / 22

  6. Introduction Regression Model RVM for classification Summary The collective model Consider N observation vectors collected in a data matrix X where row i is the data vector x i . The corresponding target vector t ∈ { t 1 , t 2 , ..., t N } the likelihood is then: N � p ( t i | x i , w , β − 1 ) p ( t | X , w , β ) = i =1 If we consider weights to be zero-mean Gaussian we have N � N ( w i | 0 , α − 1 ) p ( w | α ) = i =0 ie we have different uncertainties/precision for each factor Henrik I. Christensen (RIM@GT) Relevance Vector Machines 6 / 22

  7. Introduction Regression Model RVM for classification Summary More shuffling Reorganizing using the results from linear regression we get p ( w | t , X , α, β ) = N ( w | m , Σ ) where β ΣΦ T t = m � T � A + β Φ T Φ = Σ where Φ is the design matrix and A = diag ( α i ). In many cases the design matrix is the same as the GRAM matrix i.e. Φ ij = k ( x i , x j ). Henrik I. Christensen (RIM@GT) Relevance Vector Machines 7 / 22

  8. Introduction Regression Model RVM for classification Summary Estimation of α and β Using maximum likelihood we can derive estimates for α and β . We can integrate out w � p ( t | X , α, β ) = p ( t | X , w , β ) p ( w | α ) d w The log likelihood is then ln p ( t | X , α, β ) = ln N ( t | 0 , C ) − 1 � � N ln(2 π ) + ln | C | + t T Ct = 2 where C = β − 1 I + ΦA − 1 Φ T Henrik I. Christensen (RIM@GT) Relevance Vector Machines 8 / 22

  9. Introduction Regression Model RVM for classification Summary Re-estimation of α and β We can then re-estimate α and β from γ i α new = i m 2 i || t − Φm || 2 ( β new ) − 1 = N − � i γ i where γ i are precision estimates defined by γ i = 1 − α 1 Σ ii the precision will go to zero for some of these - ie. very large uncertainty and the corresponding α values will go to zero. In the sense of an SVM the training data becomes irrelevant. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 9 / 22

  10. Introduction Regression Model RVM for classification Summary Regression for new data Once hyper parameters have been estimated regression can be performed p ( t | x , X , t , α ∗ , β ∗ ) = N ( t | m T φ ( x ) , σ 2 ( x )) where σ 2 ( x ) = ( β ∗ ) − 1 + φ ( x ) T Σ φ ( x ) Henrik I. Christensen (RIM@GT) Relevance Vector Machines 10 / 22

  11. Introduction Regression Model RVM for classification Summary Illustrative example 1 t 0 −1 0 1 x Henrik I. Christensen (RIM@GT) Relevance Vector Machines 11 / 22

  12. Introduction Regression Model RVM for classification Summary Status Relevance vectors are similar in style to support vectors Defined within a Bayesian framework Training requires inversion of an ( N + 1) × ( N + 1) matrix which can be (very) costly In general the resulting set of vectors is much smaller The basis functions should be chosen carefully for the training. Ie. analyze your data to fully understand what is going on. The criteria function is no longer a quadratic optimization problem, and convexity is not guaranteed. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 12 / 22

  13. Introduction Regression Model RVM for classification Summary Analysis of sparsity There is a different way to estimate the parameters that is more efficient. I.e brute force is not always optimal The iterative estimation of α poses a challenge, but does suggest an alternative. Consider a rewrite of the C matrix β − 1 I + � α − 1 j + α − 1 φ j φ T φ i φ T = C j i i j � = i C − i + + α − 1 φ i φ T = i i I.e. we have made the contribution of the i ’th term explicit. Standard linear algebra allow us to rewrite | C − i || 1 − + α − 1 φ T i C − 1 det ( c ) = | C | = − i φ i | i − i − C − 1 i C − 1 − i φ i φ T C − 1 C − 1 − i = i C − 1 α i + φ T − i φ i Henrik I. Christensen (RIM@GT) Relevance Vector Machines 13 / 22

  14. Introduction Regression Model RVM for classification Summary The seperated log likelihood This allow us to rewrite the log likelihood L ( α ) = L ( α − i ) + λ ( α i ) The contribution of alpha is then q 2 λ ( α i ) = 1 � � i ln α i − ln( α i + s i ) + 2 α i + s i Here we have the complete dependency on α i We have used i C − 1 φ T = s i − i φ i i C − 1 φ T q i = − i t s i is known as the sparsity and q i is known as the quality of φ i Henrik I. Christensen (RIM@GT) Relevance Vector Machines 14 / 22

  15. Introduction Regression Model RVM for classification Summary Evaluation for stationary conditions It can be shown (see Bishop pp. 351-352) if q 2 i > s i then there is a stable solution s 2 i α i = q 2 i − s i otherwise α i goes to infinity == irrelevant Henrik I. Christensen (RIM@GT) Relevance Vector Machines 15 / 22

  16. Introduction Regression Model RVM for classification Summary Status There are efficient (non-recursive) ways to evaluate the parameters. The relative complexity is still significant. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 16 / 22

  17. Introduction Regression Model RVM for classification Summary Outline Introduction 1 Regression Model 2 RVM for classification 3 Summary 4 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 17 / 22

  18. Introduction Regression Model RVM for classification Summary Relevance vectors for classification For classification we can apply the same framework Consider the two class problem with binary targets t ∈ { 0 , 1 } then the form is y ( x ) = σ ( w t φ ( x )) where σ ( . ) is the logistic sigmoid function Closed form integration is no longer an option We can use the Laplace approach to estimate the mode and which in turn allow estimation of weights ( α ) and in term re-estimate the mode and then new values for α until convergence. The process is similar to regression (see book) Henrik I. Christensen (RIM@GT) Relevance Vector Machines 18 / 22

  19. Introduction Regression Model RVM for classification Summary Synthetic example 2 0 −2 −2 0 2 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 19 / 22

  20. Introduction Regression Model RVM for classification Summary Outline Introduction 1 Regression Model 2 RVM for classification 3 Summary 4 Henrik I. Christensen (RIM@GT) Relevance Vector Machines 20 / 22

  21. Introduction Regression Model RVM for classification Summary Summary A Bayesian approach to definition of a sparse model The model is more comprehensive / but also with more assumptions Creates sparser model with ’similar’ performance Training can be slow - especially for large data-sets Execution is faster due to a sparser model Selection of basis functions for relevance vectors can pose a challenge. Henrik I. Christensen (RIM@GT) Relevance Vector Machines 21 / 22

  22. Introduction Regression Model RVM for classification Summary Projects Halfway through the course! Covered the basics Next Monday & Wednesday - IROS-09 Next Friday - Update on projects What is your problem? What is your approach? How will you train the system? How will you evaluate performance? Henrik I. Christensen (RIM@GT) Relevance Vector Machines 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend