reduced set models for improving the training and
play

Reduced-Set Models for Improving the Training and Execution Speed of - PowerPoint PPT Presentation

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1 Introduction RSKPCA: Basics RSKPCA:


  1. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1

  2. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 0.95 Kernel Methods 0.9 0.85 Speeding Up Training 0.8 0.75 0.7 Speeding Up Testing 0.65 0.6 Research Objectives 0.55 0.5 0.45 RSKPCA: Basics 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 2

  3. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results 0.95 0.9 RSKPCA: Applications 3 0.85 0.8 RSKPCA Applications 0.75 0.7 Gaussian Process Regression 0.65 0.6 Diffusion Maps 0.55 0.5 0.45 RSKPCA Applications: Results 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 3

  4. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Conclusion 5 0.05 2 0 0 Conclusions and Future Work 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Questions 6 4

  5. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Outline Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 5

  6. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions General Questions Observations Nonparametric methods are powerful because they use all possible data Nonparametric methods are slow because they use all possible data Questions For a given class of nonparametric methods (kernel methods), is all the data necessary? How can we intelligently discard data? How do we prove these procedures are well founded, in a deterministic fashion? 6

  7. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernel Methods Kernel methods (machines) are a class of machine learning algorithms that are used to convert linear algorithms into nonlinear algorithms by the use of a feature map . Linear Learning Algorithms Classification : linear perceptron, linear support vector machine (SVM). Regression : Bayesian linear regression. Dimensionality Reduction : principal components analysis (PCA). 7

  8. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Linear Algorithm Example Linear algorithms limited if data has nonlinear stucture in input space. Example tries to separate data using linear perceptron; no feasible solution in input space. 2 2 0 1.5 −2 1 −4 0.5 −6 0 −8 −10 −0.5 −12 −1 −14 −1.5 −16 −2 −18 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Perceptron solution (in green) 8

  9. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data 9

  10. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data 9

  11. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) 9

  12. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) Question How are feature maps generated? 9

  13. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. 10

  14. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ If k : Ω × Ω → R positive definite symmetric kernel function , then k ( x , y ) = � ψ ( x ) , ψ ( y ) � H (1) ◮ Map obtained from operator K : L 2 (Ω) → L 2 (Ω) � ( K f )( x ) := k ( x , y ) f ( y ) dy (2) Ω 10

  15. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ Mercer’s theorem: eigendecomposition of operator ( λ ι , φ ι ) N ι =1 orthonormal basis (ONB) of L 2 (Ω). ◮ Kernel: k ( x , y ) = � N ι =1 λ ι φ ι ( x ) φ ι ( x ), N ∈ { N , ∞} . ◮ Feature map: � � ψ :=( λ 1 φ 1 ( x ) , λ 2 φ 2 ( x ) , . . . ) k ( x , y ) = � ψ ( x ) , ψ ( y ) � H 10

  16. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. 11

  17. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. Issues If n points, O ( n 3 ) training complexity . If n points, O ( nr ) testing (mapping) complexity . If n points, O ( n 2 ) space complexity . Not scalable in either training or testing. 11

  18. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Matrix Sparsification [GOLUB:1996] Compute K ; if matrix is sparse, use techniques such as Jacobi, Arnoldi, Hebbian etc to get low-rank approximation. Issues Very accurate, but requires kernel matrix, iterative, and unsuitable for dense matrices. Random projections [AILON:2006, LIBERTY:2009] Compute K , project onto lower-dimensional subspace using random matrix, compute ONB using SVD and use to get low-rank approximation. Issues Very accurate approximations in practice, but need to compute kernel matrix and superlinear in n . 12

  19. Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Sampling-based approach [HAR-PELED:2006, BOUTSIDIS:2009, TALWALKAR:2010] Compute K , sample r -columns using different schemes, compute low-rank approximation. Issues Accurate approximations in practice, but still need to compute kernel matrix. Nystr¨ om Method (CUR Decomposition) [DRINEAS:2005,ZHANG:2009] Sample r samples from the dataset, use to compute low-rank approximation. Approaches include uniform random samping and k -means. Pros Avoids computing the kernel matrix. Tries to approximate eigenfunctions of kernel . Study [TALWALKAR:2010] has shown effectiveness of method on wide variety of real-world learning problems. 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend