 
              Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 0.95 Kernel Methods 0.9 0.85 Speeding Up Training 0.8 0.75 0.7 Speeding Up Testing 0.65 0.6 Research Objectives 0.55 0.5 0.45 RSKPCA: Basics 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 2
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results 0.95 0.9 RSKPCA: Applications 3 0.85 0.8 RSKPCA Applications 0.75 0.7 Gaussian Process Regression 0.65 0.6 Diffusion Maps 0.55 0.5 0.45 RSKPCA Applications: Results 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 3
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Conclusion 5 0.05 2 0 0 Conclusions and Future Work 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Questions 6 4
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Outline Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 5
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions General Questions Observations Nonparametric methods are powerful because they use all possible data Nonparametric methods are slow because they use all possible data Questions For a given class of nonparametric methods (kernel methods), is all the data necessary? How can we intelligently discard data? How do we prove these procedures are well founded, in a deterministic fashion? 6
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernel Methods Kernel methods (machines) are a class of machine learning algorithms that are used to convert linear algorithms into nonlinear algorithms by the use of a feature map . Linear Learning Algorithms Classification : linear perceptron, linear support vector machine (SVM). Regression : Bayesian linear regression. Dimensionality Reduction : principal components analysis (PCA). 7
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Linear Algorithm Example Linear algorithms limited if data has nonlinear stucture in input space. Example tries to separate data using linear perceptron; no feasible solution in input space. 2 2 0 1.5 −2 1 −4 0.5 −6 0 −8 −10 −0.5 −12 −1 −14 −1.5 −16 −2 −18 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Perceptron solution (in green) 8
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data 9
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data 9
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) 9
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) Question How are feature maps generated? 9
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. 10
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ If k : Ω × Ω → R positive definite symmetric kernel function , then k ( x , y ) = � ψ ( x ) , ψ ( y ) � H (1) ◮ Map obtained from operator K : L 2 (Ω) → L 2 (Ω) � ( K f )( x ) := k ( x , y ) f ( y ) dy (2) Ω 10
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ Mercer’s theorem: eigendecomposition of operator ( λ ι , φ ι ) N ι =1 orthonormal basis (ONB) of L 2 (Ω). ◮ Kernel: k ( x , y ) = � N ι =1 λ ι φ ι ( x ) φ ι ( x ), N ∈ { N , ∞} . ◮ Feature map: � � ψ :=( λ 1 φ 1 ( x ) , λ 2 φ 2 ( x ) , . . . ) k ( x , y ) = � ψ ( x ) , ψ ( y ) � H 10
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. 11
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. Issues If n points, O ( n 3 ) training complexity . If n points, O ( nr ) testing (mapping) complexity . If n points, O ( n 2 ) space complexity . Not scalable in either training or testing. 11
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Matrix Sparsification [GOLUB:1996] Compute K ; if matrix is sparse, use techniques such as Jacobi, Arnoldi, Hebbian etc to get low-rank approximation. Issues Very accurate, but requires kernel matrix, iterative, and unsuitable for dense matrices. Random projections [AILON:2006, LIBERTY:2009] Compute K , project onto lower-dimensional subspace using random matrix, compute ONB using SVD and use to get low-rank approximation. Issues Very accurate approximations in practice, but need to compute kernel matrix and superlinear in n . 12
Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Sampling-based approach [HAR-PELED:2006, BOUTSIDIS:2009, TALWALKAR:2010] Compute K , sample r -columns using different schemes, compute low-rank approximation. Issues Accurate approximations in practice, but still need to compute kernel matrix. Nystr¨ om Method (CUR Decomposition) [DRINEAS:2005,ZHANG:2009] Sample r samples from the dataset, use to compute low-rank approximation. Approaches include uniform random samping and k -means. Pros Avoids computing the kernel matrix. Tries to approximate eigenfunctions of kernel . Study [TALWALKAR:2010] has shown effectiveness of method on wide variety of real-world learning problems. 12
Recommend
More recommend