Reduced-Set Models for Improving the Training and Execution Speed of - PowerPoint PPT Presentation

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 0.95 Kernel Methods 0.9 0.85 Speeding Up Training 0.8 0.75 0.7 Speeding Up Testing 0.65 0.6 Research Objectives 0.55 0.5 0.45 RSKPCA: Basics 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 2

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results 0.95 0.9 RSKPCA: Applications 3 0.85 0.8 RSKPCA Applications 0.75 0.7 Gaussian Process Regression 0.65 0.6 Diffusion Maps 0.55 0.5 0.45 RSKPCA Applications: Results 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 3

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Conclusion 5 0.05 2 0 0 Conclusions and Future Work 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Questions 6 4

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Outline Introduction 1 Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives RSKPCA: Basics 2 Reduced-Set Models for the Batch Case RSKPCA Results RSKPCA: Applications 3 RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results GP-MRAC 4 Reduced-Set Models for the Online Case GP-MRAC Results Conclusion 5 Conclusions and Future Work Questions 6 5

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions General Questions Observations Nonparametric methods are powerful because they use all possible data Nonparametric methods are slow because they use all possible data Questions For a given class of nonparametric methods (kernel methods), is all the data necessary? How can we intelligently discard data? How do we prove these procedures are well founded, in a deterministic fashion? 6

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernel Methods Kernel methods (machines) are a class of machine learning algorithms that are used to convert linear algorithms into nonlinear algorithms by the use of a feature map . Linear Learning Algorithms Classification : linear perceptron, linear support vector machine (SVM). Regression : Bayesian linear regression. Dimensionality Reduction : principal components analysis (PCA). 7

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Linear Algorithm Example Linear algorithms limited if data has nonlinear stucture in input space. Example tries to separate data using linear perceptron; no feasible solution in input space. 2 2 0 1.5 −2 1 −4 0.5 −6 0 −8 −10 −0.5 −12 −1 −14 −1.5 −16 −2 −18 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Perceptron solution (in green) 8

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data 9

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data 9

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) 9

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Feature Maps Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H . √ If ( x , y ) �→ ( x 2 , y 2 , 2 xy ), then data linearly separable. 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) Original data (b) Mapped Data (c) Peceptron (green) Question How are feature maps generated? 9

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. 10

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ If k : Ω × Ω → R positive definite symmetric kernel function , then k ( x , y ) = � ψ ( x ) , ψ ( y ) � H (1) ◮ Map obtained from operator K : L 2 (Ω) → L 2 (Ω) � ( K f )( x ) := k ( x , y ) f ( y ) dy (2) Ω 10

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Kernels and Feature Maps Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization ◮ Mercer’s theorem: eigendecomposition of operator ( λ ι , φ ι ) N ι =1 orthonormal basis (ONB) of L 2 (Ω). ◮ Kernel: k ( x , y ) = � N ι =1 λ ι φ ι ( x ) φ ι ( x ), N ∈ { N , ∞} . ◮ Feature map: � � ψ :=( λ 1 φ 1 ( x ) , λ 2 φ 2 ( x ) , . . . ) k ( x , y ) = � ψ ( x ) , ψ ( y ) � H 10

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. 11

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Eigendecomposition in practice Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = { x i } n 1 , integral operator approximated by Gram matrix K ij = k ( x i , x j ). Feature map learned via eigendecomposition K = U Λ U T . (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. Issues If n points, O ( n 3 ) training complexity . If n points, O ( nr ) testing (mapping) complexity . If n points, O ( n 2 ) space complexity . Not scalable in either training or testing. 11

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Matrix Sparsification [GOLUB:1996] Compute K ; if matrix is sparse, use techniques such as Jacobi, Arnoldi, Hebbian etc to get low-rank approximation. Issues Very accurate, but requires kernel matrix, iterative, and unsuitable for dense matrices. Random projections [AILON:2006, LIBERTY:2009] Compute K , project onto lower-dimensional subspace using random matrix, compute ONB using SVD and use to get low-rank approximation. Issues Very accurate approximations in practice, but need to compute kernel matrix and superlinear in n . 12

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Speeding Up Training Sampling-based approach [HAR-PELED:2006, BOUTSIDIS:2009, TALWALKAR:2010] Compute K , sample r -columns using different schemes, compute low-rank approximation. Issues Accurate approximations in practice, but still need to compute kernel matrix. Nystr¨ om Method (CUR Decomposition) [DRINEAS:2005,ZHANG:2009] Sample r samples from the dataset, use to compute low-rank approximation. Approaches include uniform random samping and k -means. Pros Avoids computing the kernel matrix. Tries to approximate eigenfunctions of kernel . Study [TALWALKAR:2010] has shown effectiveness of method on wide variety of real-world learning problems. 12

Reduced-Set Models for Improving the Training and Execution Speed of - PowerPoint PPT Presentation

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1 Introduction RSKPCA: Basics RSKPCA:

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Reduced Basis Collocation Methods for Partial Differential Equations with Random Coefficients

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics

for innovation improving for innovation improving Design Thinking for innovation improving New

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

Perceptrons Barna Saha The Machine Learning Model Training set: A training set consists of a

Automatic Generation of Minimal and Reduced Models for Structured Parametric Dynamical Systems

Scaling law and reduced models for epitaxially strained crystalline films Michael Goldman MPI,

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

Improved Key Recovery Attacks on Reduced-Round AES on Reduced-Round AES with Practical Data and

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Grading Prostate Cancer: Recommendation rating of D Recent Changes and Reduced

Vector Quantizers Quantizers for for Vector Reduced Bit- -Rate Coding Rate Coding Reduced

Bayesian Optimization in Reduced Eigenbases David Gaudrie 1 , Rodolphe Le Riche 2 , Victor Picheny

An Extension of the Divergence Operator for Gaussian Processes Jorge A. Len Departamento de

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Functional Discretization of Space Using Gaussian Processes for Road Intersection Crossing M A T

tr s t ss t

Wafer Fabrication Facility Based on Gaussian Process Regression Zhihong Min Li Li Presenter

Sambuz

Useful Links

Newsletter

Mail Us