Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg - PowerPoint PPT Presentation

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer ∗ Mike McCourt Roberto Cavoretto Department of Applied Mathematics Illinois Institute of Technology Partially supported by NSF Grant DMS–1115392 MAIA 2013 Multivariate Approximation and Interpolation with Applications Erice, Sicily Sept.25, 2013 Greg Fasshauer Hilbert-Schmidt SVD 1

Outline Fundamental Problem 1 Hilbert-Schmidt SVD and General RBF-QR Algorithm 2 Implementation for Compact Matérn Kernels 3 Application 1: Basic Function Approximation 4 Application 2: Optimal Shape Parameters via MLE 5 Summary 6 Greg Fasshauer Hilbert-Schmidt SVD 2

Fundamental Problem Kernel-based Interpolation Given data ( x i , y i ) N i = 1 , use a data-dependent linear function space N � x ∈ Ω ⊆ R d s ( x ) = c j K ( x , x j ) , j = 1 with K : Ω × Ω → R a positive definite reproducing kernel. Greg Fasshauer Hilbert-Schmidt SVD 3

Fundamental Problem Kernel-based Interpolation Given data ( x i , y i ) N i = 1 , use a data-dependent linear function space N � x ∈ Ω ⊆ R d s ( x ) = c j K ( x , x j ) , j = 1 with K : Ω × Ω → R a positive definite reproducing kernel. To find c j solve the interpolation equations s ( x i ) = y i , i = 1 , . . . , N , which leads to a linear system K c = y with symmetric positive definite – often ill-conditioned – system matrix K ij = K ( x i , x j ) , i , j = 1 , . . . , N . Greg Fasshauer Hilbert-Schmidt SVD 3

Fundamental Problem Common Complaints About Kernels Kernel methods suffer from numerical instability, the presence of free parameter(s), high computational cost. Greg Fasshauer Hilbert-Schmidt SVD 4

Fundamental Problem Common Complaints About Kernels Kernel methods suffer from numerical instability, the presence of free parameter(s), high computational cost. In this talk we will address the first two issues: We obtain stable methods by working with a “better” basis which leads to a Hilbert-Schmidt SVD of the matrix K. Free parameters can be “optimally” chosen by using statistical methods such as MLE, which are significantly enhanced by using the HS-SVD. Greg Fasshauer Hilbert-Schmidt SVD 4

Hilbert-Schmidt SVD and General RBF-QR Algorithm Hilbert-Schmidt Theory We assume that we know a Hilbert-Schmidt expansion (or Mercer series expansion) of our kernel K : ∞ � K ( x , z ) = λ n ϕ n ( x ) ϕ n ( z ) , n = 1 where ( λ n , ϕ n ) are orthonormal eigenpairs of a Hilbert-Schmidt integral operator T K : L 2 (Ω , ρ ) → L 2 (Ω , ρ ) defined as � ( T K f )( x ) = K ( x , z ) f ( z ) ρ ( z ) d z , Ω where Ω ⊂ R d and � K � L 2 (Ω × Ω ,ρ × ρ ) < ∞ . Greg Fasshauer Hilbert-Schmidt SVD 5

Hilbert-Schmidt SVD and General RBF-QR Algorithm Gaussian Eigenfunctions [Rasmussen/Williams (2006), F ./McCourt (2012)] ∞ � e − ε 2 ( x − z ) 2 = λ n ϕ n ( x ) ϕ n ( z ) n = 0 Greg Fasshauer Hilbert-Schmidt SVD 6

Hilbert-Schmidt SVD and General RBF-QR Algorithm Gaussian Eigenfunctions [Rasmussen/Williams (2006), F ./McCourt (2012)] ∞ � e − ε 2 ( x − z ) 2 = λ n ϕ n ( x ) ϕ n ( z ) n = 0 where � � � n α 2 ε 2 ϕ n ( x ) = γ n e − δ 2 x 2 H n ( αβ x ) λ n = , α 2 + δ 2 + ε 2 α 2 + δ 2 + ε 2 with H n Hermite polynomials, � � 2 � 1 � � 2 ε � � 4 δ 2 = α 2 β β 2 − 1 β = 1 + , γ n = 2 n Γ( n + 1 ) , 2 α and { ϕ n } ∞ n = 0 ( ρ -weighted) L 2 -orthonormal, i.e., � ∞ ρ ( x ) = α √ π e − α 2 x 2 ϕ m ( x ) ϕ n ( x ) ρ ( x ) d x = δ mn , −∞ Greg Fasshauer Hilbert-Schmidt SVD 6

Hilbert-Schmidt SVD and General RBF-QR Algorithm Multivariate Eigenfunction Expansion Use tensor product form of the Gaussian kernel d d � ε 2 ( x ℓ − z ℓ ) 2 − � K ( x , z ) = e − ε 2 � x − z � 2 e − ε 2 ( x ℓ − z ℓ ) 2 2 = e = ℓ = 1 ℓ = 1 x = ( x 1 , . . . , x d ) ∈ R d , Greg Fasshauer Hilbert-Schmidt SVD 7

Hilbert-Schmidt SVD and General RBF-QR Algorithm Multivariate Eigenfunction Expansion Use tensor product form of the Gaussian kernel d d � ε 2 ℓ ( x ℓ − z ℓ ) 2 − � K ( x , z ) = e − ε 2 � x − z � 2 e − ε 2 ℓ ( x ℓ − z ℓ ) 2 2 = e = ℓ = 1 ℓ = 1 x = ( x 1 , . . . , x d ) ∈ R d , Greg Fasshauer Hilbert-Schmidt SVD 7

Hilbert-Schmidt SVD and General RBF-QR Algorithm Multivariate Eigenfunction Expansion Use tensor product form of the Gaussian kernel d d � ε 2 ℓ ( x ℓ − z ℓ ) 2 − � K ( x , z ) = e − ε 2 � x − z � 2 e − ε 2 ℓ ( x ℓ − z ℓ ) 2 2 = e = ℓ = 1 ℓ = 1 � x = ( x 1 , . . . , x d ) ∈ R d , = λ n ϕ n ( x ) ϕ n ( z ) , n ∈ N d where d d � � λ n = ϕ n ( x ) = ϕ n ℓ ( x ℓ ) . λ n ℓ , ℓ = 1 ℓ = 1 Different shape parameters ε ℓ for different space dimensions allowed (i.e., K may be anisotropic). Greg Fasshauer Hilbert-Schmidt SVD 7

Hilbert-Schmidt SVD and General RBF-QR Algorithm Fundamental idea: use the eigen-expansion of the kernel K to rewrite the matrix K from the interpolation problem as   K ( x 1 , x 1 ) . . . K ( x 1 , x N )   . . K = . .   . . K ( x N , x 1 ) . . . K ( x N , x N )  λ 1   ϕ 1 ( x 1 ) . . . ϕ 1 ( x N )  ϕ 1 ( x 1 ) . . . ϕ M ( x 1 ) . . .   . . ...    . .  . . . .     = . .       . . λ M ϕ M ( x 1 ) . . . ϕ M ( x N )           ϕ 1 ( x N ) . . . ϕ M ( x N ) . . . . .  ...    . . . . Greg Fasshauer Hilbert-Schmidt SVD 8

Hilbert-Schmidt SVD and General RBF-QR Algorithm But we can’t compute with infinite matrices, so we choose a truncation value M (supported by λ n → 0 as n → ∞ , more later) and rewrite   K ( x 1 , x 1 ) . . . K ( x 1 , x N )   . . K = . .   . . K ( x N , x 1 ) . . . K ( x N , x N )       λ 1 ϕ 1 ( x 1 ) . . . ϕ M ( x 1 ) ϕ 1 ( x 1 ) . . . ϕ 1 ( x N )   ...  . .     . .  . . . . =       . . . .   λ M ϕ 1 ( x N ) ϕ M ( x N ) ϕ M ( x 1 ) ϕ M ( x N ) . . . . . . � �� = Φ = Φ T = Λ Greg Fasshauer Hilbert-Schmidt SVD 8

Hilbert-Schmidt SVD and General RBF-QR Algorithm But we can’t compute with infinite matrices, so we choose a truncation value M (supported by λ n → 0 as n → ∞ , more later) and rewrite   K ( x 1 , x 1 ) . . . K ( x 1 , x N )   . . K = . .   . . K ( x N , x 1 ) . . . K ( x N , x N )       λ 1 ϕ 1 ( x 1 ) . . . ϕ M ( x 1 ) ϕ 1 ( x 1 ) . . . ϕ 1 ( x N )   ...  . .     . .  . . . . =       . . . .   λ M ϕ 1 ( x N ) ϕ M ( x N ) ϕ M ( x 1 ) ϕ M ( x N ) . . . . . . � �� = Φ = Φ T = Λ Since ∞ M � � K ( x i , x j ) = λ n ϕ n ( x i ) ϕ n ( x j ) ≈ λ n ϕ n ( x i ) ϕ n ( x j ) n = 1 n = 1 accurate reconstruction of all entries of K will likely require M > N . Greg Fasshauer Hilbert-Schmidt SVD 8

Hilbert-Schmidt SVD and General RBF-QR Algorithm The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. Greg Fasshauer Hilbert-Schmidt SVD 9

Hilbert-Schmidt SVD and General RBF-QR Algorithm The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦ T provides an accurate (elementwise) approximation of K without ever forming it. Greg Fasshauer Hilbert-Schmidt SVD 9

Hilbert-Schmidt SVD and General RBF-QR Algorithm The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦ T provides an accurate (elementwise) approximation of K without ever forming it. However, it is not recommended to directly use this decomposition either since all of the ill-conditioning associated with K is still present – sitting in the matrix Λ . Greg Fasshauer Hilbert-Schmidt SVD 9

Hilbert-Schmidt SVD and General RBF-QR Algorithm The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦ T provides an accurate (elementwise) approximation of K without ever forming it. However, it is not recommended to directly use this decomposition either since all of the ill-conditioning associated with K is still present – sitting in the matrix Λ . We now use mostly standard numerical linear algebra to isolate some of this ill-conditioning and develop the Hilbert-Schmidt SVD and a general RBF-QR algorithm. Greg Fasshauer Hilbert-Schmidt SVD 9

Hilbert-Schmidt SVD and General RBF-QR Algorithm Details of the Hilbert-Schmidt SVD Assume M > N , so that Φ is “short and fat” and partition Φ :     ϕ 1 ( x 1 ) . . . ϕ N ( x 1 ) ϕ N + 1 ( x 1 ) . . . ϕ M ( x 1 )   Φ 1 Φ 2   . . . .   . . . . ��  =   . . . . .  N × N N × ( M − N ) ϕ 1 ( x N ) . . . ϕ N ( x N ) ϕ N + 1 ( x N ) . . . ϕ M ( x N ) Greg Fasshauer Hilbert-Schmidt SVD 10

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg - PowerPoint PPT Presentation

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer Mike McCourt Roberto Cavoretto Department of Applied Mathematics Illinois Institute of Technology Partially supported by NSF Grant DMS1115392 MAIA 2013

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

A study for hit-time reconstruction of Belle II SVD Yuma Uematsu (UTokyo) on behalf of Belle II

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

1 Low-rank approximations to a matrix using SVD First point: we can write the SVD as a sum of

GPU Parallel Implementation of The Approximate K-SVD Algorithm Using OpenCL Paul Irofti 1 Bogdan

A new weak Hilbert space Jess Surez de la Fuente, UEx Workshop on Banach spaces and Banach

On Hilbert IVth Problem Marc Troyanov (EPFL) SJTU, June 21, 2019 On Hilbert IVth Abstract

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 L. Rosasco RKHS About this

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Cooling pipes, heat management, temperature, humidity control of VXD Overview SVD PXD 2

Partial Lanczos SVD methods for R Bryan Lewis 1 , adapted from the work of Jim Baglama 2 and Lothar

The Great SVD Mystery James H. Steiger Department of Psychology and Human Development Vanderbilt

Object-Oriented Patterns & Frameworks Dr. Douglas C. Schmidt d.schmidt@vanderbilt.edu

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 3 Jan-Willem van de Meent (

Support Vector Machine and Kernel Methods Jiayu Zhou 1 Department of Computer Science and

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) Andrew Gordon Wilson

Density-Based Alternative Explanation Fuzzy Clustering as a Explaining the . . . What If Not

Designing Kernel Functions Designing Kernel Functions Using the Karhunen-Love Using the

AND MACHINE LEARNING CHAPTER 6: KERNEL METHODS Previous Chapters - Presented linear models for

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Edge Detection CS/BIOEN 4640: Image Processing Basics February 9, 2012 Gaussian Blurring for

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg - PowerPoint PPT Presentation

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer Mike McCourt Roberto Cavoretto Department of Applied Mathematics Illinois Institute of Technology Partially supported by NSF Grant DMS1115392 MAIA 2013

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

A study for hit-time reconstruction of Belle II SVD Yuma Uematsu (UTokyo) on behalf of Belle II

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

1 Low-rank approximations to a matrix using SVD First point: we can write the SVD as a sum of

GPU Parallel Implementation of The Approximate K-SVD Algorithm Using OpenCL Paul Irofti 1 Bogdan

A new weak Hilbert space Jess Surez de la Fuente, UEx Workshop on Banach spaces and Banach

On Hilbert IVth Problem Marc Troyanov (EPFL) SJTU, June 21, 2019 On Hilbert IVth Abstract

Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 L. Rosasco RKHS About this

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Cooling pipes, heat management, temperature, humidity control of VXD Overview SVD PXD 2

Partial Lanczos SVD methods for R Bryan Lewis 1 , adapted from the work of Jim Baglama 2 and Lothar

The Great SVD Mystery James H. Steiger Department of Psychology and Human Development Vanderbilt

Object-Oriented Patterns &amp; Frameworks Dr. Douglas C. Schmidt d.schmidt@vanderbilt.edu

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 3 Jan-Willem van de Meent (

Support Vector Machine and Kernel Methods Jiayu Zhou 1 Department of Computer Science and

Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) Andrew Gordon Wilson

Density-Based Alternative Explanation Fuzzy Clustering as a Explaining the . . . What If Not

Designing Kernel Functions Designing Kernel Functions Using the Karhunen-Love Using the

AND MACHINE LEARNING CHAPTER 6: KERNEL METHODS Previous Chapters - Presented linear models for

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Edge Detection CS/BIOEN 4640: Image Processing Basics February 9, 2012 Gaussian Blurring for

Object-Oriented Patterns & Frameworks Dr. Douglas C. Schmidt d.schmidt@vanderbilt.edu