Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg - - PowerPoint PPT Presentation

using a hilbert schmidt svd for stable kernel computations
SMART_READER_LITE
LIVE PREVIEW

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg - - PowerPoint PPT Presentation

Using a Hilbert-Schmidt SVD for Stable Kernel Computations Greg Fasshauer Mike McCourt Roberto Cavoretto Department of Applied Mathematics Illinois Institute of Technology Partially supported by NSF Grant DMS1115392 MAIA 2013


slide-1
SLIDE 1

Using a Hilbert-Schmidt SVD for Stable Kernel Computations

Greg Fasshauer∗ Mike McCourt Roberto Cavoretto

Department of Applied Mathematics Illinois Institute of Technology Partially supported by NSF Grant DMS–1115392

MAIA 2013 Multivariate Approximation and Interpolation with Applications Erice, Sicily Sept.25, 2013

Greg Fasshauer Hilbert-Schmidt SVD 1

slide-2
SLIDE 2

Outline

1

Fundamental Problem

2

Hilbert-Schmidt SVD and General RBF-QR Algorithm

3

Implementation for Compact Matérn Kernels

4

Application 1: Basic Function Approximation

5

Application 2: Optimal Shape Parameters via MLE

6

Summary

Greg Fasshauer Hilbert-Schmidt SVD 2

slide-3
SLIDE 3

Fundamental Problem

Kernel-based Interpolation

Given data (xi, yi)N

i=1, use a data-dependent linear function space

s(x) =

N

  • j=1

cjK(x, xj), x ∈ Ω ⊆ Rd with K : Ω × Ω → R a positive definite reproducing kernel.

Greg Fasshauer Hilbert-Schmidt SVD 3

slide-4
SLIDE 4

Fundamental Problem

Kernel-based Interpolation

Given data (xi, yi)N

i=1, use a data-dependent linear function space

s(x) =

N

  • j=1

cjK(x, xj), x ∈ Ω ⊆ Rd with K : Ω × Ω → R a positive definite reproducing kernel. To find cj solve the interpolation equations s(xi) = yi, i = 1, . . . , N, which leads to a linear system Kc = y with symmetric positive definite – often ill-conditioned – system matrix Kij = K(xi, xj), i, j = 1, . . . , N.

Greg Fasshauer Hilbert-Schmidt SVD 3

slide-5
SLIDE 5

Fundamental Problem

Common Complaints About Kernels

Kernel methods suffer from numerical instability, the presence of free parameter(s), high computational cost.

Greg Fasshauer Hilbert-Schmidt SVD 4

slide-6
SLIDE 6

Fundamental Problem

Common Complaints About Kernels

Kernel methods suffer from numerical instability, the presence of free parameter(s), high computational cost. In this talk we will address the first two issues: We obtain stable methods by working with a “better” basis which leads to a Hilbert-Schmidt SVD of the matrix K. Free parameters can be “optimally” chosen by using statistical methods such as MLE, which are significantly enhanced by using the HS-SVD.

Greg Fasshauer Hilbert-Schmidt SVD 4

slide-7
SLIDE 7

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Hilbert-Schmidt Theory

We assume that we know a Hilbert-Schmidt expansion (or Mercer series expansion) of our kernel K: K(x, z) =

  • n=1

λnϕn(x)ϕn(z), where (λn, ϕn) are orthonormal eigenpairs of a Hilbert-Schmidt integral

  • perator TK : L2(Ω, ρ) → L2(Ω, ρ) defined as

(TKf)(x) =

K(x, z)f(z)ρ(z)dz, where Ω ⊂ Rd and KL2(Ω×Ω,ρ×ρ) < ∞.

Greg Fasshauer Hilbert-Schmidt SVD 5

slide-8
SLIDE 8

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Gaussian Eigenfunctions

[Rasmussen/Williams (2006), F ./McCourt (2012)]

e−ε2(x−z)2 =

  • n=0

λnϕn(x)ϕn(z)

Greg Fasshauer Hilbert-Schmidt SVD 6

slide-9
SLIDE 9

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Gaussian Eigenfunctions

[Rasmussen/Williams (2006), F ./McCourt (2012)]

e−ε2(x−z)2 =

  • n=0

λnϕn(x)ϕn(z) where λn =

  • α2

α2 + δ2 + ε2

  • ε2

α2 + δ2 + ε2 n , ϕn(x) = γne−δ2x2Hn(αβx) with Hn Hermite polynomials, β =

  • 1 +

2ε α 2 1

4

, γn =

  • β

2nΓ(n + 1), δ2 = α2 2

  • β2 − 1
  • and {ϕn}∞

n=0 (ρ-weighted) L2-orthonormal, i.e.,

−∞

ϕm(x)ϕn(x) ρ(x) dx = δmn, ρ(x) = α √πe−α2x2

Greg Fasshauer Hilbert-Schmidt SVD 6

slide-10
SLIDE 10

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Multivariate Eigenfunction Expansion

Use tensor product form of the Gaussian kernel K(x, z) = e−ε2x−z2

2 = e

d

  • ℓ=1

ε2(xℓ−zℓ)2

=

d

  • ℓ=1

e−ε2(xℓ−zℓ)2 x = (x1, . . . , xd) ∈ Rd,

Greg Fasshauer Hilbert-Schmidt SVD 7

slide-11
SLIDE 11

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Multivariate Eigenfunction Expansion

Use tensor product form of the Gaussian kernel K(x, z) = e−ε2x−z2

2 = e

d

  • ℓ=1

ε2

ℓ(xℓ−zℓ)2

=

d

  • ℓ=1

e−ε2

ℓ(xℓ−zℓ)2

x = (x1, . . . , xd) ∈ Rd,

Greg Fasshauer Hilbert-Schmidt SVD 7

slide-12
SLIDE 12

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Multivariate Eigenfunction Expansion

Use tensor product form of the Gaussian kernel K(x, z) = e−ε2x−z2

2 = e

d

  • ℓ=1

ε2

ℓ(xℓ−zℓ)2

=

d

  • ℓ=1

e−ε2

ℓ(xℓ−zℓ)2

=

  • n∈Nd

λnϕn(x)ϕn(z), x = (x1, . . . , xd) ∈ Rd, where λn =

d

  • ℓ=1

λnℓ, ϕn(x) =

d

  • ℓ=1

ϕnℓ(xℓ). Different shape parameters εℓ for different space dimensions allowed (i.e., K may be anisotropic).

Greg Fasshauer Hilbert-Schmidt SVD 7

slide-13
SLIDE 13

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Fundamental idea: use the eigen-expansion of the kernel K to rewrite the matrix K from the interpolation problem as

K =    K(x1, x1) . . . K(x1, xN) . . . . . . K(xN, x1) . . . K(xN, xN)   

=    ϕ1(x1) . . . ϕM(x1) . . . . . . . . . ϕ1(xN) . . . ϕM(xN) . . .           λ1 ... λM ...               ϕ1(x1) . . . ϕ1(xN) . . . . . . ϕM(x1) . . . ϕM(xN) . . . . . .       

Greg Fasshauer Hilbert-Schmidt SVD 8

slide-14
SLIDE 14

Hilbert-Schmidt SVD and General RBF-QR Algorithm

But we can’t compute with infinite matrices, so we choose a truncation value M (supported by λn → 0 as n → ∞, more later) and rewrite

K =    K(x1, x1) . . . K(x1, xN) . . . . . . K(xN, x1) . . . K(xN, xN)    =    ϕ1(x1) . . . ϕM(x1) . . . . . . ϕ1(xN) . . . ϕM(xN)   

     λ1 ... λM     

   ϕ1(x1) . . . ϕ1(xN) . . . . . . ϕM(x1) . . . ϕM(xN)   

  • =ΦT

Greg Fasshauer Hilbert-Schmidt SVD 8

slide-15
SLIDE 15

Hilbert-Schmidt SVD and General RBF-QR Algorithm

But we can’t compute with infinite matrices, so we choose a truncation value M (supported by λn → 0 as n → ∞, more later) and rewrite

K =    K(x1, x1) . . . K(x1, xN) . . . . . . K(xN, x1) . . . K(xN, xN)    =    ϕ1(x1) . . . ϕM(x1) . . . . . . ϕ1(xN) . . . ϕM(xN)   

     λ1 ... λM     

   ϕ1(x1) . . . ϕ1(xN) . . . . . . ϕM(x1) . . . ϕM(xN)   

  • =ΦT

Since K(xi, xj) =

  • n=1

λnϕn(xi)ϕn(xj) ≈

M

  • n=1

λnϕn(xi)ϕn(xj) accurate reconstruction of all entries of K will likely require M > N.

Greg Fasshauer Hilbert-Schmidt SVD 8

slide-16
SLIDE 16

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea.

Greg Fasshauer Hilbert-Schmidt SVD 9

slide-17
SLIDE 17

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦT provides an accurate (elementwise) approximation of K without ever forming it.

Greg Fasshauer Hilbert-Schmidt SVD 9

slide-18
SLIDE 18

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦT provides an accurate (elementwise) approximation of K without ever forming it. However, it is not recommended to directly use this decomposition either since all of the ill-conditioning associated with K is still present – sitting in the matrix Λ.

Greg Fasshauer Hilbert-Schmidt SVD 9

slide-19
SLIDE 19

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The matrix K is often ill-conditioned, so forming K and computing with it is not a good idea. The eigen-decomposition K = ΦΛΦT provides an accurate (elementwise) approximation of K without ever forming it. However, it is not recommended to directly use this decomposition either since all of the ill-conditioning associated with K is still present – sitting in the matrix Λ. We now use mostly standard numerical linear algebra to isolate some

  • f this ill-conditioning and develop the Hilbert-Schmidt SVD and a

general RBF-QR algorithm.

Greg Fasshauer Hilbert-Schmidt SVD 9

slide-20
SLIDE 20

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Details of the Hilbert-Schmidt SVD

Assume M > N, so that Φ is “short and fat” and partition Φ:    ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1) . . . . . . . . . . . . ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)    =     Φ1

  • N×N

Φ2

  • N×(M−N)

    .

Greg Fasshauer Hilbert-Schmidt SVD 10

slide-21
SLIDE 21

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Details of the Hilbert-Schmidt SVD

Assume M > N, so that Φ is “short and fat” and partition Φ:    ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1) . . . . . . . . . . . . ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)    =     Φ1

  • N×N

Φ2

  • N×(M−N)

    . Then K = ΦΛΦT

Greg Fasshauer Hilbert-Schmidt SVD 10

slide-22
SLIDE 22

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Details of the Hilbert-Schmidt SVD

Assume M > N, so that Φ is “short and fat” and partition Φ:    ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1) . . . . . . . . . . . . ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)    =     Φ1

  • N×N

Φ2

  • N×(M−N)

    . Then K = ΦΛΦT = Φ Λ1 Λ2 ΦT

1

ΦT

2

  • Greg Fasshauer

Hilbert-Schmidt SVD 10

slide-23
SLIDE 23

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Details of the Hilbert-Schmidt SVD

Assume M > N, so that Φ is “short and fat” and partition Φ:    ϕ1(x1) . . . ϕN(x1) ϕN+1(x1) . . . ϕM(x1) . . . . . . . . . . . . ϕ1(xN) . . . ϕN(xN) ϕN+1(xN) . . . ϕM(xN)    =     Φ1

  • N×N

Φ2

  • N×(M−N)

    . Then K = ΦΛΦT = Φ Λ1 Λ2 ΦT

1

ΦT

2

  • =

Φ

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • = Ψ

Λ1ΦT

1 =M

Greg Fasshauer Hilbert-Schmidt SVD 10

slide-24
SLIDE 24

Hilbert-Schmidt SVD and General RBF-QR Algorithm

There are at least two ways to interpret the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1

Greg Fasshauer Hilbert-Schmidt SVD 11

slide-25
SLIDE 25

Hilbert-Schmidt SVD and General RBF-QR Algorithm

There are at least two ways to interpret the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1

We’ve found an invertible M = Λ1ΦT

1 such that Ψ = KM−1 is better

conditioned than K

  • “better basis”.

Greg Fasshauer Hilbert-Schmidt SVD 11

slide-26
SLIDE 26

Hilbert-Schmidt SVD and General RBF-QR Algorithm

There are at least two ways to interpret the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1

We’ve found an invertible M = Λ1ΦT

1 such that Ψ = KM−1 is better

conditioned than K

  • “better basis”.

We’ve diagonalized the matrix K, i.e., K = ΨΛ1ΦT

1 ,

where

Λ1 is a diagonal matrix of Hilbert-Schmidt singular values, Ψ and Φ1 are matrices generated by orthogonal eigenfunctions (but not orthogonal matrices).

Greg Fasshauer Hilbert-Schmidt SVD 11

slide-27
SLIDE 27

Hilbert-Schmidt SVD and General RBF-QR Algorithm

There are at least two ways to interpret the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1

We’ve found an invertible M = Λ1ΦT

1 such that Ψ = KM−1 is better

conditioned than K

  • “better basis”.

We’ve diagonalized the matrix K, i.e., K = ΨΛ1ΦT

1 ,

where

Λ1 is a diagonal matrix of Hilbert-Schmidt singular values, Ψ and Φ1 are matrices generated by orthogonal eigenfunctions (but not orthogonal matrices).

Remark The matrix Ψ is the same for both interpretations. It can be computed stably. We get a well-conditioned linear system Ψb = y (where b = Mc) for the interpolation problem.

Greg Fasshauer Hilbert-Schmidt SVD 11

slide-28
SLIDE 28

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Taking a closer look at the matrix Ψ, we see that Ψ = (Φ1 Φ2)

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • = Φ1 + Φ2
  • Λ2ΦT

2 Φ−T 1 Λ−1 1

  • .

We can interpret this as having a new basis ψ(·)T = (ψ1(·), . . . , ψN(·)) for the interpolation space span {K(·, x1), . . . , K(·, xN}} consisting of the appropriately corrected first N eigenfunctions:

Greg Fasshauer Hilbert-Schmidt SVD 12

slide-29
SLIDE 29

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Taking a closer look at the matrix Ψ, we see that Ψ = (Φ1 Φ2)

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • = Φ1 + Φ2
  • Λ2ΦT

2 Φ−T 1 Λ−1 1

  • .

We can interpret this as having a new basis ψ(·)T = (ψ1(·), . . . , ψN(·)) for the interpolation space span {K(·, x1), . . . , K(·, xN}} consisting of the appropriately corrected first N eigenfunctions: If we let φ(·)T = (ϕ1(·), . . . , ϕN(·), ϕN+1(·), . . . , ϕM(·)), then we can rewrite our kernel basis using the Hilbert-Schmidt SVD k(x)T = φ(x)T

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • Λ1ΦT

1 = ψ(x)TΛ1ΦT 1 .

Greg Fasshauer Hilbert-Schmidt SVD 12

slide-30
SLIDE 30

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Taking a closer look at the matrix Ψ, we see that Ψ = (Φ1 Φ2)

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • = Φ1 + Φ2
  • Λ2ΦT

2 Φ−T 1 Λ−1 1

  • .

We can interpret this as having a new basis ψ(·)T = (ψ1(·), . . . , ψN(·)) for the interpolation space span {K(·, x1), . . . , K(·, xN}} consisting of the appropriately corrected first N eigenfunctions: If we let φ(·)T = (ϕ1(·), . . . , ϕN(·), ϕN+1(·), . . . , ϕM(·)), then we can rewrite our kernel basis using the Hilbert-Schmidt SVD k(x)T = φ(x)T

  • IN

Λ2ΦT

2 Φ−T 1 Λ−1 1

  • Λ1ΦT

1 = ψ(x)TΛ1ΦT 1 .

The data-dependence of the new basis is captured by the “correction”

  • term. The new basis is more stable since we have removed Λ1.

Greg Fasshauer Hilbert-Schmidt SVD 12

slide-31
SLIDE 31

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The QR in RBF-QR

Additional stability in the computation of the correction matrix

  • Λ2ΦT

2 Φ−T 1 Λ−1 1

  • ,

in particular, in the formation of ΦT

2 Φ−T 1 , is achieved via a QR

decomposition of Φ, i.e.,

  • Φ1

Φ2

  • = Q
  • R1
  • N×N

R2

  • N×(M−N)
  • with orthogonal N × N matrix Q and upper triangular matrix R1.

Greg Fasshauer Hilbert-Schmidt SVD 13

slide-32
SLIDE 32

Hilbert-Schmidt SVD and General RBF-QR Algorithm

The QR in RBF-QR

Additional stability in the computation of the correction matrix

  • Λ2ΦT

2 Φ−T 1 Λ−1 1

  • ,

in particular, in the formation of ΦT

2 Φ−T 1 , is achieved via a QR

decomposition of Φ, i.e.,

  • Φ1

Φ2

  • = Q
  • R1
  • N×N

R2

  • N×(M−N)
  • with orthogonal N × N matrix Q and upper triangular matrix R1.

Then we have ΦT

2 Φ−T 1

= RT

2 QTQR−T 1

= RT

2 R−T 1 .

This idea appeared in [Fornberg/Piret (2008)].

Greg Fasshauer Hilbert-Schmidt SVD 13

slide-33
SLIDE 33

Hilbert-Schmidt SVD and General RBF-QR Algorithm

Summary of Method

Instead of solving the “original” interpolation problem with ill-conditioned matrix K Kc = y, leading to inaccurate coefficients which then need to be multiplied against poorly conditioned basis functions, we now solve Ψb = y for a new set of coefficients which we then evaluate via s(x) =

N

  • j=1

bjψj(x), i.e., using the new basis.

Greg Fasshauer Hilbert-Schmidt SVD 14

slide-34
SLIDE 34

Implementation for Compact Matérn Kernels

General Implementation

It is crucial to know the Hilbert-Schmidt expansion of K: K(x, z) =

  • n=1

λnϕn(x)ϕn(z)

Greg Fasshauer Hilbert-Schmidt SVD 15

slide-35
SLIDE 35

Implementation for Compact Matérn Kernels

General Implementation

It is crucial to know the Hilbert-Schmidt expansion of K: K(x, z) =

  • n=1

λnϕn(x)ϕn(z) The multivariate Gaussian kernels mentioned earlier were used in [F ./Hickernell/Wo´ zniakowski (2012)] to prove dimension-independent convergence rates [F ./McCourt (2012)] to obtain and implement a stable GaussQR algorithm.

Greg Fasshauer Hilbert-Schmidt SVD 15

slide-36
SLIDE 36

Implementation for Compact Matérn Kernels

General Implementation

It is crucial to know the Hilbert-Schmidt expansion of K: K(x, z) =

  • n=1

λnϕn(x)ϕn(z) The multivariate Gaussian kernels mentioned earlier were used in [F ./Hickernell/Wo´ zniakowski (2012)] to prove dimension-independent convergence rates [F ./McCourt (2012)] to obtain and implement a stable GaussQR algorithm. We now discuss the implementation for generalizations of the Brownian bridge kernel K(x, z) = min(x, z) − xz, x, z ∈ [0, 1], which we call compact Matérn kernels [Cavoretto/F ./McCourt (2013)].

Greg Fasshauer Hilbert-Schmidt SVD 15

slide-37
SLIDE 37

Implementation for Compact Matérn Kernels

We define compact Matérn kernels as Green’s kernels of

  • − d2

dx2 + ε2I β K(x, z) = δ(x − z), x, z ∈ [0, 1], β ∈ N, ε ≥ 0, subject to d2ν dx2ν K(0, z) = d2ν dx2ν K(1, z) = 0, ν = 0, . . . , β − 1.

Greg Fasshauer Hilbert-Schmidt SVD 16

slide-38
SLIDE 38

Implementation for Compact Matérn Kernels

We define compact Matérn kernels as Green’s kernels of

  • − d2

dx2 + ε2I β K(x, z) = δ(x − z), x, z ∈ [0, 1], β ∈ N, ε ≥ 0, subject to d2ν dx2ν K(0, z) = d2ν dx2ν K(1, z) = 0, ν = 0, . . . , β − 1. The Hilbert-Schmidt expansion for compact Matérn kernels is Kβ,ε(x, z) =

  • n=1

2

  • n2π2 + ε2β sin (nπx) sin (nπz) ,

i.e., the eigenvalues and eigenfunctions are λn = 1

  • n2π2 + ε2β ,

ϕn(x) = √ 2 sin (nπx) . (1)

Greg Fasshauer Hilbert-Schmidt SVD 16

slide-39
SLIDE 39

Implementation for Compact Matérn Kernels

Clearly, the eigenfunctions are bounded by √ 2, and, for a fixed value of ε, the eigenvalues decay as n−2β. Therefore the truncation length M needed for accurate representation

  • f the entries of K can be easily determined as a function of β and ε:

Greg Fasshauer Hilbert-Schmidt SVD 17

slide-40
SLIDE 40

Implementation for Compact Matérn Kernels

Clearly, the eigenfunctions are bounded by √ 2, and, for a fixed value of ε, the eigenvalues decay as n−2β. Therefore the truncation length M needed for accurate representation

  • f the entries of K can be easily determined as a function of β and ε:

To ensure that we keep the first M significant terms we take M such that λM < ǫmachλN, M > N.

Greg Fasshauer Hilbert-Schmidt SVD 17

slide-41
SLIDE 41

Implementation for Compact Matérn Kernels

Clearly, the eigenfunctions are bounded by √ 2, and, for a fixed value of ε, the eigenvalues decay as n−2β. Therefore the truncation length M needed for accurate representation

  • f the entries of K can be easily determined as a function of β and ε:

To ensure that we keep the first M significant terms we take M such that λM < ǫmachλN, M > N. Using the explicit representation of the eigenvalues, we solve for M: M(β, ε; ǫmach) = 1 π

  • ǫ−1/β

mach(N2π2 + ε2) − ε2

  • .

Greg Fasshauer Hilbert-Schmidt SVD 17

slide-42
SLIDE 42

Implementation for Compact Matérn Kernels

Program (MaternQRSolve.m)

function yy = MaternQRSolve(x,y,ep,beta,xx) phifunc = @(n,x) sqrt(2)*sin(pi*x*n); N = length(x); M = ceil(1/pi*sqrt(eps^(-1/beta)*(N^2*pi^2+ep^2)-ep^2)); n = 1:M; Lambda = diag(((n*pi).^2+ep^2).^(-beta)); Phi = phifunc(n,x); [Q,R] = qr(Phi); R1 = R(:,1:N); R2 = R(:,N+1:end); Rhat = R1\R2; Lambda1 = Lambda(1:N,1:N); Lambda2 = Lambda(N+1:M,N+1:M); Rbar = Lambda2*Rhat’/Lambda1; Psi = Phi*[eye(N);Rbar]; b = Psi\y; Phi_eval = phifunc(n,xx); yy = Phi_eval*[eye(N);Rbar]*b; end

Greg Fasshauer Hilbert-Schmidt SVD 18

slide-43
SLIDE 43

Application 1: Basic Function Approximation

Standard RBF vs. MatérnQR Interpolation

We use Kβ,ε with β = 7 and ε = 1 N = 21 uniform samples of f(x) = (1 − 4x)14

+ (4x − 3)14 +

Greg Fasshauer Hilbert-Schmidt SVD 19

slide-44
SLIDE 44

Application 2: Optimal Shape Parameters via MLE

Likelihood Functions for Gaussian Random Fields

Kernel-based interpolation has an analog in statistics called kriging. If, instead of trying to recover a function, we treat our scattered data as samples of one realization of a Gaussian random field, we can prescribe a positive definite kernel K as the presumed covariance between realizations of the Gaussian random field. The likelihood function of a zero-mean Gaussian random field (the probability of the data (xi, yi)N

i=1 given the kernel K with shape

parameters θ = (ε, β)) is L(θ; y) = p(y|θ) = 1

  • (2πσ2)N det(K)

exp

  • − 1

2σ2 yTK−1y

  • K is the kernel interpolation matrix from before, σ2 is the process

variance.

Greg Fasshauer Hilbert-Schmidt SVD 20

slide-45
SLIDE 45

Application 2: Optimal Shape Parameters via MLE

Maximum Likelihood Estimation (MLE)

Usually we minimize the negative (concentrated) log-likelihood:

  • L(θ; y) = 1

N log det(K) + log

  • yTK−1y
  • =Q(y)

. This requires evaluating log det(K) and log Q(y), which, given the ill-conditioning of K are both bound to cause trouble.

Greg Fasshauer Hilbert-Schmidt SVD 21

slide-46
SLIDE 46

Application 2: Optimal Shape Parameters via MLE

Maximum Likelihood Estimation (MLE)

Usually we minimize the negative (concentrated) log-likelihood:

  • L(θ; y) = 1

N log det(K) + log

  • yTK−1y
  • =Q(y)

. This requires evaluating log det(K) and log Q(y), which, given the ill-conditioning of K are both bound to cause trouble. Luckily, we have developed the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1

to help in both cases.

Greg Fasshauer Hilbert-Schmidt SVD 21

slide-47
SLIDE 47

Application 2: Optimal Shape Parameters via MLE

Computing log det(K)

We use the Hilbert-Schmidt SVD to write det(K) = det(ΨΛ1ΦT

1 ) = det(Ψ) det(Λ1) det(Φ1)

Evaluating det(Λ1) can be done analytically and det(Ψ) and det(Φ1) can be computed stably using standard techniques.

Greg Fasshauer Hilbert-Schmidt SVD 22

slide-48
SLIDE 48

Application 2: Optimal Shape Parameters via MLE

Computing log det(K)

We use the Hilbert-Schmidt SVD to write det(K) = det(ΨΛ1ΦT

1 ) = det(Ψ) det(Λ1) det(Φ1)

Evaluating det(Λ1) can be done analytically and det(Ψ) and det(Φ1) can be computed stably using standard techniques.

Greg Fasshauer Hilbert-Schmidt SVD 22

slide-49
SLIDE 49

Application 2: Optimal Shape Parameters via MLE

Computing log Q(y)

We remember that the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1 gives us

Kc = y ⇐ ⇒ Ψ Λ1ΦT

1 c =b

= y. (2)

Greg Fasshauer Hilbert-Schmidt SVD 23

slide-50
SLIDE 50

Application 2: Optimal Shape Parameters via MLE

Computing log Q(y)

We remember that the Hilbert-Schmidt SVD K = ΨΛ1ΦT

1 gives us

Kc = y ⇐ ⇒ Ψ Λ1ΦT

1 c =b

= y. (2) Straightforward computation shows that Q(y) = yTK−1y = bTAb, where A = Λ−1

1

+ BTΛ2B, B = ΦT

2 Φ−T 1 Λ−1 1

so that A is clearly symmetric and positive definite. In particular, Q(y) = bTΛ−1

1 b + bTBTΛ2Bb ≥ bTΛ−1 1 b,

where b is computed stably via (2) and Λ−1

1

is given analytically.

Greg Fasshauer Hilbert-Schmidt SVD 23

slide-51
SLIDE 51

Application 2: Optimal Shape Parameters via MLE Greg Fasshauer Hilbert-Schmidt SVD 24

Q(y) = bTAb = bTΛ−1

1 b + bTBTΛ2Bb

Note that Q(y) = yTK−1y = cTKc = bTAb is the native space norm

  • f the interpolant.

To statisticians this is known as the Mahalanobis distance.

slide-52
SLIDE 52

Application 2: Optimal Shape Parameters via MLE Greg Fasshauer Hilbert-Schmidt SVD 24

Q(y) = bTAb = bTΛ−1

1 b + bTBTΛ2Bb

Note that Q(y) = yTK−1y = cTKc = bTAb is the native space norm

  • f the interpolant.

To statisticians this is known as the Mahalanobis distance.

slide-53
SLIDE 53

Application 2: Optimal Shape Parameters via MLE

Stable MLE for Gaussian Interpolation in 1D

Figure: N = 15 Chebyshev points for f(x) = x +

1 1+x2 on [−1, 1].

  • L(ε; y) = 1

N log det(K) + log

  • yTK−1y
  • Greg Fasshauer

Hilbert-Schmidt SVD 25

slide-54
SLIDE 54

Application 2: Optimal Shape Parameters via MLE

Stable MLE for Gaussian Interpolation in 5D

y(x) = sin(mean(x)), using N Halton points solid lines for HS-SVD, dashed lines for direct solve

Greg Fasshauer Hilbert-Schmidt SVD 26

slide-55
SLIDE 55

Application 2: Optimal Shape Parameters via MLE

MLE as a consistent predictor of “optimal” ε

True solution (left): overall optimal values (red dot): ε = 1.333521, N = 140, Error = 5.8378 × 10−17 MLE (right): overall “optimal” values (red dot): ε = 1.778279, N = 60, Error = 6.29907 × 10−16

Greg Fasshauer Hilbert-Schmidt SVD 27

slide-56
SLIDE 56

Application 2: Optimal Shape Parameters via MLE

MLE and flat polynomial limits

Figure: N = 15 Chebyshev points for y(x) = x3 − 3x2 + 2x + 1 and y(x) = x3 − 3x2 + 2x + 1 + 10−10 cos(10x) on [−1, 1].

In both cases, the MLE predicts an ε-value that leads to optimal accuracy. However, the MLE does not “allow” the (polynomial) flat limit since polynomials are not in the native space of Gaussians.

Greg Fasshauer Hilbert-Schmidt SVD 28

slide-57
SLIDE 57

Summary

Summary

Hilbert-Schmidt/Mercer expansion and Hilbert-Schmidt SVD provide a general and transparent framework for stable kernel computation Implementation depends on availability of Mercer series for specific kernels

some eigenfunctions are easier to obtain than others some eigenfunctions are easier to handle than others

Vast applications

function interpolation/approximation parameter estimation (MLE, GCV) numerical solution of PDEs (collocation, MFS, MPS) ...

Future outlook

implement for anisotropic Gaussians HS-SVD for other kernels MLE for low-rank approximation

Greg Fasshauer Hilbert-Schmidt SVD 29

slide-58
SLIDE 58

Appendix References

References I

Rasmussen, C. E. and Williams, C. K. I. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. Cavoretto, R., Fasshauer, G. E. and McCourt, M. J. Compact Matérn kernels and piecewise polynomial splines viewed from a Hilbert-Schmidt perspective. submitted. Fasshauer, G. E., Hickernell, F. J. and Wo´ zniakowski, H. Rate of convergence and tractability of the radial function approximation problem. SIAM J. Numer. Anal. 50/1 (2012), 247–271. Fasshauer, G. E. and McCourt, M. J. Stable evaluation of Gaussian RBF interpolants. SIAM J. Scient. Comput. 34/2 (2012), pp. A737–A762. Fornberg, B. and Piret, C. A stable algorithm for flat radial basis functions on a sphere. SIAM J. Scient. Comput. 30/1 (2008), pp. 60–80.

Greg Fasshauer Hilbert-Schmidt SVD 30