Compression, inversion and sparse approximate PCA of dense kernel - - PowerPoint PPT Presentation

compression inversion and sparse approximate pca of dense
SMART_READER_LITE
LIVE PREVIEW

Compression, inversion and sparse approximate PCA of dense kernel - - PowerPoint PPT Presentation

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear computational complexity Florian Schfer ICERM 2017 F. Schfer, T.J. Sullivan, H. Owhadi Sparse factorisation of dense Kernel matrices June 8th 2017


slide-1
SLIDE 1

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear computational complexity

Florian Schäfer ICERM 2017

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 1 / 130

slide-2
SLIDE 2

Compression, inversion and approximate PCA of dense kernel matrices in near linear computational complexity Florian Schäfer, T.J. Sullivan, Houman Owhadi http://arxiv.org/abs/1706.02205

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 2 / 130

slide-3
SLIDE 3

Outline

1

A numerical experiment

2

Disintegration of measure and Gaussian elimination

3

Near-linear complexity algorithms using the theory of Gamblets

4

Further numerical results

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 3 / 130

slide-4
SLIDE 4

A numerical experiment

{xi}i∈I ⊂ [0, 1]2, with #I = N = 16641

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 4 / 130

slide-5
SLIDE 5

A numerical experiment

{xi}i∈I ⊂ [0, 1]2, with #I = N = 16641 Define K (r) as Matérn kernel with smoothness parameter ν = 1 and lengthscale l = 0.4.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 5 / 130

slide-6
SLIDE 6

A numerical experiment

{xi}i∈I ⊂ [0, 1]2, with #I = N = 16641 Define K (r) as Matérn kernel with smoothness parameter ν = 1 and lengthscale l = 0.4. Γi,j := K

  • xi − xj
  • .
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 6 / 130

slide-7
SLIDE 7

A numerical experiment

Γ, interpreted as covariance matrix, describes a Gaussian field with second order smoothness.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 7 / 130

slide-8
SLIDE 8

A numerical experiment

Γ, interpreted as covariance matrix, describes a Gaussian field with second order smoothness. Alternatively, K can be seen as the Green’s function of a fourth

  • rder elliptic PDE, on the whole space.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 8 / 130

slide-9
SLIDE 9

A numerical experiment

Γ, interpreted as covariance matrix, describes a Gaussian field with fourth order smoothness. Alternatively, K can be seen as the Green’s function of a fourth

  • rder elliptic PDE, on the whole space.

Matrices of this kind appear in both statistics and scientific computing.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 9 / 130

slide-10
SLIDE 10

A numerical experiment

Γ, interpreted as covariance matrix, describes a Gaussian field with fourth order smoothness. Alternatively, K can be seen as the Green’s function of a fourth

  • rder elliptic PDE, on the whole space.

Matrices of this kind appear in both statistics and scientific computing. We need to apply the Matrix and its inverse, and compute its determinant.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 10 / 130

slide-11
SLIDE 11

A numerical experiment

Γ, interpreted as covariance matrix, describes a Gaussian field with fourth order smoothness. Alternatively, K can be seen as the Green’s function of a fourth

  • rder elliptic PDE, on the whole space.

Matrices of this kind appear in both statistics and scientific computing. We need to apply the Matrix and its inverse, and compute its determinant. Γ is dense, and hence has N2 storage cost. Direct inversion via Gaussian elimination has O

  • N3

complexity in time.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 11 / 130

slide-12
SLIDE 12

A numerical experiment

Can we be more efficient?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 12 / 130

slide-13
SLIDE 13

A numerical experiment

Can we be more efficient? Many existing methods: Quadrature formulas, subsampling, randomised approximations, low rank approximations, fast multipole methods, hierarchical matrices, wavelet methods, inducing points, covariance tapering . . . .

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 13 / 130

slide-14
SLIDE 14

A numerical experiment

Can we be more efficient? Many existing methods: Quadrature formulas, subsampling, randomised approximations, low rank approximations, fast multipole methods, hierarchical matrices, wavelet methods, inducing points, covariance tapering . . . . We provide a simple algorithm, with rigorous error bounds and near-linear complexity.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 14 / 130

slide-15
SLIDE 15

A numerical experiment

Even writing down the matrix has N2 complexity.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 15 / 130

slide-16
SLIDE 16

A numerical experiment

Even writing down the matrix has N2 complexity. Therefore, we subsample Γ: ˜ Γi,j :=

  • Γi,j,

for (i, j) ∈ S2 0, else

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 16 / 130

slide-17
SLIDE 17

A numerical experiment

Even writing down the matrix has N2 complexity. Therefore, we subsample Γ: ˜ Γi,j :=

  • Γi,j,

for (i, j) ∈ S2 0, else #S2 = 5528749 = 0.0189N2. We have thrown away all but 2 percent of the entries, without even touching them!

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 17 / 130

slide-18
SLIDE 18

A numerical experiment

Even writing down the matrix has N2 complexity. Therefore, we subsample Γ: ˜ Γi,j :=

  • Γi,j,

for (i, j) ∈ S2 0, else #S2 = 5528749 = 0.0189N2. We have thrown away all but 2 percent of the entries, without even touching them! We will see later: S2 does not depend on the entries of Γ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 18 / 130

slide-19
SLIDE 19

A numerical experiment

We have compressed Γ to 2 percent of its original size.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 19 / 130

slide-20
SLIDE 20

A numerical experiment

We have compressed Γ to 2 percent of its original size. How much information have we retained?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 20 / 130

slide-21
SLIDE 21

A numerical experiment

We have compressed Γ to 2 percent of its original size. How much information have we retained? Consider relative error in operator norm:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 21 / 130

slide-22
SLIDE 22

A numerical experiment

We have compressed Γ to 2 percent of its original size. How much information have we retained? Consider relative error in operator norm:

  • Γ − ˜

Γ

  • Γ

= 0.9662

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 22 / 130

slide-23
SLIDE 23

A numerical experiment

We have compressed Γ to 2 percent of its original size. How much information have we retained? Consider relative error in operator norm:

  • Γ − ˜

Γ

  • Γ

= 0.9662 ˜ Γ is a very bad approximation of Γ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 23 / 130

slide-24
SLIDE 24

A numerical experiment

Can we obtain a better approximation of Γ, using only ˜ Γ?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 24 / 130

slide-25
SLIDE 25

A numerical experiment

Can we obtain a better approximation of Γ, using only ˜ Γ? Consider L = ICHOL

  • ˜

Γ

  • , the incomplete Cholesky factorisation
  • f ˜

Γ, ignoring all fill-in.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 25 / 130

slide-26
SLIDE 26

A numerical experiment

Can we obtain a better approximation of Γ, using only ˜ Γ? Consider L = ICHOL

  • ˜

Γ

  • , the incomplete Cholesky factorisation
  • f ˜

Γ, ignoring all fill-in.

  • Γ − LLT
  • Γ

= 3.0676e−04

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 26 / 130

slide-27
SLIDE 27

A numerical experiment

Can we obtain a better approximation of Γ, using only ˜ Γ? Consider L = ICHOL

  • ˜

Γ

  • , the incomplete Cholesky factorisation
  • f ˜

Γ, ignoring all fill-in.

  • Γ − LLT
  • Γ

= 3.0676e−04

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 27 / 130

slide-28
SLIDE 28

A numerical experiment

Decompose {xi}i∈I into a nested hierarchy as: {xi}i∈I(1) ⊂ {xi}i∈I(2) ⊂ {xi}i∈I(3) ⊂ · · · ⊂ {xi}i∈I(q) = {xi}i∈I (1.1)

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 28 / 130

slide-29
SLIDE 29

A numerical experiment

We define J(k) := I(k) \ I(k−1) and define the sparsity pattern: S2 :=

  • (i, j) ∈ I × I
  • i ∈ J(k), j ∈ J(l), dist
  • xi, xj
  • ≤ 2 ∗ 2− min(k,l)

.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 29 / 130

slide-30
SLIDE 30

A numerical experiment

We define J(k) := I(k) \ I(k−1) and define the sparsity pattern: S2 :=

  • (i, j) ∈ I × I
  • i ∈ J(k), j ∈ J(l), dist
  • xi, xj
  • ≤ 2 ∗ 2min(k,l)

. We order the elements of I from coarse to fine, that is from J(1) to J(q).

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 30 / 130

slide-31
SLIDE 31

A numerical experiment

L provides a good approximation of Γ at only 2 percent of the storage cost.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 31 / 130

slide-32
SLIDE 32

A numerical experiment

L provides a good approximation of Γ at only 2 percent of the storage cost. Can be computed in near linear complexity in time and space.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 32 / 130

slide-33
SLIDE 33

A numerical experiment

L provides a good approximation of Γ at only 2 percent of the storage cost. Can be computed in near linear complexity in time and space. Allows for approximate evaluation of Γ, Γ−1, and det (Γ) in near-linear time.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 33 / 130

slide-34
SLIDE 34

A numerical experiment

L provides a good approximation of Γ at only 2 percent of the storage cost. Can be computed in near linear complexity in time and space. Allows for approximate evaluation of Γ, Γ−1, and det (Γ) in near-linear time. Allows for sampling of X ∼ N (0, Γ) in near-linear time.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 34 / 130

slide-35
SLIDE 35

A numerical experiment

In this work, we

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 35 / 130

slide-36
SLIDE 36

A numerical experiment

In this work, we prove that this phaenomenon holds whenever the covariance function K is the Green’s function of an elliptic boundary value problem.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 36 / 130

slide-37
SLIDE 37

A numerical experiment

In this work, we prove that this phaenomenon holds whenever the covariance function K is the Green’s function of an elliptic boundary value problem. prove that it leads to an algorithm with computational complexity

  • f O
  • N log2 (N)
  • log (1/ǫ) + log2 (N)

4d+1 in time and O

  • N log(N) logd(N 1

ǫ )

  • in space for an approximation error of ǫ.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 37 / 130

slide-38
SLIDE 38

A numerical experiment

In this work, we prove that this phaenomenon holds whenever the covariance function K is the Green’s function of an elliptic boundary value problem. prove that it leads to an algorithm with computational complexity

  • f O
  • N log2 (N)
  • log (1/ǫ) + log2 (N)

4d+1 in time and O

  • N log(N) logd(N 1

ǫ )

  • in space.

show that even though the Matérn family is not covered rigorously by our theoretical results, we get good approximation results, in particular in the interior of the domain.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 38 / 130

slide-39
SLIDE 39

A numerical experiment

In this work, we prove that this phaenomenon holds whenever the covariance function K is the Green’s function of an elliptic boundary value problem. prove that it leads to an algorithm with computational complexity

  • f O
  • N log2 (N)
  • log (1/ǫ) + log2 (N)

4d+1 in time and O

  • N log(N) logd(N 1

ǫ )

  • in space.

show that even though the Matérn family is not covered rigorously by our theoretical results, we get good approximation results, in particular in the interior of the domain. show that as a byproduct of our algorithm we obtain a sparse approximate PCA with near optimal approximation property.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 39 / 130

slide-40
SLIDE 40

Disintegration of Gaussian Measures and the Screening Effect

Let X be a centered Gaussian vector with covariance Θ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 40 / 130

slide-41
SLIDE 41

Disintegration of Gaussian Measures and the Screening Effect

Let X be a centered Gaussian vector with covariance Θ. Assume we want to compute E [f (X)] for some function f.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 41 / 130

slide-42
SLIDE 42

Disintegration of Gaussian Measures and the Screening Effect

Let X be a centered Gaussian vector with covariance Θ. Assume we want to compute E [f (X)] for some function f. Use Monte Carlo, but for Θ large, each sample is expensive.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 42 / 130

slide-43
SLIDE 43

Disintegration of Gaussian Measures and the Screening Effect

Let X be a centered Gaussian vector with covariance Θ. Assume we want to compute E [f (X)] for some function f. Use Monte Carlo, but for Θ large, each sample is expensive. Idea: use disintegration of measure: E [f(X)] = E [E[f(X)|Y] (Y)] .

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 43 / 130

slide-44
SLIDE 44

Disintegration of Gaussian Measures and the Screening Effect

Let X be a centered Gaussian vector with covariance Θ. Assume we want to compute E [f (X)] for some function f. Use Monte Carlo, but for Θ large, each sample is expensive. Idea: use disintegration of measure: E [f(X)] = E [E[f(X)|Y] (Y)] . Choose Y, such that Y and E[f(X)|Y] can be sampled cheaply.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 44 / 130

slide-45
SLIDE 45

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 45 / 130

slide-46
SLIDE 46

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .

Corresponds to a prior on the space H1 (0, 1) of mean square differentiable functions.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 46 / 130

slide-47
SLIDE 47

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .

Corresponds to a prior on the space H1 (0, 1) of mean square differentiable functions. Assume xi are ordered in increasing order and x⌊N/2⌋ ≈ 1/2

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 47 / 130

slide-48
SLIDE 48

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .

Corresponds to a prior on the space H1 (0, 1) of mean square differentiable functions. Assume xi are ordered in increasing order and x⌊N/2⌋ ≈ 1/2 We then have, for i < ⌊N/2⌋ < j : Cov

  • Xi, Xj|X⌊N/2⌋
  • ≈ 0.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 48 / 130

slide-49
SLIDE 49

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .

Corresponds to a prior on the space H1 (0, 1) of mean square differentiable functions. Assume xi are ordered in increasing order and x⌊N/2⌋ ≈ 1/2 We then have, for i < ⌊N/2⌋ < j : Cov

  • Xi, Xj|X⌊N/2⌋
  • ≈ 0.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 49 / 130

slide-50
SLIDE 50

Disintegration of Gaussian Measures and the Screening Effect

Consider X ∈ RN, {xi}1≤i≤N ⊂ [0, 1] and Θi,j := exp

  • −|xi − xj|
  • .

Corresponds to a prior on the space H1 (0, 1) of mean square differentiable functions. Assume xi are ordered in increasing order and x⌊N/2⌋ ≈ 1/2 We then have, for i < ⌊N/2⌋ < j : Cov

  • Xi, Xj|X⌊N/2⌋
  • ≈ 0.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 50 / 130

slide-51
SLIDE 51

Disintegration of Gaussian Measures and the Screening Effect

For two observation sites xi, xj, the covariance conditional on the

  • bervation sites inbetween is small.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 51 / 130

slide-52
SLIDE 52

Disintegration of Gaussian Measures and the Screening Effect

For two observation sites xi, xj, the covariance conditional on the

  • bervation sites inbetween is small.

Known as screening effect in the spatial statistics community. Analysed by Stein (2002). Used, among others, by Banerjee et al. (2008) and Katzfuss (2015) for efficient approximation of Gaussian processes.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 52 / 130

slide-53
SLIDE 53

Disintegration of Gaussian Measures and the Screening Effect

For two observation sites xi, xj, the covariance conditional on the

  • bervation sites inbetween is small.

Known as screening effect in the spatial statistics community. Analysed by Stein (2002). Used, among others, by Banerjee et al. (2008) and Katzfuss (2015) for efficient approximation of Gaussian processes. Let us take Y = X⌊N/2⌋. Then Y is cheap to sample, and the covariance matrix of X|Y has only 2 (N/2)2 noneglegible entries.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 53 / 130

slide-54
SLIDE 54

Disintegration of Gaussian Measures and the Screening Effect

For two observation sites xi, xj, the covariance conditional on the

  • bervation sites inbetween is small.

Known as screening effect in the spatial statistics community. Analysed by Stein (2002). Used, among others, by Banerjee et al. (2008) and Katzfuss (2015) for efficient approximation of Gaussian processes. Let us take Y = X⌊N/2⌋. Then Y is cheap to sample, and the covariance matrix of X|Y has only 2 (N/2)2 noneglegible entries. When using Cholesky decomposition, this yields a factor 4 improvement of computational speed.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 54 / 130

slide-55
SLIDE 55

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Look at a single step of Block Cholesky decomposition:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 55 / 130

slide-56
SLIDE 56

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Look at a single step of Block Cholesky decomposition: This corresponds to: Θ11 Θ12 Θ21 Θ22

  • =
  • Id

Θ21Θ−1

11

Id Θ11 Θ22 − Θ21Θ−1

11 Θ12

Id Θ−1

11 Θ12

Id

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 56 / 130

slide-57
SLIDE 57

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Look at a single step of Block Cholesky decomposition: This corresponds to: Θ11 Θ12 Θ21 Θ22

  • =
  • Id

Θ21Θ−1

11

Id Θ11 Θ22 − Θ21Θ−1

11 Θ12

Id Θ−1

11 Θ12

Id

  • Note, that for
  • Θ21Θ−1

11

  • b = E [X2|X1 = b], and

Θ22 − Θ21Θ−1

11 Θ12 = Cov [X2|X1].

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 57 / 130

slide-58
SLIDE 58

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Look at a single step of Block Cholesky decomposition: This corresponds to: Θ11 Θ12 Θ21 Θ22

  • =
  • Id

Θ21Θ−1

11

Id Θ11 Θ22 − Θ21Θ−1

11 Θ12

Id Θ−1

11 Θ12

Id

  • Note, that for
  • Θ21Θ−1

11

  • b = E [X2|X1 = b], and

Θ22 − Θ21Θ−1

11 Θ12 = Cov [X2|X1].

(Block-)Cholesky decomposition is computationally equivalent to the disintegration of Gaussian measures.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 58 / 130

slide-59
SLIDE 59

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Look at a single step of Block Cholesky decomposition: This corresponds to: Θ11 Θ12 Θ21 Θ22

  • =
  • Id

Θ21Θ−1

11

Id Θ11 Θ22 − Θ21Θ−1

11 Θ12

Id Θ−1

11 Θ12

Id

  • Note, that for
  • Θ21Θ−1

11

  • b = E [X2|X1 = b], and

Θ22 − Θ21Θ−1

11 Θ12 = Cov [X2|X1].

(Block-)Cholesky decomposition is computationally equivalent to the disintegration of Gaussian measures. Follows immediately from well known formulas, but rarely used in the literature. One Example: Bickson (2008).

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 59 / 130

slide-60
SLIDE 60

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

This suggests to choose a bisective elimination ordering:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 60 / 130

slide-61
SLIDE 61

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

This suggests to choose a bisective elimination ordering: Lets start compting the Cholesky decomposition

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 61 / 130

slide-62
SLIDE 62

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

This suggests to choose a bisective elimination ordering: Lets start compting the Cholesky decomposition We observe a fade-out of entries!

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 62 / 130

slide-63
SLIDE 63

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

What about higher dimensional examples?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 63 / 130

slide-64
SLIDE 64

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

How about higher dimensional examples? In 2d, use quadsection:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 64 / 130

slide-65
SLIDE 65

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We know that the result of the factorisation is sparse, but can we compute it efficiently?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 65 / 130

slide-66
SLIDE 66

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We know that the result of the factorisation is sparse, but can we compute it efficiently? Key observation: The entry (i, j) is used for the first time with the min (i, j)-th pivot.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 66 / 130

slide-67
SLIDE 67

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We know that the result of the factorisation is sparse, but can we compute it efficiently? Key observation: The entry (i, j) is used for the first time with the min (i, j)-th pivot. If we know they will be negligible untill we use them, we don’t have to update them, nor know them in the first place.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 67 / 130

slide-68
SLIDE 68

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We know that the result of the factorisation is sparse, but can we compute it efficiently? Key observation: The entry (i, j) is used for the first time with the min (i, j)-th pivot. If we know they will be negligible untill we use them, we don’t have to update them, nor know them in the first place.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 68 / 130

slide-69
SLIDE 69

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Bisective/Quadsective ordering is the reverse of nested dissection. Indeed, for P the order-reversing permutation matrix, we have: (Θ)−1 =

  • LLT−1

= L−TL−1 = ⇒ P (Θ)−1 P = PL−TPPL−1P =

  • PL−TP

PL−TP T , But we have L−1 = LT (Θ)−1. For a sparse elimination ordering of Θ, the reverse ordering leads to sparse factorisation of (Θ)−1

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 69 / 130

slide-70
SLIDE 70

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We obtain a very simple algorithm:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 70 / 130

slide-71
SLIDE 71

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We obtain a very simple algorithm: Given a positive definite matrix Θ and a Graph G, such that Θ−1 is sparse according to G.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 71 / 130

slide-72
SLIDE 72

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We obtain a very simple algorithm: Given a positive definite matrix Θ and a Graph G, such that Θ−1 is sparse according to G. Obtain inverse nested dissection ordering for G.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 72 / 130

slide-73
SLIDE 73

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We obtain a very simple algorithm: Given a positive definite matrix Θ and a Graph G, such that Θ−1 is sparse according to G. Obtain inverse nested dissection ordering for G. Set entries (i, j) that are separated after pivot number min (i, j) to zero.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 73 / 130

slide-74
SLIDE 74

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

We obtain a very simple algorithm: Given a positive definite matrix Θ and a Graph G, such that Θ−1 is sparse according to G. Obtain inverse nested dissection ordering for G. Set entries (i, j) that are separated after pivot number min (i, j) to zero. Compute incomplete Cholesky factorisation.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 74 / 130

slide-75
SLIDE 75

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Remaining problems with our approach:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 75 / 130

slide-76
SLIDE 76

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Remaining problems with our approach: Nested dissection does not lead to near-linear complexity algorithms

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 76 / 130

slide-77
SLIDE 77

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Remaining problems with our approach: Nested dissection does not lead to near-linear complexity algorithms Precision matrix will not be exactly sparse. How is it localised?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 77 / 130

slide-78
SLIDE 78

Sparse Factorisation of Dense Matrices: fade-out instead of fill-in

Remaining problems with our approach: Nested dissection does not lead to near-linear complexity algorithms Precision matrix will not be exactly sparse. How is it localised? The answer can be found in the recent literature on numerical homogenisation:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 78 / 130

slide-79
SLIDE 79

Sparse factorisation of dense matrices using gamblets

“Gamblet” bases have been introduced as part of the game theoretical approach to numerical PDE (Owhadi (2017), Owhadi and Scovel (2017) ).

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 79 / 130

slide-80
SLIDE 80

Sparse factorisation of dense matrices using gamblets

“Gamblet” bases have been introduced as part of the game theoretical approach to numerical PDE (Owhadi (2017), Owhadi and Scovel (2017) ). Assume our covariance matrix is Θi,j =

  • [0,1]2

φ(q)

i

(x) G (x, y) φ(q)

j

(y) dx dy For φ(q)

i

:= ✶[(i−1)hq,ihq] and G the Green’s function of a second

  • rder elliptic PDE.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 80 / 130

slide-81
SLIDE 81

Sparse factorisation of dense matrices using gamblets

“Gamblet” bases have been introduced as part of the game theoretical approach to numerical PDE (Owhadi (2017), Owhadi and Scovel (2017) ). Assume our covariance matrix is Θi,j =

  • [0,1]2

φ(q)

i

(x) G (x, y) φ(q)

j

(y) dx dy For φ(q)

i

:= ✶[(i−1)hq,ihq] and G the Green’s function of a second

  • rder elliptic PDE.

Corresponds to Xi (ω) =

1

  • φ(q)

i

(x) u (x, ω) dx, with u (ω) solution to elliptic SPDE with Gaussian forcing.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 81 / 130

slide-82
SLIDE 82

Sparse factorisation of dense matrices using gamblets

Similiar to our case, only with ✶[(i−1)hq,ihq] instead of dirac mesure.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 82 / 130

slide-83
SLIDE 83

Sparse factorisation of dense matrices using gamblets

Similiar to our case, only with ✶[(i−1)hq,ihq] instead of dirac mesure. For φ(k)

i

:= ✶[(i−1)hk,ihk], Owhadi and Scovel (2017) shows:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 83 / 130

slide-84
SLIDE 84

Sparse factorisation of dense matrices using gamblets

Similiar to our case, only with ✶[(i−1)hq,ihq] instead of dirac mesure. For φ(k)

i

:= ✶[(i−1)hk,ihk], Owhadi and Scovel (2017) shows: ψ(k)

i

:= E

  • u|

1

0 u (x) φ(k) j

(x) dx = δi,j

  • is exponentially localised,
  • n a scale of hk:
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 84 / 130

slide-85
SLIDE 85

Sparse factorisation of dense matrices using gamblets

Similiar to our case, only with ✶[(i−1)hq,ihq] instead of dirac mesure. For φ(k)

i

:= ✶[(i−1)hk,ihk], Owhadi and Scovel (2017) shows: ψ(k)

i

:= E

  • u|

1

0 u (x) φ(k) j

(x) dx = δi,j

  • is exponentially localised,
  • n a scale of hk:

Main idea: Estimate on exponential decay of a conditional expectation implies exponential decay of a Cholesky factors.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 85 / 130

slide-86
SLIDE 86

Sparse factorisation of dense matrices using gamblets

Transform to multiresolution basis to obtain block matrix:

  • Γk,l
  • i,j =
  • [0,1]2

φ(k),χ

i

(x) G (x, y) φ(l),χ

j

(y) dx dy

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 86 / 130

slide-87
SLIDE 87

Sparse factorisation of dense matrices using gamblets

Transform to multiresolution basis to obtain block matrix:

  • Γk,l
  • i,j =
  • [0,1]2

φ(k),χ

i

(x) G (x, y) φ(l),χ

j

(y) dx dy Where the

  • φ(k),χ

j

  • j∈J(k) are chosen as Haar basis functions.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 87 / 130

slide-88
SLIDE 88

Sparse factorisation of dense matrices using gamblets

Then the results of Owhadi (2017) and Owhadi and Scovel (2017) imply that:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 88 / 130

slide-89
SLIDE 89

Sparse factorisation of dense matrices using gamblets

Then the results of Owhadi (2017) and Owhadi and Scovel (2017) imply that: χ(k)

i

:= E

  • u|

1

0 u (x) φ(l),χ j

(x) dx = δi,jδk,l, ∀l ≤ k

  • is exponentially

localised, on a scale of hk:

  • χ(k)

i

  • x − x(k)

i

  • ≤ C exp
  • − γ

hk

  • x − x(k)

i

  • .
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 89 / 130

slide-90
SLIDE 90

Sparse factorisation of dense matrices using gamblets

Then the results of Owhadi (2017) and Owhadi and Scovel (2017) imply that: χ(k)

i

:= E

  • u|

1

0 u (x) φ(l),χ j

(x) dx = δi,jδk,l, ∀l ≤ k

  • is exponentially

localised, on a scale of hk:

  • χ(k)

i

  • x − x(k)

i

  • ≤ C exp
  • − γ

hk

  • x − x(k)

i

  • .

Furthermore, the stiffness matrices decay exponentially on each level: B(k)

i,j

:=

1

  • χ(k)

i

(x) G−1χ(k)

j

(x) dx ≤ exp

  • −γ
  • xi − xj
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 90 / 130

slide-91
SLIDE 91

Sparse factorisation of dense matrices using gamblets

Then the results of Owhadi (2017) and Owhadi and Scovel (2017) imply that: χ(k)

i

:= E

  • u|

1

0 u (x) φ(l),χ j

(x) dx = δi,jδk,l, ∀l ≤ k

  • is exponentially

localised, on a scale of hk:

  • χ(k)

i

  • x − x(k)

i

  • ≤ C exp
  • − γ

hk

  • x − x(k)

i

  • .

Furthermore, the stiffness matrices decay exponentially on each level: B(k)

i,j

:=

1

  • χ(k)

i

(x) G−1χ(k)

j

(x) dx ≤ exp

  • −γ
  • xi − xj
  • Finally, we have for a constant κ:

cond

  • B(k)

≤ κ, ∀k

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 91 / 130

slide-92
SLIDE 92

Sparse factorisation of dense matrices using gamblets

The above properties will allow us to show localisation of the (block ) Cholesky factors:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 92 / 130

slide-93
SLIDE 93

Sparse factorisation of dense matrices using gamblets

The above properties will allow us to show localisation of the (block ) Cholesky factors: Consider the two-scale case: Γ11 Γ12 Γ21 Γ22

  • =
  • Id

Γ21Γ−1

11

Id Θ11 Γ22 − Γ21Γ−1

11 Γ12

Id Γ−1

11 Γ12

Id

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 93 / 130

slide-94
SLIDE 94

Sparse factorisation of dense matrices using gamblets

The above properties will allow us to show localisation of the (block ) Cholesky factors: Consider the two-scale case: Γ11 Γ12 Γ21 Γ22

  • =
  • Id

Γ21Γ−1

11

Id Γ11 Γ22 − Γ21Γ−1

11 Γ12

Id Γ−1

11 Γ12

Id

  • Γ21Γ−1

11

  • i,j = E
  • uφ(2),χ

i

dx

  • uφ(1),χ

m

dx = δj,m

  • =
  • φ(2),χ

i

χ(1)

j

dx Γ22 − Γ21Γ−1

11 Γ12 = Cov

  • uφ(2),χ dx
  • uφ(1),χ dx
  • =
  • B(2)−1
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 94 / 130

slide-95
SLIDE 95

Sparse factorisation of dense matrices using gamblets

  • Γ21Γ−1

11

  • i,j =
  • φ(2),χ

i

χ(1)

j

dx ≤ C exp

  • − γ

h

  • x(2)

i

− x(1)

j

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 95 / 130

slide-96
SLIDE 96

Sparse factorisation of dense matrices using gamblets

  • Γ21Γ−1

11

  • i,j =
  • φ(2),χ

i

χ(1)

j

dx ≤ C exp

  • − γ

h

  • x(2)

i

− x(1)

j

  • Fact: Inverses ( Demko (1984), Jaffard (1990) ) and Cholesky

factors (Benzi and T˚ uma (2000), Krishtal et al. (2015) ) of well-conditioned and banded/exponentially localised matrices are exponentially localised.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 96 / 130

slide-97
SLIDE 97

Sparse factorisation of dense matrices using gamblets

  • Γ21Γ−1

11

  • i,j =
  • φ(2),χ

i

χ(1)

j

dx ≤ C exp

  • − γ

h

  • x(2)

i

− x(1)

j

  • Fact: Inverses ( Demko (1984), Jaffard (1990) ) and Cholesky

factors (Benzi and T˚ uma (2000), Krishtal et al. (2015) ) of well-conditioned and banded/exponentially localised matrices are exponentially localised. Therefore:

  • B(2)−1

i,j ≤ C exp

  • − γ

h2

  • x2

i − x(2) j

  • .
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 97 / 130

slide-98
SLIDE 98

Sparse factorisation of dense matrices using gamblets

  • Γ21Γ−1

11

  • i,j =
  • φ(2),χ

i

χ(1)

j

dx ≤ C exp

  • − γ

h

  • x(2)

i

− x(1)

j

  • Fact: Inverses ( Demko (1984), Jaffard (1990) ) and Cholesky

factors (Benzi and T˚ uma (2000), Krishtal et al. (2015) ) of well-conditioned and banded/exponentially localised matrices are exponentially localised. Therefore:

  • B(2)−1

i,j ≤ C exp

  • − γ

h2

  • x2

i − x(2) j

  • .

Argument can can be extended to multiple scales. Results in exponentially decaying (block-)Cholesky factors.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 98 / 130

slide-99
SLIDE 99

Sparse factorisation of dense matrices using gamblets

  • Γ21Γ−1

11

  • i,j =
  • φ(2),χ

i

χ(1)

j

dx ≤ C exp

  • − γ

h

  • x(2)

i

− x(1)

j

  • Fact: Inverses ( Demko (1984), Jaffard (1990) ) and Cholesky

factors (Benzi and T˚ uma (2000), Krishtal et al. (2015) ) of well-conditioned and banded/exponentially localised matrices are exponentially localised. Therefore:

  • B(2)−1

i,j ≤ C exp

  • − γ

h2

  • x2

i − x(2) j

  • .

Argument can can be extended to multiple scales. Results in exponentially decaying (block-)Cholesky factors. These factors can be approximated in time complexity by (block-)Cholesky decomposition in computational complexity of O

  • N log2 (N)
  • log (1/ǫ) + log2 (N)

4d+1 in time and O

  • N log(N) logd(N 1

ǫ )

  • in space for an approximation error of ǫ.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 99 / 130

slide-100
SLIDE 100

Sparse factorisation of dense matrices using gamblets

How about φ(q)

i

= δx(q)

i

, i.e. pointwise sampling?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 100 / 130

slide-101
SLIDE 101

Sparse factorisation of dense matrices using gamblets

How about φ(q)

i

= δx(q)

i

, i.e. pointwise sampling? In Owhadi and Scovel (2017), analogue results for pointwise samples are obtained using averaging:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 101 / 130

slide-102
SLIDE 102

Sparse factorisation of dense matrices using gamblets

We are left with a simple algorithm:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 102 / 130

slide-103
SLIDE 103

Sparse factorisation of dense matrices using gamblets

We are left with a simple algorithm: Let Γ be Θ expressed in multiresolution basis.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 103 / 130

slide-104
SLIDE 104

Sparse factorisation of dense matrices using gamblets

We are left with a simple algorithm: Let Γ be Θ expressed in multiresolution basis. Throw away all entries outside of Sρ, defined as Sρ :=

  • (i, j) ∈ I × I
  • i ∈ J(k), j ∈ J(l), dist
  • x(k)

i

, x(l)

j

  • ≤ ρ ∗ hmin(k,l)

.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 104 / 130

slide-105
SLIDE 105

Sparse factorisation of dense matrices using gamblets

We are left with a simple algorithm: Let Γ be Θ expressed in multiresolution basis. Throw away all entries outside of Sρ, defined as Sρ :=

  • (i, j) ∈ I × I
  • i ∈ J(k), j ∈ J(l), dist
  • x(k)

i

, x(l)

j

  • ≤ ρ ∗ hmin(k,l)

. Compute incomplete (block-)Cholesky decomposition of Γ restricted to Sρ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 105 / 130

slide-106
SLIDE 106

Sparse factorisation of dense matrices using gamblets

We are left with a simple algorithm: Let Γ be Θ expressed in multiresolution basis. Throw away all entries outside of Sρ, defined as Sρ :=

  • (i, j) ∈ I × I
  • i ∈ J(k), j ∈ J(l), dist
  • x(k)

i

, x(l)

j

  • ≤ ρ ∗ hmin(k,l)

. Compute incomplete (block-)Cholesky decomposition of Γ restricted to Sρ. Factorisation can be done in O (N poly (ρ log (N))), error decays exponentially with ρ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 106 / 130

slide-107
SLIDE 107

Sparse factorisation of dense matrices using gamblets

We are left with two closely related problems:

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 107 / 130

slide-108
SLIDE 108

Sparse factorisation of dense matrices using gamblets

We are left with two closely related problems: The multiresolution basis, in order to satisfy the conditions of the proof of bounded condition numbers given in Owhadi and Scovel (2017) needs to satisfy the vanishing moment condition:

  • τ (k)

i

pφ(k),χ

i

dx = 0, ∀p ∈ Ps−1

  • τ (k)

i

  • ,

for a τ (k)

i

  • f diameter ≈ hk and 2s the order of the elliptic operator.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 108 / 130

slide-109
SLIDE 109

Sparse factorisation of dense matrices using gamblets

We are left with two closely related problems: The multiresolution basis, in order to satisfy the conditions of the proof of bounded condition numbers given in Owhadi and Scovel (2017) needs to satisfy the vanishing moment condition:

  • τ (k)

i

pφ(k),χ

i

dx = 0, ∀p ∈ Ps−1

  • τ (k)

i

  • ,

for a τ (k)

i

  • f diameter ≈ hk and 2s the order of the elliptic operator.

Therefore, the multiresolution basis depends on the operator.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 109 / 130

slide-110
SLIDE 110

Sparse factorisation of dense matrices using gamblets

We are left with two closely related problems: The multiresolution basis, in order to satisfy the conditions of the proof of bounded condition numbers given in Owhadi and Scovel (2017) needs to satisfy the vanishing moment condition:

  • τ (k)

i

pφ(k),χ

i

dx = 0, ∀p ∈ Ps−1

  • τ (k)

i

  • ,

for a τ (k)

i

  • f diameter ≈ hk and 2s the order of the elliptic operator.

Therefore, the multiresolution basis depends on the operator. Also, averaging over large regions required for coarse basis

  • functions. Leads to O
  • N2

complexity of basis transform.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 110 / 130

slide-111
SLIDE 111

Sparse factorisation of dense matrices using gamblets

Can we get rid of vanishing moment condition?

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 111 / 130

slide-112
SLIDE 112

Sparse factorisation of dense matrices using gamblets

Can we get rid of vanishing moment condition? Conditions in Owhadi and Scovel (2017) are (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) λmax (Θ|⊥Φ(k−1)) ≤ CHk−1.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 112 / 130

slide-113
SLIDE 113

Sparse factorisation of dense matrices using gamblets

Can we get rid of vanishing moment condition? Conditions in Owhadi and Scovel (2017) are (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) λmax (Θ|⊥Φ(k−1)) ≤ CHk−1. Moving to finer scales, the discrete space contains more and more

  • scillatory functions (small eigenvalues).
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 113 / 130

slide-114
SLIDE 114

Sparse factorisation of dense matrices using gamblets

Can we get rid of vanishing moment condition? Conditions in Owhadi and Scovel (2017) are (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) λmax (Θ|⊥Φ(k−1)) ≤ CHk−1. Moving to finer scales, the discrete space contains more and more

  • scillatory functions (small eigenvalues).

But its in the orthogonal complement, of a larger space, low modes are “projected out”.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 114 / 130

slide-115
SLIDE 115

Sparse factorisation of dense matrices using gamblets

Can we get rid of vanishing moment condition? Conditions in Owhadi and Scovel (2017) are (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) λmax (Θ|⊥Φ(k−1)) ≤ CHk−1. Moving to finer scales, the discrete space contains more and more

  • scillatory functions (small eigenvalues).

But its in the orthogonal complement, of a larger space, low modes are “projected out”. Balance of these effects leads to bounded condition numbers.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 115 / 130

slide-116
SLIDE 116

Sparse factorisation of dense matrices using gamblets

Gamblets are more robust!

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 116 / 130

slide-117
SLIDE 117

Sparse factorisation of dense matrices using gamblets

Gamblets are more robust! Can replace the conditions with (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) max

φ∈Φk,φ=1

min

ϕ∈Φk−1:ϕ≤C (φ − ϕ)T Θ (φ − ϕ) ≤ CHk−1.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 117 / 130

slide-118
SLIDE 118

Sparse factorisation of dense matrices using gamblets

Gamblets are more robust! Can replace the conditions with (roughly speaking): 1 C Hk ≤ λmin (Θ|Φ(k)) max

φ∈Φk,φ=1

min

ϕ∈Φk−1:ϕ≤C (φ − ϕ)T Θ (φ − ϕ) ≤ CHk−1.

The gamblets find the optimal orthogonalisation themselves!

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 118 / 130

slide-119
SLIDE 119

Sparse factorisation of dense matrices using gamblets

We can use subsampling as an aggregation scheme!

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 119 / 130

slide-120
SLIDE 120

Sparse factorisation of dense matrices using gamblets

Our algorithm now consists of three steps:

1

Reorder the variables hierarchically

2

Obtain the entries in S2 ( or more generally Sρ ), set other entries to zero.

3

Compute the incomplete Cholesky decomposition

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 120 / 130

slide-121
SLIDE 121

Sparse factorisation of dense matrices using gamblets

Our algorithm now consists of three steps:

1

Reorder the variables hierarchically

2

Obtain the entries in S2 ( or more generally Sρ ), set other entries to zero.

3

Compute the incomplete Cholesky decomposition

At this point, for theoretical guarantuees we need to replace step three with an incomplete Block factorisation. All numerical evidence indicates that this is not necessary.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 121 / 130

slide-122
SLIDE 122

Two additional results

As observed in Owhadi 2017, Hou and Zhang 2017, gamblets provide a near-optimal sparse PCA. We obtain a PCA with the same approximation property, by keeping only the first k columns

  • f L.
  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 122 / 130

slide-123
SLIDE 123

Two additional results

As observed in Owhadi 2017, Hou and Zhang 2017, gamblets provide a near-optimal sparse PCA. We obtain a PCA with the same approximation property, by keeping only the first k columns

  • f L.

By reversing the elimination ordering, we obtain a near linear complexity Cholesky factorisation of the sparse/exponentially decaying inverse of Θ.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 123 / 130

slide-124
SLIDE 124

Problems at the boundary

Figure: ν = 1, l = 0.4

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 124 / 130

slide-125
SLIDE 125

Problems at the boundary

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 125 / 130

slide-126
SLIDE 126

Decay of approximation error

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 126 / 130

slide-127
SLIDE 127

Sparse approximate PCA

Figure: Near optimal sparse PCA: First panel: ν = 1, l = 0.2, δx = 0.2 and ρ = 6. Second panel: ν = 2, l = 0.2 and δx = 0.2 and ρ = 8.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 127 / 130

slide-128
SLIDE 128

Perturbation of the Mesh

δx Γρ − Γ Γρ − Γ/Γ Γρ − ΓFro Γρ − ΓFro/ΓFro #S #S/N2 0.2 4.336e-03 1.560e-06 1.669e-02 1.026e-06 2.125e+07 7.675e-02 0.4 4.495e-03 1.617e-06 1.706e-02 1.063e-06 2.128e+07 7.683e-02 2.0 4.551e-03 1.638e-06 1.820e-02 1.077e-06 2.127e+07 7.682e-02 4.0 8.158e-03 2.940e-06 2.976e-02 1.933e-06 2.119e+07 7.652e-02

Table: Compression and accuracy for q = 7, l = 0.2, ρ = 5, ν = 1 and different values of δx.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 128 / 130

slide-129
SLIDE 129

Data on low dimensional manifold

δz Γρ − Γ Γρ − Γ/Γ Γρ − ΓFro Γρ − ΓFro/ΓFro #S #S/N2 0.0 5.049e-03 1.560e-06 1.885e-02 1.026e-06 2.126e+07 7.677e-02 0.1 6.341e-02 1.648e-06 1.232e-01 1.077e-06 2.083e+07 7.521e-02 0.2 1.204e-01 1.749e-06 2.203e-01 1.126e-06 1.976e+07 7.137e-02 0.4 1.954e-01 3.550e-06 5.098e-01 2.197e-06 1.722e+07 6.218e-02

Table: Compression and accuracy for q = 7, l = 0.2, ρ = 5, ν = 1, δx = 2 and different values of δz.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 129 / 130

slide-130
SLIDE 130

Fractional Operators

ν Γρ − Γ Γρ − Γ/Γ Γρ − ΓFro Γρ − ΓFro/ΓFro #S #S/N2 1.0 1.266e-03 4.556e-07 4.987e-03 2.995e-07 2.776e+07 1.003e-01 1.1 1.813e-03 6.423e-07 6.216e-03 4.190e-07 2.776e+07 1.003e-01 1.3 3.235e-03 1.129e-06 1.039e-02 7.312e-07 2.776e+07 1.003e-01 1.5 5.245e-03 1.811e-06 1.652e-02 1.166e-06 2.776e+07 1.003e-01 1.6 6.800e-03 2.333e-06 2.148e-02 1.498e-06 2.776e+07 1.003e-01 1.8 9.891e-03 3.362e-06 3.088e-02 2.147e-06 2.776e+07 1.003e-01 2.0 1.238e-02 4.180e-06 3.892e-02 2.662e-06 2.776e+07 1.003e-01

Table: Compression and accuracy for q = 7, l = 0.2, ρ = 6, δx = 0.2 and different values of ν.

  • F. Schäfer, T.J. Sullivan, H. Owhadi

Sparse factorisation of dense Kernel matrices June 8th 2017 130 / 130