More on kernels Marcel Lthi Graphics and Vision Research Group - - PowerPoint PPT Presentation

more on kernels
SMART_READER_LITE
LIVE PREVIEW

More on kernels Marcel Lthi Graphics and Vision Research Group - - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE More on kernels Marcel Lthi Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE


slide-1
SLIDE 1

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

More on kernels

Marcel Lüthi

Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel

slide-2
SLIDE 2

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Kernels everywhere

Integral and differential equations

  • Aronszajn, Nachman. "Theory of reproducing kernels." Transactions of the American mathematical

society (1950): 337-404.

Numerical analysis, Approximation and Interpolation theory

  • Wahba, Grace. Spline models for observational data. Vol. 59. Siam, 1990.
  • Schaback, Robert, and Holger Wendland. "Kernel techniques: From machine learning to meshless

methods." Acta Numerica 15 (2006): 543-639.

  • Hennig, Philipp, and Osborn, Michael: Probabilistic numerics
  • Geostatistics (Gaussian processes)
  • Stein, Michael L. Interpolation of spatial data: some theory for kriging. Springer Science & Business

Media, 1999.

2

slide-3
SLIDE 3

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Kernels everywhere

  • Learning Theory / Machine learning
  • Vapnik, Vladimir. Statistical learning theory. Vol. 1. New York: Wiley, 1998.
  • Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in

machine learning." The annals of statistics (2008): 1171-1220.

  • Shape modelling / Image analysis
  • Grenander, Ulf, and Michael I. Miller. "Computational anatomy: An emerging discipline."

Quarterly of applied mathematics 56.4 (1998): 617-694.

  • Younes, Laurent: Shapes and diffeomorphisms, Springer 2010

3

slide-4
SLIDE 4

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

What do they have in common?

  • Solution space has a rich structure

to be able to:

  • Predict unseen values
  • Deal with noisy or incomplete data
  • Capture a pattern
  • Kernels ideally suited to define

such structure

  • The resulting space of functions is

mathematically “nice”.

4

ML Statistics Image analysis Numerics Differential equations

slide-5
SLIDE 5

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Back to basics: Scalar-valued GPs

Vector-valued (this course)

  • Samples u are deformation

fields: 𝑣: 𝒴 → ℝ𝑒

Scalar-valued (more common)

  • Samples f are real-valued

functions 𝑔 ∶ 𝒴 → ℝ

slide-6
SLIDE 6

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Scalar-valued Gaussian processes

Vector-valued (this course)

𝑣 ∼ 𝐻𝑄 Ԧ 𝜈, 𝒍 Ԧ 𝜈: 𝒴 → ℝ𝑒 𝒍: 𝒴 × 𝒴 → ℝ𝑒×𝑒

Scalar-valued (more common)

𝑔 ∼ 𝐻𝑄 𝜈, 𝑙 𝜈: 𝒴 → ℝ 𝑙: 𝒴 × 𝒴 → ℝ

slide-7
SLIDE 7

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

A connection

Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection: Define 𝑙( 𝑦, 𝑗 , 𝑦′, 𝑘 = 𝒍 𝑦′, 𝑦′ 𝑗,𝑘

slide-8
SLIDE 8

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Vector/scalar valued kernel matrices

8

𝑳 = 𝑙11 𝑦1, 𝑦1 𝑙12 𝑦1, 𝑦1 𝑙21 𝑦1, 𝑦1 𝑙22 𝑦1, 𝑦1 … 𝑙11 𝑦1, 𝑦𝑜 𝑙12 𝑦1, 𝑦𝑜 𝑙21 𝑦1, 𝑦𝑜 𝑙22 𝑦1, 𝑦𝑜 ⋮ ⋮ 𝑙11 𝑦𝑜, 𝑦1 𝑙12 𝑦𝑜, 𝑦1 𝑙21 𝑦𝑜, 𝑦1 𝑙22 𝑦𝑜, 𝑦1 … 𝑙11 𝑦𝑜, 𝑦𝑜 𝑙12 𝑦𝑜, 𝑦𝑜 𝑙21 𝑦𝑜, 𝑦𝑜 𝑙22 𝑦𝑜, 𝑦𝑜

𝐿 = 𝑙 (𝑦1, 1), (𝑦1, 1) 𝑙 (𝑦1, 1), (𝑦1, 2) 𝑙 𝑦1, 2 , (𝑦1, 1) 𝑙 𝑦1, 2 , (𝑦1, 2) … 𝑙 (𝑦1, 1), (𝑦𝑜, 1) 𝑙 (𝑦1, 1), (𝑦𝑜, 2) 𝑙 𝑦1, 2 , (𝑦𝑜, 1) 𝑙 𝑦1, 2 , (𝑦𝑜, 2) ⋮ ⋮ 𝑙 (𝑦𝑜, 1), (𝑦1, 1) 𝑙 (𝑦𝑜, 1), (𝑦1, 2) 𝑙 𝑦𝑜, 2 , (𝑦1, 1) 𝑙 𝑦𝑜, 2 , (𝑦1, 2) … 𝑙 (𝑦𝑜, 1), (𝑦𝑜, 1) 𝑙 (𝑦𝑜, 1), (𝑦𝑜, 2) 𝑙 𝑦𝑜, 2 , (𝑦𝑜, 1) 𝑙 𝑦𝑜, 2 , (𝑦𝑜, 2)

slide-9
SLIDE 9

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

A connection

Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection: Define 𝑙( 𝑦, 𝑗 , 𝑦′, 𝑘 = 𝒍 𝑦′, 𝑦′ 𝑗,𝑘

All the theory developed for the scalar-valued GPs holds also for vector-valued GPs!

slide-10
SLIDE 10

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

The sampling space

slide-11
SLIDE 11

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

The space of samples

Sampling from 𝐻𝑄 𝜈, 𝑙 is done using the corresponding normal distribution 𝑂( Ԧ 𝜈, K) Algorithm (slightly inefficient)

  • 1. Do an SVD: K = 𝑉𝐸2𝑉𝑈
  • 2. Draw a normal vector 𝛽 ∼ 𝑂 0, 𝐽𝑜×𝑜
  • 3. Compute Ԧ

𝜈 + 𝑉𝐸𝛽

11

slide-12
SLIDE 12

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

The space of samples

  • From K = 𝑉𝐸2𝑉𝑈(using that 𝑉𝑈𝑉 = 𝐽) we have that

K𝑉𝐸−1 = 𝑉𝐸

  • A sample

𝑡 = Ԧ 𝜈 + 𝑉𝐸𝛽 = Ԧ 𝜈 + K𝑉𝐸−1𝛽 corresponds to linear combinations of the columns of K.

12

  • K is symmetric → rows/columns can be used interchangeably
slide-13
SLIDE 13

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Example: Squared exponential

13

σ = 1

𝑙 𝑦, 𝑦′ = exp − 𝑦 − 𝑦′ 2 𝜏2

slide-14
SLIDE 14

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Example: Squared exponential

14

𝑙 𝑦, 𝑦′ = exp − 𝑦 − 𝑦′ 2 𝜏2

σ = 3

slide-15
SLIDE 15

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Multi-scale signals

  • k x, x′ = exp − 𝑦 −

𝑦′ 1 2

+ 0.1 exp − 𝑦 −

𝑦′ 0.1 2

15

slide-16
SLIDE 16

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Periodic kernels

  • Define 𝑣 𝑦 =

cos 𝑦 sin(𝑦)

  • 𝑙 𝑦, 𝑦′ = exp(−‖(𝑣 𝑦 − 𝑣 𝑦′ ‖2= exp(−4 sin2

‖𝑦 −𝑦′‖ 𝜏2

)

16

slide-17
SLIDE 17

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Symmetric kernels

  • Enforce that f(x) = f(-x)
  • 𝑙 𝑦, 𝑦′ = 𝑙 −𝑦, 𝑦′ + 𝑙(𝑦, 𝑦′)

17

slide-18
SLIDE 18

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Changepoint kernels

  • 𝑙 𝑦, 𝑦′ = 𝑡 𝑦 𝑙1 𝑦, 𝑦′ 𝑡 𝑦′ + (1 − 𝑡 𝑦 )𝑙2(𝑦, 𝑦′)(1 − 𝑡 𝑦′ )
  • s 𝑦 =

1 1+exp( −𝑦)

18

slide-19
SLIDE 19

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

f x = x

19

Combining existing functions

𝑙 𝑦, 𝑦′ = 𝑔 𝑦 𝑔 𝑦′

slide-20
SLIDE 20

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

20

Combining existing functions

𝑙 𝑦, 𝑦′ = 𝑔 𝑦 𝑔 𝑦′ f x = sin(x)

slide-21
SLIDE 21

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

{f1 x = x, f2 x = sin(x)}

21

𝑙 𝑦, 𝑦′ = ෍

𝑗

𝑔

𝑗 𝑦 𝑔 𝑗(𝑦′)

Combining existing functions

slide-22
SLIDE 22

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Reproducing Kernel Hilbert Space

  • Define the space of functions

𝐼 = {𝑔|𝑔 𝑦 = ෍

𝑗=1 𝑂

𝛽𝑗𝑙 𝑦, 𝑦𝑗 , 𝑜 ∈ ℕ, 𝑦𝑗 ∈ 𝑌, 𝛽𝑗 ∈ ℝ} For 𝑔 𝑦 = σ𝑗 𝛽𝑗𝑙 𝑦𝑗, 𝑦 and 𝑕 𝑦 = σ𝑘 𝛽𝑘

′𝑙(𝑦𝑘, 𝑦) we define the

inner product 𝑔, 𝑕 𝑙 = ෍

𝑗,𝑘

𝛽𝑗𝛽𝑘

′𝑙(𝑦𝑗, 𝑦𝑘)

The space H called a Reproducing Kernel Hilbert Space (RKHS).

slide-23
SLIDE 23

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Two differnet basis for the RKHS

  • Kernel basis
  • Eigenbasis (KL-Basis)

𝑙 𝑦, 𝑦′ = exp − 𝑦 − 𝑦′ 2 9

slide-24
SLIDE 24

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Gaussian process regression

slide-25
SLIDE 25

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Gaussian process regression

  • Given :

Observations: {(𝑦1, 𝑧1), … , 𝑦𝑜, 𝑧𝑜 }

  • Goal:

compute p(𝑧∗|𝑦∗, 𝑦1, … , 𝑦𝑜, 𝑧1, … , 𝑧𝑜)

25

𝑦1 𝑦2 𝑦𝑜 𝑦∗ 𝑧∗

slide-26
SLIDE 26

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Gaussian process regression

  • Solution given by posterior process 𝐻𝑄 𝜈𝑞, 𝑙𝑞 with

𝜈𝑞(𝑦∗) = 𝐿 𝑦∗, 𝑌 𝐿 𝑌, 𝑌 + 𝜏2𝐽 −1𝑧 𝑙𝑞 𝑦∗, 𝑦∗′ = 𝑙 𝑦∗, 𝑦∗′ − 𝐿 𝑦∗, 𝑌 𝐿 𝑌, 𝑌 + 𝜏2𝐽 −1𝐿 𝑌, 𝑦∗

  • We can sample from the posterior.

26

slide-27
SLIDE 27

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

27

slide-28
SLIDE 28

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

28

Gaussian kernel (𝜏 = 1)

slide-29
SLIDE 29

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

29

Gaussian kernel (𝜏 = 5)

slide-30
SLIDE 30

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

30

Periodic kernel

slide-31
SLIDE 31

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

31

Changepoint kernel

slide-32
SLIDE 32

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

32

Symmetric kernel

slide-33
SLIDE 33

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Examples

33

Linear kernel

slide-34
SLIDE 34

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Observations about the solution

  • The covariance is independent of the value at the training points

38

𝑙𝑞 𝑦∗, 𝑦∗′ = 𝑙 𝑦∗, 𝑦∗′ − 𝐿 𝑦∗, 𝑌 𝐿 𝑌, 𝑌 + 𝜏2𝐽 −1𝐿 𝑌, 𝑦∗

slide-35
SLIDE 35

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

Kernels and associated structures

39

slide-36
SLIDE 36

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

An enlightening paper

40