Kernel Design GP Summer School, Sheffield, September 2016 Nicolas - - PowerPoint PPT Presentation

kernel design
SMART_READER_LITE
LIVE PREVIEW

Kernel Design GP Summer School, Sheffield, September 2016 Nicolas - - PowerPoint PPT Presentation

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Kernel Design GP Summer School, Sheffield, September 2016 Nicolas Durrande, Mines St-tienne, durrande@emse.fr GP Summer School


slide-1
SLIDE 1

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Kernel Design

GP Summer School, Sheffield, September 2016 Nicolas Durrande, Mines St-Étienne, durrande@emse.fr

GP Summer School Kernel Design 1 / 60

slide-2
SLIDE 2

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 2 / 60

slide-3
SLIDE 3

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 3 / 60

slide-4
SLIDE 4

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

We have seen during the introduction lectures that the distribution

  • f a GP Z depends on two functions :

the mean m(x) = E (Z(x)) the covariance k(x, x′) = cov (Z(x), Z(x′)) In this talk, we will focus on the covariance function, which is

  • ften call the kernel.

GP Summer School Kernel Design 4 / 60

slide-5
SLIDE 5

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

We assume we have observed a function f for a limited number of time points x1, . . . , xn :

0.0 0.2 0.4 0.6 0.8 1.0

  • 1.0
  • 0.5

0.0 0.5 1.0 1.5

x f(x)

The observations are denoted by fi = f (xi) (or F = f (X)).

GP Summer School Kernel Design 5 / 60

slide-6
SLIDE 6

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Since f in unknown, we make the general assumption that it is a sample path of a Gaussian process Z :

0.0 0.2 0.4 0.6 0.8 1.0

  • 3
  • 2
  • 1

1 2 3 4

x Z(x)

GP Summer School Kernel Design 6 / 60

slide-7
SLIDE 7

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Combining these two informations means keeping the samples interpolating the data points :

0.0 0.2 0.4 0.6 0.8 1.0

  • 1.0
  • 0.5

0.0 0.5 1.0 1.5

x Z(x)|Z(X) = F

GP Summer School Kernel Design 7 / 60

slide-8
SLIDE 8

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

The conditional distribution is still Gaussian with moments :

m(x) = E (Z(x)|Z(X)=F) = k(x, X)k(X, X)−1F c(x, x′) = cov (Z(x), Z(x′)|Z(X)=F) = k(x, x′) − k(x, X)k(X, X)−1k(X, x′)

It can be represented as a mean function with confidence intervals.

0.0 0.2 0.4 0.6 0.8 1.0

  • 1.0
  • 0.5

0.0 0.5 1.0 1.5

x Z(x)|Z(X) = F

GP Summer School Kernel Design 8 / 60

slide-9
SLIDE 9

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 9 / 60

slide-10
SLIDE 10

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Let Z be a random process with kernel k. Some properties of kernels can be obtained directly from their definition.

Example

k(x, x) = cov (Z(x), Z(x)) = var (Z(x)) ≥ 0 ⇒ k(x, x) is positive. k(x, y) = cov (Z(x), Z(y)) = cov (Z(y), Z(x)) = k(y, x) ⇒ k(x, y) is symmetric. We can obtain a thinner result...

GP Summer School Kernel Design 10 / 60

slide-11
SLIDE 11

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

We introduce the random variable T = n

i=1 aiZ(xi) where n, ai

and xi are arbitrary. Computing the variance of T gives : var (T) = cov

 

i

aiZ(xi),

  • j

ajZ(xj)

  =

  • i
  • j

aiajcov (Z(xi), Z(xj)) = aiajk(xi, xj) Since a variance is positive, we have

  • i
  • j

aiajk(xi, xj) ≥ 0 for any arbitrary n, ai and xi.

Definition

The functions satisfying the above inequality for all n ∈ N, for all xi ∈ D, for all ai ∈ R are called positive semi-definite functions.

GP Summer School Kernel Design 11 / 60

slide-12
SLIDE 12

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

We have just seen : k is a covariance ⇒ k is a positive semi-definite function The reverse is also true :

Theorem (Loeve)

k corresponds to the covariance of a GP

  • k is a symmetric positive semi-definite function

GP Summer School Kernel Design 12 / 60

slide-13
SLIDE 13

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Proving that a function is psd is often intractable. However there are a lot of functions that have already been proven to be psd :

squared exp. k(x, y) = σ2 exp

  • − (x − y)2

2θ2

  • Matern 5/2

k(x, y) = σ2

  • 1 +

√ 5|x − y| θ + 5|x − y|2 3θ2

  • exp

√ 5|x − y| θ

  • Matern 3/2

k(x, y) = σ2

  • 1 +

√ 3|x − y| θ

  • exp

√ 3|x − y| θ

  • exponential

k(x, y) = σ2 exp

  • − |x − y|

θ

  • Brownian

k(x, y) = σ2 min(x, y) white noise k(x, y) = σ2δx,y constant k(x, y) = σ2 linear k(x, y) = σ2xy

When k is a function of x − y, the kernel is called stationary. σ2 is called the variance and θ the lengthscale.

GP Summer School Kernel Design 13 / 60

slide-14
SLIDE 14

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion GP Summer School Kernel Design 14 / 60

slide-15
SLIDE 15

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

If k is stationary psd implies further results :

Properties

If ˜ k is n times differentiable in 0, then it is n times differentiable everywhere. The maximum value of ˜ k(t) is reached in t = 0.

Example

The following functions are not valid covariance structures

t K(t) t K(t) t K(t) GP Summer School Kernel Design 15 / 60

slide-16
SLIDE 16

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

For a few kernels, it is possible to prove they are psd directly from the definition. k(x, y) = δx,y k(x, y) = 1 For most of them a direct proof from the definition is not possible. The following theorem is helpful for stationary kernels :

Theorem (Bochner)

A continuous stationary function k(x, y) = ˜ k(|x − y|) is positive definite if and only if ˜ k is the Fourier transform of a finite positive measure : ˜ k(t) =

  • R

e−iωtdµ(ω)

GP Summer School Kernel Design 16 / 60

slide-17
SLIDE 17

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Example

We consider the following measure : Its Fourier transform gives ˜ k(t) = sin(t) t :

0.0 0.0

As a consequence, k(x, y) = sin(x − y) x − y is a valid covariance function.

GP Summer School Kernel Design 17 / 60

slide-18
SLIDE 18

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Usual kernels

Bochner theorem can be used to prove the positive definiteness of many usual stationary kernels The Gaussian is the Fourier transform of itself ⇒ it is psd. Matérn kernels are the Fourier transforms of

1 (1+ω2)p

⇒ they are psd.

GP Summer School Kernel Design 18 / 60

slide-19
SLIDE 19

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Unusual kernels

Inverse Fourier transform of a (symmetrised) sum of Gaussian gives (A. Wilson, ICML 2013) : µ(ω)

0.0

− →

F ˜ k(t)

0.0

The obtained kernel is parametrised by its spectrum.

GP Summer School Kernel Design 19 / 60

slide-20
SLIDE 20

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Unusual kernels

The sample paths have the following shape :

1 2 3 4 5 6 4 2 2 4 6

GP Summer School Kernel Design 20 / 60

slide-21
SLIDE 21

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 21 / 60

slide-22
SLIDE 22

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Changing the kernel has a huge impact on the model :

Gaussian kernel: Exponential kernel:

GP Summer School Kernel Design 22 / 60

slide-23
SLIDE 23

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

This is because changing the kernel implies changing the prior

Gaussian kernel: Exponential kernel:

GP Summer School Kernel Design 23 / 60

slide-24
SLIDE 24

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

In order to choose a kernel, one should gather all possible informations about the function to approximate... Is it stationary ? Is it differentiable, what’s its regularity ? Do we expect particular trends ? Do we expect particular patterns (periodicity, cycles, additivity) ? Kernels often include rescaling parameters : θ for the x axis (length-scale) and σ for the y (σ2 often corresponds to the GP variance). They can be tuned using maximizing the likelihood minimizing the prediction error

GP Summer School Kernel Design 24 / 60

slide-25
SLIDE 25

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

It is common to try various kernels and to asses the model

  • accuracy. The idea is to compare some model predictions against

actual values : On a test set Using leave-one-out Two (ideally three) things should be checked : Is the mean accurate (MSE, Q2) ? Do the confidence intervals make sense ? Are the predicted covariances right ? Furthermore, it is often interesting to try some input remapping such as x → log(x), x → exp(x), ...

GP Summer School Kernel Design 25 / 60

slide-26
SLIDE 26

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 26 / 60

slide-27
SLIDE 27

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Making new from old : Kernels can be : Summed together

◮ On the same space k(x, y) = k1(x, y) + k2(x, y) ◮ On the tensor space k(x, y) = k1(x1, y1) + k2(x2, y2)

Multiplied together

◮ On the same space k(x, y) = k1(x, y) × k2(x, y) ◮ On the tensor space k(x, y) = k1(x1, y1) × k2(x2, y2)

Composed with a function

◮ k(x, y) = k1(f (x), f (y))

All these operations will preserve the positive definiteness. How can this be useful ?

GP Summer School Kernel Design 27 / 60

slide-28
SLIDE 28

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

Example (The Mauna Loa observatory dataset)

This famous dataset compiles the monthly CO2 concentration in Hawaii since 1958.

1960 1970 1980 1990 2000 2010 2020 2030 320 340 360 380 400 420 440

Let’s try to predict the concentration for the next 20 years.

GP Summer School Kernel Design 28 / 60

slide-29
SLIDE 29

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

We first consider a squared-exponential kernel : k(x, y) = σ2 exp

  • −(x − y)2

θ2

  • 1950

1960 1970 1980 1990 2000 2010 2020 2030 2040 600 400 200 200 400 600 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 300 320 340 360 380 400 420 440 460 480

The results are terrible !

GP Summer School Kernel Design 29 / 60

slide-30
SLIDE 30

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

What happen if we sum both kernels ? k(x, y) = krbf 1(x, y) + krbf 2(x, y)

1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 300 320 340 360 380 400 420 440 460 480

GP Summer School Kernel Design 30 / 60

slide-31
SLIDE 31

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

What happen if we sum both kernels ? k(x, y) = krbf 1(x, y) + krbf 2(x, y)

1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 300 320 340 360 380 400 420 440 460 480

The model is drastically improved !

GP Summer School Kernel Design 30 / 60

slide-32
SLIDE 32

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

We can try the following kernel : k(x, y) = σ2

0x2y2 + krbf 1(x, y) + krbf 2(x, y) + kper(x, y)

1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 300 320 340 360 380 400 420 440 460

GP Summer School Kernel Design 31 / 60

slide-33
SLIDE 33

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over the same space

We can try the following kernel : k(x, y) = σ2

0x2y2 + krbf 1(x, y) + krbf 2(x, y) + kper(x, y)

1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 300 320 340 360 380 400 420 440 460

Once again, the model is significantly improved.

GP Summer School Kernel Design 31 / 60

slide-34
SLIDE 34

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

Property

k(x, y) = k1(x1, y1) + k2(x2, y2) is a valid covariance structure.

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

+

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

=

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5

Remark : From a GP point of view, k is the kernel of Z(x) = Z1(x1) + Z2(x2)

GP Summer School Kernel Design 32 / 60

slide-35
SLIDE 35

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

We can have a look at a few sample paths from Z :

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2 1 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 1 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5

⇒ They are additive (up to a modification) Tensor Additive kernels are very useful for Approximating additive functions Building models over high dimensional inputs spaces

GP Summer School Kernel Design 33 / 60

slide-36
SLIDE 36

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

We consider the test function f (x) = sin(4πx1) + cos(4πx2) + 2x2 and a set of 20 observation in [0, 1]2 Test function

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1 1 2 3

Observations

0.2 0.3 0.4 0.5 0.60.70.8 0.2 0.4 0.6 0.8 1 1 2 3

GP Summer School Kernel Design 34 / 60

slide-37
SLIDE 37

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

We obtain the following models : Gaussian kernel Mean predictor

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 2 1 1 2 3

RMSE is 1.06 Additive Gaussian kernel Mean predictor

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1 1 2 3

RMSE is 0.12

GP Summer School Kernel Design 35 / 60

slide-38
SLIDE 38

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

Remarks

It is straightforward to show that the mean predictor is additive

m(x) = (k1(x, X) + k2(x, X))(k(X, X))−1F = k1(x1, X1)(k(X, X))−1F

  • m1(x1)

+ k2(x2, X2)(k(X, X))−1F

  • m2(x2)

⇒ The model shares the prior behaviour.

The sub-models can be interpreted as GP regression models with observation noise : m1(x1) = E ( Z1(x1) | Z1(X1) + Z2(X2)=F )

GP Summer School Kernel Design 36 / 60

slide-39
SLIDE 39

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

Remark

The prediction variance has interesting features

  • pred. var. with kernel product

0.0 0.2 0.4 0.6 0.8 1.0

x1

0.0 0.2 0.4 0.6 0.8 1.0

x2

  • pred. var. with kernel sum

0.0 0.2 0.4 0.6 0.8 1.0

x1

0.0 0.2 0.4 0.6 0.8 1.0

x2

GP Summer School Kernel Design 37 / 60

slide-40
SLIDE 40

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Sum of kernels over tensor space

This property can be used to construct a design of experiment that covers the space with only cst × d points.

0.0 0.2 0.4 0.6 0.8 1.0

x1

0.0 0.2 0.4 0.6 0.8 1.0

x2

Prediction variance

GP Summer School Kernel Design 38 / 60

slide-41
SLIDE 41

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Product over the same space

Property

k(x, y) = k1(x, y) × k2(x, y) is valid covariance structure.

Example

We consider the product of a squared exponential with a cosine : × =

GP Summer School Kernel Design 39 / 60

slide-42
SLIDE 42

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Product over the tensor space

Property

k(x, y) = k1(x1, y1) × k2(x2, y2) is valid covariance structure.

Example

We multiply 2 squared exponential kernel

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

×

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

=

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Calculation shows we obtain the usual 2D squared exponential kernels.

GP Summer School Kernel Design 40 / 60

slide-43
SLIDE 43

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Composition with a function

Property

Let k1 be a kernel over D1 × D1 and f be an arbitrary function D → D1, then k(x, y) = k1(f (x), f (y)) is a kernel over D × D.

proof aiajk(xi, xj) = aiajk1(f (xi)

  • yi

, f (xj)

  • yj

) ≥ 0

Remarks : k corresponds to the covariance of Z(x) = Z1(f (x)) This can be seen as a (nonlinear) rescaling of the input space

GP Summer School Kernel Design 41 / 60

slide-44
SLIDE 44

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Example

We consider f (x) = 1

x and a Matérn 3/2 kernel

k1(x, y) = (1 + |x − y|)e−|x−y|. We obtain : Kernel

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Sample paths

0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 1 2 3

GP Summer School Kernel Design 42 / 60

slide-45
SLIDE 45

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

All these transformations can be combined !

Example

k(x, y) = f (x)f (y)k1(x, y) is a valid kernel. This can be illustrated with f (x) = 1

x and

k1(x, y) = (1 + |x − y|)e−|x−y| : Kernel

0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 60 70

Sample paths

0.0 0.2 0.4 0.6 0.8 1.0 40 20 20 40

GP Summer School Kernel Design 43 / 60

slide-46
SLIDE 46

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 44 / 60

slide-47
SLIDE 47

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Effect of a linear operator

Property (Ginsbourger 2013)

Let L be a linear operator that commutes with the covariance, then k(x, y) = Lx(Ly(k1(x, y))) is a kernel.

Example

We want to approximate a function [0, 1] → R that is symmetric with respect to 0.5. We will consider 2 linear operators : L1 : f (x) →

  • f (x)

x < 0.5 f (1 − x) x ≥ 0.5 L2 : f (x) → f (x) + f (1 − x) 2 .

GP Summer School Kernel Design 45 / 60

slide-48
SLIDE 48

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Effect of a linear operator

Example

Associated sample paths are k1 = L1(L1(k))

0.0 0.2 0.4 0.6 0.8 1.0 −2 −1 1 2 3 x Y

k2 = L2(L2(k))

0.0 0.2 0.4 0.6 0.8 1.0 −2 −1 1 2 3 x Y

The differentiability is not always respected !

GP Summer School Kernel Design 46 / 60

slide-49
SLIDE 49

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Effect of a linear operator

These linear operator are projections onto a space of symmetric functions :

H Hsym f L1f L2f

What about the optimal projection ? ⇒ This can be difficult... but it raises interesting questions !

GP Summer School Kernel Design 47 / 60

slide-50
SLIDE 50

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 48 / 60

slide-51
SLIDE 51

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Periodicity detection

We will now discuss the detection of periodicity Given a few observations can we extract the periodic part of a signal ?

5 10 15 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5

GP Summer School Kernel Design 49 / 60

slide-52
SLIDE 52

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

As previously we will build a decomposition of the process in two independent GPs : Z = Zp + Za where Zp is a GP in the span of the Fourier basis B(t) = (sin(t), cos(t), . . . , sin(nt), cos(nt))t.

Property

It can be proved that the kernel of Zp and Za are kp(x, y) = B(x)tG−1B(y) ka(x, y) = k(x, y) − kp(x, y) where G is the Gram matrix associated to B in the RKHS.

GP Summer School Kernel Design 50 / 60

slide-53
SLIDE 53

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

As previously, a decomposition of the model comes with a decomposition of the kernel

m(t) = (kp(x, X) + ka(x, X))k(X, X)−1F = kp(x, X)k(X, X)−1F

  • periodic sub-model mp

+ ka(x, X)k(X, X)−1F

  • aperiodic sub-model ma

and we can associate a prediction variance to the sub-models :

vp(t) = kp(x, x) − kp(x, X)tk(X, X)−1kp(t) va(t) = ka(x, x) − ka(x, X)tk(X, X)−1ka(t)

GP Summer School Kernel Design 51 / 60

slide-54
SLIDE 54

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Example

For the observations shown previously we obtain :

5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

||

5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

+

Can we can do any better ?

GP Summer School Kernel Design 52 / 60

slide-55
SLIDE 55

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Initially, the kernels are parametrised by 2 variables : k(x, y, σ2, θ) but writing k as a sum allows to tune independently the parameters of the sub-kernels. Let k∗ be defined as k∗(x, y, σ2

p, σ2 a, θp, θa) = kp(x, y, σ2 p, θp) + ka(x, y, σ2 a, θa)

Furthermore, we include a 5th parameter in k∗ accounting for the period by changing the Fourier basis : Bω(t) = (sin(ωt), cos(ωt), . . . , sin(nωt), cos(nωt))t

GP Summer School Kernel Design 53 / 60

slide-56
SLIDE 56

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

MLE of the 5 parameters of k∗ gives :

5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

||

5 10 15 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

+

We will now illustrate the use of these kernels for gene expression analysis.

GP Summer School Kernel Design 54 / 60

slide-57
SLIDE 57

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

We can apply this method to study the circadian rythm in

  • rganisms. We used arabidopsis data from Edward 2006.

The dimension of the data is : 22810 genes 13 time points Edward 2006 gives a list of the 3504 most periodically expressed

  • genes. The comparison with our approach gives :

21767 genes with the same label (2461 per. and 19306 non-per.) 1043 genes with different labels

GP Summer School Kernel Design 55 / 60

slide-58
SLIDE 58

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Let’s look at genes with different labels :

30 40 50 60 70 3 2 1 1 2 3

At1g60810

30 40 50 60 70 2 1 1 2 3

At4g10040

30 40 50 60 70 3 2 1 1 2 3

At1g06290

30 40 50 60 70 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

At5g48900

30 40 50 60 70 2 1 1 2 3 4

At5g41480

30 40 50 60 70 2 1 1 2 3 4

At3g08000

30 40 50 60 70 3 2 1 1 2

At3g03900

30 40 50 60 70 2 1 1 2 3

At2g36400

periodic for Edward periodic for our approach

GP Summer School Kernel Design 56 / 60

slide-59
SLIDE 59

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion

GP Summer School Kernel Design 57 / 60

slide-60
SLIDE 60

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Small recap

We have seen that Kernels have a huge impact on the model They have to reflect the prior belief on the function to approximate. Kernels can (and should) be tailored to the problem at hand. Although a direct proof of the positive definiteness of a function is

  • ften intractable, Bochner theorem allows to build kernels from

their power spectrum.

GP Summer School Kernel Design 58 / 60

slide-61
SLIDE 61

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

Various operations can be applied to kernels while keeping p.s.d.ness :

Making new from old

sum product composition with a function

Linear operator

If we have a linear application that transforms any function into a function satisfying the desired property, it is possible to build a GP fulfilling the requirements.

GP Summer School Kernel Design 59 / 60

slide-62
SLIDE 62

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion

  • C. E. Rasmussen and C. Williams

Gaussian Processes for Machine Learning, The MIT Press, 2006.

  • A. Berlinet and C. Thomas-Agnan

RKHS in probability and statistics, Kluwer academic, 2004.

  • N. Durrande, D. Ginsbourger, O. Roustant

Additive covariance kernels for high-dimensional Gaussian process modeling, AFST 2012.

  • N. Durrande, D. Ginsbourger, O. Roustant, L. Carraro

ANOVA kernels and RKHS of zero mean functions for model-based sensitivity analysis, JMA 2013.

  • N. Durrande, J. Hensman, M. Rattray, N. D. Lawrence

Detecting periodicities with Gaussian processes. PeerJ Computer Science 2016.

  • D. Ginsbourger, X. Bay, L. Carraro and O. Roustant

Argumentwise invariant kernels for the approximation of invariant functions, AFST 2012.

GP Summer School Kernel Design 60 / 60