Multiple-output Gaussian processes Mauricio A. Alvarez Department - - PowerPoint PPT Presentation

multiple output gaussian processes
SMART_READER_LITE
LIVE PREVIEW

Multiple-output Gaussian processes Mauricio A. Alvarez Department - - PowerPoint PPT Presentation

Multiple-output Gaussian processes Mauricio A. Alvarez Department of Computer Science, The University of Sheffield. 1 / 76 Sensor Network South Coast of England Sensor location 2 / 76 Sensor Network South Coast of England Sensor


slide-1
SLIDE 1

Multiple-output Gaussian processes

Mauricio A. ´ Alvarez

Department of Computer Science, The University of Sheffield.

1 / 76

slide-2
SLIDE 2

Sensor Network

South Coast of England Sensor location

2 / 76

slide-3
SLIDE 3

Sensor Network

South Coast of England Sensor location

2 / 76

slide-4
SLIDE 4

Jura Data Set

Lead pH level Copper

b a

Region of Swiss Jura

3 / 76

slide-5
SLIDE 5

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

4 / 76

slide-6
SLIDE 6

Single-output Gaussian process

5 / 76

slide-7
SLIDE 7

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′))

5 / 76

slide-8
SLIDE 8

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′))

5 / 76

slide-9
SLIDE 9

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′))

5 / 76

slide-10
SLIDE 10

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}

5 / 76

slide-11
SLIDE 11

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}

5 / 76

slide-12
SLIDE 12

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}    f(x1) . . . f(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)      

5 / 76

slide-13
SLIDE 13

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}    f(x1) . . . f(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)      

f

5 / 76

slide-14
SLIDE 14

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}    f(x1) . . . f(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)      

f K

5 / 76

slide-15
SLIDE 15

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}    f(x1) . . . f(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)      

f K

5 / 76

slide-16
SLIDE 16

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) D = {(xi, f(xi))|i = 1, . . . , N}    f(x1) . . . f(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)      

f K

For prediction: p(f(x∗)|f)

5 / 76

slide-17
SLIDE 17

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′))

5 / 76

slide-18
SLIDE 18

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi

5 / 76

slide-19
SLIDE 19

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2)

5 / 76

slide-20
SLIDE 20

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}

5 / 76

slide-21
SLIDE 21

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

5 / 76

slide-22
SLIDE 22

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y

5 / 76

slide-23
SLIDE 23

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y K

5 / 76

slide-24
SLIDE 24

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y K +

5 / 76

slide-25
SLIDE 25

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y K + σ2I

5 / 76

slide-26
SLIDE 26

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y K + σ2I

5 / 76

slide-27
SLIDE 27

Single-output Gaussian process

f(x) ∼ GP(0, k(x, x′)) y(xi) = f(xi) + ǫi ǫi ∼ N(0, σ2) D = {(xi, y(xi))|i = 1, . . . , N}    y(x1) . . . y(xN)    ∼ N       . . .    ,    k(x1, x1) · · · k(x1, xN) . . . ... . . . k(xN, x1) · · · k(xN, xN)    + σ2    1 · · · . . . ... . . . · · · 1      

y K + σ2I

For prediction: p(f(x∗)|y)

5 / 76

slide-28
SLIDE 28

Multiple-output Gaussian process

6 / 76

slide-29
SLIDE 29

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′))

6 / 76

slide-30
SLIDE 30

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′))

6 / 76

slide-31
SLIDE 31

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′))

6 / 76

slide-32
SLIDE 32

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2}

6 / 76

slide-33
SLIDE 33

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2}

6 / 76

slide-34
SLIDE 34

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 ∼ N(0, K1) f2 ∼ N(0, K2)

6 / 76

slide-35
SLIDE 35

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 ∼ N(0, K1) f2 ∼ N(0, K2) f1 f2

  • ∼ N
  • ,

K1 K2

  • 6 / 76
slide-36
SLIDE 36

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 ∼ N(0, K1) f2 ∼ N(0, K2) f1 f2

  • ∼ N
  • ,

K1 K2

  • f

6 / 76

slide-37
SLIDE 37

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 ∼ N(0, K1) f2 ∼ N(0, K2) f1 f2

  • ∼ N
  • ,

K1 K2

  • f

Kf,f

6 / 76

slide-38
SLIDE 38

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 ∼ N(0, K1) f2 ∼ N(0, K2) f1 f2

  • ∼ N
  • ,

K1 K2

  • f

Kf,f

6 / 76

slide-39
SLIDE 39

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′))

6 / 76

slide-40
SLIDE 40

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2}

6 / 76

slide-41
SLIDE 41

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

6 / 76

slide-42
SLIDE 42

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

y1 y2

  • ∼ N
  • ,

K1 K2

  • +

σ2

1I

σ2

2I

  • 6 / 76
slide-43
SLIDE 43

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

y1 y2

  • ∼ N
  • ,

K1 K2

  • +

σ2

1I

σ2

2I

  • y

6 / 76

slide-44
SLIDE 44

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

y1 y2

  • ∼ N
  • ,

K1 K2

  • +

σ2

1I

σ2

2I

  • y

Kf,f

6 / 76

slide-45
SLIDE 45

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

y1 y2

  • ∼ N
  • ,

K1 K2

  • +

σ2

1I

σ2

2I

  • y

Kf,f + Σ

6 / 76

slide-46
SLIDE 46

Multiple-output Gaussian process

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, y1(xi,2))|i = 1, . . . , N1} D2 = {(xi,2, y2(xi,2))|i = 1, . . . , N2} y1 ∼ N(0, K1 + σ2

1I)

y2 ∼ N(0, K2 + σ2

2I)

y1 y2

  • ∼ N
  • ,

K1 K2

  • +

σ2

1I

σ2

2I

  • y

Kf,f + Σ

6 / 76

slide-47
SLIDE 47

Kernels for multiple outputs

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2}

7 / 76

slide-48
SLIDE 48

Kernels for multiple outputs

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} f1 f2

  • ∼ N
  • ,

K1 K2

  • f

Kf,f

7 / 76

slide-49
SLIDE 49

Kernels for multiple outputs

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} Kf,f =

  • K1

K2

  • 7 / 76
slide-50
SLIDE 50

Kernels for multiple outputs

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} Kf,f =

  • K1

? ? K2

  • 7 / 76
slide-51
SLIDE 51

Kernels for multiple outputs

f1(x) ∼ GP(0, k1(x, x′)) f2(x) ∼ GP(0, k2(x, x′)) D1 = {(xi,1, f1(xi,1))|i = 1, . . . , N1} D2 = {(xi,2, f2(xi,2))|i = 1, . . . , N2} Kf,f =

  • K1

? ? K2

  • Build a cross-covariance

function cov[f1(x), f2(x′)] such that Kf,f is positive semi-definite.

7 / 76

slide-52
SLIDE 52

Different input configurations of the data

Isotopic data Sample sites are shared Inputs for f1(x) Inputs for f2(x)

8 / 76

slide-53
SLIDE 53

Different input configurations of the data

Isotopic data Sample sites are shared Inputs for f1(x) Inputs for f2(x) D1 = {(xi, f1(xi))N

i=1}

D2 = {(xi, f2(xi))N

i=1}

8 / 76

slide-54
SLIDE 54

Different input configurations of the data

Isotopic data Sample sites are shared Inputs for f1(x) Inputs for f2(x) D1 = {(xi, f1(xi))N

i=1}

D2 = {(xi, f2(xi))N

i=1}

Heterotopic data Sample sites may be different

8 / 76

slide-55
SLIDE 55

Different input configurations of the data

Isotopic data Sample sites are shared Inputs for f1(x) Inputs for f2(x) D1 = {(xi, f1(xi))N

i=1}

D2 = {(xi, f2(xi))N

i=1}

Heterotopic data Sample sites may be different D1 = {(xi,1, f1(xi,1))N1

i=1}

D2 = {(xi,2, f2(xi,2))N2

i=1}

8 / 76

slide-56
SLIDE 56

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

9 / 76

slide-57
SLIDE 57

Intrinsic coregionalization model (ICM): two outputs

Consider two outputs f1(x) and f2(x) with x ∈ Rp.

We assume the following generative model for the outputs

  • 1. Sample from a GP u(x) ∼ GP(0, k(x, x′)) to obtain u1(x)
  • 2. Obtain f1(x) and f2(x) by linearly transforming u1(x)

f1(x) = a1

1u1(x)

f2(x) = a1

2u1(x)

10 / 76

slide-58
SLIDE 58

ICM: samples

0.2 0.4 0.6 0.8 1

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4

11 / 76

slide-59
SLIDE 59

ICM: samples

0.2 0.4 0.6 0.8 1

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

11 / 76

slide-60
SLIDE 60

ICM: samples

0.2 0.4 0.6 0.8 1

  • 1.2
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2 0.2 0.4 0.6 0.8 1

  • 2
  • 1.5
  • 1
  • 0.5

0.5

11 / 76

slide-61
SLIDE 61

ICM: samples

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1

11 / 76

slide-62
SLIDE 62

ICM: samples

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 2.5 3 3.5 4 4.5 5

11 / 76

slide-63
SLIDE 63

ICM: samples

0.2 0.4 0.6 0.8 1 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 2.5 3 3.5 4 4.5 5 0.2 0.4 0.6 0.8 1 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5

11 / 76

slide-64
SLIDE 64

ICM: covariance (I)

For a fixed value of x, we can group f1(x) and f2(x) in a vector f(x) f(x) = f1(x) f2(x)

We refer to this vector as a vector-valued function.

The covariance for f(x) is computed as cov (f(x), f(x′)) = E

  • f(x)[f(x′)]⊤

− E {f(x)} [E {f(x′)}]⊤ .

We compute first the term E

  • f(x)[f(x′)]⊤

E f1(x) f2(x) f1(x′) f2(x′) = E {f1(x)f1(x′)} E {f1(x)f2(x′)} E {f2(x)f1(x′)} E {f2(x)f2(x′)}

  • 12 / 76
slide-65
SLIDE 65

ICM: covariance (II)

We compute the expected values as E {f1(x)f1(x′)} = E

  • a1

1u1(x)a1 1u1(x′)

  • = (a1

1)2E

  • u1(x)u1(x′)
  • E {f1(x)f2(x′)} = E
  • a1

1u1(x)a1 2(x′)

  • = a1

1a1 2E

  • u1(x)u1(x′)
  • E {f2(x)f2(x′)} = E
  • a1

2u1(x)a1 2u1(x′)

  • = (a1

2)2E

  • u1(x)u1(x′)

The term E

  • f(x)[f(x′)]⊤

follows as E

  • f(x)[f(x′)]⊤

=

  • (a1

1)2E

  • u1(x)u1(x′)
  • a1

1a1 2E

  • u1(x)u1(x′)
  • a1a2E
  • u1(x)u1(x′)
  • (a1

2)2E

  • u1(x)u1(x′)
  • =
  • (a1

1)2

a1

1a1 2

a1

1a1 2

(a1

2)2

  • E
  • u1(x)u1(x′)

The term E {f(x)} is computed as E

  • f1(x)

f2(x)

  • =
  • E {f1(x)}

E {f2(x)}

  • =
  • E
  • a1

1u1(x)

  • E
  • a1

2u1(x)

  • =
  • a1

1

a1

2

  • E
  • u1(x)
  • 13 / 76
slide-66
SLIDE 66

ICM: covariance (III)

Putting the terms together, the covariance for f(x′) follows as (a1

1)2

a1

1a1 2

a1

1a1 2

(a1

2)2

  • E
  • u1(x)u1(x′)

a1

1

a1

2

a1

1

a1

2

  • E
  • u1(x)
  • E
  • u1(x′)

Defining a = [a1

1 a1 2]⊤,

cov (f(x), f(x′)) = aa⊤E

  • u1(x)u1(x′)
  • − aa⊤E
  • u1(x)
  • E
  • u1(x′)
  • = aa⊤

E

  • u1(x)u1(x′)
  • − E
  • u1(x)
  • E
  • u1(x′)
  • k(x,x′)

= aa⊤k(x, x′)

We define B = aa⊤, leading to cov (f(x), f(x′)) = Bk(x, x′) = b11 b12 b21 b22

  • k(x, x′)

Notice that B has rank one.

14 / 76

slide-67
SLIDE 67

ICM: two outputs and two latent samples

We can introduce a bit more of complexity in the model before as follows.

Consider again two outputs f1(x) and f2(x) with x ∈ Rp.

We assume the following generative model for the outputs

  • 1. Sample twice from a GP u(x) ∼ GP(0, k(x, x′)) to obtain u1(x) and u2(x)
  • 2. Obtain f1(x) and f2(x) by adding a scaled transformation of u1(x) and

u2(x) f1(x) = a1

1u1(x) + a2 1u2(x)

f2(x) = a1

2u1(x) + a2 2u2(x)

Notice that u1(x) and u2(x) are independent, although they share the same covariance k(x, x′).

15 / 76

slide-68
SLIDE 68

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3 0.2 0.4 0.6 0.8 1

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

16 / 76

slide-69
SLIDE 69

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3 0.2 0.4 0.6 0.8 1

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10

16 / 76

slide-70
SLIDE 70

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3 0.2 0.4 0.6 0.8 1

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 5

5 0.2 0.4 0.6 0.8 1

  • 5

5

16 / 76

slide-71
SLIDE 71

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 0.2 0.4 0.6 0.8 1

  • 1
  • 0.5

0.5 1 1.5 2 2.5

16 / 76

slide-72
SLIDE 72

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 0.2 0.4 0.6 0.8 1

  • 1
  • 0.5

0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10

16 / 76

slide-73
SLIDE 73

ICM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 0.2 0.4 0.6 0.8 1

  • 1
  • 0.5

0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

16 / 76

slide-74
SLIDE 74

ICM: covariance

The vector-valued function can be written as f(x) f(x) = a1u1(x) + a2u2(x) where a1 = [a1

1 a1 2]⊤ and a2 = [a2 1 a2 2]⊤.

The covariance for f(x) is computed as cov (f(x), f(x′)) = a1(a1)⊤ cov(u1(x), u1(x′)) + a2(a2)⊤ cov(u2(x), u2(x′)) = a1(a1)⊤k(x, x′) + a2(a2)⊤k(x, x′) =

  • a1(a1)⊤ + a2(a2)⊤

k(x, x′)

We define B = a1(a1)⊤ + a2(a2)⊤, leading to cov (f(x), f(x′)) = Bk(x, x′) =

  • b11

b12 b21 b22

  • k(x, x′)

Notice that B has rank two.

17 / 76

slide-75
SLIDE 75

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

18 / 76

slide-76
SLIDE 76

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

18 / 76

slide-77
SLIDE 77

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • ,
  • b11K

b12K b21K b22K

  • 18 / 76
slide-78
SLIDE 78

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • ,
  • b11K

b12K b21K b22K

  • The matrix K ∈ RN×N has

elements k(xi, xj).

18 / 76

slide-79
SLIDE 79

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N} The Kronecker product between matrices C ∈ Rc1×c2 and G ∈ Rg1×g2 with C =    c1,1 · · · c1,c2 . . . . . . . . . cc1,1 · · · cc1,c2    is C ⊗ G =    c1,1G · · · c1,c2G . . . . . . . . . cc1,1G · · · cc1,c2G   

18 / 76

slide-80
SLIDE 80

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • , B ⊗ K
  • 18 / 76
slide-81
SLIDE 81

ICM: observed data

0.2 0.4 0.6 0.8 1

  • 10
  • 5

5 10 0.2 0.4 0.6 0.8 1

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • , B ⊗ K
  • The matrix K ∈ RN×N has

elements k(xi, xj).

18 / 76

slide-82
SLIDE 82

ICM: general case

Consider a set of functions {fd(x)}D

d=1.

In the ICM fd(x) =

R

  • i=1

ai

dui(x),

where the functions ui(x) are GPs sampled independently, and share the same covariance function k(x, x′).

For f(x) = [f1(x) · · · fD(x)]⊤, the covariance cov[f(x), f(x′)] is given as cov[f(x), f(x′)] = AA⊤ k(x, x′) = B k(x, x′), where A = [a1 a2 · · · aR].

The rank of B ∈ RD×D is given by R.

19 / 76

slide-83
SLIDE 83

ICM: autokrigeability

If the outputs are considered to be noise-free, prediction using the ICM under an isotopic data case is equivalent to independent prediction

  • ver each output.

This circumstance is also known as autokrigeability.

20 / 76

slide-84
SLIDE 84

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

21 / 76

slide-85
SLIDE 85

Semiparametric Latent Factor Model (SLFM)

ICM uses R samples ui(x) from u(x) with the same covariance function.

SLFM uses Q samples from uq(x) processes with different covariance functions.

The SLFM with Q = 1 is the same to the ICM with R = 1.

Consider two outputs f1(x) and f2(x) with x ∈ Rp.

Suppose we have Q = 2.

We assume the following generative model for the outputs

  • 1. Sample from a GP GP(0, k1(x, x′)) to obtain u1(x).
  • 2. Sample from a GP GP(0, k2(x, x′)) to obtain u2(x).
  • 3. Obtain f1(x) and f2(x) by adding a scaled versions of u1(x) and u2(x)

f1(x) = a1,1u1(x) + a1,2u2(x) f2(x) = a2,1u1(x) + a2,2u2(x)

22 / 76

slide-86
SLIDE 86

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

23 / 76

slide-87
SLIDE 87

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

23 / 76

slide-88
SLIDE 88

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 6
  • 4
  • 2

2 4 0.2 0.4 0.6 0.8 1

  • 6
  • 4
  • 2

2 4

23 / 76

slide-89
SLIDE 89

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3

23 / 76

slide-90
SLIDE 90

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3 0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

23 / 76

slide-91
SLIDE 91

SLFM: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3 0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2 0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

23 / 76

slide-92
SLIDE 92

SLFM: covariance

The vector-valued function can be written as f(x) f(x) = a1u1(x) + a2u2(x) where a1 = [a1,1 a2,1]⊤ and a2 = [a1,2 a2,2]⊤.

The covariance for f(x) is computed as cov (f(x), f(x′)) = a1(a1)⊤ cov(u1(x), u1(x′)) + a2(a2)⊤ cov(u2(x), u2(x′)) = a1(a1)⊤k1(x, x′) + a2(a2)⊤k2(x, x′)

We define B1 = a1(a1)⊤ and B2 = a2(a2)⊤, leading to cov (f(x), f(x′)) = B1k1(x, x′) + B2k2(x, x′)

Notice that B1 and B2 have rank one.

24 / 76

slide-93
SLIDE 93

SLFM: observed data

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

25 / 76

slide-94
SLIDE 94

SLFM: observed data

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

25 / 76

slide-95
SLIDE 95

SLFM: observed data

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • , B1 ⊗ K1 + B2 ⊗ K2
  • 25 / 76
slide-96
SLIDE 96

SLFM: observed data

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • , B1 ⊗ K1 + B2 ⊗ K2
  • The matrix K1 ∈ RN×N has

elements k1(xi, xj).

25 / 76

slide-97
SLIDE 97

SLFM: observed data

0.2 0.4 0.6 0.8 1

  • 12
  • 10
  • 8
  • 6
  • 4
  • 2

0.2 0.4 0.6 0.8 1

  • 8
  • 6
  • 4
  • 2

2

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • , B1 ⊗ K1 + B2 ⊗ K2
  • The matrix K1 ∈ RN×N has

elements k1(xi, xj). The matrix K2 ∈ RN×N has elements k2(xi, xj).

25 / 76

slide-98
SLIDE 98

SLFM: general case

Consider a set of functions {fd(x)}D

d=1.

In the SLFM fd(x) =

Q

  • q=1

ad,quq(x), where the functions uq(x) are GPs with covariance functions kq(x, x′).

For f(x) = [f1(x) · · · fD(x)]⊤, the covariance cov[f(x), f(x′)] is given as cov[f(x), f(x′)] =

Q

  • q=1

AqA⊤

q kq(x, x′) = Q

  • q=1

Bq kq(x, x′), where Aq = aq.

The rank of each Bq ∈ RD×D is one.

26 / 76

slide-99
SLIDE 99

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

27 / 76

slide-100
SLIDE 100

Linear model of coregionalization (LMC)

The LMC generalizes the ICM and the SLFM allowing several independent samples from GPs with different covariances.

Consider a set of functions {fd(x)}D

d=1.

In the LMC fd(x) =

Q

  • q=1

Rq

  • i=1

ai

d,qui q(x),

where the functions ui

q(x) are GPs with zero means and covariance

functions cov[ui

q(x), ui′ q′(x′)] = kq(x, x′),

if i = i′ and q = q′.

28 / 76

slide-101
SLIDE 101

LMC: interpretation

In the LMC fd(x) =

Q

  • q=1

Rq

  • i=1

ai

d,qui q(x).

There are Q groups of samples.

For each group, there Rq samples obtained independently from the same GP with covariance kq(x, x′).

  • 29 / 76
slide-102
SLIDE 102

LMC: example

The LMC corresponds to the sum of Q ICMs.

Suppose we have D = 2, Q = 2 and Rq = 2. According to the LMC f1(x) = a1

1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),

f2(x) = a1

2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),

30 / 76

slide-103
SLIDE 103

LMC: example

The LMC corresponds to the sum of Q ICMs.

Suppose we have D = 2, Q = 2 and Rq = 2. According to the LMC f1(x) = a1

1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),

f2(x) = a1

2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),

30 / 76

slide-104
SLIDE 104

LMC: example

The LMC corresponds to the sum of Q ICMs.

Suppose we have D = 2, Q = 2 and Rq = 2. According to the LMC f1(x) = a1

1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),

f2(x) = a1

2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),

30 / 76

slide-105
SLIDE 105

LMC: example

The LMC corresponds to the sum of Q ICMs.

Suppose we have D = 2, Q = 2 and Rq = 2. According to the LMC f1(x) = a1

1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),

f2(x) = a1

2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),

30 / 76

slide-106
SLIDE 106

LMC: example

The LMC corresponds to the sum of Q ICMs.

Suppose we have D = 2, Q = 2 and Rq = 2. According to the LMC f1(x) = a1

1,1u1 1(x) + a2 1,1u2 1(x) + a1 1,2u1 2(x) + a2 1,2u2 2(x),

f2(x) = a1

2,1u1 1(x) + a2 2,1u2 1(x) + a1 2,2u1 2(x) + a2 2,2u2 2(x),

30 / 76

slide-107
SLIDE 107

LMC: covariance for f(x)

For f(x) = [f1(x) · · · fD(x)]⊤, the covariance cov[f(x), f(x′)] is given as cov[f(x), f(x′)] =

Q

  • q=1

AqA⊤

q kq(x, x′) = Q

  • q=1

Bq kq(x, x′), where Aq = [a1

q a2 q · · · aRq q ].

The rank of each Bq is Rq.

The matrices Bq are known as the coregionalization matrices.

31 / 76

slide-108
SLIDE 108

LMC: observed data

32 / 76

slide-109
SLIDE 109

LMC: observed data

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

32 / 76

slide-110
SLIDE 110

LMC: observed data

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N  

  • ,

Q

  • q=1

Bq ⊗ Kq  

32 / 76

slide-111
SLIDE 111

LMC: observed data

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N  

  • ,

Q

  • q=1

Bq ⊗ Kq   The matrix Kq ∈ RN×N has elements kq(xi, xj).

32 / 76

slide-112
SLIDE 112

LMC: observed data

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N  

  • ,

Q

  • q=1

Bq ⊗ Kq   The matrix Kq ∈ RN×N has elements kq(xi, xj). The matrix Bq ∈ RD×D has elements bq

ij .

32 / 76

slide-113
SLIDE 113

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

33 / 76

slide-114
SLIDE 114

Moving average function

Consider again a set of D functions {fd(x)}D

d=1.

Each function could be expressed through a convolution integral between a kernel, {Gd(x)}D

d=1, and a function u(x),

fd(x) =

  • X

Gd(x − z)u(z)dz = Gd(x) ∗ u(x).

For the integral to exist, it is assumed that the kernel Gd(x) is a continuous function with compact support or square-integrable.

The kernel Gd(x) is also known as the moving average function or the smoothing kernel.

In Dependet Gaussian processes (DGP) the latent function u(x) is white Gaussian noise (WGN).

34 / 76

slide-115
SLIDE 115

A pictorial representation

u(x)

u(x): latent function.

35 / 76

slide-116
SLIDE 116

A pictorial representation

u(x) G (x)

1

G (x)

2

G1(x) G2(x)

u(x): latent function. G1(x), G2(x): smoothing kernels.

35 / 76

slide-117
SLIDE 117

A pictorial representation

u(x) G (x)

1

G (x)

2

G1(x) G2(x)

( f x)

2

( f x)

1

f1(x) f2(x)

u(x): latent function. G1(x), G2(x): smoothing kernels. f1(x), f2(x): output functions.

35 / 76

slide-118
SLIDE 118

Cross-covariance between fd(x) and fd′(x)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is E

  • X

Gd(x − z)u(z)dz

  • X

Gd′(x′ − z′)u(z′)dz′

E

  • X

Gd(x − z)u(z)dz

  • E
  • X

Gd′(x′ − z′)u(z′)dz′

  • =
  • X
  • X

Gd(x − z)Gd′(x′ − z′) E [u(z)u(z′)] dz′dz−

  • X

Gd(x − z)E [u(z)] dz

  • X

Gd′(x′ − z′)E [u(z′)] dz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)× {E [u(z)u(z′)] − E [u(z)] E [u(z′)]} dzdz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

In the DGP k(z, z′) = σ2δ(z − z′).

36 / 76

slide-119
SLIDE 119

Cross-covariance between fd(x) and fd′(x)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is E

  • X

Gd(x − z)u(z)dz

  • X

Gd′(x′ − z′)u(z′)dz′

E

  • X

Gd(x − z)u(z)dz

  • E
  • X

Gd′(x′ − z′)u(z′)dz′

  • =
  • X
  • X

Gd(x − z)Gd′(x′ − z′) E [u(z)u(z′)] dz′dz−

  • X

Gd(x − z)E [u(z)] dz

  • X

Gd′(x′ − z′)E [u(z′)] dz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)× {E [u(z)u(z′)] − E [u(z)] E [u(z′)]} dzdz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

In the DGP k(z, z′) = σ2δ(z − z′).

36 / 76

slide-120
SLIDE 120

Cross-covariance between fd(x) and fd′(x)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is E

  • X

Gd(x − z)u(z)dz

  • X

Gd′(x′ − z′)u(z′)dz′

E

  • X

Gd(x − z)u(z)dz

  • E
  • X

Gd′(x′ − z′)u(z′)dz′

  • =
  • X
  • X

Gd(x − z)Gd′(x′ − z′) E [u(z)u(z′)] dz′dz−

  • X

Gd(x − z)E [u(z)] dz

  • X

Gd′(x′ − z′)E [u(z′)] dz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)× {E [u(z)u(z′)] − E [u(z)] E [u(z′)]} dzdz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

In the DGP k(z, z′) = σ2δ(z − z′).

36 / 76

slide-121
SLIDE 121

Cross-covariance between fd(x) and fd′(x)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is E

  • X

Gd(x − z)u(z)dz

  • X

Gd′(x′ − z′)u(z′)dz′

E

  • X

Gd(x − z)u(z)dz

  • E
  • X

Gd′(x′ − z′)u(z′)dz′

  • =
  • X
  • X

Gd(x − z)Gd′(x′ − z′) E [u(z)u(z′)] dz′dz−

  • X

Gd(x − z)E [u(z)] dz

  • X

Gd′(x′ − z′)E [u(z′)] dz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)× {E [u(z)u(z′)] − E [u(z)] E [u(z′)]} dzdz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

In the DGP k(z, z′) = σ2δ(z − z′).

36 / 76

slide-122
SLIDE 122

Cross-covariance between fd(x) and fd′(x)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is E

  • X

Gd(x − z)u(z)dz

  • X

Gd′(x′ − z′)u(z′)dz′

E

  • X

Gd(x − z)u(z)dz

  • E
  • X

Gd′(x′ − z′)u(z′)dz′

  • =
  • X
  • X

Gd(x − z)Gd′(x′ − z′) E [u(z)u(z′)] dz′dz−

  • X

Gd(x − z)E [u(z)] dz

  • X

Gd′(x′ − z′)E [u(z′)] dz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)× {E [u(z)u(z′)] − E [u(z)] E [u(z′)]} dzdz′ =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

In the DGP k(z, z′) = σ2δ(z − z′).

36 / 76

slide-123
SLIDE 123

Example of cov [fd(x), fd′(x′)] (I)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is cov [fd(x), fd′(x′)] = σ2

  • X

Gd(x − z)Gd′(x′ − z)dz

  • Example. Assume that the smoothing kernels follow a Gaussian form

Gd(x − z) = Sd|Pd|1/2 (2π)p/2 exp

  • −1

2(x − z)⊤Pd(x − z)

  • ,

We use the identity of the product of two Gaussians N(x|µ1, P−1

1 )N(x|µ2, P−1 2 ) = N(µ1|µ2, P−1 1

+ P−1

2 )N(x|µc, P−1 c ),

where µc = (P1 + P2)−1 (P1µ1 + P2µ2) and P−1

c

= (P1 + P2)−1.

37 / 76

slide-124
SLIDE 124

Example of cov [fd(x), fd′(x′)] (II)

The cross-covariance between fd(x) and fd′(x′), cov [fd(x), fd′(x′)], is cov [fd(x), fd′(x′)] = σ2

  • X

Gd(x − z)Gd′(x′ − z)dz = σ2SdSd′ (2π)p/2|Peqv|1/2 exp

  • −1

2 (x − x′)⊤ P−1

eqv (x − x′)

  • ,

where Peqv = P−1

d

+ P−1

d′ .

  • Exercise. Show how to obtain the expression above

38 / 76

slide-125
SLIDE 125

PC: samples

0.2 0.4 0.6 0.8 1

  • 1
  • 0.5

0.5 1 1.5 2 2.5 3 0.2 0.4 0.6 0.8 1

  • 2
  • 1

1 2 3

39 / 76

slide-126
SLIDE 126

PC: samples

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

39 / 76

slide-127
SLIDE 127

PC: observed data

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

40 / 76

slide-128
SLIDE 128

PC: observed data

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

40 / 76

slide-129
SLIDE 129

PC: observed data

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • ,
  • Kf1,f1

Kf1,f2 Kf2,f1 Kf2,f2

  • ,
  • 40 / 76
slide-130
SLIDE 130

PC: observed data

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • ,
  • Kf1,f1

Kf1,f2 Kf2,f1 Kf2,f2

  • ,
  • The matrix Kfd,fd ∈ RN×N has

elements cov [fd(x), fd(x′)].

40 / 76

slide-131
SLIDE 131

PC: observed data

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.2 0.4 0.6 0.8 1

  • 2.5
  • 2
  • 1.5
  • 1
  • 0.5

0.5 1

D1 = {(xi, f1(xi))|i = 1, . . . , N} D2 = {(xi, f2(xi))|i = 1, . . . , N}

  • f1

f2

  • =

          f1(x1) . . . f1(xN) f2(x1) . . . f2(xN)           ∼ N

  • ,
  • Kf1,f1

Kf1,f2 Kf2,f1 Kf2,f2

  • ,
  • The matrix Kfd,fd ∈ RN×N has

elements cov [fd(x), fd(x′)]. The matrix Kfd,fd′ ∈ RN×N has elements cov [fd(x), fd′(x′)].

40 / 76

slide-132
SLIDE 132

Beyond u(x) as a white Gaussian noise

Consider again a set of D functions {fd(x)}D

d=1.

Each function could be expressed through a convolution integral between a kernel, {Gd(x)}D

d=1, and a function u(x),

fd(x) =

  • X

Gd(x − z)u(z)dz = Gd(x) ∗ u(x).

Assuming u(x) is a GP with zero mean and covariance k(x, x′).

The cross-covariance is now given as cov [fd(x), fd′(x′)] =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

41 / 76

slide-133
SLIDE 133

A process u(x) with covariance k(x, x′)

The cross-covariance is cov [fd(x), fd′(x′)] =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

  • Example. Assume that the smoothing kernels and the covariance for

u(x) follow a Gaussian form Gd(x − z) = Sd|Pd|1/2 (2π)p/2 exp

  • −1

2(x − z)⊤Pd(x − z)

  • ,

k(z, z′) = |Λ|1/2 (2π)p/2 exp

  • −1

2 (z − z′)⊤ Λ (z − z′)

  • ,

Using again the identities of products of two Gaussians, we get cov [fd(x), fd′(x′)] =

  • X

Gd(x − z)Gd′(x′ − z)dz = SdSd′ (2π)p/2|Peqv|1/2 exp

  • −1

2 (x − x′)⊤ P−1

eqv (x − x′)

  • ,

where Peqv = P−1

d

+ P−1

d′ + Λ−1.

42 / 76

slide-134
SLIDE 134

A process u(x) with covariance k(x, x′)

The cross-covariance is cov [fd(x), fd′(x′)] =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

  • Example. Assume that the smoothing kernels and the covariance for

u(x) follow a Gaussian form Gd(x − z) = Sd|Pd|1/2 (2π)p/2 exp

  • −1

2(x − z)⊤Pd(x − z)

  • ,

k(z, z′) = |Λ|1/2 (2π)p/2 exp

  • −1

2 (z − z′)⊤ Λ (z − z′)

  • ,

Using again the identities of products of two Gaussians, we get cov [fd(x), fd′(x′)] =

  • X

Gd(x − z)Gd′(x′ − z)dz = SdSd′ (2π)p/2|Peqv|1/2 exp

  • −1

2 (x − x′)⊤ P−1

eqv (x − x′)

  • ,

where Peqv = P−1

d

+ P−1

d′ + Λ−1.

42 / 76

slide-135
SLIDE 135

A process u(x) with covariance k(x, x′)

The cross-covariance is cov [fd(x), fd′(x′)] =

  • X
  • X

Gd(x − z)Gd′(x′ − z′)k(z, z′)dzdz′

  • Example. Assume that the smoothing kernels and the covariance for

u(x) follow a Gaussian form Gd(x − z) = Sd|Pd|1/2 (2π)p/2 exp

  • −1

2(x − z)⊤Pd(x − z)

  • ,

k(z, z′) = |Λ|1/2 (2π)p/2 exp

  • −1

2 (z − z′)⊤ Λ (z − z′)

  • ,

Using again the identities of products of two Gaussians, we get cov [fd(x), fd′(x′)] =

  • X

Gd(x − z)Gd′(x′ − z)dz = SdSd′ (2π)p/2|Peqv|1/2 exp

  • −1

2 (x − x′)⊤ P−1

eqv (x − x′)

  • ,

where Peqv = P−1

d

+ P−1

d′ + Λ−1.

42 / 76

slide-136
SLIDE 136

More general process convolutions

We can include more latent processes u1(x), u2(x), . . . , uQ(x) fd(x) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)ui q(z)dz,

where cov[ui

q(z), ui′ q′(z′)] = kq(z, z′)δi,i′δq,q′.

A general expression for cov [fd(x), fd′(x′)] follows as

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

43 / 76

slide-137
SLIDE 137

More general process convolutions

We can include more latent processes u1(x), u2(x), . . . , uQ(x) fd(x) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)ui q(z)dz,

where cov[ui

q(z), ui′ q′(z′)] = kq(z, z′)δi,i′δq,q′.

A general expression for cov [fd(x), fd′(x′)] follows as

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

43 / 76

slide-138
SLIDE 138

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

44 / 76

slide-139
SLIDE 139

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases:

44 / 76

slide-140
SLIDE 140

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Intrinsic Coregionalization Model [Goovaerts, 1997] or Multi-task Gaussian Processes [Bonilla et al., 2008]

44 / 76

slide-141
SLIDE 141

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Intrinsic Coregionalization Model [Goovaerts, 1997] or Multi-task Gaussian Processes [Bonilla et al., 2008] Gi

d,q(x − z) = ai d,qδ(x − z),

Q = 1, Rq > 1

44 / 76

slide-142
SLIDE 142

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Intrinsic Coregionalization Model [Goovaerts, 1997] or Multi-task Gaussian Processes [Bonilla et al., 2008] Gi

d,q(x − z) = ai d,qδ(x − z),

Q = 1, Rq > 1 kfd,fd′ (x, x′) =

R1

  • i=1

ai

d,1ai d′,1k1(x, x′).

44 / 76

slide-143
SLIDE 143

Starting with the general expression we had before ...

Intrinsic Coregionalization Model Kf,f = B ⊗ K

45 / 76

slide-144
SLIDE 144

Starting with the general expression we had before ...

Intrinsic Coregionalization Model Kf,f = B ⊗ K

1 2 3 4 5 !1 1 2

ICM Rq = 1, f1(x)

1 2 3 4 5 !5 5 10

ICM Rq = 1, f2(x)

Rank 1

1 2 3 4 5 !2 !1 1 2

ICM Rq = 2, f1(x)

1 2 3 4 5 !4 !2 2

ICM Rq = 2, f2(x)

Rank 2

45 / 76

slide-145
SLIDE 145

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases:

46 / 76

slide-146
SLIDE 146

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Semiparametric Latent Factor Model [Teh et al., 2005]

46 / 76

slide-147
SLIDE 147

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Semiparametric Latent Factor Model [Teh et al., 2005] Gi

d,q(x − z) = ai d,qδ(x − z),

Rq = 1, Q > 1

46 / 76

slide-148
SLIDE 148

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Semiparametric Latent Factor Model [Teh et al., 2005] Gi

d,q(x − z) = ai d,qδ(x − z),

Rq = 1, Q > 1 kfd,fd′ (x, x′) =

Q

  • q=1

a1

d,qa1 d′,qkq(x, x′).

46 / 76

slide-149
SLIDE 149

Starting with the general expression we had before ...

Semiparametric Latent Factor Model Kf,f =

Q

  • q=1

aqa⊤

q ⊗ Kq

47 / 76

slide-150
SLIDE 150

Starting with the general expression we had before ...

Semiparametric Latent Factor Model Kf,f =

Q

  • q=1

aqa⊤

q ⊗ Kq

1 2 3 4 5 !2 2 4

LMC with Rq = 1 and Q = 2, f1(x)

1 2 3 4 5 !2 2 4

LMC with Rq = 1 and Q = 2, f2(x)

47 / 76

slide-151
SLIDE 151

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases:

48 / 76

slide-152
SLIDE 152

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Linear Model of Coregionalization [Journel and Huijbregts, 1978, Goovaerts, 1997, Wackernagel, 2003].

48 / 76

slide-153
SLIDE 153

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Linear Model of Coregionalization [Journel and Huijbregts, 1978, Goovaerts, 1997, Wackernagel, 2003]. Gi

d,q(x − z) = ai d,qδ(x − z),

Rq > 1, Q > 1,

48 / 76

slide-154
SLIDE 154

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Linear Model of Coregionalization [Journel and Huijbregts, 1978, Goovaerts, 1997, Wackernagel, 2003]. Gi

d,q(x − z) = ai d,qδ(x − z),

Rq > 1, Q > 1, kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1

ai

d,qai d′,qkq(x, x′).

48 / 76

slide-155
SLIDE 155

Starting with the general expression we had before ...

Linear Model of Coregionalization Kf,f =

Q

  • q=1

Bq ⊗ Kq

49 / 76

slide-156
SLIDE 156

Starting with the general expression we had before ...

Linear Model of Coregionalization Kf,f =

Q

  • q=1

Bq ⊗ Kq

1 2 3 4 5 !5 5

LMC with Rq = 2 and Q = 2, f1(x)

1 2 3 4 5 !5 5

LMC with Rq = 2 and Q = 2, f2(x)

49 / 76

slide-157
SLIDE 157

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases:

50 / 76

slide-158
SLIDE 158

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Dependent GPs [Higdon, 2002, Boyle and Frean, 2005]

50 / 76

slide-159
SLIDE 159

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Dependent GPs [Higdon, 2002, Boyle and Frean, 2005] Q = 1, Rq = 1 k1(z, z′) = σ2δ(z, z′),

50 / 76

slide-160
SLIDE 160

Starting with the general expression we had before ...

Assume we have D outputs, {fd(x)}D

d=1. The covariance between fd(x) and

fd′(x′) follows [Higdon, 2002, Boyle and Frean, 2005, ´ Alvarez et al., 2012] kfd,fd′ (x, x′) =

Q

  • q=1

Rq

  • i=1
  • X

Gi

d,q(x − z)

  • X

Gi

d′,q(x′ − z′)kq(z, z′)dz′dz.

Some particular cases: Dependent GPs [Higdon, 2002, Boyle and Frean, 2005] Q = 1, Rq = 1 k1(z, z′) = σ2δ(z, z′), kfd,fd′ (x, x′) = σ2

  • X

Gd(x − z)Gd′(x′ − z)dz.

50 / 76

slide-161
SLIDE 161

Starting with the general expression we had before ...

Comparison

51 / 76

slide-162
SLIDE 162

Starting with the general expression we had before ...

Comparison

5 !1 1 2 3 ICM, f1(x) 5 !5 5 LMC, f1(x) 5 !2 2 4 PC, f1(x) 5 !5 5 10 ICM, f2(x) 5 !5 5 LMC, f2(x) 5 !5 5 10 PC, f2(x)

51 / 76

slide-163
SLIDE 163

Kernels for vector-valued functions

Foundations and Trends R

in

Machine Learning

  • Vol. 4, No. 3 (2011) 195–266

c 2012 M. A. ´ Alvarez, L. Rosasco and N. D. Lawrence DOI: 10.1561/2200000036

Kernels for Vector-Valued Functions: A Review By Mauricio A. ´ Alvarez, Lorenzo Rosasco and Neil D. Lawrence

52 / 76

slide-164
SLIDE 164

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

53 / 76

slide-165
SLIDE 165

Gaussian process priors for vector-valued functions

We saw a series of models for the set of outputs {fd(x)}D

d=1, that led to

a valid covariance function for the vector f(x).

For a finite number of inputs, X = {xn}N

n=1, the prior distribution over the

vector f = [f⊤

1 , . . . , f⊤ D ]⊤ is given as

     f1 f2 . . . fD      ∼ N           . . .      ,      Kf1,f1 Kf1,f2 · · · Kf1,fD Kf2,f1 Kf2,f2 · · · Kf2,fD . . . . . . · · · . . . KfD,f1 KfD,f2 · · · KfD,fD           .

54 / 76

slide-166
SLIDE 166

Gaussian process priors for vector-valued functions

We saw a series of models for the set of outputs {fd(x)}D

d=1, that led to

a valid covariance function for the vector f(x).

For a finite number of inputs, X = {xn}N

n=1, the prior distribution over the

vector f = [f⊤

1 , . . . , f⊤ D ]⊤ is given as

     f1 f2 . . . fD      ∼ N           . . .      ,      Kf1,f1 Kf1,f2 · · · Kf1,fD Kf2,f1 Kf2,f2 · · · Kf2,fD . . . . . . · · · . . . KfD,f1 KfD,f2 · · · KfD,fD           .

54 / 76

slide-167
SLIDE 167

Gaussian process priors for vector-valued functions

We saw a series of models for the set of outputs {fd(x)}D

d=1, that led to

a valid covariance function for the vector f(x).

For a finite number of inputs, X = {xn}N

n=1, the prior distribution over the

vector f = [f⊤

1 , . . . , f⊤ D ]⊤ is given as

     f1 f2 . . . fD      ∼ N           . . .      ,      Kf1,f1 Kf1,f2 · · · Kf1,fD Kf2,f1 Kf2,f2 · · · Kf2,fD . . . . . . · · · . . . KfD,f1 KfD,f2 · · · KfD,fD           .

f Kf,f

54 / 76

slide-168
SLIDE 168

Noisy observations

In practice, we usually have access to noisy observations, so we model the outputs {yd(x)}D

d=1 using

yd(x) = fd(x) + ǫd(x), where {ǫd(x)}D

d=1 are independent white Gaussian noise processes

with variance σ2

d.

The marginal likelihood is given as p(y|X, θ) = N(y|0, Kf,f + Σ), where y =

  • y⊤

1 , y⊤ 2 . . . , y⊤ D

The vector θ refers to the hyperparameters and Σ = Σ ⊗ IN.

55 / 76

slide-169
SLIDE 169

Noisy observations

In practice, we usually have access to noisy observations, so we model the outputs {yd(x)}D

d=1 using

yd(x) = fd(x) + ǫd(x), where {ǫd(x)}D

d=1 are independent white Gaussian noise processes

with variance σ2

d.

The marginal likelihood is given as p(y|X, θ) = N(y|0, Kf,f + Σ), where y =

  • y⊤

1 , y⊤ 2 . . . , y⊤ D

The vector θ refers to the hyperparameters and Σ = Σ ⊗ IN.

55 / 76

slide-170
SLIDE 170

Noisy observations

In practice, we usually have access to noisy observations, so we model the outputs {yd(x)}D

d=1 using

yd(x) = fd(x) + ǫd(x), where {ǫd(x)}D

d=1 are independent white Gaussian noise processes

with variance σ2

d.

The marginal likelihood is given as p(y|X, θ) = N(y|0, Kf,f + Σ), where y =

  • y⊤

1 , y⊤ 2 . . . , y⊤ D

The vector θ refers to the hyperparameters and Σ = Σ ⊗ IN.

55 / 76

slide-171
SLIDE 171

Hyperparameter Learning

Let D = {Xn, yn}N

n=1 represents the data, and θ represents the

hyperparameters of the covariance function.

The marginal likelihood for the outputs can be written as p(y|X, θ) = N(y|0, Kf,f + Σ), where Kf,f ∈ RND×ND with each element given by cov[fd(xn), fd′(xn′)].

The matrix Σ represents the covariance associated with some independent processes.

Hyperparameters are estimated by maximizing the logarithm of the marginal likelihood.

56 / 76

slide-172
SLIDE 172

Predictive distribution

Prediction for a set of test inputs X∗ is done using standard Gaussian process regression techniques.

The predictive distribution is given by p(y∗|y, X, θ) = N(y∗|µ∗, Ky∗,y∗), with µ∗ = Kf∗,f (Kf,f + Σ)−1 y, Ky∗,y∗ = Kf∗,f∗ − Kf∗,f (Kf,f + Σ)−1 K⊤

f∗,f + Σ∗.

57 / 76

slide-173
SLIDE 173

Can you prove autokrigeability?

The predictive distribution is given by p(y∗|y, X, θ) = N(y∗|µ∗, Ky∗,y∗), with µ∗ = Kf∗,f (Kf,f + Σ)−1 y, Ky∗,y∗ = Kf∗,f∗ − Kf∗,f (Kf,f + Σ)−1 K⊤

f∗,f + Σ∗.

Exercise: Prove that if the outputs are considered to be noise-free, prediction using the ICM under an isotopic data case is equivalent to independent prediction over each output.

58 / 76

slide-174
SLIDE 174

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

59 / 76

slide-175
SLIDE 175

The cokriging estimator

In geostatistics, the framework that allows for optimal predictions in the multivariate case is known by the general name of cokriging [Goovaerts, 1997].

In general, the output value for fd evaluated at x∗ is estimated as ˆ fd(x∗) − µd(x∗) =

D

  • s=1

ns(x∗)

  • αs=1

λαs(x∗) [fs(xαs) − µs(xαs)] , where λαs(x∗) are the weights assigned to the output data fs(xαs), µs(xαs) are the expected values of fs(xαs), and ns(x∗) ≤ N.

Cokriging estimators need to be unbiased (E[fd(x∗) − ˆ fd(x∗)] = 0) and minimize the error variance σ2

E,

σ2

E(x∗) = var

  • fd(x∗) − ˆ

fd(x∗)

  • .

60 / 76

slide-176
SLIDE 176

Cokriging assumes a model for fd

Cogriking estimators differ in the form they assume for fd(x).

In general, each output function is decomposed into a residual Rd(x) and a trend µd(x), fd(x) = Rd(x) + µd(x), ∀d

Residuals are assumed to be Gaussian processes with zero mean.

The covariance for the residuals is denoted as kd,d(x, x′) and the cross-covariance between residuals as kd,d′(x, x′).

61 / 76

slide-177
SLIDE 177

Simple cokriging

The simple cokriging estimator is given as ˆ fd(x∗) − µd =

D

  • s=1

ns(x∗)

  • αs=1

λαs(x∗) [fs(xαs) − µs)] .

It can be shown that this is an unbiased estimator.

Coefficients λαs(x∗) can be obtained by minimizing the variance σ2

E(x∗), leading to

   λ1(x∗) . . . λD(x∗)    =       K1,1 · · · K1,D . . . . . . . . . KD,1 · · · KD,D      

−1 

  k1,1 . . . kD,1    where Kd,d′ =

  • kd,d′(xαd, xβd′ )
  • and kd,1 = [kd,1(xαd, x∗)].

The predictor is then ˆ fd(x∗) = λ⊤f.

62 / 76

slide-178
SLIDE 178

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

63 / 76

slide-179
SLIDE 179

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

64 / 76

slide-180
SLIDE 180

Efficient approximations (I)

Learning θ through marginal likelihood maximization involves the inversion of the matrix Kf,f + Σ.

The inversion of this matrix scales as O(D3N3).

If only a few number K < N of values of u(x) are known, then the set of

  • utputs are uniquely determined.

65 / 76

slide-181
SLIDE 181

Efficient approximations (II)

Sample from p(u) fd(x) =

  • X

Gd(x − z)u(z)dz

66 / 76

slide-182
SLIDE 182

Efficient approximations (II)

Sample from p(u) fd(x) =

  • X

Gd(x − z)u(z)dz Sample from p(u|u) fd(x) ≈

  • X

Gd(x − z) E [u(z)|u] dz

66 / 76

slide-183
SLIDE 183

Efficient approximations

67 / 76

slide-184
SLIDE 184

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

68 / 76

slide-185
SLIDE 185

Cross-coregionalization matrices

In the LMC fd(x) =

Q

  • q=1

Rq

  • i=1

ai

d,qui q(x).

The basic processes ui

q(x) [Guzm´

an et al., 2002] are assumed to be nonorthogonal, leading to the following covariance function cov[f(x), f(x′)] =

Q

  • q=1

Q

  • q′=1

Bq,q′kq,q′(x, x′), where Bq,q′ are cross-coregionalization matrices. matrices.

69 / 76

slide-186
SLIDE 186

Non-stationarity LMC

We can write the vector-valued function f(x) as f(x) = Au(x), where A = [a1 · · · aQ] and u(x) = [u1(x) · · · uQ(x)]⊤.

A non-stationary version allows A to change with x [Gelfand et al., 2004, Wilson et al., 2012] f(x) = A(x)u(x).

70 / 76

slide-187
SLIDE 187

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

71 / 76

slide-188
SLIDE 188

Extensions [Calder and Cressie, 2007]

A more general form fd(x) =

  • Gd(x, z)u(z)dz

fd(x) =

  • j

Gd(x, zj)u(zj)

Non-stationary models fd(x) =

  • Gd,θ(x)(x, z)u(z)dz,

fd(x) =

  • Gd(x, z)uθ(z)(x)dz

72 / 76

slide-189
SLIDE 189

Latent force models [ ´ Alvarez et al., 2009]

Mechanistically inspired kernel smoothing functions. Gd(t, t′) ∝ exp [−Dq (t − t′)] first ODE Gd(t, t′) ∝ exp [−αq (t − t′)] sin [ωq (t − t′)] second ODE Gd(x, x′) = exp

i (xi−x′

i )2

4C

  • PDE

73 / 76

slide-190
SLIDE 190

Contents

Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary

74 / 76

slide-191
SLIDE 191

Summary

We can do multi-task learning or transfer learning with GPs.

Different ways to build meaningful cross-covariance functions.

Once defined, we can do all the things we know to do with a single-output GP .

Cokriging is just prediction with GPs (with a quadratic loss function).

Several extensions of LMC and PCs.

Current research: spectral representations for the joint covariance function.

75 / 76

slide-192
SLIDE 192

References I

Mauricio A. ´ Alvarez, David Luengo, and Neil D. Lawrence. Latent Force Models. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 9–16, Clearwater Beach, Florida, 16-18 April 2009. JMLR W&CP 5. Mauricio A. ´ Alvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: a review. Foundations and Trends R in Machine Learning, 4(3):195–266, 2012. Edwin V. Bonilla, Kian Ming Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In John C. Platt, Daphne Koller, Yoram Singer, and Sam Roweis, editors, NIPS, volume 20, Cambridge, MA, 2008. MIT Press. Phillip Boyle and Marcus Frean. Dependent Gaussian processes. In Lawrence Saul, Yair Weiss, and L´ eon Bouttou, editors, NIPS, volume 17, pages 217–224, Cambridge, MA, 2005. MIT Press. Catherine A. Calder and Noel Cressie. Some topics in convolution-based spatial modeling. In Proceedings of the 56th Session of the International Statistics Institute, August 2007. Alan E. Gelfand, Alexandra M. Schmidt, Sudipto Banerjee, and C.F. Sirmans. Nonstationary multivariate process modeling through spatially varying

  • coregionalization. TEST, 13(2):263–312, 2004.

Pierre Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, USA, 1997. J.A. Vargas Guzm´ an, A.W. Warrick, and D.E. Myers. Coregionalization by linear combination of nonorthogonal components. Mathematical Geology, 34 (4):405–419, 2002. David M. Higdon. Space and space-time modelling using process convolutions. In C. Anderson, V. Barnett, P . Chatwin, and A. El-Shaarawi, editors, Quantitative methods for current environmental issues, pages 37–56. Springer-Verlag, 2002. Andre G. Journel and Charles J. Huijbregts. Mining Geostatistics. Academic Press, London, 1978. ISBN 0-12391-050-1. Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In Robert G. Cowell and Zoubin Ghahramani, editors, AISTATS 10, pages 333–340, Barbados, 6-8 January 2005. Society for Artificial Intelligence and Statistics. Hans Wackernagel. Multivariate Geostatistics. Springer-Verlag Heidelberg New york, 2003. Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. Gaussian process regression networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML ’12, pages 1139–1146, 2012. 76 / 76