DD2434 - Advanced Machine Learning Gaussian Processes Carl Henrik - - PowerPoint PPT Presentation

dd2434 advanced machine learning
SMART_READER_LITE
LIVE PREVIEW

DD2434 - Advanced Machine Learning Gaussian Processes Carl Henrik - - PowerPoint PPT Presentation

Introduction Recap Kernels Gaussian Processes References DD2434 - Advanced Machine Learning Gaussian Processes Carl Henrik Ek { chek } @csc.kth.se Royal Institute of Technology November 5th, 2015 Ek KTH DD2434 - Advanced Machine Learning


slide-1
SLIDE 1

Introduction Recap Kernels Gaussian Processes References

DD2434 - Advanced Machine Learning

Gaussian Processes

Carl Henrik Ek {chek}@csc.kth.se

Royal Institute of Technology

November 5th, 2015

Ek KTH DD2434 - Advanced Machine Learning

slide-2
SLIDE 2

Introduction Recap Kernels Gaussian Processes References

Last Lecture

  • General Probabilistic Modelling

▶ Probabilistic objects ▶ Marginalisation

  • Kernels

▶ Dual linear regression ▶ Implications for modelling Ek KTH DD2434 - Advanced Machine Learning

slide-3
SLIDE 3

Introduction Recap Kernels Gaussian Processes References

Introduction Recap Kernels Gaussian Processes

Ek KTH DD2434 - Advanced Machine Learning

slide-4
SLIDE 4

Introduction Recap Kernels Gaussian Processes References

Regression

  • Two variates

▶ Input data xi ∈ Rq ▶ Output data yi ∈ RD

  • Relationship: f : X → Y

Ek KTH DD2434 - Advanced Machine Learning

slide-5
SLIDE 5

Introduction Recap Kernels Gaussian Processes References

Regression

Uncertainty

  • We are uncertain in our data
  • This means we cannot trust

▶ our observations ▶ the mapping that we learn ▶ the predictions that we make under the mapping Ek KTH DD2434 - Advanced Machine Learning

slide-6
SLIDE 6

Introduction Recap Kernels Gaussian Processes References

Regression

Uncertainty

  • Uncertainty in outputs yi

▶ Addative noise yi = Wxi + ϵ ▶ Gaussian distributed noise ϵ ∝ N(0, σ2)

  • Likelihood

Ek KTH DD2434 - Advanced Machine Learning

slide-7
SLIDE 7

Introduction Recap Kernels Gaussian Processes References

Regression

Uncertainty in prediction

  • Posterior

▶ conditional distribution ▶ after the relevant information has been taken into account

  • What is relevant

▶ our belief: prior p(W) ▶ the observations: likelihood p(Y|W, X) Ek KTH DD2434 - Advanced Machine Learning

slide-8
SLIDE 8

Introduction Recap Kernels Gaussian Processes References

Regression

p(Y|W, X) =

N

i

p(yi|W, xi) (1)

Structure

  • Do the variables co-vary?
  • Are there (in-)dependency structures that I can exploit?

Ek KTH DD2434 - Advanced Machine Learning

slide-9
SLIDE 9

Introduction Recap Kernels Gaussian Processes References

Toolbox

  • 1. Formulate prediction error likelihood

▶ Does the likelihood have structure?

  • 2. Formulate belief of model in prior

▶ Does the prior have structure

  • 3. Reach the posterior by combining likelihood and prior
  • 4. Choose model based on evidence p(D|M)

Ek KTH DD2434 - Advanced Machine Learning

slide-10
SLIDE 10

Introduction Recap Kernels Gaussian Processes References

p(W)

Ek KTH DD2434 - Advanced Machine Learning

slide-11
SLIDE 11

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-12
SLIDE 12

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-13
SLIDE 13

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-14
SLIDE 14

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-15
SLIDE 15

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-16
SLIDE 16

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-17
SLIDE 17

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-18
SLIDE 18

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-19
SLIDE 19

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-20
SLIDE 20

Introduction Recap Kernels Gaussian Processes References

p(W) p(W|X, Y) Wsamples

Ek KTH DD2434 - Advanced Machine Learning

slide-21
SLIDE 21

Introduction Recap Kernels Gaussian Processes References

Conditional1

p(X|Y) = p(Y|X)p(X) p(Y) (2)

Conjugate Distributions

  • The posterior and the prior are in the same family
  • Relationship with all three terms

1Wikipedia, Bishop 2006, p. 2.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-22
SLIDE 22

Introduction Recap Kernels Gaussian Processes References

Marginal

p(Y|X) = ∫ p(Y|W, X)p(W)dW (3)

  • Average according to belief and how well the model fits the
  • bservations
  • “Pushes” uncertain belief in parameters (in this case) through to the
  • bservations
  • Gaussian marginal is Gaussian

Ek KTH DD2434 - Advanced Machine Learning

slide-23
SLIDE 23

Introduction Recap Kernels Gaussian Processes References

Dual Linear Regression2

[K]ij = xT

i xj

(4) J(a) = 1 2aTKKa − aKy + 1 2yTy + λ 2 aTKa (5) a = (K + λI)−1y (6)

2Bishop 2006, p. 6.1. Ek KTH DD2434 - Advanced Machine Learning

slide-24
SLIDE 24

Introduction Recap Kernels Gaussian Processes References

Dual Linear Regression2

[K]ij = xT

i xj

(7) J(a) = 1 2aTKKa − aKy + 1 2yTy + λ 2 aTKa (8) a = (K + λI)−1y (9) y(xi) = wxi = aTXxi = k(xi, X)T(K + λI)−1y (10)

2Bishop 2006, p. 6.1. Ek KTH DD2434 - Advanced Machine Learning

slide-25
SLIDE 25

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernel Functions

  • A function such that

k(xi, xj) = ϕ(xi)Tϕ(xj) = (11) = ||ϕ(xi)||||ϕ(xj)||cos(θ) (12)

  • If we have k(·, ·) we never have to know the mapping ϕ(·)

Ek KTH DD2434 - Advanced Machine Learning

slide-26
SLIDE 26

Introduction Recap Kernels Gaussian Processes References

The benefits of Kernels

  • Kernels allows for implicit feature mappings

▶ We do NOT need to know the feature space ▶ Example: The space can have infinite dimensionality ▶ The mapping can be non-linear but the problem is remains linear! ▶ Allows for putting weird things like, strings (DNA) in a vector space Ek KTH DD2434 - Advanced Machine Learning

slide-27
SLIDE 27

Introduction Recap Kernels Gaussian Processes References

The benefits of Kernels

  • Kernels allows for implicit feature mappings

▶ We do NOT need to know the feature space ▶ Example: The space can have infinite dimensionality ▶ The mapping can be non-linear but the problem is remains linear! ▶ Allows for putting weird things like, strings (DNA) in a vector space Ek KTH DD2434 - Advanced Machine Learning

slide-28
SLIDE 28

Introduction Recap Kernels Gaussian Processes References

The benefits of Kernels

  • Kernels allows for implicit feature mappings

▶ We do NOT need to know the feature space ▶ Example: The space can have infinite dimensionality ▶ The mapping can be non-linear but the problem is remains linear! ▶ Allows for putting weird things like, strings (DNA) in a vector space Ek KTH DD2434 - Advanced Machine Learning

slide-29
SLIDE 29

Introduction Recap Kernels Gaussian Processes References

The benefits of Kernels

  • Kernels allows for implicit feature mappings

▶ We do NOT need to know the feature space ▶ Example: The space can have infinite dimensionality ▶ The mapping can be non-linear but the problem is remains linear! ▶ Allows for putting weird things like, strings (DNA) in a vector space Ek KTH DD2434 - Advanced Machine Learning

slide-30
SLIDE 30

Introduction Recap Kernels Gaussian Processes References

This Lecture

  • Kernel Methods

▶ Implicit feature spaces ▶ Building kernels

  • Gaussian Processes

▶ Priors over the space of functions ▶ Learning parameters of kernels Ek KTH DD2434 - Advanced Machine Learning

slide-31
SLIDE 31

Introduction Recap Kernels Gaussian Processes References

Introduction Recap Kernels Gaussian Processes

Ek KTH DD2434 - Advanced Machine Learning

slide-32
SLIDE 32

Introduction Recap Kernels Gaussian Processes References

Kernels

σ(X, Y) = E [ (X − E[X])T(Y − E[Y]) ] = = E[XTY] − E[X]TE[Y] = {E[X] = E[Y] = 0} = = E[XTY] (13)

Ek KTH DD2434 - Advanced Machine Learning

slide-33
SLIDE 33

Introduction Recap Kernels Gaussian Processes References

Kernels

Ek KTH DD2434 - Advanced Machine Learning

slide-34
SLIDE 34

Introduction Recap Kernels Gaussian Processes References

Kernels

σ(X, Y) = [ x11 x21 x31 x12 x22 x32 ]   y11 y12 y21 y22 y31 y32   = (14) = [ x11y11 + x21y21 + x31y31 x11y12 + x21y22 + x31y32 x12y11 + x22y21 + x32y31 x12y12 + x22y22 + x32y32 ]

Ek KTH DD2434 - Advanced Machine Learning

slide-35
SLIDE 35

Introduction Recap Kernels Gaussian Processes References

Kernels

σ(X, Y) = [ x11 x21 x31 x12 x22 x32 ]   y11 y12 y21 y22 y31 y32   = (15) = [ x11y11 + x21y21 + x31y31 x11y12 + x21y22 + x31y32 x12y11 + x22y21 + x32y31 x12y12 + x22y22 + x32y32 ] σ(XT, YT) =   x11 x12 x21 x22 x31 x32   [ y11 y21 y31 y12 y22 y32 ] = (16) =   x11y11 + x12y12 x11y21 + x12y22 x11y31 + x12y32 x21y11 + x22y12 x21y21 + x22y22 x21y31 + x22y32 x31y11 + x32y12 x31y21 + x32y22 x31y31 + x32y32  

Ek KTH DD2434 - Advanced Machine Learning

slide-36
SLIDE 36

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-37
SLIDE 37

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-38
SLIDE 38

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-39
SLIDE 39

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-40
SLIDE 40

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-41
SLIDE 41

Introduction Recap Kernels Gaussian Processes References

Kernels

Kernels and covariances

  • Covariance between columns: XTY (data-dimensions)
  • Covariance between rows: XYT (data-points)
  • Kernels: k(x, y) = ϕ(x)Tϕ(y)

▶ Kernel functions are covariances between data-points

  • A kernel function describes the co-variance of the data points
  • Specific class of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-42
SLIDE 42

Introduction Recap Kernels Gaussian Processes References

Kernels

k(xi, xj) = σ2e−

1 2ℓ2 (xi−xj)T(xi−xj)

(17)

Squared Exponential

  • How does the data vary along the dimensions spanned by the data
  • RBF, Squared Exponential, Exponentiated Quadratic
  • Co-variance smoothly decays with distance

Ek KTH DD2434 - Advanced Machine Learning

slide-43
SLIDE 43

Introduction Recap Kernels Gaussian Processes References

Building Kernels

Expression Conditions k(x, z) = c k1(x, z) c - any non negative real constant. k(x, z) = f(x)k1(x, z)f(z) f - any real-valued function. k(x, z) = q(k1(x, z)) q - any polynomial with non-negative coefficients. k(x, z) = exp(k1(x, z)) k(x, z) = k1(x, z) + k2(x, z) k(x, z) = k1(x, z)k2(x, z) k(x, z) = k3(φ(x), φ(z)) k3 - valid kernel in the space mapped by φ. k(x, z) = hAx, zi = hx, Azi A - symmetric psd matrix. k(x, z) = ka(xa, za) + kb(xb, zb) xa and xb - non-necessarily disjoint partitions of x; k(x, z) = ka(xa, za)kb(xb, zb) ka and kb - valid kernels on their respective spaces.

Ek KTH DD2434 - Advanced Machine Learning

slide-44
SLIDE 44

Introduction Recap Kernels Gaussian Processes References

Summary

  • Defines inner products in some

space

  • We don’t need to know the space,

its implicitly defined by the kernel function

  • Defines co-variance between

data-points

Ek KTH DD2434 - Advanced Machine Learning

slide-45
SLIDE 45

Introduction Recap Kernels Gaussian Processes References

Summary

  • Defines inner products in some

space

  • We don’t need to know the space,

its implicitly defined by the kernel function

  • Defines co-variance between

data-points

Ek KTH DD2434 - Advanced Machine Learning

slide-46
SLIDE 46

Introduction Recap Kernels Gaussian Processes References

Summary

  • Defines inner products in some

space

  • We don’t need to know the space,

its implicitly defined by the kernel function

  • Defines co-variance between

data-points

Ek KTH DD2434 - Advanced Machine Learning

slide-47
SLIDE 47

Introduction Recap Kernels Gaussian Processes References

Introduction Recap Kernels Gaussian Processes

Ek KTH DD2434 - Advanced Machine Learning

slide-48
SLIDE 48

Introduction Recap Kernels Gaussian Processes References

What have you seen up till now?

  • Probabilistic modelling

▶ likelihood, prior, posterior ▶ marginalisation

  • Implicit feature spaces

▶ kernel functions

  • We have assumed the form of the

mapping without uncertainty

Ek KTH DD2434 - Advanced Machine Learning

slide-49
SLIDE 49

Introduction Recap Kernels Gaussian Processes References

What have you seen up till now?

  • Probabilistic modelling

▶ likelihood, prior, posterior ▶ marginalisation

  • Implicit feature spaces

▶ kernel functions

  • We have assumed the form of the

mapping without uncertainty

Ek KTH DD2434 - Advanced Machine Learning

slide-50
SLIDE 50

Introduction Recap Kernels Gaussian Processes References

Outline

  • General Regression
  • Introduce uncertainty in mapping
  • prior over the space of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-51
SLIDE 51

Introduction Recap Kernels Gaussian Processes References

Outline

  • General Regression
  • Introduce uncertainty in mapping
  • prior over the space of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-52
SLIDE 52

Introduction Recap Kernels Gaussian Processes References

Outline

  • General Regression
  • Introduce uncertainty in mapping
  • prior over the space of functions

Ek KTH DD2434 - Advanced Machine Learning

slide-53
SLIDE 53

Introduction Recap Kernels Gaussian Processes References

Regression

Regression model, yi = f(xi) + ϵ (18) ϵ ∼ N(0, σ2I) (19) Introduce fi as instansiation of function, fi = f(xi), (20) as a new random variable.

Ek KTH DD2434 - Advanced Machine Learning

slide-54
SLIDE 54

Introduction Recap Kernels Gaussian Processes References

Regression

Model, p(Y, f, X, θ) = p(Y|f)p(f|X, θ)p(X)p(θ) (21) Want to “push” X through a mapping f of which we are uncertain, p(f|X, θ), (22) prior over instansiations of function.

Ek KTH DD2434 - Advanced Machine Learning

slide-55
SLIDE 55

Introduction Recap Kernels Gaussian Processes References

Priors over functions3

3Lecture7/gp basics.py Ek KTH DD2434 - Advanced Machine Learning

slide-56
SLIDE 56

Introduction Recap Kernels Gaussian Processes References

Priors over functions3

3Lecture7/gp basics.py Ek KTH DD2434 - Advanced Machine Learning

slide-57
SLIDE 57

Introduction Recap Kernels Gaussian Processes References

Priors over functions3

3Lecture7/gp basics.py Ek KTH DD2434 - Advanced Machine Learning

slide-58
SLIDE 58

Introduction Recap Kernels Gaussian Processes References

Priors over functions3

3Lecture7/gp basics.py Ek KTH DD2434 - Advanced Machine Learning

slide-59
SLIDE 59

Introduction Recap Kernels Gaussian Processes References

Priors over functions3

3Lecture7/gp basics.py Ek KTH DD2434 - Advanced Machine Learning

slide-60
SLIDE 60

Introduction Recap Kernels Gaussian Processes References

Gaussian Distribution

Joint Distribution, [ x1 x2 ] ∼ N ([ µ1 µ2 ] , [ σ(x1, x1) σ(x1, x2) σ(x2, x1) σ(x2, x2) ]) . (23) x2|x1 ∼ N ( µ2 + σ(x1, x2)σ(x1, x1)−1(x1 − µ1), σ(x2, x2) − σ(x2, x1)σ(x1, x1)−1σ(x1, x2) ) (24)

Ek KTH DD2434 - Advanced Machine Learning

slide-61
SLIDE 61

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.5 0.5 1 ]) (25)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-62
SLIDE 62

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.5 0.5 1 ]) (26)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-63
SLIDE 63

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.5 0.5 1 ]) (27)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-64
SLIDE 64

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.5 0.5 1 ]) (28)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-65
SLIDE 65

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.5 0.5 1 ]) (29)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-66
SLIDE 66

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.99 0.99 1 ]) (30)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-67
SLIDE 67

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.99 0.99 1 ]) (31)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-68
SLIDE 68

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.99 0.99 1 ]) (32)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-69
SLIDE 69

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.99 0.99 1 ]) (33)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-70
SLIDE 70

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 0.99 0.99 1 ]) (34)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-71
SLIDE 71

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 1 ]) (35)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-72
SLIDE 72

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 1 ]) (36)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-73
SLIDE 73

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 1 ]) (37)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-74
SLIDE 74

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 1 ]) (38)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-75
SLIDE 75

Introduction Recap Kernels Gaussian Processes References

The Gaussian Conditional4

N ([ 0 ] , [ 1 1 ]) (39)

4Lecture7/conditional gaussian.py Ek KTH DD2434 - Advanced Machine Learning

slide-76
SLIDE 76

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-77
SLIDE 77

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-78
SLIDE 78

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-79
SLIDE 79

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-80
SLIDE 80

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-81
SLIDE 81

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-82
SLIDE 82

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-83
SLIDE 83

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-84
SLIDE 84

Introduction Recap Kernels Gaussian Processes References Ek KTH DD2434 - Advanced Machine Learning

slide-85
SLIDE 85

Introduction Recap Kernels Gaussian Processes References

If all instansiations of the function is jointly Gaussian such that the co-variance structure depends on how much information an observation provides for the other we will get the curve above.

Ek KTH DD2434 - Advanced Machine Learning

slide-86
SLIDE 86

Introduction Recap Kernels Gaussian Processes References

Row space

  • Co-variance between each point!
  • Co-variance function is a kernel!
  • We can do all this in induced space, i.e. allow for any function!

Ek KTH DD2434 - Advanced Machine Learning

slide-87
SLIDE 87

Introduction Recap Kernels Gaussian Processes References

Row space

  • Co-variance between each point!
  • Co-variance function is a kernel!
  • We can do all this in induced space, i.e. allow for any function!

Ek KTH DD2434 - Advanced Machine Learning

slide-88
SLIDE 88

Introduction Recap Kernels Gaussian Processes References

Row space

  • Co-variance between each point!
  • Co-variance function is a kernel!
  • We can do all this in induced space, i.e. allow for any function!

Ek KTH DD2434 - Advanced Machine Learning

slide-89
SLIDE 89

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

p(f|X, θ) ∼ GP(µ(X), k(X, X)) (40)

Defenition

A Gaussian Process is an infinite collection of random variables who any subset is jointly gaussian. The process is specified by a mean function µ(·) and a co-variance function k(·, ·) f ∼ GP(µ(·), k(·, ·)) (41)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-90
SLIDE 90

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

p(f|X, θ) ∼ GP(µ(X), k(X, X)) (42) yi = fi + ϵ (43) ϵ ∼ N(0, σ2I) (44) p(Y|X, θ) = ∫ p(Y|f)p(f|X, θ)df (45)

Connection to Distribution

GP is infinite, but we only observe finite amount of data. This means conditioning on a subset of the data, the GP is a just a Gaussian distribution, which is self-conjugate.

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-91
SLIDE 91

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

The mean function

  • Function of only the input location
  • What do I expect the function value to be only accounting for the

input location

  • We will assume this to be constant

The co-variance function

  • Function of two input locations
  • How should the information from other locations with known

function value observations effect my estimate

  • Encodes the behavior of the function

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-92
SLIDE 92

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

The mean function

  • Function of only the input location
  • What do I expect the function value to be only accounting for the

input location

  • We will assume this to be constant

The co-variance function

  • Function of two input locations
  • How should the information from other locations with known

function value observations effect my estimate

  • Encodes the behavior of the function

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-93
SLIDE 93

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

The mean function

  • Function of only the input location
  • What do I expect the function value to be only accounting for the

input location

  • We will assume this to be constant

The co-variance function

  • Function of two input locations
  • How should the information from other locations with known

function value observations effect my estimate

  • Encodes the behavior of the function

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-94
SLIDE 94

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

The Prior p(f|X, θ) = GP(µ(x), k(x, x′)) (46) µ(x) = 0 (47) k(xi, xj) = σ2e−

1 2ℓ2 (xi−xj)T(xi−xj)

(48)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-95
SLIDE 95

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-96
SLIDE 96

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-97
SLIDE 97

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-98
SLIDE 98

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-99
SLIDE 99

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-100
SLIDE 100

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-101
SLIDE 101

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-102
SLIDE 102

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-103
SLIDE 103

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-104
SLIDE 104

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-105
SLIDE 105

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-106
SLIDE 106

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

The (predictive) Posterior [ f f∗ ] ∼ N ([ 0 ] , [ k(X, X) k(X, x∗) k(x∗, X) k(x∗, x∗) ]) (49) p(f∗|x∗, X, f, θ) = N(k(x∗, X)TK(X, X)−1f, k(x∗, x∗) − k(x∗, X)TK(X, X)−1K(X, x∗)) (50)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-107
SLIDE 107

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

k(x∗, X)TK(X, X)−1f (51)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-108
SLIDE 108

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

k(x∗, x∗) − k(x∗, X)TK(X, X)−1K(X, x∗) (52)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-109
SLIDE 109

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

k(x∗, x∗) − k(x∗, X)TK(X, X)−1K(X, x∗) (53)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-110
SLIDE 110

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

k(x∗, x∗) − k(x∗, X)TK(X, X)−1K(X, x∗) (54)

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-111
SLIDE 111

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-112
SLIDE 112

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-113
SLIDE 113

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-114
SLIDE 114

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-115
SLIDE 115

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-116
SLIDE 116

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-117
SLIDE 117

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-118
SLIDE 118

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-119
SLIDE 119

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-120
SLIDE 120

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-121
SLIDE 121

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-122
SLIDE 122

Introduction Recap Kernels Gaussian Processes References

Gaussian Processes5

Summary

  • GP is a prior over function realisations
  • Introduce new random variable as the output of the mapping
  • Joint distribution of any observations Gaussian
  • Posterior (predictive) distribution is conditional Gaussian

5Bishop 2006, p. 6.4.2 Ek KTH DD2434 - Advanced Machine Learning

slide-123
SLIDE 123

Introduction Recap Kernels Gaussian Processes References

Co-variances in practice

[ y f∗ ] ∼ N ([ 0 ] , [ k(X, X) + σ2I k(X, x∗) k(x∗, X) k(x∗, x∗) ]) (55)

  • The conditional distribution passes exactly through the data

▶ noise-free observations

  • Construct covariance functions by rules for building kernels

▶ k(xi, xj) = λ1kSE(xi, xj) + λ2klin(xi, xj) + λ3kwhite(xi, xj) Ek KTH DD2434 - Advanced Machine Learning

slide-124
SLIDE 124

Introduction Recap Kernels Gaussian Processes References

Co-variances in practice

Periodic kernel, k(xi, xj) = σ2e

− 2

ℓ2 sin2

( π

|xi−xj| p

)

(56)

Periodic functions

  • ℓ lengthscale
  • p period of function

Ek KTH DD2434 - Advanced Machine Learning

slide-125
SLIDE 125

Introduction Recap Kernels Gaussian Processes References

Co-variances in practice

klin(xi, xj) = (xT

i xj)

(57) k(xi, xj) = 2 πsin−1   2xT

i Σxj

√ (1 + 2xT

i Σxi)(1 + 2xT j Σxj)

  (58) xi = [1, x1i, . . . , xqi]T (59) “Computation with Infinite Neural Networks”, Williams

Non-stationary functions

  • Non-stationary co-variance
  • Functions that have different behaviour in different parts of domain

Ek KTH DD2434 - Advanced Machine Learning

slide-126
SLIDE 126

Introduction Recap Kernels Gaussian Processes References

Co-variances in practice

6

[K]ij = k(xi, xj) (60)

6/Lecture7/covariance.py Ek KTH DD2434 - Advanced Machine Learning

slide-127
SLIDE 127

Introduction Recap Kernels Gaussian Processes References

Co-variances in practice

Summary

  • Covariance functions encodes your preference in function behavior
  • Choosing the right co-variance is very important
  • Ask yourself what do you know about the variations in the data

Ek KTH DD2434 - Advanced Machine Learning

slide-128
SLIDE 128

Introduction Recap Kernels Gaussian Processes References

Assignment

You should now be able to do Task 2.2 of the Assignment

Ek KTH DD2434 - Advanced Machine Learning

slide-129
SLIDE 129

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Hyper-parameters

  • Prior has parameters

▶ referred to as hyper-parameters ▶ SE have lengthscale and variance

  • Learning in GPs implies inferring hyper-parameters from the model

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-130
SLIDE 130

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

p(Y|X, θ) = ∫ p(Y|f)p(f|X, θ)df (61)

Marginal Likelihood

  • We are not interested in f directly
  • Marginalise out f!
  • Gaussian marginal is gaussian

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-131
SLIDE 131

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

p(Y|X, θ) = ∫ p(Y|f)p(f|X, θ)df (62)

Marginal Likelihood

  • We are not interested in f directly
  • Marginalise out f!
  • Gaussian marginal is gaussian

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-132
SLIDE 132

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

p(Y|X, θ) = ∫ p(Y|f)p(f|X, θ)df (63)

Marginal Likelihood

  • We are not interested in f directly
  • Marginalise out f!
  • Gaussian marginal is gaussian

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-133
SLIDE 133

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Learning

  • Type-II Maximum Likelihood

ˆ θ = argmaxθp(Y|X, θ) (64)

  • How is this different to a normal ML estimate?
  • Lots of exponentials in objective implies working in log-space

▶ Logarithm monotonic function ⇒ does not alter the location of extreme

points of a function

▶ Minimisation of negative log() rather than maximisation of log() purely

practical

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-134
SLIDE 134

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Learning

  • Type-II Maximum Likelihood

ˆ θ = argmaxθp(Y|X, θ) (65)

  • How is this different to a normal ML estimate?
  • Lots of exponentials in objective implies working in log-space

▶ Logarithm monotonic function ⇒ does not alter the location of extreme

points of a function

▶ Minimisation of negative log() rather than maximisation of log() purely

practical

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-135
SLIDE 135

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Learning

  • Type-II Maximum Likelihood

ˆ θ = argmaxθp(Y|X, θ) (66)

  • How is this different to a normal ML estimate?
  • Lots of exponentials in objective implies working in log-space

▶ Logarithm monotonic function ⇒ does not alter the location of extreme

points of a function

▶ Minimisation of negative log() rather than maximisation of log() purely

practical

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-136
SLIDE 136

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Learning

  • Type-II Maximum Likelihood

ˆ θ = argmaxθp(Y|X, θ) (67)

  • How is this different to a normal ML estimate?
  • Lots of exponentials in objective implies working in log-space

▶ Logarithm monotonic function ⇒ does not alter the location of extreme

points of a function

▶ Minimisation of negative log() rather than maximisation of log() purely

practical

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-137
SLIDE 137

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

Learning

  • Type-II Maximum Likelihood

ˆ θ = argmaxθp(Y|X, θ) (68)

  • How is this different to a normal ML estimate?
  • Lots of exponentials in objective implies working in log-space

▶ Logarithm monotonic function ⇒ does not alter the location of extreme

points of a function

▶ Minimisation of negative log() rather than maximisation of log() purely

practical

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-138
SLIDE 138

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

argmaxθp(Y|X, θ) = argminθ − log (p(Y|X, θ)) = argminθL(θ) (69) L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π) (70)

  • Can be minimised using gradient based methods
  • Data-fit: 1

2yTK−1y

  • Complexity: 1

2log|K|

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-139
SLIDE 139

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

argmaxθp(Y|X, θ) = argminθ − log (p(Y|X, θ)) = argminθL(θ) (71) L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π) (72)

  • Can be minimised using gradient based methods
  • Data-fit: 1

2yTK−1y

  • Complexity: 1

2log|K|

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-140
SLIDE 140

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

argmaxθp(Y|X, θ) = argminθ − log (p(Y|X, θ)) = argminθL(θ) (73) L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π) (74)

  • Can be minimised using gradient based methods
  • Data-fit: 1

2yTK−1y

  • Complexity: 1

2log|K|

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-141
SLIDE 141

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-142
SLIDE 142

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-143
SLIDE 143

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-144
SLIDE 144

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-145
SLIDE 145

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-146
SLIDE 146

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-147
SLIDE 147

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-148
SLIDE 148

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-149
SLIDE 149

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-150
SLIDE 150

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-151
SLIDE 151

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-152
SLIDE 152

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-153
SLIDE 153

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-154
SLIDE 154

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-155
SLIDE 155

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-156
SLIDE 156

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-157
SLIDE 157

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-158
SLIDE 158

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-159
SLIDE 159

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-160
SLIDE 160

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-161
SLIDE 161

Introduction Recap Kernels Gaussian Processes References

Learning in Gaussian Processes6

L(θ) = 1 2yTK−1y + 1 2log|K| + N 2 log(2π)

6Bishop 2006, p. 6.4.3 Ek KTH DD2434 - Advanced Machine Learning

slide-162
SLIDE 162

Introduction Recap Kernels Gaussian Processes References

Summary

  • Kernels are covariance functions of data-points
  • Gaussian processes are priors over functions
  • GP’s allows us to average over all possible functions
  • Nothing different compared to Lecture 2, just a different prior!

Ek KTH DD2434 - Advanced Machine Learning

slide-163
SLIDE 163

Introduction Recap Kernels Gaussian Processes References

Summary

  • Kernels are covariance functions of data-points
  • Gaussian processes are priors over functions
  • GP’s allows us to average over all possible functions
  • Nothing different compared to Lecture 2, just a different prior!

Ek KTH DD2434 - Advanced Machine Learning

slide-164
SLIDE 164

Introduction Recap Kernels Gaussian Processes References

Summary

  • Kernels are covariance functions of data-points
  • Gaussian processes are priors over functions
  • GP’s allows us to average over all possible functions
  • Nothing different compared to Lecture 2, just a different prior!

Ek KTH DD2434 - Advanced Machine Learning

slide-165
SLIDE 165

Introduction Recap Kernels Gaussian Processes References

Summary

  • Kernels are covariance functions of data-points
  • Gaussian processes are priors over functions
  • GP’s allows us to average over all possible functions
  • Nothing different compared to Lecture 2, just a different prior!

Ek KTH DD2434 - Advanced Machine Learning

slide-166
SLIDE 166

Introduction Recap Kernels Gaussian Processes References

Next Time

Practical 1

  • November 6th 15-17 V1
  • My best friend the Gaussian

▶ derive Gaussian identities

  • Complete assignment Task 2.1

and 2.2

Ek KTH DD2434 - Advanced Machine Learning

slide-167
SLIDE 167

Introduction Recap Kernels Gaussian Processes References

Next Time

Practical 1

  • November 6th 15-17 V1
  • My best friend the Gaussian

▶ derive Gaussian identities

  • Complete assignment Task 2.1

and 2.2

Ek KTH DD2434 - Advanced Machine Learning

slide-168
SLIDE 168

Introduction Recap Kernels Gaussian Processes References

e.o.f.

Ek KTH DD2434 - Advanced Machine Learning

slide-169
SLIDE 169

Introduction Recap Kernels Gaussian Processes References

References I

Christopher M Bishop. Pattern recognition and machine learning. 2006. Christopher K I Williams. “Computation with Infinite Neural Networks”. In: Neural Computation 10 (July 1998),

  • pp. 1203–1216.

Ek KTH DD2434 - Advanced Machine Learning

slide-170
SLIDE 170

Ek KTH DD2434 - Advanced Machine Learning