Introduction to Nonparametric Bayesian Modeling and Gaussian Process - - PowerPoint PPT Presentation

introduction to nonparametric bayesian modeling and
SMART_READER_LITE
LIVE PREVIEW

Introduction to Nonparametric Bayesian Modeling and Gaussian Process - - PowerPoint PPT Presentation

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course: lecture 3) Nov 07, 2015 Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression


slide-1
SLIDE 1

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression

Piyush Rai

  • Dept. of CSE, IIT Kanpur

(Mini-course: lecture 3) Nov 07, 2015

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 1

slide-2
SLIDE 2

Recap

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 2

slide-3
SLIDE 3

Optimization vs Inference

All ML problems require estimating parameters given data. Primarily two views:

  • 1. Learning as Optimization

Parameter θ is a fixed unknown Seeks a point estimate (single best answer) for θ ˆ θ = arg min

θ Loss(D; θ)

subject to constraints on θ Probabilistic methods such as MLE and MAP also fall in this category

  • 2. Learning as (Bayesian) Inference

Parameter θ is a random variable with a prior distribution P(θ) Seeks a posterior distribution over the parameters P(θ | D) = P(D | θ)P(θ) P(D)

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 3

slide-4
SLIDE 4

Bayesian Learning

Prior distribution specifies our prior belief/knowledge about parameters θ Bayesian inference updates the prior and gives the posterior

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 4

slide-5
SLIDE 5

Bayesian Learning

Prior distribution specifies our prior belief/knowledge about parameters θ Bayesian inference updates the prior and gives the posterior

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 4

slide-6
SLIDE 6

Bayesian Learning

Prior distribution specifies our prior belief/knowledge about parameters θ Bayesian inference updates the prior and gives the posterior

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 4

slide-7
SLIDE 7

Bayesian Learning

Prior distribution specifies our prior belief/knowledge about parameters θ Bayesian inference updates the prior and gives the posterior

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 4

slide-8
SLIDE 8

Bayesian Learning

Prior distribution specifies our prior belief/knowledge about parameters θ Bayesian inference updates the prior and gives the posterior

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 4

slide-9
SLIDE 9

Why be Bayesian?

Posterior P(θ|D) quantifies uncertainty in the parameters

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 5

slide-10
SLIDE 10

Why be Bayesian?

Posterior P(θ|D) quantifies uncertainty in the parameters More robust predictions by averaging over the posterior P(θ|D) P(dtest|ˆ θ) vs P(dtest|D) =

  • P(dtest|θ)P(θ|D)dθ

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 5

slide-11
SLIDE 11

Why be Bayesian?

Posterior P(θ|D) quantifies uncertainty in the parameters More robust predictions by averaging over the posterior P(θ|D) P(dtest|ˆ θ) vs P(dtest|D) =

  • P(dtest|θ)P(θ|D)dθ

Allows inferring hyperparameters of the model and doing model comparison

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 5

slide-12
SLIDE 12

Why be Bayesian?

Posterior P(θ|D) quantifies uncertainty in the parameters More robust predictions by averaging over the posterior P(θ|D) P(dtest|ˆ θ) vs P(dtest|D) =

  • P(dtest|θ)P(θ|D)dθ

Allows inferring hyperparameters of the model and doing model comparison Offers a natural way for informed data acquisition (active learning)

Can use the predictive posterior of unseen data points to guide data selection

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 5

slide-13
SLIDE 13

Why be Bayesian?

Posterior P(θ|D) quantifies uncertainty in the parameters More robust predictions by averaging over the posterior P(θ|D) P(dtest|ˆ θ) vs P(dtest|D) =

  • P(dtest|θ)P(θ|D)dθ

Allows inferring hyperparameters of the model and doing model comparison Offers a natural way for informed data acquisition (active learning)

Can use the predictive posterior of unseen data points to guide data selection

Can do nonparametric Bayesian modeling

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 5

slide-14
SLIDE 14

Nonparametric Bayesian Learning

How big/complex my model should be? How many parameters suffice?

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 6

slide-15
SLIDE 15

Nonparametric Bayesian Learning

How big/complex my model should be? How many parameters suffice? Model-selection or cross-validation, can often be expensive and impractical

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 6

slide-16
SLIDE 16

Nonparametric Bayesian Learning

How big/complex my model should be? How many parameters suffice? Model-selection or cross-validation, can often be expensive and impractical Nonparametric Bayesian Models: Allow unbounded number of parameters

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 6

slide-17
SLIDE 17

Nonparametric Bayesian Learning

How big/complex my model should be? How many parameters suffice? Model-selection or cross-validation, can often be expensive and impractical Nonparametric Bayesian Models: Allow unbounded number of parameters

The model can grow/shrink adaptively as we observe more and more data

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 6

slide-18
SLIDE 18

Nonparametric Bayesian Learning

How big/complex my model should be? How many parameters suffice? Model-selection or cross-validation, can often be expensive and impractical Nonparametric Bayesian Models: Allow unbounded number of parameters

The model can grow/shrink adaptively as we observe more and more data We “let the data speak” how complex the model needs to be

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 6

slide-19
SLIDE 19

What’s a Nonparametric Bayesian Model?

An NPBayes model is NOT a model with no parameters! It has potentially infinite many (unbounded number of) parameters It has the ability to “create” new parameters if data requires so..

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 7

slide-20
SLIDE 20

What’s a Nonparametric Bayesian Model?

An NPBayes model is NOT a model with no parameters! It has potentially infinite many (unbounded number of) parameters It has the ability to “create” new parameters if data requires so.. Some non-Bayesian models are also nonparametric. For example: nearest neighbor regression/classification, kernel SVMs, kernel density estimation

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 7

slide-21
SLIDE 21

What’s a Nonparametric Bayesian Model?

An NPBayes model is NOT a model with no parameters! It has potentially infinite many (unbounded number of) parameters It has the ability to “create” new parameters if data requires so.. Some non-Bayesian models are also nonparametric. For example: nearest neighbor regression/classification, kernel SVMs, kernel density estimation NPBayes models offer the benefits of both Bayesian modeling and nonparametric modeling

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 7

slide-22
SLIDE 22

Examples of NPBayes Models

Some modeling problems and NPBayes models of choice1:

1Table courtesy: Zoubin Ghahramani

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 8

slide-23
SLIDE 23

Gaussian Process

A Gaussian Process (GP) is a distribution over functions f : f ∼ GP(µ, Σ) .. such that f ’s value at a finite set of points ①1, . . . , ①N is jointly Gaussian {f (①1), f (①2), . . . , f (①N)} ∼ N(µ, K) ① ①

① ①

① ①

① ①

① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 9

slide-24
SLIDE 24

Gaussian Process

A Gaussian Process (GP) is a distribution over functions f : f ∼ GP(µ, Σ) .. such that f ’s value at a finite set of points ①1, . . . , ①N is jointly Gaussian {f (①1), f (①2), . . . , f (①N)} ∼ N(µ, K) If µ = 0, a GP is fully specified by its covariance (kernel) matrix K ① ①

① ①

① ①

① ①

① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 9

slide-25
SLIDE 25

Gaussian Process

A Gaussian Process (GP) is a distribution over functions f : f ∼ GP(µ, Σ) .. such that f ’s value at a finite set of points ①1, . . . , ①N is jointly Gaussian {f (①1), f (①2), . . . , f (①N)} ∼ N(µ, K) If µ = 0, a GP is fully specified by its covariance (kernel) matrix K Covariance matrix defined by a kernel function k(①n, ①m). Some examples:

k(①n, ①m) = exp

  • − ||①n−①m||2

2σ2

  • : Gaussian kernel

k(①n, ①m) = v0 exp

  • |①n−①m|

r

α + v1 + v2δnm

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 9

slide-26
SLIDE 26

Gaussian Process

A Gaussian Process (GP) is a distribution over functions f : f ∼ GP(µ, Σ) .. such that f ’s value at a finite set of points ①1, . . . , ①N is jointly Gaussian {f (①1), f (①2), . . . , f (①N)} ∼ N(µ, K) If µ = 0, a GP is fully specified by its covariance (kernel) matrix K Covariance matrix defined by a kernel function k(①n, ①m). Some examples:

k(①n, ①m) = exp

  • − ||①n−①m||2

2σ2

  • : Gaussian kernel

k(①n, ①m) = v0 exp

  • |①n−①m|

r

α + v1 + v2δnm

GP based modeling also allows learning the kernel hyperparameters from data

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 9

slide-27
SLIDE 27

Gaussian Process

Left: some functions drawn from a GP prior N(0, K) Right: posterior over these functions after observing 5 examples {①n, yn}

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 10

slide-28
SLIDE 28

Gaussian Process Regression

Training data: {①n, yn}N

n=1. Response is a noisy function of the input

yn = f (①n) + ǫn Assume a zero-mean Gaussian error p(ǫ|σ2) = N(ǫ|0, σ2) Leads to a Gaussian likelihood model for the responses p(yn|f (①n)) = N(yn|f (①n), σ2) ② ① ① ② ②

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 11

slide-29
SLIDE 29

Gaussian Process Regression

Training data: {①n, yn}N

n=1. Response is a noisy function of the input

yn = f (①n) + ǫn Assume a zero-mean Gaussian error p(ǫ|σ2) = N(ǫ|0, σ2) Leads to a Gaussian likelihood model for the responses p(yn|f (①n)) = N(yn|f (①n), σ2) Denote ② = [y1, . . . , yN]⊤ ∈ RN, f = [f (①1), . . . , f (①N)]⊤ ∈ RN and write p(②|f) = N(②|f, σ2IN)

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 11

slide-30
SLIDE 30

Gaussian Process Regression

Training data: {①n, yn}N

n=1. Response is a noisy function of the input

yn = f (①n) + ǫn Assume a zero-mean Gaussian error p(ǫ|σ2) = N(ǫ|0, σ2) Leads to a Gaussian likelihood model for the responses p(yn|f (①n)) = N(yn|f (①n), σ2) Denote ② = [y1, . . . , yN]⊤ ∈ RN, f = [f (①1), . . . , f (①N)]⊤ ∈ RN and write p(②|f) = N(②|f, σ2IN) In GP regression, we assume f drawn from a GP p(f) = N(f|0, K)

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 11

slide-31
SLIDE 31

Gaussian Process Regression

The likelihood model p(②|f) = N(②|f, σ2IN) The prior distribution p(f) = N(f|0, K) The marginal distribution over the responses ② p(②) =

  • p(②|f)p(f)df = N(②|0, σ2IN + K)

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 12

slide-32
SLIDE 32

Gaussian Process Regression

The likelihood model p(②|f) = N(②|f, σ2IN) The prior distribution p(f) = N(f|0, K) The marginal distribution over the responses ② p(②) =

  • p(②|f)p(f)df = N(②|0, σ2IN + K)

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 12

slide-33
SLIDE 33

Making Predictions

Recall, the marginal distribution over the responses ② = [y1, . . . , yN] p(②) = N(②|0, σ2IN + K) = N(②|0, CN) Adding the response y∗ of a new test point ①∗ p([②, y∗]) = N([②, y∗]|0, CN+1) where the (N + 1) × (N + 1) matrix CN+1 is given by CN+1 =

  • C

k∗ k∗

c

  • and k∗ = [k(①∗, ①1), . . . , k(①∗, ①N)], c = k(①∗, ①∗) + σ2

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 13

slide-34
SLIDE 34

Making Predictions on Test Data

Recall p([②, y∗]) = N([②, y∗]|0, CN+1). The predictive distribution will be p(y∗|②) = p([②, y∗]) p(②) p(y∗|②) = N(y∗|m(①∗), σ2(①∗)) m(①∗) = k∗

⊤C−1 N ②

σ2(①∗) = c − k∗

⊤C−1 N k∗

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 14

slide-35
SLIDE 35

Making Predictions on Test Data

Recall p([②, y∗]) = N([②, y∗]|0, CN+1). The predictive distribution will be p(y∗|②) = p([②, y∗]) p(②) p(y∗|②) = N(y∗|m(①∗), σ2(①∗)) m(①∗) = k∗

⊤C−1 N ②

σ2(①∗) = c − k∗

⊤C−1 N k∗

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 14

slide-36
SLIDE 36

Making Predictions on Test Data

Recall p([②, y∗]) = N([②, y∗]|0, CN+1). The predictive distribution will be p(y∗|②) = p([②, y∗]) p(②) p(y∗|②) = N(y∗|m(①∗), σ2(①∗)) m(①∗) = k∗

⊤C−1 N ②

σ2(①∗) = c − k∗

⊤C−1 N k∗

Note that for GP regression, exact inference is possible at test time!

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 14

slide-37
SLIDE 37

Interpreting GP predictions..

Let’s look at the predictions made by GP regression p(y∗|②) = N(y∗|m(①∗), σ2(①∗)) m(①∗) = k∗

⊤C−1 N ②

σ2(①∗) = c − k∗

⊤C−1 N k∗

Two interpretations for the mean prediction m(①∗)

An SVM like interpretation m(①∗) = k∗

⊤C−1 N ② = k∗ ⊤α = N

  • n=1

k(①∗, ①n)αn where α is akin to the weights of support vectors A nearest neighbors interpretation m(①∗) = k∗

⊤C−1 N ② = ✇ ⊤② = N

  • n=1

wnyn where ✇ is akin to the weights of the neighbors

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 15

slide-38
SLIDE 38

Inferring Hyperparameters

Recall, the marginal distribution over the responses ② = [y1, . . . , yN] p(②|σ2, θ) = N(②|0, σ2IN + Kθ) Can maximize the (log) marginal likelihood w.r.t. σ2 and the kernel hyperparameterss θ and get point estimates of the hyperparameters log p(②|σ2, θ) = −1 2 log |σ2IN + Kθ| − 1 2② ⊤(σ2IN + Kθ)−1② + const

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 16

slide-39
SLIDE 39

Inferring Hyperparameters

Recall, the marginal distribution over the responses ② = [y1, . . . , yN] p(②|σ2, θ) = N(②|0, σ2IN + Kθ) Can maximize the (log) marginal likelihood w.r.t. σ2 and the kernel hyperparameterss θ and get point estimates of the hyperparameters log p(②|σ2, θ) = −1 2 log |σ2IN + Kθ| − 1 2② ⊤(σ2IN + Kθ)−1② + const Note: Can also put hyperpriors on the hyperparameters and infer the hyperparameters in a fully Bayesian manner

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 16

slide-40
SLIDE 40

Gaussian Process Classification

Non-binary labels (multiclass, counts, etc.) can also be easily handled

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 17

slide-41
SLIDE 41

GP vs (Kernel) SVM

The objective function of a soft-margin SVM looks like 1 2||✇||2 + C

N

  • n=1

(1 − ynfn)+ where fn = ✇ ⊤①n and yn is the true label for ①n ① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 18

slide-42
SLIDE 42

GP vs (Kernel) SVM

The objective function of a soft-margin SVM looks like 1 2||✇||2 + C

N

  • n=1

(1 − ynfn)+ where fn = ✇ ⊤①n and yn is the true label for ①n Kernel SVM: fn = N

m=1 αmk(①n, ①m). Denote f = [f1, . . . , fN]⊤ ✇

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 18

slide-43
SLIDE 43

GP vs (Kernel) SVM

The objective function of a soft-margin SVM looks like 1 2||✇||2 + C

N

  • n=1

(1 − ynfn)+ where fn = ✇ ⊤①n and yn is the true label for ①n Kernel SVM: fn = N

m=1 αmk(①n, ①m). Denote f = [f1, . . . , fN]⊤

We can write ||✇||2

2

= α⊤Kα = f⊤K−1f, and kernel SVM objective becomes 1 2f⊤K−1f + C

N

  • n=1

(1 − ynfn)+

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 18

slide-44
SLIDE 44

GP vs (Kernel) SVM

The objective function of a soft-margin SVM looks like 1 2||✇||2 + C

N

  • n=1

(1 − ynfn)+ where fn = ✇ ⊤①n and yn is the true label for ①n Kernel SVM: fn = N

m=1 αmk(①n, ①m). Denote f = [f1, . . . , fN]⊤

We can write ||✇||2

2

= α⊤Kα = f⊤K−1f, and kernel SVM objective becomes 1 2f⊤K−1f + C

N

  • n=1

(1 − ynfn)+ Negative log of the likelihood p(f|X) of a GP can be written as 1 2f⊤K−1f −

N

  • n=1

log p(yn|fn) + const

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 18

slide-45
SLIDE 45

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs

① ① ① ① ① ① ① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-46
SLIDE 46

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs Both GP and SVM need dealing with (storing/inverting) large kernel matrices

Various approximations proposed to address this issue (applicable to both) ① ① ① ① ① ① ① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-47
SLIDE 47

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs Both GP and SVM need dealing with (storing/inverting) large kernel matrices

Various approximations proposed to address this issue (applicable to both)

Ability to learn the kernel hyperparameters in GP is very useful, e.g.,

① ① ① ① ① ① ① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-48
SLIDE 48

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs Both GP and SVM need dealing with (storing/inverting) large kernel matrices

Various approximations proposed to address this issue (applicable to both)

Ability to learn the kernel hyperparameters in GP is very useful, e.g.,

Learning the kernel bandwidth for Gaussian kernels k(①n, ①m) = exp

  • −||①n − ①m||2

2σ2

① ① ①

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-49
SLIDE 49

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs Both GP and SVM need dealing with (storing/inverting) large kernel matrices

Various approximations proposed to address this issue (applicable to both)

Ability to learn the kernel hyperparameters in GP is very useful, e.g.,

Learning the kernel bandwidth for Gaussian kernels k(①n, ①m) = exp

  • −||①n − ①m||2

2σ2

  • Doing feature selection (via Automatic Relevance Determination)

k(①n, ①m) = exp

D

  • d=1

(①nd − ①md)2 2σd 2

  • Piyush Rai (IIT Kanpur)

Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-50
SLIDE 50

GP vs (Kernel) SVM

Thus GPs can be interpreted as a Bayesian analogue of kernel SVMs Both GP and SVM need dealing with (storing/inverting) large kernel matrices

Various approximations proposed to address this issue (applicable to both)

Ability to learn the kernel hyperparameters in GP is very useful, e.g.,

Learning the kernel bandwidth for Gaussian kernels k(①n, ①m) = exp

  • −||①n − ①m||2

2σ2

  • Doing feature selection (via Automatic Relevance Determination)

k(①n, ①m) = exp

D

  • d=1

(①nd − ①md)2 2σd 2

  • Learning compositions of kernels for more flexible modeling

K = Kθ1 + Kθ2 + . . .

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 19

slide-51
SLIDE 51

Other Usage of GP

Nonlinear Dimensionality Reduction: Gaussian Process Latent Variable Models Bayesian Optimization: Optimizing functions that have an unknown functional form and are expensive to evaluate Deep Gaussian Processes: Data assumed to be an output of a multivariate GP, inputs to each GP are outputs of another GP, and so on.. Many applications: Robotics and control, vision, spatial statistics, and so on..

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 20

slide-52
SLIDE 52

Resources on Gaussian Processes

Book: Gaussian Processes for Machine Learning (freely available online) MATLAB Packages: Useful to play with, build applications, extend existing models and inference algorithms for GPs (both regression and classification)

GPML: http://www.gaussianprocess.org/gpml/code/matlab/doc/ GPStuff: http://research.cs.aalto.fi/pml/software/gpstuff/

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 21

slide-53
SLIDE 53

Next Talk

Nonparametric Bayesian models for mixture modeling (clustering): Dirichlet Processes and Chinese Restaurant Process Nonparametric Bayesian models for latent factor modeling (dimensionality reduction): Beta Processes and Indian Buffet Process

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 22

slide-54
SLIDE 54

Thanks! Questions?

Piyush Rai (IIT Kanpur) Nonparametric Bayesian Modeling and Gaussian Process Regression 23