Multiresolution Gaussian Processes Emily Fox ICERM 2012 - - PowerPoint PPT Presentation

multiresolution gaussian processes
SMART_READER_LITE
LIVE PREVIEW

Multiresolution Gaussian Processes Emily Fox ICERM 2012 - - PowerPoint PPT Presentation

Multiresolution Gaussian Processes Emily Fox ICERM 2012 Providence, RI Joint work with David Dunson (Duke) Goals Data from Neuronal Recordings Many :me series exhibit: 0.5 Long-range


slide-1
SLIDE 1

Multiresolution Gaussian Processes

Joint work with David Dunson (Duke)

Emily ¡Fox ¡ ICERM ¡2012 ¡ Providence, ¡RI ¡

slide-2
SLIDE 2

50 100 150 200 250 300 −1 −0.5 0.5

Time Observations

Data from Neuronal Recordings

Many ¡:me ¡series ¡exhibit: ¡

  • Long-­‑range ¡correla:ons ¡
  • Non-­‑Markovian ¡dynamics ¡

¡ In ¡a ¡mul:variate ¡seAng: ¡

  • Time-­‑varying ¡correla4ons ¡

Goals

Some:mes ¡also… ¡

  • Func:onal ¡data ¡analysis ¡

¡ ¡ ¡ ¡à ¡sharing ¡common ¡global ¡trend ¡

slide-3
SLIDE 3

Magnetoencephalography (MEG)

. . .

Helmet with 102 sensors

COW

slide-4
SLIDE 4

Magnetoencephalography (MEG)

. . .

Helmet with 102 sensors

COW

  • Long-range dependencies
  • Time-varying correlations
slide-5
SLIDE 5

Trial-to-Trial Variability

  • Data are noisy (low SNR)

§ Multiple trials recorded for each stimulus

  • Each trial records the same process

§ Capture common global trajectory § Allow trial-to-trial variability

  • Functional data analysis setting
slide-6
SLIDE 6

MEG Noise

slide-7
SLIDE 7

MEG Noise

slide-8
SLIDE 8

MEG Noise

slide-9
SLIDE 9

MEG Noise

slide-10
SLIDE 10

Build Word-Specific Model

Stimulus: w = HOUSE

yt ∼ N(µ(w)(xt), Σ(w)(xt))

Hierarchy captures trial-to-trial variability

slide-11
SLIDE 11

Build Word-Specific Model

Capturing heteroscedasticity is key

Stimulus: w = HOUSE

yt ∼ N(µ(w)(xt), Σ(w)(xt))

µ(x)

Time 1 Time 2 Time 3

Sensor 2 Sensor 1

Σ(x1) Σ(x2) Σ(x3) x1 = x2 = x3 =

slide-12
SLIDE 12

Build Word-Specific Model

Harness k-dim latent space

Stimulus: w = HOUSE

yt ∼ N(µ(w)(xt), Σ(w)(xt)) R102 Rk

slide-13
SLIDE 13

Low-Rank Covariance Evolution

X

λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)

Σ(x) = Λ(x)Λ(x) + Σ0

  • Matrix ¡of ¡“dic:onary ¡elements” ¡

§ E.g., ¡Gaussian ¡processes ¡ § ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡elements ¡

p × k

p × k

k << p

Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.

slide-14
SLIDE 14

Low-Rank Covariance Evolution

X

λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)

Σ(x) = Λ(x)Λ(x) + Σ0

X

+

Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.

slide-15
SLIDE 15

λ11(·) λ12(·) λ22(·) λ21(·) λp1(·)λp2(·)

One Step Further…

Σ(x) = Θξ(x)ξ(x)Θ + Σ0

X

              θ11 θ12 θ13 θ21 θ22 θ23 . . . . . . . . . . . . . . . . . . θp1 θp2 θp3              

Θ ξ(·)

ξ11(·) ξ12(·) ξ21(·) ξ22(·) ξ32(·) ξ31(·)

X Λ(·)

Fox and Dunson, “Bayesian Nonparametric Covariance Regression”, under review.

slide-16
SLIDE 16

Changing Correlations – MEG

102 sensors: Correlations between sensors change with processing of word “kick”

slide-17
SLIDE 17

Mean Hierarchy

µ(w)(x) µ(w,1)(x) µ(w,J)(x)

Trial 1 Trial J

(Note: defined in a k-dim space and projected up)

Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

slide-18
SLIDE 18

Data Collection

  • 4 word categories, 5 words per category
  • 20 repetitions per word (400 total)

§ 15 train/word (300 total) § 5 test/word (100 total)

Animals Tools Food Buildings

Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

slide-19
SLIDE 19

Classification Performance

Fyshe, Fox, Dunson, and Mitchell, “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

slide-20
SLIDE 20

50 100 150 200 250 300 −1 −0.5 0.5

Time Observations

MEG Data – 1 Sensor

3 trials, 1 sensor Yes: ¡

  • Long-­‑range ¡correla:ons ¡
  • Non-­‑Markovian ¡dynamics ¡

What ¡we ¡missed: ¡

  • Abrupt ¡changes ¡
  • Locally ¡sta4onary ¡dynamics ¡

Long-­‑range ¡correla:ons ¡span ¡changepoints ¡

slide-21
SLIDE 21

50 100 150 200 250 300 −1 −0.5 0.5

Time Observations

MEG Data – 1 Sensor

3 trials, 1 sensor

50 100 150 200 250 300 50 100 150 200 250 300

Sample Correlation Matrix

(20 trials)

Key ¡features: ¡

  • Long-­‑range ¡correla:ons ¡
  • Abrupt ¡changes ¡
  • Locally ¡smooth ¡

Time Time

slide-22
SLIDE 22

GPs on Nested Partition

Parent ¡func+on: ¡

  • Smooth ¡global ¡trajectory ¡
  • Long-­‑range ¡correla:ons ¡
  • Non-­‑Markovian ¡dynamics ¡
  • Sta4onary ¡

x3 x1x2 xn . . .

x

f 0(x) ∼ N(0, K0)

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-23
SLIDE 23

GPs on Nested Partition

A1

1

A1

2

changepoint = break in stationarity

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-24
SLIDE 24

GPs on Nested Partition

A1

1

A1

2

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-25
SLIDE 25

GPs on Nested Partition

f 1(A1

2) ∼ GP(f 0(A1 2), c1 2)

A1

1

A1

2

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-26
SLIDE 26

GPs on Nested Partition

f 1(A1

2) ∼ GP(f 0(A1 2), c1 2)

A1

1

A1

2

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1)

f 1(x) | f 0 ∼ N(0, K1)

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-27
SLIDE 27

GPs on Nested Partition

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 1(A1

2) ∼ GP(f 0(A1 2), c1 2)

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1) Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-28
SLIDE 28

GPs on Nested Partition

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 1(A1

2) ∼ GP(f 0(A1 2), c1 2)

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1)

. . . . . .

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-29
SLIDE 29

GPs on Nested Partition

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 1(A1

2) ∼ GP(f 0(A1 2), c1 2)

f 1(A1

1) ∼ GP(f 0(A1 1), c1 1)

f `(x) | f `−1 ∼ N(0, K`)

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-30
SLIDE 30

GPs on Nested Partition

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

g = f L

. . .

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-31
SLIDE 31

GPs on Nested Partition

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

g = f L

. . .

Fox and Dunson, “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

slide-32
SLIDE 32

Induced Marginal GP

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

Conditioned on partition, marginalize GPs

slide-33
SLIDE 33

Induced Marginal GP

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

Equivalent to GP with partition-dependent (non-stationary) covariance function

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

slide-34
SLIDE 34

Correlation Structure

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

cA0(xi, xj) cA1

1(xi, xj)

xi yi yj xj

locations

  • bservations

corr(yi, yj | A) + +

slide-35
SLIDE 35

Correlation Structure

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

corr(yi, yj | A) = PLij

`=0 c` r`

i (xi, xj)

r (σ2 + PL−1

`=0 c` r`

i (xi, xi))(σ2 + PL−1

`=0 c` r`

j(xj, xj))

Lowest tree level in same partition set

cA0(xi, xj) cA1

1(xi, xj)

+ +

  • Correlation spans

changepoints

  • Higher corr for sharing

more partition sets

corr(yi, yj | A)

slide-36
SLIDE 36

Covariance Function – Length scale

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

Length-­‑scale ¡hyperparam: ¡

  • Fractal-­‑like ¡smoothness ¡
  • Locally ¡as ¡smooth ¡as ¡parent ¡fcn ¡
  • Lower ¡levels ¡capture ¡more ¡detail ¡
  • Only ¡one ¡param ¡
slide-37
SLIDE 37

Covariance Function – Variance

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

Variance ¡hyperparam: ¡

  • Decreasing ¡variability ¡from ¡parent ¡
  • Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡
  • Lower ¡levels ¡are ¡less ¡influen:al ¡
slide-38
SLIDE 38

Covariance Function – Variance

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

Variance ¡hyperparam: ¡

  • Decreasing ¡variability ¡from ¡parent ¡
  • Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡
  • Lower ¡levels ¡are ¡less ¡influen:al ¡

Resulting function is similar to higher level function despite adding changepoints

. . .

slide-39
SLIDE 39

Balanced Binary Trees

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A0

=

A2

3

A1

1

slide-40
SLIDE 40

Related Methods

A2

1

A2

2

A2

3

A2

4

Treed GP, Gramacy and Lee 2008 Kim, Mallick, Holmes 2005 GP changepoint models Saatci, Turner, Rasmussen 2010

A2

1

A2

2

A2

3

A2

4

Mixture of GP experts, Meeds and Osindero 2006 Rasmussen and Ghahramani 2002 Phylogenies of GPs Jones and Moriarty 2011 Henao and Lucas 2012 Function-Valued Observations

=

A2

3

A1

1

slide-41
SLIDE 41

Related Methods

A2

1

A2

2

A2

3

A2

4

Treed GP, Gramacy and Lee 2008 Kim, Mallick, Holmes 2005 GP changepoint models Saatci, Turner, Rasmussen 2010

A2

1

A2

2

A2

3

A2

4

Mixture of GP experts, Meeds and Osindero 2006 Rasmussen and Ghahramani 2002 Multiscale Gaussian models c.f., Willsky 2002

=

slide-42
SLIDE 42

Multiple Trials

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 0

Multiresolution GP

slide-43
SLIDE 43

Multiple Trials – Example for MEG

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 0

Shared parent function Shared partition

Multiresolution GP

Trail- specific process

j = 1, . . . , J

slide-44
SLIDE 44

Multiple Trials – Example for MEG

f 0

Shared parent function

. . .

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(1)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(2)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(J)

slide-45
SLIDE 45

Multiple Trials – Example for MEG

f 0

Shared parent function

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

. . .

y(1) y(2) y(J)

Z dx Z dx Z dx Z dx Z dx Z dx

slide-46
SLIDE 46

Multiple Trials – Example for MEG

f 0

Shared parent function

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

. . .

y(1) y(2) y(J)

Z dx Z dx Z dx Z dx Z dx Z dx

f0 A

y(1) y(2) y(J)

. . .

Shared parent function Shared partition Conditionally independent trials

slide-47
SLIDE 47

Draw from Prior

50 100 150 200 −4 −2 2 4 6 8

Time Observations

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

50 100 150 200 250 300 −1 −0.5 0.5

Time Observations

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

(20 trials)

Sample Corr. Matrix

(100 trials)

MEG data Sim. data

slide-48
SLIDE 48

Conditioned on the Partition…

  • Posterior global trajectory

Shared parent function

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(1)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(2)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(J)

. . .

Z dx Z dx Z dx Z dx Z dx Z dx

slide-49
SLIDE 49

Conditioned on the Partition…

  • Posterior global trajectory
  • Posterior predictive distribution of new trial

Shared parent function

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(1)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(2)

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

y(J)

. . .

Z dx Z dx Z dx Z dx Z dx Z dx Z dx

slide-50
SLIDE 50

Conditioned on the Partition…

  • Posterior global trajectory
  • Posterior predictive distribution of new trial
  • Marginal (conditional) likelihood

Key to inference of nested partition!

slide-51
SLIDE 51

Independence Chain MCMC

  • Likelihood:
slide-52
SLIDE 52

Independence Chain MCMC

  • Likelihood:
  • Prior:

§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info

p(A) = Y

i

F(zi)

zi

Throw down 2^L -1 points according to F Deterministically merge to form partition A

slide-53
SLIDE 53

Independence Chain MCMC

  • Likelihood:
  • Prior:

§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info

  • Proposal: ????

nested partition = balanced binary tree

p(A) = Y

i

F(zi)

zi

slide-54
SLIDE 54

Inference of Hierarchical Partition

  • Stochastic tree search

tends to be inefficient

  • Can harness specific

correlation structure

  • Want method to

(hierarchically) find drops in correlation

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

slide-55
SLIDE 55

Inference of Hierarchical Partition

cut 1 cut 2 cut 2 TIME

  • Stochastic tree search

tends to be inefficient

  • Can harness specific

correlation structure

  • Want method to

(hierarchically) find drops in correlation

  • Think of problem as

graph cutting

§ Node = time step § Edge = correlation

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

slide-56
SLIDE 56

Normalized Cuts (Shi & Malik 2000)

cut 1 cut 2 cut 2 TIME

  • Normalized cuts balances:

§ Amount of edge weight cut § Connectivity of component

  • Cost matrix =

sample correlation matrix

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

W = abs(corr(Y ))

slide-57
SLIDE 57

Normalized Cuts

cut 1 cut 2 cut 2 TIME

  • Normalized cuts balances:

§ Amount of edge weight cut § Connectivity of component

  • Cost matrix =

sample correlation matrix

  • Cost of cut =

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix cutpoint

W = abs(corr(Y ))

ncut(A, B) = W ✓ 1 W + 1 W ◆

Encourages cutting small edge weights Penalizes cutting disconnected components

slide-58
SLIDE 58

Normalized Cuts

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

  • Normalized cuts balances:

§ Amount of edge weight cut § Connectivity of component

  • Cost matrix =

sample correlation matrix

  • Cost of cut =
  • Hierarchically perform cuts

W = abs(corr(Y ))

ncut(A, B) = W ✓ 1 W + 1 W ◆

slide-59
SLIDE 59

Normalized Cuts

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

50 100 150 200 250 300 50 100 150 200 250 300

Normalized Cuts Partition

(recursive minimization)

slide-60
SLIDE 60

Normalized Cuts Proposal

  • Instead of recursive min

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

ncut(A, B) cutpoint

Always chooses this cutpoint

slide-61
SLIDE 61

Normalized Cuts Proposal

  • Instead of recursive min
  • Use ncuts metric as proposal

50 100 150 200 250 300 50 100 150 200 250 300

Sample Corr. Matrix

ncut(A, B) cutpoint

slide-62
SLIDE 62

Independence Chain MCMC

  • Likelihood:
  • Prior:

§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info

  • Proposal:

p(A) = Y

i

F(zi)

zi

Complexity O(n3) Complexity O(n2(L-1))

slide-63
SLIDE 63

Independence Chain MCMC

  • Likelihood:
  • Prior:

§ Define distribution on changepoints (level-independent) § Easy to define uniform distribution on trees and elicit prior info

  • Proposal:
  • Can also interleave local node repartition proposals

instead of global partition proposals

p(A) = Y

i

F(zi)

zi

slide-64
SLIDE 64

Node Proposals

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A0

=

A3

1 A3 2 A3 3 A3 4

A3

5A3 6A3 7

A3

8

slide-65
SLIDE 65

Node Proposals

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A0

=

A3

1 A3 2 A3 3 A3 4

A3

5A3 6A3 7

A3

8

slide-66
SLIDE 66

Node Proposals

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A0

=

A3

1 A3 2 A3 3 A3 4

A3

5A3 6A3 7

A3

8

slide-67
SLIDE 67

Node Proposals

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

A0

=

A3

1 A3 2 A3 3 A3 4

A3

5A3 6A3 7

A3

8

Equivalent to global repartition proposal!

slide-68
SLIDE 68

Simulated Data

50 100 150 200 −4 −2 2 4 6 8

Time Observations

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Sample Corr. Matrix

(100 trials)

slide-69
SLIDE 69

Simulated Data

50 100 150 200 −4 −2 2 4 6 8

Time Observations

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

True Partition Sample Corr. Matrix

(100 trials)

slide-70
SLIDE 70

Simulated Data

50 100 150 200 −4 −2 2 4 6 8

Time Observations

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

True Partition Ncuts Partition Sample Corr. Matrix

(100 trials)

slide-71
SLIDE 71

Simulated Data

50 100 150 200 −4 −2 2 4 6 8

Time Observations

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

True Partition Ncuts Partition MAP Partition Sample Corr. Matrix

(100 trials)

slide-72
SLIDE 72

MEG Data

  • 10 words
  • 20 repetitions per word

§ 15 training/word § 5 test/word

  • Examine one

multiresolution GP per word per sensor

Buildings Tools Apartment Chisel Barn Hammer Church Pliers Igloo Saw House Screwdriver

slide-73
SLIDE 73

MEG Changepoints – Level 1

slide-74
SLIDE 74

MEG Changepoints – Level 1

STIMULUS ONSET

slide-75
SLIDE 75

MEG Changepoints – Level 1

STIMULUS ONSET n100

slide-76
SLIDE 76

MEG Changepoints – Level 1

STIMULUS ONSET n100 semantic processing

slide-77
SLIDE 77

MEG Changepoints – Level 1

STIMULUS ONSET n100 semantic processing

slide-78
SLIDE 78

Baselines – Single and Hierarchical GPs

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 0 g(j)

j = 1, . . . , J

f 0 g(j)

j = 1, . . . , J

f 0

Multiresolution GP Hierarchical GP Single GP

(cf., Fyshe et al. AISTATS 2012)

slide-79
SLIDE 79

Baselines – Single and Hierarchical GPs

A1

1

A1

2

A2

1

A2

2

A2

3

A2

4

f 0 g(j)

j = 1, . . . , J

f 0 g(j)

j = 1, . . . , J

f 0

Multiresolution GP Hierarchical GP Single GP

(cf., Fyshe et. al. AISTATS 2012)

No partition No trial-to-trail variability No partition

slide-80
SLIDE 80

Decrease in MSE

100 150 200 250 300 −5 5 10 15 20 25

Conditioning Point % Decrease in MSE v. GP

Visual Frontal Parietal Temporal

% Decrease in MSE per lobe

mGP vs. GP

100 200 300 400 −15 −10 −5 5 10

Time (sec) Observations

test mGP hGP

CONDITIONED PREDICTION conditioning point

slide-81
SLIDE 81

100 150 200 250 300 −5 5 10 15 20 25

Conditioning Point % Decrease in MSE v. hGP

Visual Frontal Parietal Temporal

Decrease in MSE

% Decrease in MSE per lobe

mGP vs. hGP

100 150 200 250 300 −5 5 10 15 20 25

Conditioning Point % Decrease in MSE v. GP

Visual Frontal Parietal Temporal

mGP vs. GP

slide-82
SLIDE 82

Entire Heldout Prediction

0.5 1 1.5 −20 −10 10

Time (sec) Observations

MLE hGP mGP

slide-83
SLIDE 83

Wavelet-based Functional Mixed Models

Morris & Carrol 2006 (JRSS B)

  • Allows spiky trajectories
  • Model related functions
  • Notes:

§ Assume regular grid of obs § Can cope with multivariate

setting (not used here)

−8 −7 −6 −5 x 10

4

w f m m m G P Heldout Log Likelihood

  • Examine each word and sensor independently
  • Compute heldout likelihood of 5 entire trials
slide-84
SLIDE 84

50 100 150 200 250 300 50 100 150 200 250 300

Key ¡features: ¡

  • Long-­‑range ¡correla:ons ¡
  • Abrupt ¡changes ¡
  • Locally ¡smooth ¡

Summary

Addi:onally: ¡

  • Func:onal ¡data ¡analysis ¡

¡ ¡ ¡ ¡à ¡sharing ¡common ¡global ¡trend ¡

  • Irregular ¡grid ¡of ¡observa:ons ¡
  • Tractability ¡and ¡interpretability ¡

50 100 150 200 250 300 −1 −0.5 0.5

Time Observations

50 100 150 200 250 300 50 100 150 200 250 300

slide-85
SLIDE 85

Extensions

  • Mul:variate ¡seAngs ¡

§ Input ¡spaces ¡ § Output ¡spaces ¡

  • Hierarchical ¡dependence ¡structures ¡

§ Par:al ¡sharing ¡of ¡parents ¡in ¡the ¡tree ¡ § mGP ¡factor ¡models ¡

  • Incorporate ¡mGP ¡in ¡a ¡func:onal ¡ANOVA ¡framework ¡
  • Theore:cal ¡analysis ¡

§ Posterior ¡consistency ¡

              θ11 θ12 θ13 θ21 θ22 θ23 . . . . . . . . . . . . . . . . . . θp1 θp2 θp3              

Θ ξ(·)

ξ11(·) ξ12(·) ξ21(·) ξ22(·) ξ32(·) ξ31(·)

X

  • Prior on multivariate partitions
  • Partition proposals:

spectral clustering using graph Laplacian