multiresolution gaussian processes
play

Multiresolution Gaussian Processes Emily Fox ICERM 2012 - PowerPoint PPT Presentation

Multiresolution Gaussian Processes Emily Fox ICERM 2012 Providence, RI Joint work with David Dunson (Duke) Goals Data from Neuronal Recordings Many :me series exhibit: 0.5 Long-range


  1. Multiresolution Gaussian Processes Emily ¡Fox ¡ ICERM ¡2012 ¡ Providence, ¡RI ¡ Joint work with David Dunson (Duke)

  2. Goals Data from Neuronal Recordings Many ¡:me ¡series ¡exhibit: ¡ 0.5 • Long-­‑range ¡ correla:ons ¡ Observations 0 • Non-­‑Markovian ¡ dynamics ¡ ¡ − 0.5 In ¡a ¡mul:variate ¡seAng: ¡ • Time-­‑varying ¡correla4ons ¡ − 1 0 50 100 150 200 250 300 Time Some:mes ¡also… ¡ • Func:onal ¡data ¡analysis ¡ ¡ ¡ ¡ ¡ à ¡sharing ¡common ¡global ¡trend ¡

  3. Magnetoencephalography (MEG) Helmet with 102 sensors COW . . .

  4. Magnetoencephalography (MEG) Helmet with 102 sensors • Long-range dependencies • Time-varying correlations COW . . .

  5. Trial-to-Trial Variability • Data are noisy (low SNR) § Multiple trials recorded for each stimulus • Each trial records the same process § Capture common global trajectory § Allow trial-to-trial variability • Functional data analysis setting

  6. MEG Noise

  7. MEG Noise

  8. MEG Noise

  9. MEG Noise

  10. Build Word-Specific Model Stimulus: w = HOUSE y t ∼ N ( µ ( w ) ( x t ) , Σ ( w ) ( x t )) Hierarchy captures trial-to-trial variability

  11. Build Word-Specific Model Stimulus: w = HOUSE y t ∼ N ( µ ( w ) ( x t ) , Σ ( w ) ( x t )) Capturing heteroscedasticity is key x 3 = µ ( x ) Time 3 Sensor 1 x 2 = Time 2 Σ ( x 3 ) x 1 = Time 1 Σ ( x 2 ) Σ ( x 1 ) Sensor 2

  12. Build Word-Specific Model Stimulus: w = HOUSE y t ∼ N ( µ ( w ) ( x t ) , Σ ( w ) ( x t )) Harness k-dim latent space R 102 R k

  13. Low-Rank Covariance Evolution Matrix ¡of ¡“dic:onary ¡elements” ¡ • E.g., ¡Gaussian ¡processes ¡ § ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡elements ¡ § p × k λ 11 ( · ) λ 12 ( · ) λ 21 ( · ) λ 22 ( · ) X k << p λ p 1 ( · ) λ p 2 ( · ) p × k Σ ( x ) = Λ ( x ) Λ ( x ) � + Σ 0 Fox and Dunson , “Bayesian Nonparametric Covariance Regression”, under review.

  14. Low-Rank Covariance Evolution λ 11 ( · ) λ 12 ( · ) λ 21 ( · ) λ 22 ( · ) X + X λ p 1 ( · ) λ p 2 ( · ) Σ ( x ) = Λ ( x ) Λ ( x ) � + Σ 0 Fox and Dunson , “Bayesian Nonparametric Covariance Regression”, under review.

  15. One Step Further…   θ 11 θ 12 θ 13 λ 11 ( · ) λ 12 ( · ) ξ 11 ( · ) ξ 12 ( · ) θ 21 θ 22 θ 23   λ 21 ( · ) λ 22 ( · )   ξ 21 ( · ) ξ 22 ( · )     X  . . .  ξ 31 ( · ) ξ 32 ( · ) . . .   . . .       ξ ( · )   . . . . . .   X . . .   λ p 1 ( · ) λ p 2 ( · )   θ p 1 θ p 2 θ p 3 Θ Λ ( · ) Σ ( x ) = Θ ξ ( x ) ξ ( x ) � Θ � + Σ 0 Fox and Dunson , “Bayesian Nonparametric Covariance Regression”, under review.

  16. Changing Correlations – MEG 102 sensors: Correlations between sensors change with processing of word “kick”

  17. Mean Hierarchy µ ( w, 1) ( x ) Trial 1 µ ( w ) ( x ) µ ( w,J ) ( x ) Trial J (Note: defined in a k-dim space and projected up) Fyshe, Fox, Dunson, and Mitchell , “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

  18. Data Collection • 4 word categories, 5 words per category Food Animals Tools Buildings • 20 repetitions per word (400 total) § 15 train/word (300 total) § 5 test/word (100 total) Fyshe, Fox, Dunson, and Mitchell , “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

  19. Classification Performance Fyshe, Fox, Dunson, and Mitchell , “Hierarchical Latent Dictionary Learning for Word Classification using Brain Activation Patterns”, AISTATS 2012.

  20. MEG Data – 1 Sensor 0.5 3 trials, Observations 0 1 sensor − 0.5 − 1 0 50 100 150 200 250 300 Time Yes: ¡ What ¡we ¡missed: ¡ • Long-­‑range ¡ correla:ons ¡ • Abrupt ¡changes ¡ • Non-­‑Markovian ¡ dynamics ¡ • Locally ¡ sta4onary ¡ dynamics ¡ Long-­‑range ¡correla:ons ¡ span ¡ changepoints ¡

  21. MEG Data – 1 Sensor 0.5 3 trials, Observations 0 1 sensor − 0.5 Sample Correlation Matrix − 1 (20 trials) 0 50 100 150 200 250 300 Time Key ¡features: ¡ 50 • Long-­‑range ¡ correla:ons ¡ 100 Time 150 • Abrupt ¡changes ¡ 200 250 • Locally ¡ smooth ¡ 300 50 100 150 200 250 300 Time

  22. GPs on Nested Partition Parent ¡func+on: ¡ x 1 x 2 x 3 x n . . . • Smooth ¡global ¡trajectory ¡ x • Long-­‑range ¡correla:ons ¡ f 0 ( x ) ∼ N (0 , K 0 ) • Non-­‑Markovian ¡dynamics ¡ • Sta4onary ¡ 20 40 60 80 100 120 140 160 Fox and Dunson , “Multiresolution Gaussian Proccesses”, 180 to appear NIPS 2012. 200 20 40 60 80 100 120 140 160 180 200

  23. GPs on Nested Partition A 1 A 1 1 2 changepoint = break in stationarity Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  24. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  25. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 f 1 ( A 1 2 ) ∼ GP( f 0 ( A 1 2 ) , c 1 2 ) Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  26. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 f 1 ( A 1 2 ) ∼ GP( f 0 ( A 1 2 ) , c 1 2 ) f 1 ( x ) | f 0 ∼ N (0 , K 1 ) 20 40 60 80 100 120 140 160 Fox and Dunson , “Multiresolution Gaussian Proccesses”, 180 to appear NIPS 2012. 200 20 40 60 80 100 120 140 160 180 200

  27. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 f 1 ( A 1 2 ) ∼ GP( f 0 ( A 1 2 ) , c 1 2 ) A 2 A 2 A 2 A 2 1 2 3 4 Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  28. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 f 1 ( A 1 2 ) ∼ GP( f 0 ( A 1 2 ) , c 1 2 ) A 2 A 2 A 2 A 2 1 2 3 4 . . . . . . Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  29. GPs on Nested Partition f 1 ( A 1 1 ) ∼ GP( f 0 ( A 1 1 ) , c 1 A 1 A 1 1 ) 1 2 f 1 ( A 1 2 ) ∼ GP( f 0 ( A 1 2 ) , c 1 2 ) A 2 A 2 A 2 A 2 1 2 3 4 20 40 f ` ( x ) | f ` − 1 ∼ N (0 , K ` ) 60 80 100 120 140 160 Fox and Dunson , “Multiresolution Gaussian Proccesses”, 180 to appear NIPS 2012. 200 20 40 60 80 100 120 140 160 180 200

  30. GPs on Nested Partition A 1 A 1 1 2 . . . A 2 A 2 A 2 A 2 1 2 3 4 g = f L Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  31. GPs on Nested Partition A 1 A 1 1 2 . . . A 2 A 2 A 2 A 2 1 2 3 4 g = f L Fox and Dunson , “Multiresolution Gaussian Proccesses”, to appear NIPS 2012.

  32. Induced Marginal GP Conditioned on partition, A 1 A 1 marginalize GPs 1 2 A 2 A 2 A 2 A 2 1 2 3 4

  33. Induced Marginal GP Equivalent to GP with A 1 A 1 1 2 partition-dependent A 2 A 2 A 2 A 2 (non-stationary) 1 2 3 4 covariance function 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

  34. corr( y i , y j | A ) Correlation Structure c A 0 ( x i , x j ) + A 1 A 1 1 2 1 ( x i , x j ) c A 1 + A 2 A 2 A 2 A 2 1 2 3 4 0 locations x j x i y j observations y i

  35. corr( y i , y j | A ) Correlation Structure c A 0 ( x i , x j ) • Correlation spans changepoints + A 1 A 1 1 2 1 ( x i , x j ) • Higher corr for sharing c A 1 more partition sets + A 2 A 2 A 2 A 2 1 2 3 4 0 Lowest tree level in same partition set P L ij ` =0 c ` i ( x i , x j ) r ` corr( y i , y j | A ) = r ( σ 2 + P L − 1 i ( x i , x i ))( σ 2 + P L − 1 ` =0 c ` ` =0 c ` j ( x j , x j )) r ` r `

  36. Covariance Function – Length scale A 1 A 1 1 2 Length-­‑scale ¡hyperparam: ¡ • Fractal-­‑like ¡smoothness ¡ • Locally ¡as ¡smooth ¡as ¡parent ¡fcn ¡ A 2 A 2 A 2 A 2 1 2 3 4 • Lower ¡levels ¡capture ¡more ¡detail ¡ • Only ¡one ¡param ¡

  37. Covariance Function – Variance A 1 A 1 1 2 Variance ¡hyperparam: ¡ A 2 A 2 A 2 A 2 • Decreasing ¡variability ¡from ¡parent ¡ 1 2 3 4 • Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡ • Lower ¡levels ¡are ¡less ¡influen:al ¡

  38. Covariance Function – Variance A 1 A 1 1 2 Resulting function is similar to higher level Variance ¡hyperparam: ¡ function despite adding A 2 A 2 A 2 A 2 • Decreasing ¡variability ¡from ¡parent ¡ 1 2 3 4 changepoints • Finite ¡var ¡regardless ¡of ¡tree ¡depth ¡ • Lower ¡levels ¡are ¡less ¡influen:al ¡ . . .

  39. Balanced Binary Trees A 0 A 1 1 A 1 A 1 = 1 2 A 2 A 2 A 2 A 2 1 2 3 4 A 2 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend