doubly stochastic inference for deep gaussian processes
play

Doubly Stochastic Inference for Deep Gaussian Processes Hugh - PowerPoint PPT Presentation

Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017 Motivation DGPs promise much, but are difficult to train 2 Doubly Stochastic Inference for DGPs Hugh


  1. Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing Imperial College London 29/5/2017

  2. Motivation § DGPs promise much, but are difficult to train 2 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  3. Motivation § DGPs promise much, but are difficult to train § Fully factorized VI doesn’t work well 2 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  4. Motivation § DGPs promise much, but are difficult to train § Fully factorized VI doesn’t work well § We seek a variational approach that works and scales 2 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  5. Motivation § DGPs promise much, but are difficult to train § Fully factorized VI doesn’t work well § We seek a variational approach that works and scales 2 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  6. Motivation § DGPs promise much, but are difficult to train § Fully factorized VI doesn’t work well § We seek a variational approach that works and scales Other recently proposed schemes [1, 2, 5] make additional approximations and require more machinery than VI 2 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  7. Talk outline 1. Summary : Model Inference Results 2. Details: Model Inference Results 3. Questions 3 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  8. Model We use the standard DGP model, with one addition: 4 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  9. Model We use the standard DGP model, with one addition: § We include a linear (identity) mean function for all the internal layers 4 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  10. Model We use the standard DGP model, with one addition: § We include a linear (identity) mean function for all the internal layers (1D example in [4]) 4 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  11. Inference § We use the model conditioned on the inducing points as a conditional variational posterior 5 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  12. Inference § We use the model conditioned on the inducing points as a conditional variational posterior § We impose Gaussians on the inducing points, (independent between layers but full rank within layers) 5 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  13. Inference § We use the model conditioned on the inducing points as a conditional variational posterior § We impose Gaussians on the inducing points, (independent between layers but full rank within layers) § We use sampling to deal with the intractable expectation 5 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  14. Inference § We use the model conditioned on the inducing points as a conditional variational posterior § We impose Gaussians on the inducing points, (independent between layers but full rank within layers) § We use sampling to deal with the intractable expectation 5 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  15. Inference § We use the model conditioned on the inducing points as a conditional variational posterior § We impose Gaussians on the inducing points, (independent between layers but full rank within layers) § We use sampling to deal with the intractable expectation We never compute N ˆ N matrices (we make no additional simplifications to variational posterior) 5 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  16. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  17. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data § Big jump in improvement over single layer GP with 5 ˆ number of inducing points 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  18. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data § Big jump in improvement over single layer GP with 5 ˆ number of inducing points § On small data we never do worse than the single layer model, and often better 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  19. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data § Big jump in improvement over single layer GP with 5 ˆ number of inducing points § On small data we never do worse than the single layer model, and often better § We can get 98.1% on mnist with only 100 inducing points 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  20. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data § Big jump in improvement over single layer GP with 5 ˆ number of inducing points § On small data we never do worse than the single layer model, and often better § We can get 98.1% on mnist with only 100 inducing points § We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  21. Results § We show significant improvement over single layer models on large ( „ 10 6 ) and massive ( „ 10 9 ) data § Big jump in improvement over single layer GP with 5 ˆ number of inducing points § On small data we never do worse than the single layer model, and often better § We can get 98.1% on mnist with only 100 inducing points § We surpass all permutation invariant methods on rectangles-images (designed to test deep vs shallow architectures) § Identical model/inference hyperparameters for all our models 6 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  22. Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: § If dimensions agree use the identity, otherwise PCA 7 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  23. Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: § If dimensions agree use the identity, otherwise PCA § Sensible alternative: initialize latents to identity (but linear mean function works better) 7 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  24. Details: The Model We use the standard DGP model, with a linear mean function for all the internal layers: § If dimensions agree use the identity, otherwise PCA § Sensible alternative: initialize latents to identity (but linear mean function works better) § Not so sensible alternative: random. Doesn’t work well (posterior is (very) multimodal) 7 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  25. The DGP: Graphical Model X Z 0 f 1 u 1 ✏ h 1 Z 1 f 2 u 2 ✏ h 2 Z 2 f 3 u 3 y 8 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  26. The DGP: Density likelihood hkkkkkikkkkkj N p p y , t h l , f l , u l u L π p p y i | f L l “ 1 q “ i q ˆ i “ 1 L π p p h l | f l q p p f l | u l ; h l ´ 1 , Z l ´ 1 q p p u l ; Z l ´ 1 q l “ 1 loooooooooooooooooooooooooomoooooooooooooooooooooooooon DGP prior 9 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  27. Factorised Variational Posterior X Z 0 X N ( u 1 | m 1 , S 1 ) f 1 u 1 f 1 u 1 ✏ 2 ) h 1 Z 1 i N ( h 1 i | µ 1 i , σ 1 h 1 Q i N ( u 2 | m 2 , S 2 ) f 2 u 2 f 2 u 2 ✏ 2 ) h 2 Z 2 Q i N ( h 2 i | µ 2 i , σ 2 h 2 i N ( u 3 | m 3 , S 3 ) f 3 u 3 f 3 u 3 y y 10 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  28. Our Variational Posterior X Z 0 X N ( u 1 | m 1 , S 1 ) f 1 u 1 f 1 u 1 ✏ ✏ h 1 Z 1 h 1 N ( u 2 | m 2 , S 2 ) f 2 u 2 f 2 u 2 ✏ ✏ h 2 Z 2 h 2 N ( u 3 | m 3 , S 3 ) f 3 u 3 f 3 u 3 y f 3 11 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  29. Recap: ‘GPs for Big Data’ [3] q p f , u q “ p p f | u ; X , Z q N p u | m , S q 12 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  30. Recap: ‘GPs for Big Data’ [3] q p f , u q “ p p f | u ; X , Z q N p u | m , S q Marginalise u from the variational posterior: ª p p f | u ; X , Z q N p u | m , S q d u “ N p f | µ , Σ q “ : q p f | m , S ; X , Z q (1) 12 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

  31. Recap: ‘GPs for Big Data’ [3] q p f , u q “ p p f | u ; X , Z q N p u | m , S q Marginalise u from the variational posterior: ª p p f | u ; X , Z q N p u | m , S q d u “ N p f | µ , Σ q “ : q p f | m , S ; X , Z q (1) Define the following mean and covariance functions: µ m , Z p x i q “ m p x i q ` α p x i q T p m ´ m p Z qq , Σ S , Z p x i , x j q “ k p x i , x j q ´ α p x i q T p k p Z , Z q ´ S q α p x j q . where α p x i q “ k p x i , Z q k p Z , Z q ´ 1 12 Doubly Stochastic Inference for DGPs Hugh Salimbeni @Amazon Berlin, 29/5/2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend