deep gaussian processes with importance weighted
play

Deep Gaussian Processes with Importance-Weighted Variational - PowerPoint PPT Presentation

Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth Problem setting Problem setting Bimodal density Problem setting Changes with input Problem setting


  1. Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth

  2. Problem setting

  3. Problem setting Bimodal density

  4. Problem setting Changes with input

  5. Problem setting Skewness

  6. Problem setting Skewness • Bus arrival times

  7. Problem setting Skewness • Bus arrival times • Confounding variables

  8. A possible approach f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  9. A possible approach test samples training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  10. A possible approach test samples Neural network training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  11. A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  12. A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Concatenation with inputs

  13. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  14. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  15. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  16. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  17. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  18. A possible approach Overfitting f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  19. A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  20. A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation Small number of examples per input x n

  21. Another possible approach ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

  22. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

  23. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k )

  24. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k ) Underfitting

  25. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  26. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  27. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  28. Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  29. Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) Better data fit g ∼ GP ( µ 2 , k 2 )

  30. Contributions

  31. Contributions • New architecture - latent variables by concatenation, not addition

  32. Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results

  33. Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results • Provide an extensive empirical comparison with all 41 UCI regression datasets

  34. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  35. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  36. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 )

  37. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 ) Our approach exploits analytic results, leading to a tighter bound

  38. Results

  39. Results • Latent variables in the DGP are highly beneficial

  40. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both .

  41. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both . • Importance-weighted VI outperforms VI

  42. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both . • Importance-weighted VI outperforms VI

  43. Thanks for listening Poster #218 • New architecture • Importance-weighted • 41 datasets

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend