doubly stochastic variational inference for neural
play

Doubly Stochastic Variational Inference for Neural Processes with - PowerPoint PPT Presentation

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables Q. Wang & Herke van Hoof Amsterdam Machine Learning Lab ICML 2020 1 / 69 Highlights in this Work 2 / 69 Highlights in this Work A


  1. Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables Q. Wang & Herke van Hoof Amsterdam Machine Learning Lab ICML 2020 1 / 69

  2. Highlights in this Work 2 / 69

  3. Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs 3 / 69

  4. Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs A novel exchangeable SP within a Hierarchical Bayesian Framework ◮ formalization of a hierarchical SP ◮ plausible approximate inference method 4 / 69

  5. Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs A novel exchangeable SP within a Hierarchical Bayesian Framework ◮ formalization of a hierarchical SP ◮ plausible approximate inference method Competitive performance on extensive Uncertainty-aware Applications ◮ high dimensional regressions on simulators/real-world dataset ◮ classification and o.o.d. detection on image dataset 5 / 69

  6. Outline of this Talk Motivation for SP s 1 Study of SP s with LVMs 2 NP with Hierarchical Latent Variables 3 Experiments and Applications 4 6 / 69

  7. Motivation for SP s 7 / 69

  8. Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) 8 / 69

  9. Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) 9 / 69

  10. Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; 10 / 69

  11. Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; Quantify uncertainty in risk-sensitive applications : e.g. forecast p ( s t +1 | s t , a t ) in autonomous driving [2] ; 11 / 69

  12. Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; Quantify uncertainty in risk-sensitive applications : e.g. forecast p ( s t +1 | s t , a t ) in autonomous driving [2] ; Model distributions instead of point estimates : working as a generative model for more realizations [3]. 12 / 69

  13. Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : 13 / 69

  14. Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : Marginalization Consistency. For any finite collection of random variables { y 1 , y 2 , . . . , y N + M } , the probability after marginalization over subset is unchanged. � ρ x 1: N + M ( y 1: N + M ) dy N +1: N + M = ρ x 1: N ( y 1: N ) (1.1) Exchangeability Consistency. Any random permutation over set of variables does not influence joint probability. ρ x 1: N ( y 1: N ) = ρ x π (1: N ) ( y π (1: N ) ) (1.2) 14 / 69

  15. Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : Marginalization Consistency. For any finite collection of random variables { y 1 , y 2 , . . . , y N + M } , the probability after marginalization over subset is unchanged. � ρ x 1: N + M ( y 1: N + M ) dy N +1: N + M = ρ x 1: N ( y 1: N ) (1.1) Exchangeability Consistency. Any random permutation over set of variables does not influence joint probability. ρ x 1: N ( y 1: N ) = ρ x π (1: N ) ( y π (1: N ) ) (1.2) With these two conditions, an exchangeable SP can be induced. (Refer to Kolmogorov Extension Theorem ) 15 / 69

  16. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: Flexibility in distributions: Neural Processes (NPs) Extension to high dimensions: 16 / 69

  17. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) Extension to high dimensions: 17 / 69

  18. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: 18 / 69

  19. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 19 / 69

  20. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 20 / 69

  21. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 21 / 69

  22. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property → more scalable with computational complexity O ( N ) Extension to high dimensions: → Correlations among or across Input/Output 22 / 69

  23. SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property → more scalable with computational complexity O ( N ) → more flexible with no explicit Extension to high dimensions: → Correlations among or across distributions Input/Output 23 / 69

  24. Study of SP s with LVMs 24 / 69

  25. Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 25 / 69

  26. Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) ���� � �� � ���� index depend. l.v. deter. term stoch. term Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 26 / 69

  27. Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) ���� � �� � ���� index depend. l.v. deter. term stoch. term y i = ϕ ( x i , z i ) + (2.2) ζ i ���� � �� � ���� obs . trans. obs. noise Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 27 / 69

  28. Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) ���� � �� � ���� index depend. l.v. deter. term stoch. term y i = ϕ ( x i , z i ) + (2.2) ζ i ���� � �� � ���� obs . trans. obs. noise Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is p ( z C , z T ) p θ ( z T | x C , y C , x T ) = , (2.3) � p ( z C , z T ) dz C mostly intractable. 28 / 69

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend