Doubly Stochastic Variational Inference for Neural Processes with - PowerPoint PPT Presentation

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables Q. Wang & Herke van Hoof Amsterdam Machine Learning Lab ICML 2020 1 / 69

Highlights in this Work 2 / 69

Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs 3 / 69

Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs A novel exchangeable SP within a Hierarchical Bayesian Framework ◮ formalization of a hierarchical SP ◮ plausible approximate inference method 4 / 69

Highlights in this Work A systematical revisit to SP s with an Implicit Latent Variable Model ◮ conceptualization of latent SP models ◮ comprehension about SP s with LVMs A novel exchangeable SP within a Hierarchical Bayesian Framework ◮ formalization of a hierarchical SP ◮ plausible approximate inference method Competitive performance on extensive Uncertainty-aware Applications ◮ high dimensional regressions on simulators/real-world dataset ◮ classification and o.o.d. detection on image dataset 5 / 69

Outline of this Talk Motivation for SP s 1 Study of SP s with LVMs 2 NP with Hierarchical Latent Variables 3 Experiments and Applications 4 6 / 69

Motivation for SP s 7 / 69

Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) 8 / 69

Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) 9 / 69

Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; 10 / 69

Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; Quantify uncertainty in risk-sensitive applications : e.g. forecast p ( s t +1 | s t , a t ) in autonomous driving [2] ; 11 / 69

Why Do We Need Stochastic Processes ? The stochastic process ( SP ) is a math tool to describe the distribution over functions. (Fig. refers to [1]) Flexible to handle correlations among samples : significant for non- i.i.d. dataset ; Quantify uncertainty in risk-sensitive applications : e.g. forecast p ( s t +1 | s t , a t ) in autonomous driving [2] ; Model distributions instead of point estimates : working as a generative model for more realizations [3]. 12 / 69

Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : 13 / 69

Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : Marginalization Consistency. For any finite collection of random variables { y 1 , y 2 , . . . , y N + M } , the probability after marginalization over subset is unchanged. � ρ x 1: N + M ( y 1: N + M ) dy N +1: N + M = ρ x 1: N ( y 1: N ) (1.1) Exchangeability Consistency. Any random permutation over set of variables does not influence joint probability. ρ x 1: N ( y 1: N ) = ρ x π (1: N ) ( y π (1: N ) ) (1.2) 14 / 69

Two Consistencies in Exchangeable SP s Some required properties for exchangeable stochastic process ρ [4] : Marginalization Consistency. For any finite collection of random variables { y 1 , y 2 , . . . , y N + M } , the probability after marginalization over subset is unchanged. � ρ x 1: N + M ( y 1: N + M ) dy N +1: N + M = ρ x 1: N ( y 1: N ) (1.1) Exchangeability Consistency. Any random permutation over set of variables does not influence joint probability. ρ x 1: N ( y 1: N ) = ρ x π (1: N ) ( y π (1: N ) ) (1.2) With these two conditions, an exchangeable SP can be induced. (Refer to Kolmogorov Extension Theorem ) 15 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: Flexibility in distributions: Neural Processes (NPs) Extension to high dimensions: 16 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) Extension to high dimensions: 17 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: 18 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → Optimization/Computational bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 19 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 20 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property Extension to high dimensions: → Correlations among or across Input/Output 21 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property → more scalable with computational complexity O ( N ) Extension to high dimensions: → Correlations among or across Input/Output 22 / 69

SP s in Progress and Primary Concerns Crucial properties for SP s : Analysis on GPs/NPs : Gaussian Processes (GPs) Scalability in large-scale dataset: → less scalable with computational → Optimization/Computational complexity O ( N 3 ) bottleneck → less flexible with Gaussian distributions Flexibility in distributions: Neural Processes (NPs) → Non-Gaussian or Multi-modal property → more scalable with computational complexity O ( N ) → more flexible with no explicit Extension to high dimensions: → Correlations among or across distributions Input/Output 23 / 69

Study of SP s with LVMs 24 / 69

Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 25 / 69

Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) �� index depend. l.v. deter. term stoch. term Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 26 / 69

Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) �� index depend. l.v. deter. term stoch. term y i = ϕ ( x i , z i ) + (2.2) ζ i �� obs . trans. obs. noise Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is (2.3) mostly intractable. 27 / 69

Deep Latent Variable Model as SP s Here we present an implicit Latent Variable Model for SP s : Generation paradigm with (potentially correlated) latent variables : z i = φ ( x i ) + ǫ ( x i ) (2.1) �� index depend. l.v. deter. term stoch. term y i = ϕ ( x i , z i ) + (2.2) ζ i �� obs . trans. obs. noise Predictive distribution in SP s : Let the context and target input be C = { ( x i , y i ) | i = 1 , 2 , . . . , N } and x T , the computation is p ( z C , z T ) p θ ( z T | x C , y C , x T ) = , (2.3) � p ( z C , z T ) dz C mostly intractable. 28 / 69

Doubly Stochastic Variational Inference for Neural Processes with - PowerPoint PPT Presentation

Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables Q. Wang & Herke van Hoof Amsterdam Machine Learning Lab ICML 2020 1 / 69 Highlights in this Work 2 / 69 Highlights in this Work A

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Doubly Stochastic Inference for Deep Gaussian Processes Hugh Salimbeni Department of Computing

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

American-style options, stochastic volatility, and degenerate parabolic variational inequalities

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

Neural Variational Inference and Learning Andriy Mnih, Karol Gregor 22 June 2014 1 / 14

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

Variational Inference for Bayes vMF Mixture Hanxiao Liu September 23, 2014 1 / 14 Variational

Searching for Doubly Self Searching for Doubly Self- Orthogonal Latin Squares Orthogonal Latin

doubly linked lists Sept. 20/21, 2017 1 Singly linked list head tail 2 Doubly linked list

Doubly-Linked Lists 4-02-2013 Doubly-linked list Implementation of List ListIterator

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Internet DBMS versus Traditional DBMS Local distributed database system Much more users,

Be sure to view each slide to the end of the presentation for information concerning the

Copy Number Variations and Association Mapping 02-715 Advanced Topics in

Parallel Community Detection for Massive Graphs E. Jason Riedy, Henning Meyerhenke, David Ediger,

CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching

texdoc 2.0 An update on creating LaTeX documents from within Stata Ben Jann University of Bern,

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs Stylianos I. Venieris

Improved constant factor for the unit distance problem Pter goston* and Dmtr Plvlgyi

Sambuz

Useful Links

Newsletter

Mail Us