Edward: Deep Probabilistic Programming Extended Seminar Systems and - PowerPoint PPT Presentation

Edward: Deep Probabilistic Programming Extended Seminar – Systems and Machine Learning Steven Lang 13.02.2020 1

Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion 2

Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Introduction 3

Motivation ◮ Nature of deep neural networks is compositional ◮ Connect layers in creative ways ◮ No worries about – testing (forward propagation) – inference (gradient based opt., with backprop. and auto-diff.) ◮ Leads to easy development of new successful architectures Introduction 4

Motivation LeNet-5 (Lecun et al. 1998) ResNet-50 (He et al. 2015) VGG16 (Simonyan and Zisserman 2014) Inception-v4 (Szegedy et al. 2014) Introduction 5

Motivation Goal : Achieve the composability of deep learning for 1. Probabilistic models 2. Probabilistic inference Introduction 6

Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Refresher on Probabilistic Modeling 7

What is a Random Variable (RV)? ◮ Random number determined by chance, e.g. outcome of a single dice roll ◮ Drawn according to a probability distribution ◮ Typical random variables in statistical machine learning: – input data – output data – noise Refresher on Probabilistic Modeling 8

2 = 1 0.4 = 0, 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 0.1 0.0 2 0 2 4 6 8 10 X What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value Refresher on Probabilistic Modeling 9

What is a Probability Distribution? ◮ Discrete : Describes probability, that RV will be equal to a certain value ◮ Continuous : Describes probability density , that RV will be equal to a certain value 2 = 1 0.4 = 0, Example : Normal distribution 2 = 2 = 2, 0.3 2 = 3 = 4, p ( X ) 0.2 � � 2 � 1 − 1 � x − µ N ( µ, σ ) = √ 2 πσ 2 exp 0.1 2 σ 0.0 2 0 2 4 6 8 10 X Refresher on Probabilistic Modeling 9

Common Probability Distributions Discrete ◮ Bernoulli ◮ Binomial ◮ Hypergeometric ◮ Poisson ◮ Boltzmann Refresher on Probabilistic Modeling 10

Common Probability Distributions Discrete Continuous ◮ Bernoulli ◮ Uniform ◮ Binomial ◮ Beta ◮ Hypergeometric ◮ Normal ◮ Poisson ◮ Laplace ◮ Boltzmann ◮ Student-t Refresher on Probabilistic Modeling 10

What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of Refresher on Probabilistic Modeling 11

What is Inference? ◮ Answer the query P ( Q | E ) – Q : Query, set of RVs we are interested in – E : Evidence, set of RVs that we know the state of ◮ Example: What is the prob. that – it has rained ( Q ) – when we know that the gras is wet ( E ) P ( Has Rained = true | Gras = wet ) Refresher on Probabilistic Modeling 11

Probabilistic Models Bayesian Networks Variational Autoencoder Deep Belief Networks Markov Networks Refresher on Probabilistic Modeling 12

Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Deep Probabilistic Programming 13

Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Deep Probabilistic Programming 14

Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Deep Probabilistic Programming 14

Key Ideas Probabilistic programming lets users ◮ specify probabilistic models as programs ◮ compile those models down into inference procedures Two compositional representations as first class citizens ◮ Random variables ◮ Inference Goal Make probabilistic programming as flexible and efficient as deep learning! Deep Probabilistic Programming 14

Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: Deep Probabilistic Programming 15

Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size Deep Probabilistic Programming 15

Typicall PPL Tradeoffs Probabilistic programming languages typically have the following trade-off: ◮ Expressiveness – allow rich class beyond graphical models – scales poorly w.r.t. data and model size ◮ Efficiency – PPL is restricted to a specific class of models – inference algorithms are optimized for this specific class Deep Probabilistic Programming 15

Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Deep Probabilistic Programming 16

Edward Edward (Tran et al. 2017) builds on two compositional representations ◮ Random variables ◮ Inference Edward allows to fit the same model using a variety of composable inference methods ◮ Point estimation ◮ Variational inference ◮ Markov Chain Monte Carlo Deep Probabilistic Programming 16

Edward Key concept : no distinct model or inference block ◮ Model : Composition/collection of random variables ◮ Inference : Way of modifying parameters in that collection subject to another Deep Probabilistic Programming 17

Edward Uses computational benefits from TensorFlow like ◮ distributed training ◮ parallelism ◮ vectorization ◮ GPU support “for free” Deep Probabilistic Programming 18

Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in Edward Experiments Alternatives Conclusion Compositional Representations in Edward 19

Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes Compositional Representations in Edward 20

Criteria for Probabilistic Models Edward poses the following criteria on compositional representations for probabilistic models : 1. Integration with computational graphs – nodes represent operations on data – edges represent data communicated between nodes 2. Invariance of the representation under the graph – graph can be reused during inference Compositional Representations in Edward 20

Graph Example Computational Graph Evaluation y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

Graph Example Computational Graph Evaluation 1. x + y y x Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z Constants x 2 pow Operations Compositional Representations in Edward 21

Graph Example Computational Graph Evaluation 1. x + y y x 2. ( x + y ) · y · z Variables + z 3. 2 ( x + y ) · y · z Constants x 2 pow Operations Compositional Representations in Edward 21

Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Compositional Representations in Edward 22

Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Compositional Representations in Edward 22

Example: Beta-Bernoulli Programm Beta-Bernoulli Model 50 � p ( x , θ ) = Beta ( θ | 1 , 1) Bernoulli ( x n | θ ) n =1 Computation Graph ones(50) θ ∗ x ∗ x θ Edward code theta = Beta(a=1.0 , b=1.0) # Sample from Beta dist. x = Bernoulli(p=tf.ones (50) * theta) # Sample from Bernoulli dist. Compositional Representations in Edward 22

Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference Compositional Representations in Edward 23

Criteria for Probabilistic Inference Edward poses the following criteria on compositional representations for probabilistic inference : 1. Support for many classes of inference 2. Invariance of inference under the computational graph – posterior can be further composed as part of another model Compositional Representations in Edward 23

Inference in Edward Goal : calculate posterior p ( z , β | x train ; θ ) , given ◮ data x train ◮ model parameters θ ◮ local variables z ◮ global variables β Compositional Representations in Edward 24

Edward: Deep Probabilistic Programming Extended Seminar Systems and - PowerPoint PPT Presentation

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven Lang 13.02.2020 1 Outline Introduction Refresher on Probabilistic Modeling Deep Probabilistic Programming Compositional Representations in

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Interact with me: #ComeONLINE Edward Esene @EdwardEsene2 Edward Esene @EdwardEsene Edward

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

A Brief Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago Universidade

Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago BISS 2014, Bertinoro

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

A three-level M-quantile model for poverty mapping in Poland Maciej Bersewicz 1 , Stefano

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

Single dimensional optimization Importance sampling Biostatistics 615/815 Lecture 16: . .

Coherent upper conditional previsions defined by Hausdorff outer measures for unbounded random

Outlier Detection Methods Paul van Leeuwen 5 December 2019 Introduction How Does LOF Work? An

Estimation of Demographic Parameters for New Zealand Sea Lions Breeding on the Auckland Islands

Making and Evaluating Point Forecasts Tilmann Gneiting Universit at Heidelberg Eltville, June

Adaptive Designs Mark van der Laan Division of Biostatistics, UC Berkeley September 28 , 2018