Probability Functional Descent: A Unifying Perspective on GANs, VI, - PowerPoint PPT Presentation

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn

Deep generative models

Deep generative models Variational inference

Deep generative models Variational inference Deep reinforcement learning

Probability functional J : P ( X ) → ℝ

Probability functional J : P ( X ) → ℝ “gradient” ∇ J

Probability functional J : P ( X ) → ℝ “gradient” ∇ J von Mises influence function Ψ : X → ℝ

Gradient descent on f : ℝ n → ℝ Initialize x ∈ ℝ n arbitrarily 0. Compute the gradient g = ∇ f ( x ) 1. 2. Choose x ′ such that x ′ · g < x · g (usually, we set x ′ = x − αg )

Gradient descent on f : ℝ n → ℝ Initialize x ∈ ℝ n arbitrarily 0. Compute the gradient g = ∇ f ( x ) 1. 2. Choose x ′ such that x ′ · g < x · g (usually, we set x ′ = x − αg ) Probability functional descent on J : P ( X ) → ℝ 0. Initialize a distribution μ ∈ P ( X ) arbitrarily 1. Compute the influence function Ψ of J at μ 2. Choose μ ′ such that 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )]

Generative modeling Probability functional descent 1. Compute the influence J G ( μ ) = D( μ || ν 0 ) function Ψ of J at μ where D is e.g. Jensen–Shannon, Wasserstein 2. Choose μ ′ such that 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] Optimize the discriminator , which 1. approximates the influence function of J G Update the generator μ 2. PFD recovers: Minimax GAN ● Non-saturating GAN ● Wasserstein GAN ●

Variational inference Probability functional descent 1. Compute the influence J VI ( q ) = KL( q ( θ ) || p ( θ | x )) function Ψ of J at μ 2. Choose μ ′ such that Compute the ELBO , log( q ( θ )/ p ( x , θ )) , the 1. 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] influence function for J VI Update the approximate posterior q 2. PFD recovers: Black-box variational inference ● Adversarial variational Bayes ● Approximate posterior distillation ●

Reinforcement learning Probability functional descent 1. Compute the influence J RL (π) = 𝔽 π [∑ t γ t R t ] function Ψ of J at μ 2. Choose μ ′ such that Approximate the advantage Q π ( s , a ) 1. 𝔽 x ~ μ ′ [Ψ( x )] < 𝔽 x ~ μ [Ψ( x )] − V π ( s ) , the influence function for J RL Update the policy π 2. PFD recovers: Policy gradient ● Actor-critic ● Dual actor critic ●

Probability functional descent is a unifying perspective that enables the easy development of new algorithms.

Probability functional descent is a unifying perspective that enables the easy development of new algorithms. https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/ https://arxiv.org/abs/1710.10196 https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/ https://stats.stackexchange.com/questions/246117/applying-stochastic-variational-inference-to-bayesian-mixture-of-gaussian http://people.csail.mit.edu/hongzi/content/publications/DeepRM-HotNets16.pdf https://towardsdatascience.com/atari-reinforcement-learning-in-depth-part-1-ddqn-ceaa762a546f

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn

Probability Functional Descent: A Unifying Perspective on GANs, VI, - PowerPoint PPT Presentation

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn Deep generative models Deep generative models Variational inference Deep generative models

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Counting and Probability Whats to come? Counting and Probability Whats to come?

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis & Patrick Maiden Outline

Probability Distribution 1. Probability Distribution: ( ,..., ) is a tuple p p p 1 n

Administrative issues Master level course l Obligatory course in the Masters Degree Programme

MA/CSSE 474 Theory of Computation DFSM to RE, Part 2 Closures Pumping Theorem Intro Your

Lexical Mapping Theory and the anatomy of a (verbal) lexical entry Jamie Y. Findlay

r t r r

Iteration of polynomials, functional equations, and fractal zeta functions Peter Grabner joint

Computer Algebra for Lattice Path Combinatorics Alin Bostan AofA CIRM, Luminy, France June 27,

Estimating proportions of elements in finite symmetric and classical groups Alice Niemeyer UWA,

Probability Functional Descent: A Unifying Perspective on GANs, VI, - PowerPoint PPT Presentation

Probability Functional Descent: A Unifying Perspective on GANs, VI, and RL Casey Chu < caseychu@stanford.edu > Jose Blanchet Peter Glynn Deep generative models Deep generative models Variational inference Deep generative models

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Counting and Probability Whats to come? Counting and Probability Whats to come?

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis &amp; Patrick Maiden Outline

Probability Distribution 1. Probability Distribution: ( ,..., ) is a tuple p p p 1 n

Administrative issues Master level course l Obligatory course in the Masters Degree Programme

MA/CSSE 474 Theory of Computation DFSM to RE, Part 2 Closures Pumping Theorem Intro Your

Lexical Mapping Theory and the anatomy of a (verbal) lexical entry Jamie Y. Findlay

r t r r

Iteration of polynomials, functional equations, and fractal zeta functions Peter Grabner joint

Computer Algebra for Lattice Path Combinatorics Alin Bostan AofA CIRM, Luminy, France June 27,

Estimating proportions of elements in finite symmetric and classical groups Alice Niemeyer UWA,

Gradient Descent Michail Michailidis & Patrick Maiden Outline