Accelerated Flow for Probability Distributions Thirty-sixth - - PowerPoint PPT Presentation

accelerated flow for probability distributions
SMART_READER_LITE
LIVE PREVIEW

Accelerated Flow for Probability Distributions Thirty-sixth - - PowerPoint PPT Presentation

Accelerated Flow for Probability Distributions Thirty-sixth International Conference on Machine Learning, Long Beach, 2019 Amirhossein Taghvaei Joint work with P. G. Mehta Coordinated Science Laboratory University of Illinois at


slide-1
SLIDE 1

Accelerated Flow for Probability Distributions

Thirty-sixth International Conference on Machine Learning, Long Beach, 2019 Amirhossein Taghvaei Joint work with P. G. Mehta Coordinated Science Laboratory University of Illinois at Urbana-Champaign June 13, 2019

slide-2
SLIDE 2

Objective and main idea

Euclidean space Space of probability distributions Gradient descent Wasserstein gradient flow Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

slide-3
SLIDE 3

Objective and main idea

Euclidean space Space of probability distributions Gradient descent Wasserstein gradient flow Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

slide-4
SLIDE 4

Objective and main idea

Euclidean space Space of probability distributions Gradient descent Wasserstein gradient flow Accelerated methods ? Objective: Construct accelerated flows for probability distribution Approach: (Wibisono, et. al. 2017) proposed a variational formulation to construct accelerated flows on Euclidean space Our approach is to extend the variational formulation for probability distributions

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 1 / 6 Amirhossein

slide-5
SLIDE 5

Variational formulation for Euclidean space

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) ? Gradient flow ˙ xt = −∇f(xt) ? Lagrangian t3(1 2| ˙ xt|2 − f(xt)) ? Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ? Accelerated flow is obtained by minimizing the action integral of the Lagrangian

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 2 / 6 Amirhossein

slide-6
SLIDE 6

Wasserstein gradient flow

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2|ut|2 − f(xt)) ? Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

slide-7
SLIDE 7

Wasserstein gradient flow

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2|ut|2 − f(xt)) ? Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

slide-8
SLIDE 8

Wasserstein gradient flow

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2|ut|2 − f(xt)) ? Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ? The Wasserstein gradient flow with respect to relative entropy is the Fokker-Planck equation (Jordan, et. al. 1998) The Fokker-Planck equation is realized with the Langevin sde The goal is to obtain accelerated forms of the sde

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 3 / 6 Amirhossein

slide-9
SLIDE 9

Summary

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2| ˙ xt|2 − f(xt)) E[t3(1 2| ˙ Xt|2 − f(Xt) − log(ρ(Xt)))] Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ¨ Xt = −3 t ˙ Xt − ∇f(Xt) − ∇ log(ρt(Xt)) The accelerated flow involves a mean-field term ∇ log ρt(Xt) which depends on the distribution of Xt The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

slide-10
SLIDE 10

Summary

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2| ˙ xt|2 − f(xt)) E[t3(1 2| ˙ Xt|2 − f(Xt) − log(ρ(Xt)))] Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ¨ Xt = −3 t ˙ Xt − ∇f(Xt) − ∇ log(ρt(Xt)) The accelerated flow involves a mean-field term ∇ log ρt(Xt) which depends on the distribution of Xt The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

slide-11
SLIDE 11

Summary

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2| ˙ xt|2 − f(xt)) E[t3(1 2| ˙ Xt|2 − f(Xt) − log(ρ(Xt)))] Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ¨ Xt = −3 t ˙ Xt − ∇f(Xt) − ∇ log(ρt(Xt)) The accelerated flow involves a mean-field term ∇ log ρt(Xt) which depends on the distribution of Xt The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

slide-12
SLIDE 12

Summary

vector variables Rd probability distribution P2(Rd) Objective funct. f(x) F(ρ) = D(ρρ∞) Gradient flow ˙ xt = −∇f(xt) dXt = −∇f(Xt) dt + √ 2 dBt Lagrangian t3(1 2| ˙ xt|2 − f(xt)) E[t3(1 2| ˙ Xt|2 − f(Xt) − log(ρ(Xt)))] Accelerated flow ¨ xt = −3 t ˙ xt − ∇f(xt) ¨ Xt = −3 t ˙ Xt − ∇f(Xt) − ∇ log(ρt(Xt)) The accelerated flow involves a mean-field term ∇ log ρt(Xt) which depends on the distribution of Xt The numerical algorithm involves a system of interacting particles The mean-field term is approximated in terms of particles

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 4 / 6 Amirhossein

slide-13
SLIDE 13

Numerical example

Gaussian The target distribution is Gaussian

100 101

t

10

6

10

4

10

2

100 102

KL(

t|

) O( 1

t2)

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 5 / 6 Amirhossein

slide-14
SLIDE 14

Numerical example

non-Gaussian The target distribution is mixture of two Gaussians

t=t0

t=t1 t=t2 t0 t1 t2

100 101

t

10

3

10

2

10

1

100

KL(

t|

) O( 1

t2)

Thanks for your attention. For more details come to see poster #206

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein

slide-15
SLIDE 15

Numerical example

non-Gaussian The target distribution is mixture of two Gaussians

t=t0

t=t1 t=t2 t0 t1 t2

100 101

t

10

3

10

2

10

1

100

KL(

t|

) O( 1

t2)

Thanks for your attention. For more details come to see poster #206

Accelerated gradient flow for prob. dist. Amirhossein Taghvaei 6 / 6 Amirhossein