Regret bounds for online variational inference Pierre Alquier ACML - - PowerPoint PPT Presentation

regret bounds for online variational inference
SMART_READER_LITE
LIVE PREVIEW

Regret bounds for online variational inference Pierre Alquier ACML - - PowerPoint PPT Presentation

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019 Pierre Alquier, RIKEN AIP Regret bounds for online variational inference Co-authors Emtiyaz Khan Badr-Eddine Chrief-Abdellatif Approximate


slide-1
SLIDE 1

Regret bounds for online variational inference

Pierre Alquier

ACML – Nagoya, Nov. 18, 2019

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-2
SLIDE 2

Co-authors

Badr-Eddine Chérief-Abdellatif Emtiyaz Khan

Approximate Bayesian Inference team https : //emtiyaz.github.io/ Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-3
SLIDE 3

Motivation

  • K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R. E. Turner, R. Yokota, M. E. Khan (2019).

Practical Deep Learning with Bayesian Principles. NeurIPS. Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-4
SLIDE 4

Motivation

  • K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R. E. Turner, R. Yokota, M. E. Khan (2019).

Practical Deep Learning with Bayesian Principles. NeurIPS. 1 proposes a fast algorithm

to approximate the posterior,

2 applies it to train Deep

Neural Networks on CIFAR-10, ImageNet ...

3 observation : improved

uncertainty quantification.

Picture : Roman Bachmann. Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-5
SLIDE 5

Motivation

  • K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R. E. Turner, R. Yokota, M. E. Khan (2019).

Practical Deep Learning with Bayesian Principles. NeurIPS. 1 proposes a fast algorithm

to approximate the posterior,

2 applies it to train Deep

Neural Networks on CIFAR-10, ImageNet ...

3 observation : improved

uncertainty quantification.

Picture : Roman Bachmann.

Objective : provide a theoretical analysis of this algorithm.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-6
SLIDE 6

Motivation

  • K. Osawa, S. Swaroop, A. Jain, R. Eschenhagen, R. E. Turner, R. Yokota, M. E. Khan (2019).

Practical Deep Learning with Bayesian Principles. NeurIPS. 1 proposes a fast algorithm

to approximate the posterior,

2 applies it to train Deep

Neural Networks on CIFAR-10, ImageNet ...

3 observation : improved

uncertainty quantification.

Picture : Roman Bachmann.

Objective : provide a theoretical analysis of this algorithm. First step : simplified versions.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-7
SLIDE 7

The sequential prediction problem

Sequential prediction problem

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-8
SLIDE 8

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-9
SLIDE 9

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-10
SLIDE 10

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-11
SLIDE 11

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-12
SLIDE 12

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-13
SLIDE 13

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-14
SLIDE 14

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-15
SLIDE 15

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

2

predict y3 : ˆ y3

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-16
SLIDE 16

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

2

predict y3 : ˆ y3

3

y3 revealed

4 . . . Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-17
SLIDE 17

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

2

predict y3 : ˆ y3

3

y3 revealed

4 . . .

Objective :

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-18
SLIDE 18

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

2

predict y3 : ˆ y3

3

y3 revealed

4 . . .

Objective : make sure that we learn to predict well as soon as possible.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-19
SLIDE 19

The sequential prediction problem

Sequential prediction problem

1 1

x1 given

2

predict y1 : ˆ y1

3

y1 is revealed

2 1

x2 given

2

predict y2 : ˆ y2

3

y2 revealed

3 1

x3 given

2

predict y3 : ˆ y3

3

y3 revealed

4 . . .

Objective : make sure that we learn to predict well as soon as possible. Keep

T

  • t=1

ℓ(ˆ yt, yt) as small as possible.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-20
SLIDE 20

Online gradient algorithm (OGA)

Given a set of predictors {fθ, θ ∈ Θ ⊂ Rd}, e.g fθ(x) = θ, x, an initial guess θ1, ˆ yt = fθt(xt) and θt+1 = θt − η∇θℓ(fθt(xt), yt).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-21
SLIDE 21

Online gradient algorithm (OGA)

Given a set of predictors {fθ, θ ∈ Θ ⊂ Rd}, e.g fθ(x) = θ, x, an initial guess θ1, ˆ yt = fθt(xt) and θt+1 = θt − η∇θℓt(θt).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-22
SLIDE 22

Online gradient algorithm (OGA)

Given a set of predictors {fθ, θ ∈ Θ ⊂ Rd}, e.g fθ(x) = θ, x, an initial guess θ1, ˆ yt = fθt(xt) and θt+1 = θt − η∇θℓt(θt). Note that θt+1 can be obtained by :

1 min

θ

  • θ,

t

  • s=1

∇θℓs(θs)

  • + θ − θ12

  • ,

2 min

θ

  • θ, ∇θℓt(θt)
  • + θ − θt2

  • .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-23
SLIDE 23

Bayesian learning and variational inference (VI)

πt+1(θ) := π(θ|x1, y1, . . . , xt, yt) ∝ exp

  • −η

t

  • s=1

ℓs(θ)

  • π(θ).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-24
SLIDE 24

Bayesian learning and variational inference (VI)

πt+1(θ) := π(θ|x1, y1, . . . , xt, yt) ∝ exp

  • −η

t

  • s=1

ℓs(θ)

  • π(θ).

Not tractable in general, leading to variational approximations : ˜ πt+1(θ) = arg min

q∈F

KL(q, πt+1) = arg min

q∈F

  • Eθ∼q
  • t
  • s=1

ℓs(θ)

  • + KL(q, π)

η

  • .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-25
SLIDE 25

Bayesian learning and variational inference (VI)

πt+1(θ) := π(θ|x1, y1, . . . , xt, yt) ∝ exp

  • −η

t

  • s=1

ℓs(θ)

  • π(θ).

Not tractable in general, leading to variational approximations : ˜ πt+1(θ) = arg min

q∈F

KL(q, πt+1) = arg min

q∈F

  • Eθ∼q
  • t
  • s=1

ℓs(θ)

  • + KL(q, π)

η

  • .

Formula for the online update of πt+1 : πt+1(θ) ∝ exp (−ηℓt(θ)) πt(θ). Q1 : can we similarly define a sequential update for a variational approximation ?

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-26
SLIDE 26

Regret bounds for Bayesian inference

Theorem (classical result) Under the assumption that the loss is bounded by B, the Bayesian update leads to

T

  • t=1

Eθ∼πt[ℓt(θ)] ≤ inf

q

T

  • t=1

Eθ∼q[ℓt(θ)] + ηB2T 8 + KL(q, π) η

  • .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-27
SLIDE 27

Regret bounds for Bayesian inference

Theorem (classical result) Under the assumption that the loss is bounded by B, the Bayesian update leads to

T

  • t=1

Eθ∼πt[ℓt(θ)] ≤ inf

q

T

  • t=1

Eθ∼q[ℓt(θ)] + ηB2T 8 + KL(q, π) η

  • .

Derivation of the infimum and η ∼ √ T “usually” leads to

T

  • t=1

Eθ∼πt[ℓt(θ)] ≤ inf

θ T

  • t=1

ℓt(θ) + O(

  • dT log(T)).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-28
SLIDE 28

Regret bounds for Bayesian inference

Theorem (classical result) Under the assumption that the loss is bounded by B, the Bayesian update leads to

T

  • t=1

Eθ∼πt[ℓt(θ)] ≤ inf

q

T

  • t=1

Eθ∼q[ℓt(θ)] + ηB2T 8 + KL(q, π) η

  • .

Derivation of the infimum and η ∼ √ T “usually” leads to

T

  • t=1

Eθ∼πt[ℓt(θ)] ≤ inf

θ T

  • t=1

ℓt(θ) + O(

  • dT log(T)).

Q2 : can we derive similar results for online VI ?

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-29
SLIDE 29

Two options for online VI

Parametric VI : F = {qµ, µ ∈ M}.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-30
SLIDE 30

Two options for online VI

Parametric VI : F = {qµ, µ ∈ M}.

1 Sequential Variational Approximation (SVA) :

θt+1 = arg min

θ

  • θ,

t

  • s=1

∇θℓs(θs)

  • + θ − θ12

  • ,

2 Streaming Variational Bayes (SVB) :

θt+1 = arg min

θ

  • θ, ∇θℓt(θt)
  • + θ − θt2

  • ,

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-31
SLIDE 31

Two options for online VI

Parametric VI : F = {qµ, µ ∈ M}.

1 Sequential Variational Approximation (SVA) :

θt+1 = arg min

θ

  • θ,

t

  • s=1

∇θℓs(θs)

  • + θ − θ12

  • ,

µt+1 = arg min

µ

  • µ,

t

  • s=1

∇µEθ∼qµs [ℓs(θ)]

  • + KL(qµ, π)

η

  • .

2 Streaming Variational Bayes (SVB) :

θt+1 = arg min

θ

  • θ, ∇θℓt(θt)
  • + θ − θt2

  • ,

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-32
SLIDE 32

Two options for online VI

Parametric VI : F = {qµ, µ ∈ M}.

1 Sequential Variational Approximation (SVA) :

θt+1 = arg min

θ

  • θ,

t

  • s=1

∇θℓs(θs)

  • + θ − θ12

  • ,

µt+1 = arg min

µ

  • µ,

t

  • s=1

∇µEθ∼qµs [ℓs(θ)]

  • + KL(qµ, π)

η

  • .

2 Streaming Variational Bayes (SVB) :

θt+1 = arg min

θ

  • θ, ∇θℓt(θt)
  • + θ − θt2

  • ,

µt+1 = arg min

µ

  • µ, ∇µEθ∼qµt [ℓt(θ)]
  • + KL(qµ, qµt)

η

  • .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-33
SLIDE 33

SVA & SVB are tractable, and not equivalent

Example : Gaussian prior θ ∼ π = N(0, s2I) and mean-field Gaussian approximation, µ = (m, σ). SVA : mt+1 ← mt − ηs2¯ gmt, gt+1 ← gt + ¯ gσt, σt+1 ← h (ηsgt+1) s, SVB : mt+1 ← mt − ησ2

t ¯

gmt, σt+1 ← σth (ησt ¯ gσt) where h(x) := √ 1 + x2 − x is applied componentwise, as well as the multiplication of two vectors, and ¯ gmt = ∂ ∂mEθ∼πmt ,σt [ℓt(θ)], ¯ gσt = ∂ ∂σEθ∼πmt ,σt [ℓt(θ)].

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-34
SLIDE 34

Theoretical analysis of SVA

Theorem 1 Under convexity and L-Lipschitz assumption on the loss, under α-strong convexity assumption on the KL term, SVA leads to

T

  • t=1

Eθ∼qµt [ℓt(θ)] ≤ inf

µ∈M

T

  • t=1

Eθ∼qµ[ℓt(θ)] + ηL2T α + KL(qµ, π) η

  • .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-35
SLIDE 35

Theoretical analysis of SVA

Theorem 1 Under convexity and L-Lipschitz assumption on the loss, under α-strong convexity assumption on the KL term, SVA leads to

T

  • t=1

Eθ∼qµt [ℓt(θ)] ≤ inf

µ∈M

T

  • t=1

Eθ∼qµ[ℓt(θ)] + ηL2T α + KL(qµ, π) η

  • .

Application to Gaussian approximation leads to

T

  • t=1

Eθ∼qµt [ℓt(θ)] ≤ inf

θ T

  • t=1

ℓt(θ) + (1 + o(1))2L α

  • dT log(T).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-36
SLIDE 36

Theoretical analysis of SVB

Theorem 2 Using Gaussian approximations, assuming the loss is convex, L-Lipschitz and the parameter space bounded (diameter = D), SVB with adequate η leads to

T

  • t=1

ℓt

  • Eθ∼qµt (θ)
  • ≤ inf

θ T

  • t=1

ℓt(θ) + DL √ 2T.

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-37
SLIDE 37

Theoretical analysis of SVB

Theorem 2 Using Gaussian approximations, assuming the loss is convex, L-Lipschitz and the parameter space bounded (diameter = D), SVB with adequate η leads to

T

  • t=1

ℓt

  • Eθ∼qµt (θ)
  • ≤ inf

θ T

  • t=1

ℓt(θ) + DL √ 2T. If, moreover, the loss is H-strongly convex,

T

  • t=1

ℓt

  • Eθ∼qµt (θ)
  • ≤ inf

θ T

  • t=1

ℓt(θ) + L2(1 + log(T)) H .

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-38
SLIDE 38

Test on a simulated dataset

Figure – Average cumulative losses on different datasets for classification and regression tasks with OGA (yellow), OGA-EL (red), SVA (blue), SVB (purple) and NGVI (green).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-39
SLIDE 39

Test on the Breast dataset

Figure – Average cumulative losses on different datasets for classification and regression tasks with OGA (yellow), OGA-EL (red), SVA (blue), SVB (purple) and NGVI (green).

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-40
SLIDE 40

Open questions

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-41
SLIDE 41

Open questions

1 Analysis of SVB in the general case. Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-42
SLIDE 42

Open questions

1 Analysis of SVB in the general case. 2 Analysis of the uncertainty quantification. Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-43
SLIDE 43

Open questions

1 Analysis of SVB in the general case. 2 Analysis of the uncertainty quantification. 3 NGVI is the next step in going closer to algorithms used

to train Neural Networks with Bayesian principles. But being based on a different parametrization, it does not satisfy our convexity assumption...

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-44
SLIDE 44

Open questions

1 Analysis of SVB in the general case. 2 Analysis of the uncertainty quantification. 3 NGVI is the next step in going closer to algorithms used

to train Neural Networks with Bayesian principles. But being based on a different parametrization, it does not satisfy our convexity assumption... Uses exponential family approximations {qµ, µ ∈ M} where m is the mean parameter. Denoting λ the natural parameter (with λ = F(µ)), λt+1 = (1 − ρ)λt + ρ∇µEθ∼qµt [ℓt(θ)] ,

  • M. E. Khan, D. Nielsen (2018). Fast yet Simple Natural-Gradient Descent for Variational

Inference in Complex Models. ISITA. Pierre Alquier, RIKEN AIP Regret bounds for online variational inference

slide-45
SLIDE 45

Thank you !

Pierre Alquier, RIKEN AIP Regret bounds for online variational inference