ControlVAE: Controllable Variational Autoencoder Huajie Shao, - - PowerPoint PPT Presentation

controlvae controllable variational autoencoder
SMART_READER_LITE
LIVE PREVIEW

ControlVAE: Controllable Variational Autoencoder Huajie Shao, - - PowerPoint PPT Presentation

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc.


slide-1
SLIDE 1

ControlVAE: Controllable Variational Autoencoder

1

University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc. at Seattle

Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher

slide-2
SLIDE 2

Background--VAE

2

Machine Translation Disentanglement representation learning

slide-3
SLIDE 3

VAE model

3

#(%)

Desired

'((%)

)(%) *+(,|.) log(2(3|4) − )(%)'((5 4 3 ||2 4 )

3 3

4

Encoder Decoder

5(4|3) 2(3|4) VAE

Fig, The basic VAE model ELBO objective function

  • Recon. term

KL- divergence

slide-4
SLIDE 4

Background

  • KL-vanishing (posterior collapse)

Ø KL tends to zero during model training

  • Trade-off between KL-divergence and

reconstruction quality

4

KL vanishing KL-divergence

  • Recon. accuracy
slide-5
SLIDE 5

Related work

5

Study

Description

Cons Cost annealing (bowman2015)

increase weight 𝛾 on KL from 0 until to 1 using sigmoid function after N steps still suffer from KL-vanishing

𝜸-VAE (higgins2017)

assign a large and fixed weight to KL term in the VAE objective fixed weight leads to high recon. error

TamingVAE (rezende2018)

formulate the reconstruction loss as an

  • ptimization using Lagrangian multiplier
  • suffer from local minima
  • KL vanishing

FactorVAE (kim2018)

Decompose KL into three terms: Index_code, total correlation and wise-KL fixed weight, has high recon. error

infoVAE (zhao2017)

Add a mutual information maximization term to encourage mutual information between x and z

  • fixed weight
  • cannot explicitly control KL value

Drawback of existing work:

  • 1. Fixed weight on KL term, leading to high recon. error
  • 2. KL vanishing (posterior collapse)
slide-6
SLIDE 6

Motivation

6

[1] Language modeling: KL vanishing [2] Disentanglement: information capacity (KL-divergence)

slide-7
SLIDE 7

ControlVAE Framework

7

+

Controller

#(%)

Desired KL

'((%)

)(%) *+(,|.) log(2(3|4) − )(%)'((5 4 3 ||2 4 )

VAE objective

3 3

4

Encoder Decoder

5(4|3) 2(3|4) VAE

Feedback

Fig, Framework of ControlVAE via dynamic learning

slide-8
SLIDE 8

ControlVAE Model

Objective function:

8

Where is the output of a controller 𝛾 (t)

slide-9
SLIDE 9

PID control algorithm

9

§ e(t) is the error between the real KL-divergence and the set point § Kp is the coefficient for proportional (P) term § Ki is the coefficient for integer (I) term § Kd is the coefficient for derivative (D) term PID algorithm

e(t)

P term I term D term

slide-10
SLIDE 10

Non-linear PI Controller

10

application-specific constant

Insight of PI controller

§ When e(t) > 0: output " 𝐿𝑀(t) is very small, reduce 𝛾 𝑢 , boost KL value; § When e(t) < 0: output " 𝐿𝑀(t) is larger than set point, increase 𝛾 𝑢 to optimize KL term;

t

Set point

KL

slide-11
SLIDE 11

Non-linear PI Controller

11

+

"# 1 + exp(* + )

−". /

012 3

* 4

+ +

5(+)

Objective

− *(+)

Desired KL Value

"6(+)

Feedback

P I

  • Fig. PI controller
slide-12
SLIDE 12

Evaluation

Applications:

v Language modeling: text and dialog generation v Disentanglement representation learning v Image generation

Benchmark datasets:

  • Language modeling: [1] Penn Tree Bank (PTB) [2] Switchboard(SW) telephone

conversation

  • Disentanglement: DSprites
  • Image generation: CelebA

12

slide-13
SLIDE 13

Evaluation: Language modeling (PTB data)

13 Baselines: 1) Cost annealing: gradually increases the weight on KL-divergence from 0 until to 1 after N steps using Sigmoid function 2) Cyclical annealing: splits the training process into M cycles and each increases the weight from 0 until to 1 using a linear function

(a) KL divergence (b) Recon. loss (c) Weight 𝛾(𝑢)

slide-14
SLIDE 14

Evaluation: Language modeling

14

Switchboard (SW) to measure the diversity of generated text

slide-15
SLIDE 15

Evaluation: Disentanglement (Dsprites data)

15

(a) Recon. error (b) Weight 𝛾(𝑢) (c) Disentangled factors

Baselines: 1) \beta-VAE: Burgess, C. P., Higgins, I., Pal, A., Matthey,et al. (2018). Understanding disentangling in $\beta $-

  • VAE. arXiv preprint arXiv:1804.03599.

2) FactorVAE: Kim, Hyunjik, and Andriy Mnih. "Disentangling by Factorising." In International Conference on Machine Learning, pp. 2649-2658. 2018.

slide-16
SLIDE 16

Evaluation: Disentanglement

16

ControlVAE (KL=16)

!-VAE (! = 100)

x y scale

  • rient

shape

FactorVAE (% = 10)

Fig., Example of traverse a single latent dimension in a range of [-3, 3]

slide-17
SLIDE 17

Evaluation: Image generation

17

(a) Recon. loss (b) KL divergence

slide-18
SLIDE 18

Conclusion

  • Propose a new controllable VAE, ControlVAE, that combines

a PI controller, with the basic VAE model.

  • Design a new non-linear PI controller, to automatically tune

the weight in the VAE objective.

  • ControlVAE can not only avert the KL-vanishing, but also

control the diversity of generated text.

  • Achieve better disentangling and reconstruction quality than

the existing methods.

18

slide-19
SLIDE 19

Thank you very much!!

19

Q&A

slide-20
SLIDE 20

Backup

20

slide-21
SLIDE 21

PI Parameter Tuning

21

  • Tune Kp , when output #

𝐿𝑀(t) is very small, error >>0,

  • Tune Ki , when output #

𝐿𝑀(t) is very large, e(t) < 0

P term I term e.g., Kp = 0.01 e.g., Ki = 0.001 or 0.0001

t

Set point

KL

slide-22
SLIDE 22

Set Point Guideline

  • The set point of KL-divergence is largely application specific.

Ø Text generation: slightly increase the KL-divergence, denoted by KLvae, produced by the basic VAE or by Cost annealing method. ØELBO improvement: KL should be increased within the following bound

22

slide-23
SLIDE 23

ELBO improvement

23