ControlVAE: Controllable Variational Autoencoder Huajie Shao, - - PowerPoint PPT Presentation

▶

Mar 07, 2024 311 likes •560 views

ControlVAE: Controllable Variational Autoencoder Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc.

SLIDE 1

ControlVAE: Controllable Variational Autoencoder

University of Illinois at Urbana-Champaign Amazon Web Services Deep Learning Alibaba Inc. at Seattle

Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin LiuJun Wang, Tarek Abdelzaher

SLIDE 2

Background--VAE

Machine Translation Disentanglement representation learning

SLIDE 3

VAE model

∑

#(%)

Desired

'((%)

)(%) *+(,|.) log(2(3|4) − )(%)'((5 4 3 ||2 4 )

3 3

4

Encoder Decoder

5(4|3) 2(3|4) VAE

Fig, The basic VAE model ELBO objective function

Recon. term

KL- divergence

SLIDE 4

Background

KL-vanishing (posterior collapse)

Ø KL tends to zero during model training

Trade-off between KL-divergence and

reconstruction quality

KL vanishing KL-divergence

Recon. accuracy

SLIDE 5

Related work

Study

Description

Cons Cost annealing (bowman2015)

increase weight 𝛾 on KL from 0 until to 1 using sigmoid function after N steps still suffer from KL-vanishing

𝜸-VAE (higgins2017)

assign a large and fixed weight to KL term in the VAE objective fixed weight leads to high recon. error

TamingVAE (rezende2018)

formulate the reconstruction loss as an

ptimization using Lagrangian multiplier
suffer from local minima
KL vanishing

FactorVAE (kim2018)

Decompose KL into three terms: Index_code, total correlation and wise-KL fixed weight, has high recon. error

infoVAE (zhao2017)

Add a mutual information maximization term to encourage mutual information between x and z

fixed weight
cannot explicitly control KL value

Drawback of existing work:

1. Fixed weight on KL term, leading to high recon. error
2. KL vanishing (posterior collapse)

SLIDE 6

Motivation

[1] Language modeling: KL vanishing [2] Disentanglement: information capacity (KL-divergence)

SLIDE 7

ControlVAE Framework

∑

+

Controller

−

#(%)

Desired KL

'((%)

)(%) *+(,|.) log(2(3|4) − )(%)'((5 4 3 ||2 4 )

VAE objective

3 3

4

Encoder Decoder

5(4|3) 2(3|4) VAE

Feedback

Fig, Framework of ControlVAE via dynamic learning

SLIDE 8

ControlVAE Model

Objective function:

Where is the output of a controller 𝛾 (t)

SLIDE 9

PID control algorithm

§ e(t) is the error between the real KL-divergence and the set point § Kp is the coefficient for proportional (P) term § Ki is the coefficient for integer (I) term § Kd is the coefficient for derivative (D) term PID algorithm

e(t)

P term I term D term

SLIDE 10

Non-linear PI Controller

application-specific constant

Insight of PI controller

§ When e(t) > 0: output " 𝐿𝑀(t) is very small, reduce 𝛾 𝑢 , boost KL value; § When e(t) < 0: output " 𝐿𝑀(t) is larger than set point, increase 𝛾 𝑢 to optimize KL term;

t

Set point

SLIDE 11

Non-linear PI Controller

∑

+

"# 1 + exp(* + )

−". /

012 3

* 4

∑

+ +

5(+)

Objective

− *(+)

Desired KL Value

"6(+)

Feedback

P I

Fig. PI controller

SLIDE 12

Evaluation

Applications:

v Language modeling: text and dialog generation v Disentanglement representation learning v Image generation

Benchmark datasets:

Language modeling: [1] Penn Tree Bank (PTB) [2] Switchboard(SW) telephone

conversation

Disentanglement: DSprites
Image generation: CelebA

SLIDE 13

Evaluation: Language modeling (PTB data)

13 Baselines: 1) Cost annealing: gradually increases the weight on KL-divergence from 0 until to 1 after N steps using Sigmoid function 2) Cyclical annealing: splits the training process into M cycles and each increases the weight from 0 until to 1 using a linear function

(a) KL divergence (b) Recon. loss (c) Weight 𝛾(𝑢)

SLIDE 14

Evaluation: Language modeling

Switchboard (SW) to measure the diversity of generated text

SLIDE 15

Evaluation: Disentanglement (Dsprites data)

(a) Recon. error (b) Weight 𝛾(𝑢) (c) Disentangled factors

Baselines: 1) \beta-VAE: Burgess, C. P., Higgins, I., Pal, A., Matthey,et al. (2018). Understanding disentangling in $\beta $-

VAE. arXiv preprint arXiv:1804.03599.

2) FactorVAE: Kim, Hyunjik, and Andriy Mnih. "Disentangling by Factorising." In International Conference on Machine Learning, pp. 2649-2658. 2018.

SLIDE 16

Evaluation: Disentanglement

ControlVAE (KL=16)

!-VAE (! = 100)

x y scale

rient

shape

FactorVAE (% = 10)

Fig., Example of traverse a single latent dimension in a range of [-3, 3]

SLIDE 17

Evaluation: Image generation

(a) Recon. loss (b) KL divergence

SLIDE 18

Conclusion

Propose a new controllable VAE, ControlVAE, that combines

a PI controller, with the basic VAE model.

Design a new non-linear PI controller, to automatically tune

the weight in the VAE objective.

ControlVAE can not only avert the KL-vanishing, but also

control the diversity of generated text.

Achieve better disentangling and reconstruction quality than

the existing methods.

SLIDE 19

Thank you very much!!

Q&A

SLIDE 20

Backup

SLIDE 21

PI Parameter Tuning

Tune Kp , when output #

𝐿𝑀(t) is very small, error >>0,

Tune Ki , when output #

𝐿𝑀(t) is very large, e(t) < 0

P term I term e.g., Kp = 0.01 e.g., Ki = 0.001 or 0.0001

t

Set point

SLIDE 22

Set Point Guideline

The set point of KL-divergence is largely application specific.

Ø Text generation: slightly increase the KL-divergence, denoted by KLvae, produced by the basic VAE or by Cost annealing method. ØELBO improvement: KL should be increased within the following bound

SLIDE 23