Learning Discrete and Continuous Factors of Data via Alternating - - PowerPoint PPT Presentation

learning discrete and continuous factors of data via
SMART_READER_LITE
LIVE PREVIEW

Learning Discrete and Continuous Factors of Data via Alternating - - PowerPoint PPT Presentation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong, Hyun Oh Song Seoul National University ICML19 1 Motivation Shape? square Postion x? 0.3 Our goal is to disentangle the underlying


slide-1
SLIDE 1

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement

Yeonwoo Jeong, Hyun Oh Song

Seoul National University

ICML19

1

slide-2
SLIDE 2

Motivation

Shape? square Postion x? 0.3 Postion y? 0.7 Size? 0.5 Rotation? 40°

◮ Our goal is to disentangle the underlying explanatory factors of data without any supervision.

2

slide-3
SLIDE 3

Motivation

square 0.3 0.7 0.5 40° square 0.3 0.7 0.5 40°

3

slide-4
SLIDE 4

Motivation

square 0.3 0.7 0.5 40° ellipse 0.3 0.7 0.5 40°

3

slide-5
SLIDE 5

Motivation

square 0.3 0.7 0.5 40° square 1 0.7 0.5 40°

3

slide-6
SLIDE 6

Motivation

square 0.3 0.7 0.5 40° square 0.3 0.7 0° 0.5

3

slide-7
SLIDE 7

Motivation

square 0.3 0.7 0.5 40° square 0.3 0.7 1 40°

3

slide-8
SLIDE 8

Motivation

◮ Most recent methods focus on learning only the continuous factors of variation.

4

slide-9
SLIDE 9

Motivation

◮ Most recent methods focus on learning only the continuous factors of variation. ◮ Learning discrete representations is known as a challenging

  • problem. However, learning continuous and discrete

representations is a more challenging problem.

4

slide-10
SLIDE 10

Outline

Method Experiments Conclusion

Method 5

slide-11
SLIDE 11

Overview of our method 𝑦

𝑨1

ො 𝑦

𝛾𝑚 on KL regularizer 𝛾ℎ on KL regularizer

𝑨𝑗 𝑨𝑜 𝑒

𝑟𝜚 𝑨 𝑦 𝑞𝜄 𝑦 𝑨, 𝑒

𝑨

Min cost flow solver

Method 6

slide-12
SLIDE 12

Overview of our method

◮ We propose an efficient procedure for implicitly penalizing the total correlation by controlling the information flow on each variables. ◮ We propose a method for jointly learning discrete and continuous latent variables in an alternating maximization framework.

Method 6

slide-13
SLIDE 13

Limitation of β-VAE framework

◮ β-VAE sets β > 1 to penalize TC(z) for disentangled representations. ◮ However, it penalizes the mutual information(= I(x, z)) between the data and the latent variables.

Method 7

slide-14
SLIDE 14

Our method

◮ We aim at penalizing TC(z) by sequentially penalizing the individual summand I(z1:i−1; zi). TC(z) =

m

  • i=2

I(z1:i−1; zi).

Method 8

slide-15
SLIDE 15

Our method

◮ We aim at penalizing TC(z) by sequentially penalizing the individual summand I(z1:i−1; zi). TC(z) =

m

  • i=2

I(z1:i−1; zi). ◮ We implicitly minimizes each summand, I(z1:i−1; zi) by sequentially maximizing the left hand side I(x; z1:i) for all i = 2, . . . , m

1. I(x; z1:i) = I(x; z1:i−1) + I(x; zi) − I(z1:i−1; zi). ↑ 2. I(x; z1:i) = I(x; z1:i−1) + I(x; zi) − I(z1:i−1; zi). ↑

Method 8

slide-16
SLIDE 16

Our method

◮ In practice, we maximize I(x; z1:i) by minimizing reconstruction term while penalizing zi+1:m with high β (:= βh) and the others with small β (:= βl).

Method 9

slide-17
SLIDE 17

Our method 𝑦

𝑨1

ො 𝑦

𝛾𝑚 on KL regularizer 𝛾ℎ on KL regularizer

𝑨𝑗 𝑨𝑜 𝑒

Min cost flow solver

𝑟𝜚 𝑨 𝑦 𝑞𝜄 𝑦 𝑨, 𝑒

𝑨

◮ Every latent dimensions are heavily penalized with βh. Each penalty

  • n latent dimension is sequentially relieved one at a time with βl in a

cascading fashion.

Method 10

slide-18
SLIDE 18

Our method 𝑦

𝑨1

ො 𝑦

𝛾𝑚 on KL regularizer 𝛾ℎ on KL regularizer

𝑨𝑗 𝑨𝑜 𝑒

𝑟𝜚 𝑨 𝑦 𝑞𝜄 𝑦 𝑨, 𝑒

𝑨

Min cost flow solver

◮ Every latent dimensions are heavily penalized with βh. Each penalty

  • n latent dimension is sequentially relieved one at a time with βl in a

cascading fashion.

Method 10

slide-19
SLIDE 19

Our method 𝑦

𝑨1

ො 𝑦

𝛾𝑚 on KL regularizer 𝛾ℎ on KL regularizer

𝑨𝑗 𝑨𝑜 𝑒

𝑟𝜚 𝑨 𝑦 𝑞𝜄 𝑦 𝑨, 𝑒

𝑨

Min cost flow solver

◮ Every latent dimensions are heavily penalized with βh. Each penalty

  • n latent dimension is sequentially relieved one at a time with βl in a

cascading fashion.

Method 10

slide-20
SLIDE 20

Our method 𝑦

𝑨1

ො 𝑦

𝛾𝑚 on KL regularizer 𝛾ℎ on KL regularizer

𝑨𝑗 𝑨𝑜 𝑒

𝑟𝜚 𝑨 𝑦 𝑞𝜄 𝑦 𝑨, 𝑒

𝑨

Min cost flow solver

◮ Every latent dimensions are heavily penalized with βh. Each penalty

  • n latent dimension is sequentially relieved one at a time with βl in a

cascading fashion.

Method 10

slide-21
SLIDE 21

Graphical model

Figure: Graphical models view. Solid lines denote the generative process and the dashed lines denote the inference process. x, z, d denotes the data, continuous latent code, and the discrete latent code respectively.

Method 11

slide-22
SLIDE 22

Motviation of our method

◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE.

Method 12

slide-23
SLIDE 23

Motviation of our method

◮ AAE with supervised discrete variables(AAE-S) can learn good continuous representations when the burden of simultaneously modeling the continuous and discrete factors is relieved through supervision on discrete factors unlike jointVAE. ◮ Inspired by these findings, our idea is to alternate between finding the most likely discrete configuration of the variables given the continuous factors, and updating the parameters (φ, θ) given the discrete configurations.

Method 12

slide-24
SLIDE 24

Construct unary term

𝑦(1) 𝑦(1) 𝑦(1)

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}.

Method 13

slide-25
SLIDE 25

Construct unary term

𝑦(1) ො 𝑦(1) 𝑦(1) 𝑦(1) 𝑓1

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}.

Method 13

slide-26
SLIDE 26

Construct unary term

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) 𝑓1 𝑓𝑙

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}.

Method 13

slide-27
SLIDE 27

Construct unary term

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑓1 𝑓𝑙 𝑓𝑇

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}.

Method 13

slide-28
SLIDE 28

Construct unary term

rec rec rec 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑓1 𝑓𝑙 𝑓𝑇

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}.

Method 13

slide-29
SLIDE 29

Construct unary term

𝑣1 rec rec rec 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑓1 𝑓𝑙 𝑓𝑇

◮ The discrete latent variables are represented using one-hot encodings of each variables d(i) ∈ {e1, . . . , eS}. ◮ u(i)

θ

denotes the vector of the likelihood log pθ(x(i)|z(i), ek) evaluated at each k ∈ [S].

Method 13

slide-30
SLIDE 30

Alternating minimization scheme

◮ Our goal is to maximize the variational lower bound of the following

  • bjective,

L(θ, φ) = I(x; [z, d]) − βEx∼p(x)DKL(qφ(z | x) p(z)) − λDKL(q(d) p(d)) ◮ After rearranging the terms, we arrive at the following optimization problem. maximize

θ,φ

      maximize

d(1),...d(n) n

  • i=1

u(i)

θ ⊺d(i) − λ′ i=j

d(i)⊺d(j)

  • :=LLB(θ,φ)

      − β

n

  • i=1

DKL(qφ(z|x(i))||p(z)) subject to d(i)1 = 1, d(i) ∈ {0, 1}S, ∀i,

Method 14

slide-31
SLIDE 31

Finding the most likely discrete configuration

𝑦(1) 𝑦(1) 𝑦(1) 𝑦(i) 𝑦(i) 𝑦(n) 𝑦(n) 𝑦(n) 𝑦(i)

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-32
SLIDE 32

Finding the most likely discrete configuration

𝑦(1) ො 𝑦(1) 𝑦(1) 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) 𝑦(n) 𝑦(i) 𝑓1 𝑓1 𝑓1

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-33
SLIDE 33

Finding the most likely discrete configuration

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) 𝑦(i) 𝑓1 𝑓𝑙 𝑓1 𝑓𝑙 𝑓1 𝑓𝑙

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-34
SLIDE 34

Finding the most likely discrete configuration

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-35
SLIDE 35

Finding the most likely discrete configuration

rec rec rec rec rec rec rec rec rec 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-36
SLIDE 36

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-37
SLIDE 37

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-38
SLIDE 38

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-39
SLIDE 39

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-40
SLIDE 40

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-41
SLIDE 41

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ With the unary terms, we solve inner maximization problem LLB(θ, φ) over the discrete variables [d(1), . . . , d(n)].1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-42
SLIDE 42

Finding the most likely discrete configuration

𝑣1 rec rec rec 𝑣𝑗 rec rec rec 𝑣𝑜 rec rec rec

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(1) ො 𝑦(1) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(i) ො 𝑦(i) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑦(n) ො 𝑦(n) 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇 𝑓1 𝑓𝑙 𝑓𝑇

◮ The maximization problem can be exactly solved in polynomial time via minimum cost flow(mcf) without continuous relaxation.1

1Jeong, Y. and Song, H. O. “Efficient end-to-end learning for quantizable representations”

ICML2018.

Method 15

slide-43
SLIDE 43

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-44
SLIDE 44

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-45
SLIDE 45

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-46
SLIDE 46

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-47
SLIDE 47

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-48
SLIDE 48

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-49
SLIDE 49

Updating the parameters

Min cost flow solver

𝑦(1) 𝑦(𝑗) 𝑦(𝑜) 𝑒(1) 𝑒(𝑗) 𝑒(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-50
SLIDE 50

Updating the parameters

Min cost flow solver

𝑦(1) ො 𝑦(1) 𝑦(𝑗) ො 𝑦(𝑗) 𝑦(𝑜) ො 𝑦(𝑜) 𝑒(1) 𝑒(𝑗) 𝑒(𝑜)

◮ Then, we update the parameters under this discrete configurations.

Method 16

slide-51
SLIDE 51

Outline

Method Experiments Conclusion

Experiments 17

slide-52
SLIDE 52

Notation

◮ We denote our full method as CascadeVAE. ◮ We evaluate with disentanglement score introduced in FactorVAE and unsupervised classification accuracy. ◮ Baselines are β-VAE, JointVAE, FactorVAE

Experiments 18

slide-53
SLIDE 53

dSprites Dataset Example

◮ Shape (discrete) : square, ellipse, heart ◮ Scale: 6 values linearly spaced in [0.5, 1] ◮ Orientation: 40 values in [0, 2π] ◮ Position X: 32 values in [0, 1] ◮ Position Y: 32 values in [0, 1]

Experiments 19

slide-54
SLIDE 54

Quantitative results on dSprites

Disentanglement score

Method m Mean (std) Best β VAE (β = 10.0) 5 70.11 (7.54) 84.62 (β = 4.0) 10 74.41 (7.68) 88.38 FactorVAE 5 81.09 (2.63) 85.12 10 82.15 (0.88) 88.25 JointVAE 6 74.51 (5.17) 91.75 4 73.06 (2.18) 75.38 CascadeVAE (βl = 1.0) 6 90.49 (5.28) 99.50 (βl = 2.0) 4 91.34 (7.36) 98.62

Unsupervised classification accuracy

Method m Mean (std) Best JointVAE 6 44.79 (3.88) 53.14 4 43.99 (3.94) 54.11 CascadeVAE 6 78.84 (15.65) 99.66 4 76.00 (22.16) 98.72 Experiments 20

slide-55
SLIDE 55

Outline

Method Experiments Conclusion

Conclusion 21

slide-56
SLIDE 56

Conclusion

◮ Our experiments show that information cascading and alternating maximization of discrete and continuous variables, lead to the state of the art performance in 1) disentanglement score, and 2) classification accuracy. ◮ The source code is available at https://github.com/snu-mllab/DisentanglementICML19.

Conclusion 22

slide-57
SLIDE 57

Latent dimension traversal in dSprites

Conclusion 23

slide-58
SLIDE 58

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-59
SLIDE 59

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-60
SLIDE 60

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-61
SLIDE 61

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-62
SLIDE 62

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-63
SLIDE 63

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-64
SLIDE 64

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-65
SLIDE 65

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-66
SLIDE 66

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-67
SLIDE 67

β-VAE

z1 z2 z3 z4 z5

FactorVAE

z1 z2 z3 z4 z5

24

slide-68
SLIDE 68

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-69
SLIDE 69

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-70
SLIDE 70

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-71
SLIDE 71

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-72
SLIDE 72

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-73
SLIDE 73

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-74
SLIDE 74

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-75
SLIDE 75

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-76
SLIDE 76

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-77
SLIDE 77

JointVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

25

slide-78
SLIDE 78

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-79
SLIDE 79

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-80
SLIDE 80

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-81
SLIDE 81

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-82
SLIDE 82

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-83
SLIDE 83

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-84
SLIDE 84

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-85
SLIDE 85

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-86
SLIDE 86

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26

slide-87
SLIDE 87

CascadeVAE

d = [1 0 0] d = [0 1 0] d = [0 0 1] z1 z2 z3 z4 z5 z6

26