Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony - - PowerPoint PPT Presentation

correlated variational auto encoders
SMART_READER_LITE
LIVE PREVIEW

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony - - PowerPoint PPT Presentation

Correlated Variational Auto-Encoders Da Tang 1 Dawen Liang 2 Tony Jebara 1 , 2 Nicholas Ruozzi 3 1 Columbia University 2 Netflix Inc. 3 The University of Texas at Dallas June 11, 2019 Variational Auto-Encoders (VAEs) Learn stochastic low


slide-1
SLIDE 1

Correlated Variational Auto-Encoders

Da Tang1 Dawen Liang2 Tony Jebara1,2 Nicholas Ruozzi3

1Columbia University 2Netflix Inc. 3The University of Texas at Dallas

June 11, 2019

slide-2
SLIDE 2

Variational Auto-Encoders (VAEs)

◮ Learn stochastic low dimensional latent representations for high dimensional data:

qλ(z | x) pθ(x | z) Data x Latent representa3on z Reconstruc3on x

slide-3
SLIDE 3

Variational Auto-Encoders (VAEs)

◮ Learn stochastic low dimensional latent representations for high dimensional data:

qλ(z | x) pθ(x | z) Data x Latent representa3on z Reconstruc3on x

◮ Model the likelihood and the inference distribution independent among data

points in the objective (the ELBO): L(λ, θ) =

n

  • i=1

(Eqλ(zi|xi) [log pθ(xi|zi)] − KL(qλ(zi|xi)||p0(zi))).

slide-4
SLIDE 4

Motivation

◮ VAEs assume the prior is i.i.d. among data points.

slide-5
SLIDE 5

Motivation

◮ VAEs assume the prior is i.i.d. among data points. ◮ If we know information about correlations between data points (e.g., networked

data), we can incorporate it into the generative process of VAEs.

slide-6
SLIDE 6

Learning with a Correlation Graph

◮ Given an undirected correlation graph G = (V , E) for data x1, . . . , xn, where

V = {v1, . . . , vn} and E = {(vi, vj) : xi and xj are correlated}.

slide-7
SLIDE 7

Learning with a Correlation Graph

◮ Given an undirected correlation graph G = (V , E) for data x1, . . . , xn, where

V = {v1, . . . , vn} and E = {(vi, vj) : xi and xj are correlated}.

◮ Directly applying a correlated prior of z = (z1, . . . , zn) on general undirected

graphs is hard.

slide-8
SLIDE 8

Correlated Priors

Define the prior of z as a uniform mixture over all Maximal Acyclic Sub- graphs of G: pcorrg (z) = 1 |AG|

  • G ′=(V ,E ′)∈AG

pG ′

0 (z).

slide-9
SLIDE 9

Correlated Priors

We apply a uniform mixture

  • ver

acyclic subgraphs since we have closed- form correlated distributions for acyclic graphs: pG ′

0 (z) = n

  • i=1

p0(zi)

  • (vi,vj)∈E ′

p0(zi, zj) p0(zi)p0(zj).

slide-10
SLIDE 10

Correlated Priors

We apply a uniform mixture

  • ver

acyclic subgraphs since we have closed- form correlated distributions for acyclic graphs: pG ′

0 (z) = n

  • i=1

p0(zi)

  • (vi,vj)∈E ′

p0(zi, zj) p0(zi)p0(zj).

slide-11
SLIDE 11

Correlated Priors

We apply a uniform mixture

  • ver

acyclic subgraphs since we have closed- form correlated distributions for acyclic graphs: pG ′

0 (z) = n

  • i=1

p0(zi)

  • (vi,vj)∈E ′

p0(zi, zj) p0(zi)p0(zj).

slide-12
SLIDE 12

Inference with a Weighted Objective

Define a new ELBO for general graphs: log pθ(x) = log Ep

corrg

(z)[pθ(x|z)]

≥ 1 |AG|

  • G ′∈AG
  • EqG′

λ (z|x)[log pθ(x|z)] − KL(qG ′

λ (z|x)||pG ′ 0 (z))

  • :=L(λ, θ)

where qG ′

λ is defined in the same way as for the priors:

qG ′

λ (z) = n

  • i=1

qλ(zi|xi)

  • (vi,vj)∈E ′

qλ(zi, zj|xi, xj) qλ(zi|xi)qλ(zj|xj).

slide-13
SLIDE 13

Inference with a Weighted Objective

◮ The loss function is intractable due to the potentially exponential many subgraphs.

slide-14
SLIDE 14

Inference with a Weighted Objective

◮ The loss function is intractable due to the potentially exponential many subgraphs. ◮ Represent the average loss on acyclic subgraphs as a weighted average loss on

edges. …

⅔ ⅔ ⅔ ½ ½ ½ ½ ½ ½ 1

slide-15
SLIDE 15

Inference with a Weighted Objective

◮ The loss function is intractable due to the potentially exponential many subgraphs. ◮ Represent the average loss on acyclic subgraphs as a weighted average loss on

edges. …

⅔ ⅔ ⅔ ½ ½ ½ ½ ½ ½ 1

◮ The weighted loss is tractable. The weights can be computed from the

pseudo-inverse of the Laplacian matrix of G.

slide-16
SLIDE 16

Empirical Results

Table: Link prediction test NCRR Method Test NCRR vae 0.0052 ± 0.0007 GraphSAGE 0.0115 ± 0.0025 cvae 0.0171 ± 0.0009 Table: Spectral clustering scores Method NMI scores vae 0.0031 ± 0.0059 GraphSAGE 0.0945 ± 0.0607 cvae 0.2748 ± 0.0462 Table: User matching test RR Method Test RR vae 0.3498 ± 0.0167 cvae 0.7129 ± 0.0096

slide-17
SLIDE 17

Conclusion and Future Work

◮ CVAE accounts for correlations between data points that are known a priori. It

can adopt a correlated variational density function to achieve a better variational approximation.

slide-18
SLIDE 18

Conclusion and Future Work

◮ CVAE accounts for correlations between data points that are known a priori. It

can adopt a correlated variational density function to achieve a better variational approximation.

◮ Future work includes extending to correlated VAEs with higher-order correlations.

slide-19
SLIDE 19

Thanks!

Poster #219

Code available at https://github.com/datang1992/Correlated-VAEs.