Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, - - PowerPoint PPT Presentation

latent normalizing flows for discrete sequences
SMART_READER_LITE
LIVE PREVIEW

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, - - PowerPoint PPT Presentation

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, Alexander M. Rush School of Engineering and Applied Sciences, Harvard University Poster #3 @ Pacific Ballroom Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific


slide-1
SLIDE 1

Latent Normalizing Flows for Discrete Sequences

Zachary M. Ziegler, Alexander M. Rush

School of Engineering and Applied Sciences, Harvard University

Poster #3 @ Pacific Ballroom

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 1 / 8

slide-2
SLIDE 2

Motivation: Normalizing flows

For invertible fθ : ǫ → Z and base density pǫ(ǫ), pZ(z) = pǫ(f −1

θ

(z))

  • det∂f −1

θ

(z) ∂z

  • Flows generalize autoregressive models for continuous data, allowing

increased model flexibility and non-autoregressive generation.

Kingma and Dhariwal 2018, van den Oord et al. 2017, Rezende and Mohamed 2015

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 2 / 8

slide-3
SLIDE 3

Goal: Flows for discrete data

For discrete sequences MLE autoregressive models are ubiquitous. Can flows go beyond AR models for discrete sequences?

Figure: OpenNMT

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 3 / 8

slide-4
SLIDE 4

Challenges and approach

1

Discrete change of variables poses theoretical and practical challenges compared to continuous change of variables.

x1 x2 . . . xT z1:T T ǫ1:T

Flow Prior

x ∈ VT z ∈ RT ×H Latent variable model where prior p(z1:T ) captures dynamics of discrete data over time. Key: weak conditionally independent emission model. VAE for inference, optimize ELBO.

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 4 / 8

slide-5
SLIDE 5

Challenges and approach

1

Discrete change of variables poses theoretical and practical challenges compared to continuous.

2

Discrete data is inherently highly multimodal. Specialized flows for multimodal sequences:

Model dependencies across dimension and across time.

ǫ1 z1 ǫ2 z2 ǫ3 z3

← ← ←

Autoregressive (←) ǫ1 z1 ǫ2 z2 ǫ3 z3

↔ ↔ ↔

Autoregressive in time (←) ǫ1 z1 ǫ2 z2 ǫ3 z3

↔ ↔ ↔

Non-autoregressive (→)

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 5 / 8

slide-6
SLIDE 6

Challenges and approach

1

Discrete change of variables poses theoretical and practical challenges compared to continuous.

2

Discrete data is inherently highly multimodal. Specialized flows for multimodal sequences:

Model dependencies across dimension and across time. Replace underlying affine transformation with non-linear transformation.

−5 5

x

−5 5

y Example Transform

−5 5

x, y density Initial and Final Densities

−2 2

y1

−2 2

y2 Learned Distribution

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 6 / 8

slide-7
SLIDE 7

Experiments: Character-level LM, PTB

Model Test NLL Reconst. KL LSTM 1.41

  • Independent-across-time flow

2.90 0.15 2.77 Autoregressive (←) 1.42 0.10 1.37 Autoregressive in time (←) 1.46 0.10 1.43 Non-autoregressive (→) 1.63 0.21 1.55

KL always makes up > 90% of loss, indicating continuous flow models vast majority of uncertainty. Additional experiments on polyphonic music generation.

_ g r

  • u

p s _

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 7 / 8

slide-8
SLIDE 8

Conclusions

Latent variable model for discrete sequences modeling discrete dynamics in continuous latent space with continuous flows. See poster for details of approach, more experimental results, and generation speed comparison. Poster #3 @ Pacific Ballroom, for details and more experiments

Latent Normalizing Flows for Discrete Sequences Poster #3 @ Pacific Ballroom 8 / 8