Meta-Amoruized Variational Inference and Learning Kristy Choi - - PowerPoint PPT Presentation

meta amoruized variational inference and learning
SMART_READER_LITE
LIVE PREVIEW

Meta-Amoruized Variational Inference and Learning Kristy Choi - - PowerPoint PPT Presentation

Meta-Amoruized Variational Inference and Learning Kristy Choi CS236: December 4th, 2019 Probabilistic Inference Probabilistic inference is a paruicular way of viewing the world: + = Observations Updated (posterior) Prior belief Typically


slide-1
SLIDE 1

Meta-Amoruized Variational Inference and Learning

Kristy Choi

CS236: December 4th, 2019

slide-2
SLIDE 2

Probabilistic Inference

Probabilistic inference is a paruicular way of viewing the world:

+

Observations

=

Prior belief Updated (posterior) belief

Typically the beliefs are “hidden” (unobserved), and we want to model them using latent variables.

slide-3
SLIDE 3

Probabilistic Inference

Many machine learning applications can be cast as probabilistic inference queries:

Bioinformatics Medical diagnosis Human cognition Computer vision

slide-4
SLIDE 4

N

Medical Diagnosis Example

  • bserved symptoms

x

identity of disease

z

Goal: Infer identity of disease given a set of observed symptoms from a patient population.

slide-5
SLIDE 5

N

Exact Inference

x z

symptoms disease

Marginal is intractable, we can’t compute this even if we want to

family of tractable distributions intractable integral

slide-6
SLIDE 6

N

Approximate Variational Inference

x z

symptoms disease

→ turned an intractable inference problem into an

  • ptimization problem

dependence on x: learn new q per data point

slide-7
SLIDE 7

N

Amoruized Variational Inference

x z

symptoms disease

→ scalability: VAE formulation

deterministic mapping predicts z as a function of x

slide-8
SLIDE 8

Multiple Patient Populations

N

x z

symptoms

disease

N

x z

symptoms

disease

N

x z

symptoms

disease

N

x z

symptoms

disease

N

x z

symptoms

disease

Doctor is equivalently re-learning how to diagnose an illness :/

slide-9
SLIDE 9

Multiple Patient Populations

Share statistical strength across difgerent populations to infer latent representations that transfer to similar, but previously unseen populations (distributions)

slide-10
SLIDE 10

(Naive) Meta-Amoruized Variational Inference

meta-distribution

slide-11
SLIDE 11

Meta-Amoruized Variational Inference

meta-distribution shared meta-inference network

slide-12
SLIDE 12

Meta-Inference Network

  • Meta-inference model takes in 2 inputs:

○ Marginal ○ Query point

  • Mapping
  • Parameterize encoder with neural network
  • Dataset : represent each marginal distribution as a

set of samples

slide-13
SLIDE 13

In Practice: MetaVAE

Summary network ingests samples from each dataset Aggregation network pergorms inference

summary network samples query aggregation network decoder_i

slide-14
SLIDE 14

Related Work

x z

N VAE MetaVAE

Tx

z

T

T

x

D

D

x != x

D T

D T

c xT z xD Neural Statistician xD = xT

D T

c xT z xD Variational Homoencoder (VHE) xD != xT

Avoid restrictive assumption on global prior over datasets p(c)

slide-15
SLIDE 15

Intuition: Clustering Mixtures of Gaussians

Learns how to cluster: for 50 datasets, MetaVAE achieves 9.9% clustering error, while VAE gets 27.9%

z = 0 z = 1

slide-16
SLIDE 16

Learning Invariant Representations

  • Apply various

transformations

  • Amoruize over subsets
  • f transformations,

learn representations

  • Test representations
  • n held-out

transformations (classifjcation)

slide-17
SLIDE 17

Invariance Experiment Results

MetaVAE representations consistently

  • utpergorm NS/VHE on MNIST + NORB

datasets

slide-18
SLIDE 18

Analysis

MetaVAE representations tend not to change very much within a family of transformations that it was amoruized over, as desired.

slide-19
SLIDE 19

Conclusion

  • Limitations

○ No sampling ○ Semi-parametric ○ Arbitrary dataset construction

  • Developed an algorithm for a family of probabilistic models:

meta-amoruized inference paradigm

  • MetaVAE learns transferrable representations that generalize

well across similar data distributions in downstream tasks

  • Paper: htups://arxiv.org/pdf/1902.01950.pdf
slide-20
SLIDE 20

Encoding Musical Style with Transformer Autoencoders

slide-21
SLIDE 21

Generative Models for Music

  • Generating music is a challenging problem, as music

contains structure at multiple timescales. ○ Periodicity, repetition

  • Coherence in style and rhythm across (long) time periods!

raw audio: WaveNet, GANs, etc. symbolic: RNNs, LSTMs, etc.

slide-22
SLIDE 22

Music Transformer

  • Symbolic: event-based representation that allows

for generation of expressive pergormances (without generating a score)

  • Current SOTA in music generation

○ Can generate music over 60 seconds in length

  • Atuention-based

○ Replaces self-atuention with relative atuention

slide-23
SLIDE 23

What We Want

  • Control music generation using either (1) pergormance or (2)

melody + perg as conditioning

  • Generate pieces that sound similar in style to input pieces!
slide-24
SLIDE 24

Transformer Autoencoder

  • 1. Sum
  • 2. Concatenation
  • 3. Tile (temporal

dimension)

slide-25
SLIDE 25

Quantitative Metrics

Transformer autoencoder (both pergormance-only and melody & perg)

  • utpergorm baselines in generating similar

pieces!

slide-26
SLIDE 26

Samples

Conditioning Pergormance Generated Pergormance: “Twinkle, Twinkle” in the style

  • f the above pergormance

Twinkle, Twinkle melody Conditioning Pergormance Generated Pergormance: “Claire de Lune” in the style

  • f the above pergormance

Claire de Lune

slide-27
SLIDE 27

Conclusion

  • Developed a method for controllable generation

with high-level controls for music

○ Demonstrated effjcacy both quantitatively and through qualitative listening tests

  • Thanks!

○ Stanford: Mike Wu, Noah Goodman, Stefano Ermon ○ Magenta @ Google Brain: Jesse Engel, Ian Simon,

Curuis “Fjord” Hawthorne, Monica Dinculescu

slide-28
SLIDE 28

References

1. Edwards, H., and Storkey, A. Towards a neural statistician. 2016 2. Hewitu, L. B., Nye, M. I.; Gane, A.; Jaakkola, T., and Tenenbaum, J.B. Variational Homoencoder. 2018 3. Kingma, D.P., and Welling, M. Auto-encoding variational bayes. 2013 4. Gershman, S., and Goodman, N. Amoruized inference in probabilistic reasoning. 2014 5. Jordan, M. I.; Ghahramani, Z.; Jaakkola, T.S.; and Saul, L.K. An introduction to variational methods for graphical models. 1999 6. Blei, D. M.; Kuckelbir, A.; and McAulifge, J.D. Variational inference: a review for statisticians. 2017 7. Huang, C.Z.; Vaswani, A., Uskoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A. M., Hofgman, M. D., Dinculescu, M., Eck, D. Music Transformer. 2019 8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. Atuention is all you need. 2017 9. Shaw, P., Uszkoreit, J., Vaswani, A. Self-Atuention with relative position representations. 2018 10. htups://magenta.tensorglow.org/music-transformer 11. Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., Roberus, A. Adversarial Neural Audio Synthesis. 2019 12. Van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. 2016 13. Kalchbrenner, N., Elsen, E., Simonyan, K., Noury, S., Casagrande, N., Lockharu, E., Stimberg, F., van den Oord, A., Dieleman, S., Kavukcuoglu, K. Effjcient Neural Audio Synthesis. 2018