CSC2541: Differentiable Inference and Generative Models Density - - PowerPoint PPT Presentation

csc2541 differentiable inference and generative models
SMART_READER_LITE
LIVE PREVIEW

CSC2541: Differentiable Inference and Generative Models Density - - PowerPoint PPT Presentation

CSC2541: Differentiable Inference and Generative Models Density estimation using Real NVP. Ding et al, 2016 Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep


slide-1
SLIDE 1

CSC2541: Differentiable Inference and Generative Models

slide-2
SLIDE 2

Density estimation using Real NVP. Ding et al, 2016

slide-3
SLIDE 3

Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing Systems 29

slide-4
SLIDE 4

Density estimation using Real NVP. Ding et al, 2016

slide-5
SLIDE 5

A group of people are watching a dog ride (Jamie Kyros)

slide-6
SLIDE 6

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

slide-7
SLIDE 7

Types of Generative Models

  • Conditional probabilistic models
  • Latent-variable probabilistic models
  • GANs
  • Invertible models
slide-8
SLIDE 8

Advantages of latent variable models

  • Model checking by sampling
  • Natural way to specify models
  • Compact representations
  • Semi-Supervised learning
  • Understanding factors of variation in data
slide-9
SLIDE 9

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala

slide-10
SLIDE 10

Advantages of probabilistic latent-variable models

  • Data-efficient learning - automatic regularization, can take advantage of more information
  • Compose models - e.g. incorporate data corruption model. Different from composing

feedforward computations

  • Handle missing data (without the standard hack of just guessing the missing values using

averages).

  • Predictive uncertainty - necessary for decision-making
  • conditional predictions (e.g. if brexit happens, the value of the pound will fall)
  • Active learning - what data would be expected to increase our confidence about a

prediction

  • Cons:
  • intractable integral over latent variables
  • Examples: medical diagnosis, image modeling
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

[1] Palmer, Wipf, Kreutz-Delgado, and Rao. Variational EM algorithms for non-Gaussian latent variable models. NIPS 2005. [2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001. [3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003. [4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000. [5] Jordan and Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation 1994. [6] Bengio and Frasconi. An Input Output HMM Architecture. NIPS 1995. [7] Ghahramani and Jordan. Factorial Hidden Markov Models. Machine Learning 1997. [8] Bach and Jordan. A probabilistic interpretation of Canonical Correlation Analysis. Tech. Report 2005. [9] Archambeau and Bach. Sparse probabilistic projections. NIPS 2008. [10] Hoffman, Bach, Blei. Online learning for Latent Dirichlet Allocation. NIPS 2010. [1] [2] [3] [4] Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS [8,9] [10] Canonical correlations analysis admixture / LDA / NMF [6] [2] [5] Mixture of Experts Driven LDS IO-HMM Factorial HMM [7]

Courtesy of Matthew Johnson

slide-16
SLIDE 16

Differentiable models

  • Model distributions implicitly by a variable pushed

through a deep net:

  • Approximate intractable distribution by a tractable

distribution parameterized by a deep net:

  • Optimize all parameters using stochastic gradient

descent

y = fθ(x) p(y|x) = N(y|µ = fθ(x), Σ = gθ(x))

slide-17
SLIDE 17
slide-18
SLIDE 18

Probabilistic graphical models + structured representations + priors and uncertainty + data and computational efficiency – rigid assumptions may not fit – feature engineering – top-down inference Deep learning – neural net “goo” – difficult parameterization – can require lots of data + flexible + feature learning + recognition networks

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

Machine-learning-centric History of Generative Models

  • 1940s - 1960s Motivating probability and Bayesian inference
  • 1980s - 2000s Bayesian machine learning with MCMC
  • 1990s - 2000s Graphical models with exact inference
  • 1990s - present Bayesian Nonparametrics with MCMC (Indian Buffet

process, Chinese restaurant process)

  • 1990s - 2000s Bayesian ML with mean-field variational inference
  • 1995 Helmholtz machine (almost invented variational autoencoders)
  • 2000s - present Probabilistic Programming
  • 2000s - 2013 Deep undirected graphical models (RBMs, pretraining)
  • 2010s - present Stan - Bayesian Data Analysis with HMC
  • 2000s - 2013 Autoencoders, denoising autoencoders
  • 2000s - present Invertible density estimation
  • 2013 - present Variational autoencoders
  • 2014 - present Generative adversarial nets
slide-23
SLIDE 23

Frontiers

  • Generate images given captions
  • Generating large structures
  • images with consistent internal structure and not blurry
  • videos
  • long texts
  • Discrete latent random variables
  • Generate complex discrete structures
  • Time-series models for reinforcement learning
slide-24
SLIDE 24

Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016). Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in Neural Information Processing Systems 29

slide-25
SLIDE 25

Density estimation using Real NVP. Ding et al, 2016

slide-26
SLIDE 26
slide-27
SLIDE 27

Modeling idea: graphical models on latent variables, neural network models for observations

Composing graphical models with neural networks for structured representations and fast inference. Johnson, Duvenaud, Wiltschko, Datta, Adams, NIPS 2016

slide-28
SLIDE 28

unsupervised learning supervised learning

Courtesy of Matthew Johnson

slide-29
SLIDE 29

data space latent space

slide-30
SLIDE 30
slide-31
SLIDE 31 10 20 30 40 50 60 70 10 2 30 40 m m 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 mm 10 20 30 40 50 60 70 10 2 30 40 m m 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 mm mm 10 20 30 40 50 60 70 90 80 100 110 120 130 140 150 10 2 30 40 mm 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 90 80 100 110 120 130 140 150

Application: learn syllable representation of behavior from video

slide-32
SLIDE 32
slide-33
SLIDE 33 10 20 30 40 50 60 70 10 20 30 4 mm 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 mm 10 20 30 40 50 60 70 10 20 30 4 mm 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 mm mm 10 20 30 40 50 60 70 90 80 100 110 120 130 140 150 10 20 30 4 mm 10 20 30 40 mm 50 60 10 20 30 40 50 60 70 90 80 100 110 120 130 140 150

z1 z2 z3 z4 z5 z6 z7 x1 x2 x3 x4 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7

θ

Courtesy of Matthew Johnson

slide-34
SLIDE 34
slide-35
SLIDE 35

start rear

slide-36
SLIDE 36

fall from rear

slide-37
SLIDE 37

grooming

slide-38
SLIDE 38

From Carl Rasmussen

slide-39
SLIDE 39

Seminars

  • 7 weeks of seminars, about 8 people each
  • Each day will have one or two major themes, 3-6

papers covered

  • Divided into 2-3 presentations of about 30 mins

each

  • Explain main idea, relate to previous work and

future directions

slide-40
SLIDE 40

Class Projects

  • Develop a generative model for a new medium.
  • Generate sound given video (hard to generate

raw sound)

  • Automatic onomatopoeia: Generate text ‘ka-

bloom-kshhhh’ given a sound of an explosion.

  • Generating text of a specific style. For instance,

generating SMILES strings representing organic molecules

slide-41
SLIDE 41

Class Projects

  • Extend existing models, inference, or training.

For instance:

  • Extending variational autoencoders to have

infinite capacity in some sense (combining Nonparametric Bayesian methods with variational autoencoders)

  • Train a VAE or GAN for matrix decomposition
  • Explore the use of mixture distributions for

approximating distributions

slide-42
SLIDE 42

Class Projects

  • Apply an existing approach in a new way.
  • Missing data (not at random)
  • Automatic data cleaning (flagging suspect

entries)

  • Simultaneous localization and mapping (SLAM)

from scratch

slide-43
SLIDE 43

Class Projects

  • Review / comparison / tutorials:
  • Approaches to generating images
  • Approaches to generating video
  • Approaches to handling discrete latent variables
  • Approaches to building invertible yet general transformations
  • Variants of the GAN training objective
  • Different types of recognition networks
  • clearly articulate the differences between different approaches, and their

strengths and weaknesses.

  • Ideally, include experiments highlighting the different properties of each

method on realistic problems.

slide-44
SLIDE 44

Class Project Dates

  • Project proposal due Oct 14th
  • about 2 pages, include prelim. lit search
  • Presentations: Nov 18th and 25th
  • Projects due: Dec 10th
slide-45
SLIDE 45

Grades

  • Class presentations - 20%
  • Project proposal - 20%
  • Project presentation - 20%
  • Project report and code - 40%
slide-46
SLIDE 46

Quiz