Composing graphical models with neural networks for structured - PowerPoint PPT Presentation

Composing graphical models with neural networks for structured representations and fast inference Matt Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan Adams

6 0 6 0 6 0 5 0 5 0 5 0 4 0 4 0 4 0 m m m m 3 0 m 3 0 m 3 0 2 0 2 0 2 0 1 0 1 0 1 0 0 10 0 0 m m m 10 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 mm 10 m mm 2 0 10 20 30 40 50 60 70 20 10 20 30 40 50 60 70 20 30 3 0 3 0 40 40 40 m m m m 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70 dart pause rear [1,2] /b/ /ax/ /n/ /ae/ /n/ /ax/ 60 60 50 50 40 40 mm mm 30 30 20 20 10 10 0 0 10 mm 10 mm 10 20 30 40 50 60 70 20 10 20 30 40 50 60 70 20 30 30 40 40 mm mm 10 20 30 40 50 60 70 10 20 30 40 50 60 70 60 50 40 mm 30 20 10 0 10 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 20 30 40 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 [1] Lee and Glass. A Nonparametric Bayesian Approach to Acoustic Model Discovery. ACL 2012. [2] Lee. Discovering Linguistic Structures in Speech: Models and Applications. MIT Ph.D. Thesis 2014.

Alexander Wiltschko, Matthew Johnson , et al., Neuron 2015.

image manifold

depth video image manifold

depth video image manifold manifold coordinates dart rear

Recurrent neural networks? [1,2,3] v 3 ˆ v 2 ˆ v 1 ˆ Learned Representation W 1 W 1 copy W 2 W 2 v 1 v 2 v 3 v 3 v 3 v 2 v 2 Figure 2. LSTM Autoencoder Model Figure 1. LSTM unit Probabilistic graphical models? [4,5,6] [1] Srivastava, Mansimov, Salakhutdinov. Unsupervised learning of video representations using LSTMs. ICML 2015. [2] Ranzato, MarcAurelio, et al. Video (language) modeling: a baseline for generative models of natural videos. Preprint 2015. [3] Sutskever, Hinton, and Taylor. The Recurrent Temporal Restricted Boltzmann Machine. NIPS 2008. [4] Fox, Sudderth, Jordan, Willsky. Bayesian nonparametric inference of switching dynamic linear models. IEEE TSP 2011. [5] Johnson and Willsky. Bayesian nonparametric hidden semi-Markov models. JMLR 2013. [6] Murphy. Machine learning: a probabilistic perspective. MIT Press 2012.

unsupervised learning supervised learning

Probabilistic graphical models Deep learning + structured representations – neural net “goo” + priors and uncertainty – difficult parameterization + data and computational efficiency – can require lots of data – rigid assumptions may not fit + flexible – feature engineering + feature learning – top-down inference + recognition networks

Modeling idea: graphical models on latent variables, neural network models for observations Inference: recognition networks output conjugate potentials, then apply fast graphical model inference Application: learn syllable representation of behavior from video 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 0 10 0 mm mm 10 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 mm 10 2 mm 0 10 20 30 40 50 60 70 2 10 20 30 40 50 60 70 0 2 0 30 30 30 40 40 40 m m m m 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

Modeling idea: graphical models on latent variables, neural network models for observations

π (1)   z t +1 ∼ π ( z t ) π (2) π =   π (3) z 2 z 3 z 4 z 5 z 6 z 7 z 1 A (1) A (2) A (3) iid x t +1 = A ( z t ) x t + B ( z t ) u t ∼ N (0 , I ) u t B (1) B (2) B (3) 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 10 0 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 mm 10 mm 20 10 20 30 40 50 60 70 10 20 30 40 50 60 70 20 20 30 30 30 4 0 4 0 4 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

π (1)   π (2) π =   π (3) z 1 z 2 z 3 z 4 z 5 z 6 z 7 x 1 x 3 x 2 x 4 x 5 x 6 x 7 A (1) A (2) A (3) B (1) B (2) B (3) 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 10 0 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 mm 10 mm 20 10 20 30 40 50 60 70 10 20 30 40 50 60 70 20 20 30 30 30 4 0 4 0 4 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

θ z 1 z 2 z 3 z 4 z 5 z 6 z 7 x 1 x 3 x 2 x 4 x 5 x 6 x 7 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 10 0 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 mm 10 mm 20 10 20 30 40 50 60 70 10 20 30 40 50 60 70 20 20 30 30 30 4 0 4 0 4 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

θ z 1 z 2 z 3 z 4 z 5 z 6 z 7 x 1 x 3 x 2 x 4 x 5 x 6 x 7 y 1 y 2 y 3 y 4 y 5 y 6 y 7 60 60 60 50 50 50 40 40 40 mm mm mm 30 30 30 20 20 20 10 10 10 0 10 0 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 mm 10 mm 20 10 20 30 40 50 60 70 10 20 30 40 50 60 70 20 20 30 30 30 4 0 4 0 4 0 mm mm 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 10 20 30 40 50 60 70 10 20 30 40 50 60 70

θ z 1 z 2 z 3 z 4 z 5 z 6 z 7 x 1 x 3 x 2 x 4 x 5 x 6 x 7 γ y 1 y 2 y 3 y 4 y 5 y 6 y 7 x t y t | x t , γ ∼ N ( µ ( x t ; γ ) , Σ ( x t ; γ )) µ ( x t ; γ ) diag( Σ ( x t ; γ ))

θ θ θ z n z 1 z 2 z 3 z 4 x n γ x n x 1 x 2 x 3 x 4 γ γ y n y n y 1 y 2 y 3 y 4 p ( θ ) conjugate prior on global variables θ p ( x | θ ) exponential family on local variables x n γ p ( γ ) any prior on observation parameters y n p ( y | x, γ ) neural network observation model

[1] [2] [3] [4] Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS [5] [2] [6] [7] Mixture of Experts Driven LDS IO-HMM Factorial HMM [8,9] [10] Canonical correlations analysis admixture / LDA / NMF [1] Palmer, Wipf, Kreutz-Delgado, and Rao. Variational EM algorithms for non-Gaussian latent variable models. NIPS 2005. [2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001. [3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003. [4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000. [5] Jordan and Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation 1994. [6] Bengio and Frasconi. An Input Output HMM Architecture. NIPS 1995. [7] Ghahramani and Jordan. Factorial Hidden Markov Models. Machine Learning 1997. [8] Bach and Jordan. A probabilistic interpretation of Canonical Correlation Analysis. Tech. Report 2005. [9] Archambeau and Bach. Sparse probabilistic projections. NIPS 2008. [10] Hoffman, Bach, Blei. Online learning for Latent Dirichlet Allocation. NIPS 2010.

θ x n Inference? γ y n

θ θ x 3 x 1 x 2 x 4 x 3 x 1 x 2 x 4 y 1 y 2 y 3 y 4 p ( x | θ ) is linear dynamical system q ( θ ) q ( x ) ≈ p ( θ , x | y ) p ( y | x, θ ) is linear-Gaussian p ( θ ) is conjugate prior h i log p ( θ ,x,y ) L [ q ( θ ) q ( x ) ] , E q ( θ ) q ( x ) q ( θ ) q ( x ) q ( θ ) ↔ η θ q ( x ) ↔ η x

θ θ x 3 x 1 x 2 x 4 x 3 x 1 x 2 x 4 y 1 y 2 y 3 y 4 p ( x | θ ) is linear dynamical system q ( θ ) q ( x ) ≈ p ( θ , x | y ) p ( y | x, θ ) is linear-Gaussian p ( θ ) is conjugate prior h i log p ( θ ,x,y ) L ( η θ , η x ) , E q ( θ ) q ( x ) q ( θ ) q ( x ) x ( η θ ) , arg max L SVI ( η θ ) , L ( η θ , η ∗ L ( η θ , η x ) x ( η θ )) η ∗ η x Proposition (natural gradient SVI of Ho ff man et al. 2013) r L SVI ( η θ ) = η 0 e θ + E q ∗ ( x ) ( t xy ( x, y ) , 1) � η θ

θ θ x 3 x 1 x 2 x 4 x 3 x 1 x 2 x 4 y 1 y 2 y 3 y 4 N N p ( x | θ ) is linear dynamical system q ( θ ) q ( x ) ≈ p ( θ , x | y ) p ( y | x, θ ) is linear-Gaussian p ( θ ) is conjugate prior h i log p ( θ ,x,y ) L ( η θ , η x ) , E q ( θ ) q ( x ) q ( θ ) q ( x ) x ( η θ ) , arg max L SVI ( η θ ) , L ( η θ , η ∗ L ( η θ , η x ) x ( η θ )) η ∗ η x Proposition (natural gradient SVI of Ho ff man et al. 2013) N X e r L SVI ( η θ ) = η 0 θ + E q ∗ ( x n ) ( t xy ( x n , y n ) , 1) � η θ n =1

Step 1: compute evidence potentials [1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Composing graphical models with neural networks for structured - PowerPoint PPT Presentation

Composing graphical models with neural networks for structured representations and fast inference Matt Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan Adams 6 0 6 0 6 0 5 0 5 0 5 0 4 0 4 0 4 0 m m m m 3 0 m 3 0

Composing Transformation Composing Transformation Composing Transformation the process of

FUTURE COMPOSING THE FUTURE COMPOSING THE FUTURE CONTENTS >>> INTRODUCTION Vision

Composing heterogeneous software with style Stephen Kell stephen.kell@cs.ox.ac.uk Composing. . .

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Composing multiple StarPU applications Composing multiple StarPU applications over heterogeneous

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

CMSC 433 Programming Language Technologies and Paradigms Composing Objects Composing Objects

Composing Criteria of Individuation Matthew Gotham Department of Linguistics University College

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Graphical models for Neuroscience Part I Giuseppe Vinci Department of Statistics Rice

What is Biological Diversity? Brook Milligan Department of Biology New Mexico State University

PON1 polymorphisms and Wilms Tumor Gisele Moledo de Vasconcelos, PhD Gonalves BAA, Azevedo RM,

On the Role of Auxiliary Assumptions in the Production of Evidence Corey Dethier University of

Theory of Natural Selection Definitions: Gene: Nucleotide sequence coding for, or regulating the

Types of Electric Vehicles Source: 2012 Chevrolet Volt plug-in hybrid, Photo by Mariordo, CC

Mobility > Transportation The ability to move or be moved freely and easily. MOBIL ILIT ITY

SENSATA FOURTH QUARTER AND FULL YEAR 2015 EARNINGS SUMMARY Forwardlooking Statements In

Electric Transportation Program Clean Energy Business and Market Development November 1, 2018

Composing graphical models with neural networks for structured - PowerPoint PPT Presentation

Composing graphical models with neural networks for structured representations and fast inference Matt Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan Adams 6 0 6 0 6 0 5 0 5 0 5 0 4 0 4 0 4 0 m m m m 3 0 m 3 0

Composing Transformation Composing Transformation Composing Transformation the process of

FUTURE COMPOSING THE FUTURE COMPOSING THE FUTURE CONTENTS &gt;&gt;&gt; INTRODUCTION Vision

Composing heterogeneous software with style Stephen Kell stephen.kell@cs.ox.ac.uk Composing. . .

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Composing multiple StarPU applications Composing multiple StarPU applications over heterogeneous

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

CMSC 433 Programming Language Technologies and Paradigms Composing Objects Composing Objects

Composing Criteria of Individuation Matthew Gotham Department of Linguistics University College

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Graphical models for Neuroscience Part I Giuseppe Vinci Department of Statistics Rice

What is Biological Diversity? Brook Milligan Department of Biology New Mexico State University

PON1 polymorphisms and Wilms Tumor Gisele Moledo de Vasconcelos, PhD Gonalves BAA, Azevedo RM,

On the Role of Auxiliary Assumptions in the Production of Evidence Corey Dethier University of

Theory of Natural Selection Definitions: Gene: Nucleotide sequence coding for, or regulating the

Types of Electric Vehicles Source: 2012 Chevrolet Volt plug-in hybrid, Photo by Mariordo, CC

Mobility &gt; Transportation The ability to move or be moved freely and easily. MOBIL ILIT ITY

SENSATA FOURTH QUARTER AND FULL YEAR 2015 EARNINGS SUMMARY Forwardlooking Statements In

Electric Transportation Program Clean Energy Business and Market Development November 1, 2018

FUTURE COMPOSING THE FUTURE COMPOSING THE FUTURE CONTENTS >>> INTRODUCTION Vision

Mobility > Transportation The ability to move or be moved freely and easily. MOBIL ILIT ITY