Exploiting compositionality to explore a large space of model - PowerPoint PPT Presentation

Exploiting compositionality to explore a large space of model structures Roger Grosse Dept. of Computer Science, University of Toronto

Introduction How has the life of a machine learning engineer changed in the past decade? Many tasks that previously required human experts are starting to be automated Stan ? probabilistic programming feature algorithm probabilistic model selection engineering configuration inference

The probabilistic modeling pipeline Can we identify good models automatically? Design a Fit the Evaluate the model model model Two challenges: Automating each stage of this pipeline Identifying a promising set of candidate models

The probabilistic modeling pipeline Design a Fit the Evaluate the model model model

Matrix decompositions Example: Senate votes, 2009-2010 Votes Senators all of one Senator’s votes record of votes on one motion or bill

Matrix decompositions Clustering the Senators Cluster Cluster Within-cluster Observations assignments centers variability = + Which groups of Senators vote for a Which cluster a particular bill/motion Senator belongs to

Matrix decompositions Clustering the Senators Cluster Cluster Within-cluster Observations assignments centers variability = +

Matrix decompositions Clustering the votes Cluster Cluster Within-cluster Observations centers assignments variability = + which cluster a what sorts of vote belongs to bills/motions one which Senators tend Senator tends to to vote for one sort of vote for bill/motion

Matrix decompositions Clustering the votes Cluster Cluster Within-cluster Observations centers assignments variability = +

Matrix decompositions Dimensionality reduction Residuals Observations = + Representation of a vote Representation of a Senator

Matrix decompositions Dimensionality reduction Residuals Observations = +

Matrix decompositions Co-clustering Senators and Votes + +

Matrix decompositions No structure Cluster columns Cluster rows … Dimensionality Co-clustering reduction

The probabilistic modeling pipeline Design a Fit the Evaluate the model model model

Building models compositionally We build models by composing simpler motifs + - - + x x x x x + + + - x x x x x x x xx + - - - x x x x x x x x x x - + + - x x x x xx x x x x x x x x x x Dimensionality Clustering Binary attributes reduction x x x x x x x x x x x x x x x Heavy-tailed Smoothness Periodicity distributions

Building models compositionally (Ghahramani, 1999 NIPS tutorial)

Generative models Posterior Generation Inference Latent variables h Infer a good explanation of Tell a story of how how a particular datasets get dataset was generated generated This gives a joint Find likely values probability of the latent distribution over variables observations and conditioned on the latent variables observations Observations v p ( h , v ) = p ( h ) p ( v | h ) p ( h | v )

Space of models: building blocks λ i ∼ Gamma( a, b ) π ∼ Dirichlet( α ) ν j ∼ Gamma( a, b ) u i ∼ Multinomial( π ) u ij ∼ Normal(0 , λ − 1 i ν − 1 ) j Gaussian Multinomial (G) (M) � 1 if i ≥ j p j ∼ Beta( α , β ) u ij = 0 otherwise u ij ∼ Bernoulli( p j ) Bernoulli Integration (B) (C) Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Space of models: generative process We represent models as algebraic expressions. M G + G 1. Sample all leaf matrices independently from their corresponding prior distributions M T + G 2. Evaluate the resulting expression Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Space of models: grammar Starting symbol: G Gaussian Multinomial Production rules: (G) (M) G � MG + G | GM T + G clustering � + low rank G � GG + G � nary features + G � BG + G | GB T + G binary features M � B + G � CG + G | GC T + G linear dynamics | � sparsity G � exp( G ) � G sparsity G � exp( G ) � G Bernoulli Integration (B) (C) Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Example: co-clustering M G + G G M T + G G → MG + G G → GM � + G M T G + G Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Examples from the literature dependent gaussian scale mixture (e.g. Karklin and Lewicki, 2005) Bayesian clustered tensor factorization (Sutskever et al., 2009) ... ... binary matrix factorization sparse coding (Meeds et al., 2006) (e.g. Olshausen and Field, 1996) co-clustering linear dynamical system (e.g. Kemp et al., 2006) binary features ... low-rank approximation (Salakhutdinov and (Griffiths and ... Mnih, 2008) Ghahramani, 2005) random walk clustering no structure Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

The probabilistic modeling pipeline Design a Fit a Evaluate the model model model Posterior Inference

Algorithms: posterior inference fit a clustering Recursive initialization model G → MG + G implement one algorithm per production rule share computation between models Choose the model dimension using Bayesian nonparametrics Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Posterior inference algorithms Can make use of model-specific algorithmic tricks carefully designed for individual production rules : ( A + UCV ) − 1 = A − 1 − A − 1 U ( C − 1 + V A − 1 U ) − 1 V A − 1 Eliminating variables Linear algebra analytically identities x x x x x x x x x x x x x x xx x xx xx x xx x x x x x x x x tractable High-level substructures transition operators

The probabilistic modeling pipeline Design a Fit a Evaluate the model model model We evaluate models on the probability they assign to held-out subsets of the observation matrix.

The probabilistic modeling pipeline Design a Fit a Evaluate the model model model Want to search over the large, open-ended space of models Key problem: the search space is very large! over 1000 models reachable in 3 productions how to choose a promising set of models to evaluate?

Algorithms: structure search A brief history of models of natural images… Olshausen and Karklin and Lewicki, Sanger, 1988 Field, 1994 2005, 2008 Model the dependencies Model the heavy-tailed Model patches as linear between scales of distributions of coefficients combinations of uncorrelated coefficients basis functions high-level texture oriented edges representation similar Fourier representation similar to simple cells to complex cells

Algorithms: structure search Refining models = applying productions Based on this intuition, we apply a greedy search procedure ... M ( GM T + G ) + G ... MG + G G

Experiments: simulated data Tested on simulated data where we know the correct structure Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: simulated data Tested on simulated data where we know the correct structure Usually chooses the correct structure in low-noise conditions Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: simulated data Tested on simulated data where we know the correct structure Usually chooses the correct structure in low-noise conditions Gracefully falls back to simpler models under heavy noise Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: real-world data ( MG + G ) M T + G Senate votes 09-10 GM T + G — Cluster votes. 22 clusters Cluster Senators. largest: party line Democrat, party line 11 clusters Republican, all yea no cross-party clusters others are series of No third level model votes on single issues improves by more than 1 nat Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: real-world data ( MG + G ) M T + G Senate votes 09-10 GM T + G — Motion capture C ( GG + G ) + G CG + G — Model 1: Data: motion capture of Independent a person walking. Each Markov chains row gives a person’s Model 2: displacement and joint Correlations in angles in one frame. joint angles Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: real-world data ( MG + G ) M T + G Senate votes 09-10 GM T + G — Motion capture C ( GG + G ) + G CG + G — Image patches (exp( GG + G ) � G ) G + G (exp( G ) � G ) G + G GG + G Model 1: Low- Data: 1,000 12x12 rank approximation patches from 10 blurred (PCA). and whitened images. Model 2: Sparsify coefficients to get sparse coding Model 3: Model dependencies between scale variables ... Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Experiments: real-world data ( MG + G ) M T + G Senate votes 09-10 GM T + G — Motion capture C ( GG + G ) + G CG + G — Image patches (exp( GG + G ) � G ) G + G (exp( G ) � G ) G + G GG + G Concepts MG + G M ( GG + G ) + G — Data: Mechanical Turk Model 1: Model 2: users’ judgments to 218 Cluster entities. Low-rank representation of questions about 1000 cluster centers. entities 39 clusters 8 dimensions Dimension 1: living vs. nonliving Dimension 2: large vs. small Grosse , Salakhutdinov, Freeman, and Tenenbaum, UAI 2012

Exploiting compositionality to explore a large space of model - PowerPoint PPT Presentation

Exploiting compositionality to explore a large space of model structures Roger Grosse Dept. of Computer Science, University of Toronto Introduction How has the life of a machine learning engineer changed in the past decade? Many tasks that

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook

Exploiting compositionality to explore a large space of model structures R. Grosse, R.

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Distributional Compositionality Compositionality in DS Raffaella Bernardi University of Trento

2019 Pursuing Peace Conference Justice Prerequisite to Peace creating space to

2019 Pursuing Peace Conference Justice Prerequisite to Peace creating space to

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Compositionality in DS Raffaella Bernardi University of Trento November, 2019 Raffaella

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment

Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3,

Models of Language Evolution Iterated learning Michael Franke Facets of EvoLang

Models of Language Evolution Session 10 : Iterated Learning and the Evolution of Compositionality

Evaluating compositionality in sentences embeddings Ishita Dasgupta Harvard University,

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Exploiting Private Local Exploiting Private Local Memories to Reduce the Memories to Reduce the

P ARAGON : Q O S-A WARE S CHEDULING F OR H ETEROGENEOUS D ATACENTERS Christina Delimitrou and

Statistical Inference Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing & Control Statistical

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

shortened Notation Measures of Location Measures of Dispersion Standardization

When Testing in Production is a Good Idea Dan Robinson CTO, Heap whoami Joined as Heap's

Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

Sambuz

Useful Links

Newsletter

Mail Us

Exploiting compositionality to explore a large space of model - PowerPoint PPT Presentation

Exploiting compositionality to explore a large space of model structures Roger Grosse Dept. of Computer Science, University of Toronto Introduction How has the life of a machine learning engineer changed in the past decade? Many tasks that

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook

Exploiting compositionality to explore a large space of model structures R. Grosse, R.

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Compositionality and Asynchrony Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term

Distributional Compositionality Compositionality in DS Raffaella Bernardi University of Trento

2019 Pursuing Peace Conference Justice Prerequisite to Peace creating space to

2019 Pursuing Peace Conference Justice Prerequisite to Peace creating space to

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Compositionality in DS Raffaella Bernardi University of Trento November, 2019 Raffaella

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment

Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3,

Models of Language Evolution Iterated learning Michael Franke Facets of EvoLang

Models of Language Evolution Session 10 : Iterated Learning and the Evolution of Compositionality

Evaluating compositionality in sentences embeddings Ishita Dasgupta Harvard University,

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data

Exploiting Private Local Exploiting Private Local Memories to Reduce the Memories to Reduce the

P ARAGON : Q O S-A WARE S CHEDULING F OR H ETEROGENEOUS D ATACENTERS Christina Delimitrou and

Statistical Inference Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing &amp; Control Statistical

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

shortened Notation Measures of Location Measures of Dispersion Standardization

When Testing in Production is a Good Idea Dan Robinson CTO, Heap whoami Joined as Heap's

Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

Sambuz

Useful Links

Newsletter

Mail Us

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing & Control Statistical