Exploiting compositionality to explore a large space of model - - PowerPoint PPT Presentation

exploiting compositionality to explore a large space of
SMART_READER_LITE
LIVE PREVIEW

Exploiting compositionality to explore a large space of model - - PowerPoint PPT Presentation

Exploiting compositionality to explore a large space of model structures R. Grosse, R. Salakhutdinov, W. Freeman, & J. Tenenbaum Best Student Paper at UAI 2012 Jan Gasthaus Tea talk 31st Aug 2012 1 / 15 Motivation Goal: Given a data set,


slide-1
SLIDE 1

Exploiting compositionality to explore a large space of model structures

  • R. Grosse, R. Salakhutdinov, W. Freeman, & J. Tenenbaum

Best Student Paper at UAI 2012

Jan Gasthaus Tea talk 31st Aug 2012

1 / 15

slide-2
SLIDE 2

Motivation

Goal: Given a data set, determine the right model to use for that data set

2 / 15

slide-3
SLIDE 3

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published 2 / 15

slide-4
SLIDE 4

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set 2 / 15

slide-5
SLIDE 5

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

2 / 15

slide-6
SLIDE 6

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

Mainly a computational problem; Proposed solution:

2 / 15

slide-7
SLIDE 7

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

Mainly a computational problem; Proposed solution:

◮ Pick a rich class of models: matrix decomposition models 2 / 15

slide-8
SLIDE 8

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

Mainly a computational problem; Proposed solution:

◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple

  • nes

2 / 15

slide-9
SLIDE 9

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

Mainly a computational problem; Proposed solution:

◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple

  • nes

◮ Approximate model selection criterion 2 / 15

slide-10
SLIDE 10

Motivation

Goal: Given a data set, determine the right model to use for that data set Ideal approach

◮ Implement all models ever published ◮ Fit them to the data set ◮ Compare them using some model selection criterion and

pick the best

Mainly a computational problem; Proposed solution:

◮ Pick a rich class of models: matrix decomposition models ◮ Fit more complex models re-using computations from simple

  • nes

◮ Approximate model selection criterion ◮ Greedy heuristic for exploring the space of structure

exploiting compositionality

2 / 15

slide-11
SLIDE 11

In A Nutshell

Grammar for generative models for matrix factorization

◮ Express models as algebraic expressions such as MG + G ◮ Devise CFG that generates these expressions with rules like

G → GG + G

Search over model structures greedily by applying the production rules and using an approximate lower bound on model score Initialize sampling in model by using a specialized algorithm for each production rule

3 / 15

slide-12
SLIDE 12

Components

4 / 15

slide-13
SLIDE 13

Grammar

5 / 15

slide-14
SLIDE 14

Models

6 / 15

slide-15
SLIDE 15

Inference: Individual Models

Initialize state using one-shot algorithm for each rule application Latent dimensionality is determined during initialization using BNP Then run simple Gibbs sampler (no details provided . . . )

7 / 15

slide-16
SLIDE 16

Initialization

8 / 15

slide-17
SLIDE 17

Scoring Candidate Structures

Criterion used: predictive likelihood of held-out rows and columns

◮ Marginal likelihood not feasible ◮ MSE not selective enough

Use a (stochastic) lower bound on predictive likelihood, computed using a variational approximation combined with annealed importance sampling (this is about as much detail as is in the paper . . . )

9 / 15

slide-18
SLIDE 18

Search Over Structures

Greedy search following grammar

1

Start with G

2

Expand using all possible rules

3

Fit & score models

4

Keep top K models

5

Go to 2

Assumes that good simple models will lead to good more complex models when refined Assumption seems to be warranted: K = 3 yields the same results as K = 1 in experiments

10 / 15

slide-19
SLIDE 19

Results on Synthetic Data

11 / 15

slide-20
SLIDE 20

Results on Real Data

12 / 15

slide-21
SLIDE 21

Results on Real Data

13 / 15

slide-22
SLIDE 22

Results on Real Data

14 / 15

slide-23
SLIDE 23

Computing Predictive Likelihood

15 / 15