Distributed MAP Inference for Undirected Graphical Models Sameer - - PowerPoint PPT Presentation

distributed map inference for undirected graphical models
SMART_READER_LITE
LIVE PREVIEW

Distributed MAP Inference for Undirected Graphical Models Sameer - - PowerPoint PPT Presentation

Distributed MAP Inference for Undirected Graphical Models Sameer Singh 1 Amarnag Subramanya 2 Fernando Pereira 2 Andrew McCallum 1 1 University of Massachusetts, Amherst MA 2 Google Research, Mountain View CA Workshop on Learning on Cores,


slide-1
SLIDE 1

Distributed MAP Inference for Undirected Graphical Models

Sameer Singh1 Amarnag Subramanya2 Fernando Pereira2 Andrew McCallum1

1University of Massachusetts, Amherst MA 2Google Research, Mountain View CA

Workshop on Learning on Cores, Clusters and Clouds (LCCC) Neural Information Processing Systems (NIPS) 2010

slide-2
SLIDE 2

Motivation

  • Graphical models are used in a number of information extraction tasks
  • Recently, models are getting larger and denser
  • Coreference Resolution [Culotta et al. NAACL 2007]
  • Relation Extraction [Riedel et al. EMNLP 2010, Poon & Domingos EMNLP 2009]
  • Joint Inference [Finkel & Manning. NAACL 2009, Singh et al. ECML 2009]
  • Inference is difficult, and approximations have been proposed
  • LP-Relaxations [Martins et al. EMNLP 2010]
  • Dual Decomposition [Rush et al. EMNLP 2010]
  • MCMC-Based [McCallum et al. NIPS 2009, Poon et al. AAAI 2008]

Without parallelization, these approaches have restricted scalability

slide-3
SLIDE 3

Motivation

Contributions:

1 Distribute MAP Inference for a large, dense factor graph

  • 1 million variables, 250 machines

2 Incorporate sharding as variables in the model

slide-4
SLIDE 4

Outline

1 Model and Inference

Graphical Models MAP Inference Distributed Inference

2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution

3 Hierarchical Models

Sub-Entities Super-Entities

4 Large-Scale Experiments

slide-5
SLIDE 5

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Factor Graphs

Represent distribution over variables Y using factors ψ. p(Y = y) ∝ exp

  • yc⊆y

ψc(yc) Note: Set of factors is different of every assignment Y = y ({ψ}y)

1 1

Y1 Y2 Y3 Y4 {ψ}0110 = {ψ01

12, ψ11 23, ψ10 34, ψ00 14}

1 1

Y1 Y2 Y3

1

Y4 {ψ}0111 = {ψ01

12, ψ11 23, ψ11 34, ψ11 24}

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 2 / 19

slide-6
SLIDE 6

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

MAP1 Inference

We want to find the best configuration according to the model, ˆ y = arg max

y

p(Y = y) = arg max

y

exp

  • yc⊆y

ψc(yc) Computational bottlenecks:

1 Space of Y is usually enormous (exponential) 2 Even evaluating

  • yc⊆y

ψc(yc) for each y may be polynomial

1MAP = maximum a posteriori Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 3 / 19

slide-7
SLIDE 7

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

MCMC for MAP Inference

Initial Configuration y = y0 for (num samples):

1 Propose a change to y to get configuration y′

(Usually a small change)

2 Acceptance probability:

α(y, y′) = min

  • 1,
  • p(y′)

p(y)

1/t (Only involve computations local to the change)

3 if Toss(α):

Accept the change, y = y′ return y p(y′) p(y) = exp   

  • y′

c⊆y′

ψc(y′

c) −

  • yc⊆y

ψc(yc)   

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 4 / 19

slide-8
SLIDE 8

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Mutually Exclusive Proposals

Let {ψ}y′

y be the set of factors used to evaluate a proposal y → y′

i.e. {ψ}y′

y =

  • {ψ}y ∪ {ψ}y′

  • {ψ}y ∩ {ψ}y′

Consider two proposals y → ya and y → yb such that, {ψ}ya

y ∩ {ψ}yb y = {}

Completely different set of factors are required to evaluate these proposals. These two proposals can be evaluated (and accepted) in parallel.

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 5 / 19

slide-9
SLIDE 9

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Distributed Inference

Distributor

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

slide-10
SLIDE 10

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Distributed Inference

Distributor Inference Inference Inference

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

slide-11
SLIDE 11

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Distributed Inference

Combine Distributor Inference Inference Inference

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

slide-12
SLIDE 12

Outline

1 Model and Inference

Graphical Models MAP Inference Distributed Inference

2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution

3 Hierarchical Models

Sub-Entities Super-Entities

4 Large-Scale Experiments

slide-13
SLIDE 13

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Input Features

m1 m2 m3 m4 m5

Define similarity between mentions, φ : M2 → R

  • φ(mi, mj) > 0: mi, mj are similar
  • φ(mi, mj) < 0: mi, mj are dissimilar

We use cosine similarity of the context bag of words: φ(mi, mj) = cosSim({c}i, {c}j) − b

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 7 / 19

slide-14
SLIDE 14

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Graphical Model

The random variables in our model are entities (E) and mentions (M) For any assignment to these entities (E = e), we define the model score: p(E = e) ∝ exp   

  • mi∼mj

ψa(mi, mj) +

  • mi≁mj

ψr(mi, mj)    where ψa(mi, mj) = waφ(mi, mj), and ψr(mi, mj) = −wrφ(mi, mj)

m1 m2 m3 m4 m5 e1 e2

For the following configuration,

p(e1, e2) ∝ exp

  • wa (φ12 + φ13 + φ23 + φ45)

− wr(φ15 + φ25 + φ35 +φ14 + φ24 + φ34)

  • 1 Space of E is Bell Number(n) in number of mentions

2 Evaluating model score for each E = e is O(n2)

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 8 / 19

slide-15
SLIDE 15

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

MCMC for MAP Inference

m1 m2 m3 m4 m5 e1 e2 m1 m2 m4 m5 e1 e2 m3

p(e) ∝ exp{wa (φ12 + φ13 + φ23 + φ45) −wr(φ15 + φ25 + φ35 + φ14 + φ24 + φ34)} p(´ e) ∝ exp{wa (φ12 + φ34 + φ35 + φ45) −wr(φ15 + φ25 + φ13 + φ14 + φ24 + φ23)

log p(´ e) p(e) = wa (φ34 + φ35 − φ13 − φ23) − wr(φ13 + φ23 − φ34 − φ35)

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 9 / 19

slide-16
SLIDE 16

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Mutually Exclusive Proposals

m1 m2 m3 m4 m5 e1 e2 m1 m2 m3 m4 m5 e1 e2 e3

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

slide-17
SLIDE 17

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Mutually Exclusive Proposals

m1 m2 m3 m4 m5 e1 e2 m1 m2 m3 m4 m5 e1 e2 e3

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

slide-18
SLIDE 18

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Mutually Exclusive Proposals

m1 m2 m3 m4 m5 e1 e2 m1 m2 m3 m4 m5 e1 e2 m1 m2 m3 m4 m5 e1 e2 e3 e3

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

slide-19
SLIDE 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Results

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 11 / 19

slide-20
SLIDE 20

Outline

1 Model and Inference

Graphical Models MAP Inference Distributed Inference

2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution

3 Hierarchical Models

Sub-Entities Super-Entities

4 Large-Scale Experiments

slide-21
SLIDE 21

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Sub-Entities

  • Consider an accepted move for

a mention

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

slide-22
SLIDE 22

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Sub-Entities

  • Ideally, similar mentions should

also move to the same entity

  • Default proposal function does

not utilize this

  • Good proposals become more

rare with larger datasets

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

slide-23
SLIDE 23

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Sub-Entities

  • Include Sub-Entity variables
  • Model score is used to sample

sub-entity variables

  • Propose moves of mentions in a

sub-entity simultaneously

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

slide-24
SLIDE 24

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Super-Entities

Random Distribution

  • Random distribution may not

assign similar entities to the same machine

  • Probability that similar entities

will be assigned to the same machine is small

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 13 / 19

slide-25
SLIDE 25

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Super-Entities

Model-Based Distribution

  • Augment model with

Super-Entities variables

  • Entities in the same super-entity

are assigned the same machine

  • Model score is used to sample

super-entity variables

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 13 / 19

slide-26
SLIDE 26

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Hierarchical Representation

Super Entities Entities

Sub-Entities

  • Factors
  • Affinity factors between

mentions sub-entities entities in the same sub-entities entities super-entities

  • Repulsion factors are similarly symmetric across levels
  • Sampling: Fix variables of two levels, sample the remaining level

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 14 / 19

slide-27
SLIDE 27

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Evaluation

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 15 / 19

slide-28
SLIDE 28

Outline

1 Model and Inference

Graphical Models MAP Inference Distributed Inference

2 Cross-Document Coreference

Coreference Problem Pairwise Model Inference and Distribution

3 Hierarchical Models

Sub-Entities Super-Entities

4 Large-Scale Experiments

slide-29
SLIDE 29

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Preliminary Large-Scale Experiments

Data

  • New York Times Annotated Corpus [Sandhous LDC 2008]

20 years of articles (1987-2007)

  • prune rare names (<1000): ∼1 million person name mentions

Evaluation

  • Automated labels are too noisy for evaluation
  • Instead, we estimate the speed of inference
  • trust the model to accept good proposals
  • observe the number of predicted entities

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 16 / 19

slide-30
SLIDE 30

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Speed of Inference

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 17 / 19

slide-31
SLIDE 31

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Related Work

  • GraphLab [Low et al. UAI 2010]
  • how do we represent dynamic graphs
  • how do we represent hierarchical models
  • Graph Splashing [Gonzalez et al. UAI 2009]
  • graph structure changes with every configuration
  • BP messages are enormous for exponential-domain variables
  • Topic Models [Smola & Narayanmurthy. VLDB 2010, Asuncion et al. NIPS 2009]
  • restrictions since they are calculating probabilities
  • we allow non-random distribution and customized proposals

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 18 / 19

slide-32
SLIDE 32

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions

Conclusions

1 propose distributed inference for graphical models 2 enable distributed cross-document coreference 3 improve sharding with latent hierarchical variables 4 demonstrate utility on large datasets

Future Work:

  • more scalability experiments
  • study mixing and convergence properties
  • add more expressive factors
  • supervision: labeled data, noisy evidences

Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 19 / 19

slide-33
SLIDE 33

Thanks!

Sameer Singh Amarnag Subramanya sameer@cs.umass.edu asubram@google.com Fernando Pereira Andrew McCallum pereira@google.com mccallum@cs.umass.edu