Distributed MAP Inference for Undirected Graphical Models Sameer - PowerPoint PPT Presentation

Distributed MAP Inference for Undirected Graphical Models Sameer Singh 1 Amarnag Subramanya 2 Fernando Pereira 2 Andrew McCallum 1 1 University of Massachusetts, Amherst MA 2 Google Research, Mountain View CA Workshop on Learning on Cores, Clusters and Clouds (LCCC) Neural Information Processing Systems (NIPS) 2010

Motivation • Graphical models are used in a number of information extraction tasks • Recently, models are getting larger and denser • Coreference Resolution [Culotta et al. NAACL 2007] • Relation Extraction [Riedel et al. EMNLP 2010, Poon & Domingos EMNLP 2009] • Joint Inference [Finkel & Manning. NAACL 2009, Singh et al. ECML 2009] • Inference is difficult, and approximations have been proposed • LP-Relaxations [Martins et al. EMNLP 2010] • Dual Decomposition [Rush et al. EMNLP 2010] • MCMC-Based [McCallum et al. NIPS 2009, Poon et al. AAAI 2008] Without parallelization, these approaches have restricted scalability

Motivation Contributions: 1 Distribute MAP Inference for a large, dense factor graph • 1 million variables, 250 machines 2 Incorporate sharding as variables in the model

Outline 1 Model and Inference Graphical Models MAP Inference Distributed Inference 2 Cross-Document Coreference Coreference Problem Pairwise Model Inference and Distribution 3 Hierarchical Models Sub-Entities Super-Entities 4 Large-Scale Experiments

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Factor Graphs Represent distribution over variables Y using factors ψ . � p ( Y = y ) ∝ exp ψ c ( y c ) y c ⊆ y Note: Set of factors is different of every assignment Y = y ( { ψ } y ) 0 1 1 0 0 1 1 1 Y 1 Y 2 Y 3 Y 4 Y 1 Y 2 Y 3 Y 4 { ψ } 0111 = { ψ 01 12 , ψ 11 23 , ψ 11 34 , ψ 11 { ψ } 0110 = { ψ 01 12 , ψ 11 23 , ψ 10 34 , ψ 00 24 } 14 } Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 2 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions MAP 1 Inference We want to find the best configuration according to the model, y ˆ = arg max p ( Y = y ) y � = arg max exp ψ c ( y c ) y y c ⊆ y Computational bottlenecks: 1 Space of Y is usually enormous (exponential) � 2 Even evaluating ψ c ( y c ) for each y may be polynomial y c ⊆ y 1 MAP = maximum a posteriori Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 3 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions MCMC for MAP Inference Initial Configuration y = y 0 for (num samples): 1 Propose a change to y to get configuration y ′ (Usually a small change) � � 1 / t � � p ( y ′ ) α ( y , y ′ ) = min 2 Acceptance probability: 1 , p ( y ) (Only involve computations local to the change) Accept the change, y = y ′ 3 if Toss( α ): return y   p ( y ′ )   � ψ c ( y ′ � p ( y ) = exp c ) − ψ c ( y c )   y ′ c ⊆ y ′ y c ⊆ y Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 4 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Mutually Exclusive Proposals Let { ψ } y ′ y be the set of factors used to evaluate a proposal y → y ′ i.e. { ψ } y ′ � { ψ } y ∪ { ψ } y ′ � � { ψ } y ∩ { ψ } y ′ � y = − Consider two proposals y → y a and y → y b such that, { ψ } y a y ∩ { ψ } y b y = {} Completely different set of factors are required to evaluate these proposals. These two proposals can be evaluated (and accepted) in parallel. Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 5 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Distributed Inference Distributor Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Distributed Inference Inference Distributor Inference Inference Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Distributed Inference Inference Distributor Combine Inference Inference Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 6 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Input Features m1 m3 Define similarity between mentions, φ : M 2 → R • φ ( m i , m j ) > 0: m i , m j are similar m2 m4 • φ ( m i , m j ) < 0: m i , m j are dissimilar m5 We use cosine similarity of the context bag of words: φ ( m i , m j ) = cosSim ( { c } i , { c } j ) − b Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 7 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Graphical Model The random variables in our model are entities ( E ) and mentions ( M ) For any assignment to these entities ( E = e ), we define the model score:     � � p ( E = e ) ∝ exp ψ a ( m i , m j ) + ψ r ( m i , m j )  m i ∼ m j m i ≁ m j  where ψ a ( m i , m j ) = w a φ ( m i , m j ), and ψ r ( m i , m j ) = − w r φ ( m i , m j ) For the following configuration, m4 e2 � p ( e 1 , e 2 ) ∝ exp w a ( φ 12 + φ 13 + φ 23 + φ 45 ) m1 m5 − w r ( φ 15 + φ 25 + φ 35 e1 � + φ 14 + φ 24 + φ 34 ) m2 m3 1 Space of E is Bell Number( n ) in number of mentions 2 Evaluating model score for each E = e is O ( n 2 ) Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 8 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions MCMC for MAP Inference m4 m4 e2 e2 m1 m1 m5 e1 m5 e1 m2 m2 m3 m3 p ( e ) ∝ exp { w a ( φ 12 + φ 13 + φ 23 + φ 45 ) p (´ e ) ∝ exp { w a ( φ 12 + φ 34 + φ 35 + φ 45 ) − w r ( φ 15 + φ 25 + φ 35 + φ 14 + φ 24 + φ 34 ) } − w r ( φ 15 + φ 25 + φ 13 + φ 14 + φ 24 + φ 23 ) log p (´ e ) = w a ( φ 34 + φ 35 − φ 13 − φ 23 ) − w r ( φ 13 + φ 23 − φ 34 − φ 35 ) p ( e ) Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 9 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Mutually Exclusive Proposals m4 e2 m1 m5 e1 m4 m2 e2 m1 m3 m5 e3 e1 m2 m3 Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Mutually Exclusive Proposals m4 e2 m1 m5 e1 e2 m4 m2 m1 m3 e3 m5 e1 m2 m3 Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Mutually Exclusive Proposals m4 e2 m1 m5 e1 m4 m2 e2 m1 m3 m5 e3 e1 e2 m4 m2 m1 m3 e3 m5 e1 m2 m3 Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 10 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Results Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 11 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Sub-Entities • Consider an accepted move for a mention Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Sub-Entities • Ideally, similar mentions should also move to the same entity • Default proposal function does not utilize this • Good proposals become more rare with larger datasets Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Sub-Entities • Include Sub-Entity variables • Model score is used to sample sub-entity variables • Propose moves of mentions in a sub-entity simultaneously Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 12 / 19

Model and Inference Coreference Hierarchical Models Large-Scale Experiments Related Work Conclusions Super-Entities • Random distribution may not assign similar entities to the same machine Random Distribution • Probability that similar entities will be assigned to the same machine is small Sameer Singh (UMass, Amherst) Distributed MAP Inference LCCC, NIPS 2010 Workshop 13 / 19

Distributed MAP Inference for Undirected Graphical Models Sameer - PowerPoint PPT Presentation

Distributed MAP Inference for Undirected Graphical Models Sameer Singh 1 Amarnag Subramanya 2 Fernando Pereira 2 Andrew McCallum 1 1 University of Massachusetts, Amherst MA 2 Google Research, Mountain View CA Workshop on Learning on Cores,

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Undirected Graphical Models: Markov Random Fields Probabilistic Graphical Models Sharif

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

A heuristic for finding compatible differential paths with application to HAS-160 Aleksandar

DYNAMIC SCHEDULING Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

S ENS A: Sensitivity Analysis for Quantitative Change-impact Prediction Haipeng Cai Siyuan

CSCI 3110 Fun with Algorithms Christopher Whidden cwhidden@dal.ca Faculty of Computer Science

Minimum Delay Data Gathering in Radio Networks Jean-Claude Bermond, Nicolas Nisse, Patricio Reyes ,

Slides for Lecture 18 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Querying Term Associations and their Temporal Evolution in Social Data Vassilis Plachouras

MA/CSSE 473 Day 17 Divide-and-conquer Convex Hull Strassen's Algorithm: Matrix Multiplication

Distributed MAP Inference for Undirected Graphical Models Sameer - PowerPoint PPT Presentation

Distributed MAP Inference for Undirected Graphical Models Sameer Singh 1 Amarnag Subramanya 2 Fernando Pereira 2 Andrew McCallum 1 1 University of Massachusetts, Amherst MA 2 Google Research, Mountain View CA Workshop on Learning on Cores,

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Undirected Graphical Models: Markov Random Fields Probabilistic Graphical Models Sharif

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Undirected Graphical Models Undirected Graphs Chris Williams, School of Informatics, University

Undirected graphical models Graph G : arbitrary undirected graph Useful when variables interact

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models 10-708 Learning Completely Observed Learning Completely Observed

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Directed Graphical Models + Undirected Graphical Models Matt Gormley Lecture 7 Sep. 18, 2019

Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

A heuristic for finding compatible differential paths with application to HAS-160 Aleksandar

DYNAMIC SCHEDULING Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

S ENS A: Sensitivity Analysis for Quantitative Change-impact Prediction Haipeng Cai Siyuan

CSCI 3110 Fun with Algorithms Christopher Whidden cwhidden@dal.ca Faculty of Computer Science

Minimum Delay Data Gathering in Radio Networks Jean-Claude Bermond, Nicolas Nisse, Patricio Reyes ,

Slides for Lecture 18 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve

Querying Term Associations and their Temporal Evolution in Social Data Vassilis Plachouras

MA/CSSE 473 Day 17 Divide-and-conquer Convex Hull Strassen's Algorithm: Matrix Multiplication

Graphical Models Graphical Models Relationship between the directed & undirected models