Higher-order Coreference Resolution with Coarse-to-fine Inference - PowerPoint PPT Presentation

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke Zettlemoyer University of Washington * Now at Google 1

Coreference Resolution It’s because of what both of you are doing to have things change. I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 2

Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Advantages Disadvantages • Conceptually simple • Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 5

Contributions • Address a modeling challenge: • Enable higher-order (multi-hop) coreference • Address a computational challenge: • Coarse-to-fine inference with a factored model 6

Existing Approach: Span-ranking Model Lee et al. 2017 (EMNLP): Consider all possible spans in the document: • 1 < i < n Compute neural span representations: • h ( i ) Estimate probability distribution over possible antecedents: • P ( y i | h ) ✏ 8

Limitations of a First Order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Local information not sufficient Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 9

Limitations of a First Order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Global structure reveals inconsistency Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 10

Higher-order Model Let span representations softly condition on previous decisions • 11

Higher-order Model Let span representations softly condition on previous decisions • For each iteration: • Estimation antecedent distribution • Attend over possible antecedents • Merge every span representation with its expected antecedent • 12

Higher-order Model I think that’s what’s… Go ahead Linda. P ( y all of you | h ) Thanks goes to you and to the media to help us. Absolutely. I Linda you ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 13

Higher-order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. P ( y you | h ) Absolutely. I Linda ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 14

Higher-order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. P ( y you | h ) Absolutely. I Linda ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Learn a representation of “you” w.r.t. “I” 15

Higher-order Model P ( y all of you | h ) I think that’s what’s… Go ahead Linda. I Linda you ε Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 16

Higher-order Model P ( y all of you | h ) I think that’s what’s… Go ahead Linda. I Linda you ε Thanks goes to you and to the media to help us. Absolutely. P ( y all of you | h 0 ) Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. I Linda you ε 17

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute h n ( i ) • 18

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) •

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i 20

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) 21

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) 22

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) Final result: • P ( y i | h n ) 23

Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) Final coreference decision conditions on clusters of size n + 2 h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) Final result: • P ( y i | h n ) 24

Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Disadvantages Advantages • Conceptually simple •Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 25

Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Disadvantages Advantages • Conceptually simple •Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 2nd order model already runs out of memory 26

Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition 28

Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition • O(n 2 ) spans to consider in practice 29

Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition • O(n 2 ) spans to consider in practice • O(n 4 ) coreference links to consider 30

Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) 31

Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) Existing scoring function: s ( i, j, h ) = FFNN ( h ( i )) + FFNN ( h ( j )) Mention scores + FFNN ( h ( i ) , h ( j ) , h ( i ) � h ( j )) Antecedent scores 32

Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) Coarse-to-fine scoring function: s ( i, j, h ) = FFNN ( h ( i )) + FFNN ( h ( j )) Mention scores + h ( i ) > W c h ( j ) Cheap/inaccurate antecedent scores + FFNN ( h ( i ) , h ( j ) , h ( i ) � h ( j )) Antecedent scores 33

Higher-order Coreference Resolution with Coarse-to-fine Inference - PowerPoint PPT Presentation

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke Zettlemoyer University of Washington * Now at Google 1 Coreference Resolution Its because of what both of you are doing to have things

Evaluating Theories of Coreference Resolution Coreference Resolution: The Task Bayer AG has

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

CORBON 2016: Coreference Resolution Beyond OntoNotes NAACL HLT 2016 Workshop Maciej Ogrodniczuk

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline

Coarse-graining Markov state models with PCCA Coarse-graining Markov state models

Neural Networks and Coreference Resolution for Slot Filling Heike Adel, Hinrich Sch utze Team

End-to-end Neural Coreference Resolution Kenton Lee, Luheng He, Mike Lewis and Luke Zettlemoyer

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

End-to-end Neural Coreference Resolution Kenton Lee Luheng He Mike Lewis Luke

GroRef: Rule-Based Coreference Resolution for Dutch Rob van der Goot, Hessel Haagsma, Dieke Oele

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan

Exploring Lexicalized Features for Coreference Resolution Anders Bj orkelund and Pierre Nugues

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Lattice Alignment Align must be linear can be random reference signals => coarse

Interplay of Coreference and Discourse Research and Annotations Anna Nedoluzhko Charles University,

How to Writ e a SIGGRAPH Paper Dani Lischinski The Hebrew University of Jerusalem, Israel

DAPT POST-PCI une dure minimale ... y compris aprs un SCA ... Cest possible ? Pr Gilles

Motivation: Why SGGS? Model representation Inferences Refutational Completeness Goal

Hebrews Structure Section One 1:12:4 Doctrinal Exposition: 1:114 Practical Exhortation and

Permutation Channels Anuran Makur Department of Electrical Engineering and Computer Science

Bounds on Permutation Channel Capacity Anuran Makur Department of Electrical Engineering and

Learning Additive Noise Channels: Generalization Bounds and Algorithms Nir Weinberger

Finite-Blocklength and Error-Exponent Analyses for LDPC Codes in Point-to-Point and Multiple

Higher-order Coreference Resolution with Coarse-to-fine Inference - PowerPoint PPT Presentation

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke Zettlemoyer University of Washington * Now at Google 1 Coreference Resolution Its because of what both of you are doing to have things

Evaluating Theories of Coreference Resolution Coreference Resolution: The Task Bayer AG has

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

CORBON 2016: Coreference Resolution Beyond OntoNotes NAACL HLT 2016 Workshop Maciej Ogrodniczuk

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline

Coarse-graining Markov state models with PCCA Coarse-graining Markov state models

Neural Networks and Coreference Resolution for Slot Filling Heike Adel, Hinrich Sch utze Team

End-to-end Neural Coreference Resolution Kenton Lee, Luheng He, Mike Lewis and Luke Zettlemoyer

CSEP 517 Natural Language Processing Coreference Resolution Luke Zettlemoyer University of

End-to-end Neural Coreference Resolution Kenton Lee Luheng He Mike Lewis Luke

GroRef: Rule-Based Coreference Resolution for Dutch Rob van der Goot, Hessel Haagsma, Dieke Oele

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan

Exploring Lexicalized Features for Coreference Resolution Anders Bj orkelund and Pierre Nugues

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model Aria Haghighi and Dan Klein

Lattice Alignment Align must be linear can be random reference signals =&gt; coarse

Interplay of Coreference and Discourse Research and Annotations Anna Nedoluzhko Charles University,

How to Writ e a SIGGRAPH Paper Dani Lischinski The Hebrew University of Jerusalem, Israel

DAPT POST-PCI une dure minimale ... y compris aprs un SCA ... Cest possible ? Pr Gilles

Motivation: Why SGGS? Model representation Inferences Refutational Completeness Goal

Hebrews Structure Section One 1:12:4 Doctrinal Exposition: 1:114 Practical Exhortation and

Permutation Channels Anuran Makur Department of Electrical Engineering and Computer Science

Bounds on Permutation Channel Capacity Anuran Makur Department of Electrical Engineering and

Learning Additive Noise Channels: Generalization Bounds and Algorithms Nir Weinberger

Finite-Blocklength and Error-Exponent Analyses for LDPC Codes in Point-to-Point and Multiple

Lattice Alignment Align must be linear can be random reference signals => coarse