Constrained Conditional Models Learning and Inference in Natural - PowerPoint PPT Presentation

Constrained Conditional Models Learning and Inference in Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott Yih, Dav Zimak Funding: ARDA, under the AQUAINT program December 2008 NSF: ITR IIS-0085836, ITR IIS-0428472, ITR IIS- 0085980, SoD-HCER-0613885 A DOI grant under the Reflex program; DHS ICMLA DASH Optimization (Xpress-MP) Page 1

Nice to Meet You

Learning and Inference Global decisions in which several local decisions play a role but  there are mutual dependencies on their outcome. E.g. Structured Output Problems – multiple dependent output variables  (Learned) models/classifiers for different sub-problems  In some cases, not all models are available to be learned simultaneously  Key examples in NLP are Textual Entailment and QA  In these cases, constraints may appear only at evaluation time  Incorporate models’ information, along with prior  knowledge/constraints, in making coherent decisions decisions that respect the learned models as well as domain & context  specific knowledge/constraints. Page 3

Inference

A process that maintains and Comprehension updates a collection of propositions about the state of affairs. (ENGLAND, June, 1989) - (ENGLAND, June, 1989) - Christopher Robin is alive and hristopher Robin is alive and well. well. He He lives lives in in England. He is the same person that England. He is the same pe rson that you you read read about in the book, about in the book, Winnie the innie the Pooh. As a boy, Chris lived Pooh. As a boy, Chris lived in in a pretty home calle a pretty home called Cotchfield d Cotchfield Farm. When Farm. When Chris was three years old, his father wrote a poem about him. Chris was three years old, his father wrote a poem about him. The The poem poem was was printed in a magazine for others to r printed in a magazine for others to read. Mr. Robin then wrote a book. He ead. Mr. Robin then wrote a book. He made up a fairy tale land where Chris li made up a fairy tale land where Chris lived. ved. His friends were animals. There His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a was a bear called Winnie the Pooh. Th ere was also an owl and a young pig, young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. called a piglet. All the animals were stu ffed toys that Chris owned. Mr. Robin Robin made them come to life with his words. The places in the story were all near made them come to life with his words. The places in the story were all near Cotchfield Cotchfield Farm. Winnie the Pooh was wri Farm. Winnie the Pooh was written in 1925. Children still love tten in 1925. Children still love to to read about Christopher Robin and his animal read about Christopher Robin and his animal friends. Most people don't know friends. Most people don't know he is a real person who is grown now. He has written two books he is a real person who is grown now. He has written two books of of his his own. own. They tell what it is like to be famous. They tell what it is like to be famous. 1. Christopher Robin was born in England. 2. Winnie th 1. Christopher Robin was born in England. 2. Winnie the P e Pooh is ooh is a a title of title of a a book. book. 3. Christopher Robin’s dad was a magician. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must 4. Christopher Robin must be at least 65 now be at least 65 now. This is an Inference Problem This is an Inference Problem Page 5

This Talk: Constrained Conditional Models A general inference framework that combines  Learning conditional models Learning conditional models with with using declarative expressive constraints using declarative expressive constraints  Within a constrained optimization framework Within a constrained optimization framework  Formulate a decision process as a constrained optimization problem, or  Break up a complex problem into a set of sub-problems and require  components’ outcomes to be consistent modulo constraints Has been shown useful in the context of many NLP problems  SRL, Summarization; Co-reference; Information Extraction SRL, Summarization; Co-refer ence; Information Extraction  [Roth&Yih04,07; Punyakanok et.al [Roth&Yih04,07; Punyakanok et.al 05,08; Chang et.al07,08; 5,08; Chang et.al07,08;  Clarke&Lap Clarke&Lapata06,07; Denise&Baldrige07] ata06,07; Denise&Baldrige07] Here: focus on Learning and Inference for Structured NLP  Problems Page 6

Outline Constrained Conditional Models  Motivation Motivation  Examples Examples  Training Paradigms: Investigate ways for training models  and combining constraints Joint Joint Learning and Inference vs. Learning and Inference vs. decoupling decoupling Learning & Inference Learning & Inference  Guiding Semi-Supervised Learning Guiding Semi-Supervised Learning with Constraints with Constraints  Features vs. Constraints Features vs. Constraints  Hard and Soft Constraints Hard and Soft Constraints  Examples  Semantic Parsing Semantic Parsing  Information Extraction Information Extraction  Pipeline processes Pipeline processes  Page 7

Improvement over no inference: 2-5% Inference with General Constraint Structure [Roth&Yih’04] other other 0.05 0.05 other other 0.10 0.10 other other other 0.05 0.05 0.05 per per per 0.85 0.85 0.85 per per per 0.60 0.60 0.60 per per per per 0.50 0.50 0.50 0.50 loc loc 0.10 0.10 loc loc 0.30 0.30 loc loc loc loc 0.45 0.45 0.45 0.45 Some Questions: Dole ’s wife, Elizabeth , is a native of N.C. How to guide the global E 1 E 2 E 3 inference? Why not learn Jointly? R 23 R 12 irrelevant irrelevant irrelevant 0.05 0.05 0.05 irrelevant irrelevant 0.10 0.10 spou spouse_of spouse_of spouse_of se_of 0.45 0.45 0.45 0.45 spouse_of spouse_of 0.05 0.05 born_in born_in born_in born_in 0.50 0.50 0.50 0.50 born_in born_in born_in 0.85 0.85 0.85 Models could be learned separately; constraints may come up only at decision time. Page 8

Task of Interests: Structured Output For each instance, assign values to a set of variables  Output variables depends on each other  Common tasks in  Natural language processing  Parsing; Semantic Parsing; Summarization; Transliteration; Co-reference  resolution,… Information extraction  Entities, Relations,…  Many pure machine learning approaches exist  Hidden Markov Models (HMMs) ; CRFs  Perceptrons…  However, … However, …  Page 9

Information Extraction via Hidden Markov Models Lars Ole Andersen . Program analysis and specialization for the Lars Ole Andersen . Program analysis and specialization for the C Programming language. PhD thesis. DIKU , C Programming language. PhD thesis. DIKU , University of Copenhagen, May 1994 . University of Copenhagen, May 1994 . Prediction result of a trained HMM Prediction result of a trained HMM Lars Ole Andersen . Program analysis and [AUTHOR] [AUTHOR] [TIT [TITLE] LE] specialization for the [EDITOR] [EDITO R] C [BOOKTI [BOOKTITLE] LE] Programming language [TECH-REPORT] [TECH-REPORT] . PhD thesis . [INSTITUTION] [INSTITUTION] DIKU , University of Copenhagen , May 1994 . [DATE] [DATE] Unsatisfactory results ! Page 10 Page 10

Strategies for Improving the Results (Pure) Machine Learning Approaches  Higher Order HMM/CRF?  Increasing the model complexity Increasing the window size?  Adding a lot of new features  Requires a lot of labeled examples  What if we only have a few labeled examples?  Can we keep the learned model simple and still make expressive decisions? Any other options?  Humans can immediately tell bad outputs  The output does not make sense  Page 11

Information extraction without Prior Knowledge Lars Ole Andersen . Program analysis and specialization for the Lars Ole Andersen . Program analysis and specialization for the C Programming language. PhD thesis. DIKU , C Programming language. PhD thesis. DIKU , University of Copenhagen, May 1994 . University of Copenhagen, May 1994 . Prediction result of a trained HMM Prediction result of a trained HMM Lars Ole Andersen . Program analysis and [AUTHOR] [AUTHOR] [TIT [TITLE] LE] specialization for the [EDITOR] [EDITO R] C [BOOKTI [BOOKTITLE] LE] Programming language [TECH-REPORT] [TECH-REPORT] . PhD thesis . [INSTITUTION] [INSTITUTION] DIKU , University of Copenhagen , May 1994 . [DATE] [DATE] Violates lots of natural constraints! Page 12 Page 12

Examples of Constraints Each field must be a consecutive list of words and can appear  at most once in a citation. State transitions must occur on punctuation marks.  The citation can only start with AUTHOR or EDITOR .  The words pp., pages correspond to PAGE.  Four digits starting with 20xx and 19xx are DATE .  Quotations can appear only in TITLE  ……. Easy to express pieces of “knowledge”  Non Propositional; May use Quantifiers Page 13

Constrained Conditional Models Learning and Inference in Natural - PowerPoint PPT Presentation

Constrained Conditional Models Learning and Inference in Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Graphical Models Graphical Models Conditional Independence 1 Steven J Zeil d-Separation 2

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Implementing Existing Management Protocols on Constrained Devices J urgen Sch onw alder

Graphical Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Conditional Independence

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models Matthew J.

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F:

Multiscale Conditional 1) Generalization of conditional random fields (CRF) to multiscale

15. The Conditional 15.1 The conditional: Formation and uses 15.2 Mise en pratique 15.1 The

Computa@onal Frameworks for Seman@c Analysis and Wikifica@on

Romans Series Lesson #160 October 23, 2014 Dean Bible Ministries www.deanbibleministries.org

Honor Your Mentor Honor Your Mentor W. Allen Addison, MD , The time you spent with us and the

Transitioning to Teach: Helping Clinicians and Administrators Enter Academia JOAN C. MASTERS,

Real World Haskell. It's 1994. It's 1994. There's just one There's just one song on the radio.

31/03/19 Sayyiduna Jibril (Peace be upon Him) www.SundayGathering.co.uk /AlMuhammadiyya |

What works? Early recognition It is important that we educate our communities As many

Plenary Session and Workshops Andr Douette - Member of the Board of the Federation of European

Constrained Conditional Models Learning and Inference in Natural - PowerPoint PPT Presentation

Constrained Conditional Models Learning and Inference in Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Graphical Models Graphical Models Conditional Independence 1 Steven J Zeil d-Separation 2

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Implementing Existing Management Protocols on Constrained Devices J urgen Sch onw alder

Graphical Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Conditional Independence

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models Matthew J.

Conditional Statements Python Conditional Statements Sometimes a statement (or a block of

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Conditional Sentences as Conditional Speech Acts Workshop Questioning Speech Acts Universitt

Conditional Probability &amp; Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) &gt; 0 Conditional probability of E given F:

Multiscale Conditional 1) Generalization of conditional random fields (CRF) to multiscale

15. The Conditional 15.1 The conditional: Formation and uses 15.2 Mise en pratique 15.1 The

Computa@onal Frameworks for Seman@c Analysis and Wikifica@on

Romans Series Lesson #160 October 23, 2014 Dean Bible Ministries www.deanbibleministries.org

Honor Your Mentor Honor Your Mentor W. Allen Addison, MD , The time you spent with us and the

Transitioning to Teach: Helping Clinicians and Administrators Enter Academia JOAN C. MASTERS,

Real World Haskell. It's 1994. It's 1994. There's just one There's just one song on the radio.

31/03/19 Sayyiduna Jibril (Peace be upon Him) www.SundayGathering.co.uk /AlMuhammadiyya |

What works? Early recognition It is important that we educate our communities As many

Plenary Session and Workshops Andr Douette - Member of the Board of the Federation of European

Conditional Probability & Independence Conditional Probabilities Question : How should we

P( ) 1 conditional probability where P(F) > 0 Conditional probability of E given F: