Semi-Supervised Learning of Sequence Models via Method of Moments - PowerPoint PPT Presentation

Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith André F. T. Martins Computer Science & Eng. IST, University of Lisbon School of Informatics IT, IST, University of Lisbon University of Washington Robotics Institute, CMU Unbabel University of Edinburgh zmarinho@cmu.edu andre.martins@unbabel.com scohen@inf.ed.ac.uk nasmith@cs.washington.edu

Sequence Labeling N V Pre Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 2 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Sequence Labeling N V V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 3 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Sequence Labeling ADJ N V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 4 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Sequence Labeling K 6 possible assignments ? ? ? ? ? ? y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 5 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • 6 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • model can be extended to include features • Berg-Kirkpatrick, et al, Painless unsupervised learning with features. NAACL HLT, 2010. 7 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Maximum Likelihood estimation Method of Moments estimation (MLE) (MoM) • exact inference is hard computationally efficient • EM sensitive to local optima no local optima (depends on initialization) • EM expensive in large datasets one pass over data (several inference passes) 8 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

Hidden Markov Model via Maximum Likelihood Estimation via Method of Moments MLE MLE MoM MoM HMM feature HMM HMM feature HMM ✓ ✓ semi-supervised learning ? ? ✓ ✓ ✓ unsupervised learning ? Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity , JMLR 2014 Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 9 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments 10 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of Moments Key insight: 1. Conditional Independence: infer label by looking at context 2. Anchor Trick: learn a proxy for labels with anchors 11 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word 12 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 :) goin wait now am 2 I context = { w -1 , w +1 } Log-linear model 13

1. Conditional Independence adp y t-1 y t y t+1 w t+1 w t-1 w t context chimichangas tasted like 14 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like 15 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like “You shall know a word by the company it keeps.” Firth, 1957 16 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

1. Conditional Independence | label word ⊥ context y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word context 17 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

2. Anchor Trick all instances of be = verb p ( verb | be ) = 1 p ( label ≠ verb | be ) = 0 verb label y t-1 y t y t+1 anchor w t+1 w t-1 w t word be Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 18 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

2. Anchor Trick More anchors per label verb = b, be, are, is, am, have, going verb y is are go be going have am less biased context estimates more than 1 anchor word 19 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

2. Anchor Trick How to find anchors ? • small labeled corpus • small lexicon Austin noun airport playground am,be,is,are go, verb make,made become he,it,she pron so,on,of adp 20 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of moments unlabeled co-occurrences in data w t-1 w t+1 w t+2 w t context Andrew fights like Jet Li. Ann sings like me. eat Fruit like cherry. Children like ice-cream. 21 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of moments Q p (context | word) w t-1 w t+1 w t+2 w t context m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t Andrew fights like Jet Li. Ann sings like me. like eat Fruit like cherry. Children like ice-cream. 22 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of moments Q p (context | word) m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t be Let there be love. like Bill will be a ninja. 23 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments Proposed work 26 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q • solve per word type ~(ms) γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 27 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 28 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q estimated from labeled data + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 estimated from unlabeled data γ = 1 X labels 29 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

HMM Learning Learn parameters ? p (label | word) γ coefficients Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) γ p (word) X p (label) = words 30 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

HMM Learning Learn parameters ? Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) Transition Matrix • estimate from labeled data only 31 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Semi-Supervised Learning of Sequence Models via Method of Moments - PowerPoint PPT Presentation

Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith Andr F. T. Martins Computer Science

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Learning and adaptation Perception of robots by humans Interfaces Exhibiting and

This is why we can(t) have nice things Timo van der Kuil 16/11/2019 Meeting C++ 1 Who am I

Redefining Innovation in Multilateral IP Regulation to Advance Agricultural Invention: An

Welcome to the Wildland Fire Assessment Tool lesson. WFAT, as it will be referred to throughout

The Best Designed Library You Shouldnt Use Ahmed Charles

rrs r r rrs

Revisiting the Institutional Approach to Herbrands Theorem Ionu uu 1,2 Jos Luiz

<atomic.h> weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb

Semi-Supervised Learning of Sequence Models via Method of Moments - PowerPoint PPT Presentation

Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith Andr F. T. Martins Computer Science

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Learning and adaptation Perception of robots by humans Interfaces Exhibiting and

This is why we can(t) have nice things Timo van der Kuil 16/11/2019 Meeting C++ 1 Who am I

Redefining Innovation in Multilateral IP Regulation to Advance Agricultural Invention: An

Welcome to the Wildland Fire Assessment Tool lesson. WFAT, as it will be referred to throughout

The Best Designed Library You Shouldnt Use Ahmed Charles

rrs r r rrs

Revisiting the Institutional Approach to Herbrands Theorem Ionu uu 1,2 Jos Luiz

&lt;atomic.h&gt; weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb

<atomic.h> weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb