semi supervised learning of sequence models via method of
play

Semi-Supervised Learning of Sequence Models via Method of Moments - PowerPoint PPT Presentation

Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith Andr F. T. Martins Computer Science


  1. Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith André F. T. Martins Computer Science & Eng. IST, University of Lisbon School of Informatics IT, IST, University of Lisbon University of Washington Robotics Institute, CMU Unbabel University of Edinburgh zmarinho@cmu.edu andre.martins@unbabel.com scohen@inf.ed.ac.uk nasmith@cs.washington.edu

  2. Sequence Labeling N V Pre Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 2 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  3. Sequence Labeling N V V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 3 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  4. Sequence Labeling ADJ N V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 4 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  5. Sequence Labeling K 6 possible assignments ? ? ? ? ? ? y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 5 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  6. Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • 6 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  7. Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • model can be extended to include features • Berg-Kirkpatrick, et al, Painless unsupervised learning with features. NAACL HLT, 2010. 7 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  8. Maximum Likelihood estimation Method of Moments estimation (MLE) (MoM) • exact inference is hard computationally efficient • EM sensitive to local optima no local optima (depends on initialization) • EM expensive in large datasets one pass over data (several inference passes) 8 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

  9. Hidden Markov Model via Maximum Likelihood Estimation via Method of Moments MLE MLE MoM MoM HMM feature HMM HMM feature HMM ✓ ✓ semi-supervised learning ? ? ✓ ✓ ✓ unsupervised learning ? Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity , JMLR 2014 Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 9 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |

  10. Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments 10 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |

  11. Method of Moments Key insight: 1. Conditional Independence: infer label by looking at context 2. Anchor Trick: learn a proxy for labels with anchors 11 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  12. 1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word 12 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  13. 1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 :) goin wait now am 2 I context = { w -1 , w +1 } Log-linear model 13

  14. 1. Conditional Independence adp y t-1 y t y t+1 w t+1 w t-1 w t context chimichangas tasted like 14 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

  15. 1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like 15 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

  16. 1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like “You shall know a word by the company it keeps.” Firth, 1957 16 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |

  17. 1. Conditional Independence | label word ⊥ context y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word context 17 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  18. 2. Anchor Trick all instances of be = verb p ( verb | be ) = 1 p ( label ≠ verb | be ) = 0 verb label y t-1 y t y t+1 anchor w t+1 w t-1 w t word be Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 18 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  19. 2. Anchor Trick More anchors per label verb = b, be, are, is, am, have, going verb y is are go be going have am less biased context estimates more than 1 anchor word 19 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  20. 2. Anchor Trick How to find anchors ? • small labeled corpus • small lexicon Austin noun airport playground am,be,is,are go, verb make,made become he,it,she pron so,on,of adp 20 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |

  21. Method of moments unlabeled co-occurrences in data w t-1 w t+1 w t+2 w t context Andrew fights like Jet Li. Ann sings like me. eat Fruit like cherry. Children like ice-cream. 21 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  22. Method of moments Q p (context | word) w t-1 w t+1 w t+2 w t context m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t Andrew fights like Jet Li. Ann sings like me. like eat Fruit like cherry. Children like ice-cream. 22 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  23. Method of moments Q p (context | word) m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t be Let there be love. like Bill will be a ninja. 23 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  24. Method of moments | label word ⊥ 1. Conditional Independence context = X p (context | word) p (label | word) p (context | label) labels context label context = x label word word R Γ Q 24 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  25. Method of moments | label word ⊥ 1. Conditional Independence context = X p (context | word) p (label | word) p (context | label) labels 2. Anchor Trick = X p (context | word) p (label | word) p (context | anchors) labels context label context = anchors x word word R Γ Q 25 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  26. Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments Proposed work 26 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |

  27. Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q • solve per word type ~(ms) γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 27 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  28. Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 28 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  29. Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q estimated from labeled data + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 estimated from unlabeled data γ = 1 X labels 29 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  30. HMM Learning Learn parameters ? p (label | word) γ coefficients Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) γ p (word) X p (label) = words 30 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

  31. HMM Learning Learn parameters ? Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) Transition Matrix • estimate from labeled data only 31 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend