a supertag context model for weakly supervised ccg parser
play

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning - PowerPoint PPT Presentation

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU Contributions 1. A new generative model for learning CCG parsers from weak


  1. A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington Chris Dyer CMU Jason Baldridge UT-Austin Noah A. Smith CMU

  2. Contributions 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

  3. Type-Level Supervision • Unannotated text • Incomplete tag dictionary: word � {tags}

  4. Type-Level Supervision the lazy dogs wander np/n n/n n np np (s\np)/np

  5. Type-Level Supervision the lazy dogs wander np/n n/n n n np np n/n (s\np)/np np/n s\np …

  6. Type-Level Supervision ? the lazy dogs wander np/n n/n n n np np n/n (s\np)/np np/n s\np …

  7. PCFG: Local Decisions

  8. PCFG: Local Decisions A

  9. PCFG: Local Decisions A B C

  10. PCFG: Local Decisions A B C

  11. PCFG: Local Decisions A B C D E F G

  12. PCFG: Local Decisions A B C D E F G B C B C P( D E | P( F G | ) )

  13. PCFG: Local Decisions A B C D E F G B C B C P( D E | P( F G | ) )

  14. A New Generative Model A B C D E F G B B P( D E | )

  15. A New Generative Model A B C D E F G B B P( D E | × P( | ) ) F B B R

  16. A New Generative Model A B C D E F G <S> B B P( D E | × P( | ) × P( | ) ) F B B B S B R L

  17. A New Generative Model A B C D E F G <E> <S> (This makes inference tricky… we’ll come back to that)

  18. Why CCG? • The grammar formalism itself can be used to guide learning • Given any two categories, we always know whether they are combinable. • We can extract a priori context preferences, before we even look at the data • Adjacent categories tend to be combinable.

  19. Why CCG? S s NP VP np VB s/np DT NN np/n n ? buy the book buy the book universal, intrinsic all relationships grammar properties must be learned

  20. CCG Parsing s np n n / n s np / n n \ np FA sleeps the lazy dog

  21. CCG Parsing s np np/n n / n s np / n n \ np FC sleeps the lazy dog

  22. Supertag Context n /n n s np np np / n n n s \ np sleeps the lazy dog

  23. Supertag Context n n /n n s np np np / n n n s \ np sleeps the lazy dog

  24. Supertag Context n n /n n s np np np / n n n s \ np sleeps the lazy dog

  25. Supertag Context s np np np / n n n s \ np sleeps the lazy dog

  26. Supertag Context n n /n n s np np np / n n n s \ np sleeps the lazy dog

  27. Constituent Context • Klein & Manning showed the value of modeling context with the Constituent Context Model (CCM) sleeps the lazy dog [Klein & Manning 2002]

  28. Constituent Context DT ( JJ NN ) VBZ [Klein & Manning 2002]

  29. Constituent Context “substitutability” DT ( JJ NN ) VBZ lazy dog [Klein & Manning 2002]

  30. Constituent Context “substitutability” DT ( NN ) VBZ dog [Klein & Manning 2002]

  31. Constituent Context “substitutability” DT ( JJ JJ NN ) VBZ big lazy dog [Klein & Manning 2002]

  32. Constituent Context “substitutability” ~Noun DT ( ) VBZ [Klein & Manning 2002]

  33. Constituent Context “substitutability” DT ( ) VBZ [Klein & Manning 2002]

  34. Supertag Context n ( n /n n s np n np / n ) s \ np sleeps the lazy dog

  35. Supertag Context • We know the constituent label • We know if it’s a fitting context, even before looking at the data n ( s np np n / ) s \ np sleeps the

  36. This Paper 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

  37. Supertag-Context Parsing A 04 Standard PCFG A 03 P(A root ) P(A → A left A right OR w i ) A 13 t 1 t 2 t 3 t 4 w 1 w 2 w 3 w 4 0 1 2 3 4

  38. Supertag-Context Parsing A 04 With Context A 03 P(A root ) P(A → A left A right OR w i ) A 13 P(A → t left ) t 1 t 2 t 3 t 4 <s> <e> P(A → t right ) w 1 w 2 w 3 w 4 0 1 2 3 4

  39. Prior on Categories np np np np\(np/n) n np/n (np\(np/n))/n n np/n n/n n the lazy dog the lazy dog [Garrette, Dyer, Baldridge, and Smith, 2015]

  40. Supertag-Context Prior { 10 5 if t left can combine with A ∝ P L-prior (t left | A) 1 otherwise A ? t left t right sleeps the lazy dog

  41. Supertag-Context Prior P R-prior (t right | A) { 10 5 if A can combine with t right ∝ 1 otherwise n ? t left t right sleeps the lazy dog

  42. This Paper 1. A new generative model for learning CCG parsers from weak supervision 2. A way to select Bayesian priors that capture properties of CCG 3. A Bayesian inference procedure to learn the parameters of our model

  43. Type-Level Supervision ? the lazy dogs wander np/n n/n n np np (s\np)/np

  44. Type-Supervised Learning unlabeled corpus tag dictionary universal properties of the CCG formalism

  45. Posterior Inference • A Bayesian inference procedure will make use of our linguistically-informed priors • But we can’t do sampling like a PCFG • Can’t compute the inside chart, even with dynamic programming.

  46. Sampling via Metropolis-Hastings Idea: • Sample tree from an efficient proposal distribution • (PCFG parameters) (Johnson et al. 2007) • Accept according to the full distribution • (Context parameters)

  47. Posterior Inference Priors (prefer connections) the lazy dogs wander np/n n/n n Model np np (s\np)/np

  48. Posterior Inference Priors (prefer connections) the lazy dogs wander np/n n/n n Model np np (s\np)/np

  49. Posterior Inference Priors (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

  50. Posterior Inference Priors Inside (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

  51. Posterior Inference Sample Priors (prefer connections) the lazy dogs wander np/n n/n n n Model np np n/n (s\np)/np np/n s\np …

  52. Metropolis-Hastings Priors (prefer connections) Model

  53. Metropolis-Hastings Priors Existing Tree (prefer connections) New Tree Model

  54. Metropolis-Hastings Priors Existing Tree (prefer connections) New Tree Model

  55. Metropolis-Hastings Priors Existing Tree (prefer connections) New Tree Model

  56. Metropolis-Hastings Priors (prefer connections) Model

  57. Posterior Inference Priors (prefer connections) Model

  58. Metropolis-Hastings • Sample tree based only on the pcfg parameters • Accept based only on the context • New worse than old => less likely to accept

  59. Experimental Results

  60. Experimental Question • When supervision is incomplete, does modeling context, and biasing toward combining contexts, help learn better parsing models?

  61. English Results 75 no context +context combinability 65 64 61 64 63 60 56 60 60 59 55 58 parsing accuracy 50 25 0 250k 200k 150k 100k 50k 25k size of the corpus from which the tag dictionary is drawn

  62. Experimental Results 60 no context 58 +context combinability 55 54 52 parsing accuracy 40 34 29 20 0 English Italian Chinese 25k token TD corpus

  63. Conclusion Under weak supervision, we can use universal grammatical knowledge about context to find trees with a better global structure .

  64. Deficiency • Generative story has a “throw away” step if the context-generated nonterminals don’t match the tree. • We sample only over the space of valid trees (condition on well-formed structures). • This is a benefit of the Bayesian formulation. • See Smith 2011.

  65. Metropolis-Hastings new tree current tree P context ( y ) = P context ( y ′ ) = P full ( y ′ ) / P pcfg ( y ′ ) P full ( y ) / P pcfg ( y ) z ∼ uniform(0,1) P full ( y ′ ) / P pcfg ( y ′ ) P context ( y ′ ) accept if z < = P full ( y ) / P pcfg ( y ) P context ( y )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend