bootstrapping dependency grammars
play

Bootstrapping Dependency Grammars from Sentence Fragments via - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments /


  1. Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 1 / 12

  2. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  3. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  4. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  5. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  6. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  7. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  8. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  9. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  10. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  11. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  12. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  13. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  14. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  15. Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) ... could we “start small” and use more data? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

  16. Introduction Previous Work How have long inputs been handled previously? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  17. Introduction Previous Work How have long inputs been handled previously? very carefully... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  18. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  19. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  20. Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) ◮ = punctuation-constrained Viterbi training Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

  21. Introduction Example Example: Punctuation (Spitkovsky et al., 2011) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

  22. Introduction Example Example: Punctuation (Spitkovsky et al., 2011) [ SBAR Although it probably has reduced the level of expenditures for some purchasers ] , Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend