Bootstrapping Dependency Grammars from Sentence Fragments via - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 1 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Unsupervised Learning Why do unsupervised learning? one practical reason: ◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... yet grammar inducers use less than supervised parsers: ◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) long sentences are hard — shorter inputs can be easier: ◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures (Smith and Eisner, 2006) ◮ faster training, etc. — a rich history going back to Elman (1993) ... could we “start small” and use more data? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

Introduction Previous Work How have long inputs been handled previously? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

Introduction Previous Work How have long inputs been handled previously? very carefully... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

Introduction Previous Work How have long inputs been handled previously? very carefully... ◮ Viterbi training (tolerates bad independence assumptions of models) ◮ + punctuation-induced constraints (partial bracketing: Pereira and Schabes, 1992) ◮ = punctuation-constrained Viterbi training Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

Introduction Example Example: Punctuation (Spitkovsky et al., 2011) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

Introduction Example Example: Punctuation (Spitkovsky et al., 2011) [ SBAR Although it probably has reduced the level of expenditures for some purchasers ] , Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

Bootstrapping Dependency Grammars from Sentence Fragments via - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments /

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars Topological Dependency Trees: A Constraint-based Account of Linear

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Dependency Grammars Dependency grammars . ltekin, SfS / University of Tbingen WS

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Dependency Parsing CMSC 470 Marine Carpuat Dependency Grammars Syntactic structure = lexical

Dependency Grammars and Parser LING 571 Deep Processing for NLP October 16, 2019 Shane

Dependency Grammars and Parsers Deep Processing for NLP Ling571 January 28, 2015 Roadmap

Statistical Parsing October 27, 2016 Dependency grammars Grammar formalisms Finale Plan of the

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Dependency Grammars: Avoiding Constituents Traditional way of thinking Goes back to Panini

Minimalist Grammars Formalisme en Dependency Structures Sjoerd Dost Myrthe Tielman Logical

Glass Lantern Slides from Chatsworth Park Elementary Part 2 10/17/2017 Chatsworth Historical

The SIGMORPHON 2016 shared task morphological reinflection Ryan Cotterell, Christo Kirov,

BITCOIN MECHANICS See here: http://www.pptfaq.com /FAQ00125_How_big AND OPTIMIZATIONS

Stochastic LandauLifshitz Equation on Real Line FARAH El RAFEI School of Mathematics and

The Webinar will begin shortly. You will be hearing the following voices: Frank P. Saladis, PMP

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Illinois Early Childhood Innovation Zones: Early Wins & Lessons Learned So Far Part 3 of

Beam Search Shahrzad Kiani and Zihao Chen CSC2547 Presentation Beam Search Greedy Search: Always

Bootstrapping Dependency Grammars from Sentence Fragments via - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments /

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars Topological Dependency Trees: A Constraint-based Account of Linear

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Dependency Grammars Dependency grammars . ltekin, SfS / University of Tbingen WS

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Dependency Parsing CMSC 470 Marine Carpuat Dependency Grammars Syntactic structure = lexical

Dependency Grammars and Parser LING 571 Deep Processing for NLP October 16, 2019 Shane

Dependency Grammars and Parsers Deep Processing for NLP Ling571 January 28, 2015 Roadmap

Statistical Parsing October 27, 2016 Dependency grammars Grammar formalisms Finale Plan of the

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Dependency Grammars: Avoiding Constituents Traditional way of thinking Goes back to Panini

Minimalist Grammars Formalisme en Dependency Structures Sjoerd Dost Myrthe Tielman Logical

Glass Lantern Slides from Chatsworth Park Elementary Part 2 10/17/2017 Chatsworth Historical

The SIGMORPHON 2016 shared task morphological reinflection Ryan Cotterell, Christo Kirov,

BITCOIN MECHANICS See here: http://www.pptfaq.com /FAQ00125_How_big AND OPTIMIZATIONS

Stochastic LandauLifshitz Equation on Real Line FARAH El RAFEI School of Mathematics and

The Webinar will begin shortly. You will be hearing the following voices: Frank P. Saladis, PMP

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Illinois Early Childhood Innovation Zones: Early Wins &amp; Lessons Learned So Far Part 3 of

Beam Search Shahrzad Kiani and Zihao Chen CSC2547 Presentation Beam Search Greedy Search: Always

Illinois Early Childhood Innovation Zones: Early Wins & Lessons Learned So Far Part 3 of