Bootstrapping Dependency Grammars from Sentence Fragments via - - PowerPoint PPT Presentation

bootstrapping dependency grammars
SMART_READER_LITE
LIVE PREVIEW

Bootstrapping Dependency Grammars from Sentence Fragments via - - PowerPoint PPT Presentation

Bootstrapping Dependency Grammars from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.) Spitkovsky et al. (Stanford & Google) Incomplete Fragments /


slide-1
SLIDE 1

Bootstrapping Dependency Grammars

from Sentence Fragments via Austere Models Valentin I. Spitkovsky with Daniel Jurafsky (Stanford University) and Hiyan Alshawi (Google Inc.)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 1 / 12

slide-2
SLIDE 2

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-3
SLIDE 3

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-4
SLIDE 4

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate... Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-5
SLIDE 5

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-6
SLIDE 6

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-7
SLIDE 7

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-8
SLIDE 8

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-9
SLIDE 9

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-10
SLIDE 10

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-11
SLIDE 11

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier:

◮ better chances of guessing larger fractions of correct trees Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-12
SLIDE 12

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier:

◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures

(Smith and Eisner, 2006)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-13
SLIDE 13

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier:

◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures

(Smith and Eisner, 2006)

◮ faster training, etc. Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-14
SLIDE 14

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier:

◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures

(Smith and Eisner, 2006)

◮ faster training, etc.

— a rich history going back to Elman (1993)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-15
SLIDE 15

Introduction Unsupervised Learning

Why do unsupervised learning?

  • ne practical reason:

◮ got lots of potentially useful data! ◮ but more than would be feasible to annotate...

yet grammar inducers use less than supervised parsers:

◮ most systems train on WSJ10 (or, more recently, WSJ15) ◮ WSJ10 has approximately 50K tokens (5% of WSJ’s 1M) ◮ in just 7K sentences (WSJ15’s 16K cover 160K tokens)

long sentences are hard — shorter inputs can be easier:

◮ better chances of guessing larger fractions of correct trees ◮ preference for more local structures

(Smith and Eisner, 2006)

◮ faster training, etc.

— a rich history going back to Elman (1993)

... could we “start small” and use more data?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 2 / 12

slide-16
SLIDE 16

Introduction Previous Work

How have long inputs been handled previously?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

slide-17
SLIDE 17

Introduction Previous Work

How have long inputs been handled previously?

very carefully...

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

slide-18
SLIDE 18

Introduction Previous Work

How have long inputs been handled previously?

very carefully...

◮ Viterbi training

(tolerates bad independence assumptions of models)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

slide-19
SLIDE 19

Introduction Previous Work

How have long inputs been handled previously?

very carefully...

◮ Viterbi training

(tolerates bad independence assumptions of models)

◮ + punctuation-induced constraints

(partial bracketing: Pereira and Schabes, 1992)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

slide-20
SLIDE 20

Introduction Previous Work

How have long inputs been handled previously?

very carefully...

◮ Viterbi training

(tolerates bad independence assumptions of models)

◮ + punctuation-induced constraints

(partial bracketing: Pereira and Schabes, 1992)

◮ = punctuation-constrained Viterbi training Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 3 / 12

slide-21
SLIDE 21

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-22
SLIDE 22

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers],

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-23
SLIDE 23

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] —

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-24
SLIDE 24

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] —

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-25
SLIDE 25

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] — [VP doesn’t appear to have altered the long-term rate of increase in health-care costs],

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-26
SLIDE 26

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] — [VP doesn’t appear to have altered the long-term rate of increase in health-care costs], [NP the Institute of Medicine],

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-27
SLIDE 27

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] — [VP doesn’t appear to have altered the long-term rate of increase in health-care costs], [NP the Institute of Medicine], [NP an affiliate of the National Academy of Sciences],

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-28
SLIDE 28

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] — [VP doesn’t appear to have altered the long-term rate of increase in health-care costs], [NP the Institute of Medicine], [NP an affiliate of the National Academy of Sciences], [VP concluded after a two-year study].

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-29
SLIDE 29

Introduction Example

Example:

Punctuation (Spitkovsky et al., 2011)

[SBAR Although it probably has reduced the level of expenditures for some purchasers], [NP utilization management] — [PP like most other cost containment strategies] — [VP doesn’t appear to have altered the long-term rate of increase in health-care costs], [NP the Institute of Medicine], [NP an affiliate of the National Academy of Sciences], [VP concluded after a two-year study].

... wouldn’t it be great if we could just break it up?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 4 / 12

slide-30
SLIDE 30

Data Splitting Previous Work

How have long inputs been handled previously?

splitting on punctuation

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 5 / 12

slide-31
SLIDE 31

Data Splitting Previous Work

How have long inputs been handled previously?

splitting on punctuation:

◮ supervised parsing of long Chinese sentences

(Li et al., 2005) (Li et al., 2010)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 5 / 12

slide-32
SLIDE 32

Data Splitting Previous Work

How have long inputs been handled previously?

splitting on punctuation:

◮ supervised parsing of long Chinese sentences

(Li et al., 2005) (Li et al., 2010)

◮ unsupervised constituent parsing

(Ponvert et al., 2011)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 5 / 12

slide-33
SLIDE 33

Data Splitting Previous Work

How have long inputs been handled previously?

splitting on punctuation:

◮ supervised parsing of long Chinese sentences

(Li et al., 2005) (Li et al., 2010)

◮ unsupervised constituent parsing

(Ponvert et al., 2011)

◮ unsupervised chunking

(Ponvert et al., 2010) via Seginer’s (2007) CCL parser

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 5 / 12

slide-34
SLIDE 34

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-35
SLIDE 35

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-36
SLIDE 36

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-37
SLIDE 37

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-38
SLIDE 38

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-39
SLIDE 39

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data

but, also impact on quality of data

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-40
SLIDE 40

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data

but, also impact on quality of data:

◮ mostly phrases and clauses

(75% agree with constituent boundaries)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-41
SLIDE 41

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data

but, also impact on quality of data:

◮ mostly phrases and clauses

(75% agree with constituent boundaries)

◮ many fewer complete sentences exhibiting full structure Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-42
SLIDE 42

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data

but, also impact on quality of data:

◮ mostly phrases and clauses

(75% agree with constituent boundaries)

◮ many fewer complete sentences exhibiting full structure ◮ even less representative than short sentences Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-43
SLIDE 43

Data Splitting What If

What if we chopped up input at punctuation?

impact on quantity of data (with a 15-token threshold):

◮ number of training inputs goes up to 34,856 (from 15,922) ◮ number of tokens increases to 709,215 (from 163,715) ◮ more and simpler word sequences incorporated earlier ◮ much more dense coverage of available data

but, also impact on quality of data:

◮ mostly phrases and clauses

(75% agree with constituent boundaries)

◮ many fewer complete sentences exhibiting full structure ◮ even less representative than short sentences

however, there is an appropriate model family (DBMs)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 6 / 12

slide-44
SLIDE 44

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-45
SLIDE 45

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch ce dir = R adj = T

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-46
SLIDE 46

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch ce dir = R adj = T

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-47
SLIDE 47

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R adj = T cd1 ce

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-48
SLIDE 48

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 ce adj = F

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-49
SLIDE 49

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 ce adj = F

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-50
SLIDE 50

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-51
SLIDE 51

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-52
SLIDE 52

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

STOP

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-53
SLIDE 53

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

STOP

PROOT(ch | comp)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-54
SLIDE 54

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

STOP

PROOT(ch | comp) PATTACH(cd | ch, dir, cross)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-55
SLIDE 55

Data Splitting Model

Class-based, head-outward generation

(Alshawi, 1996)

ch dir = R cd1 adj = F cd2 ce

STOP

PROOT(ch | comp) PATTACH(cd | ch, dir, cross) PSTOP(| dir, adj, ce, comp)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 7 / 12

slide-56
SLIDE 56

Data Splitting Example Continued

Example (cont’d):

DBMs (Spitkovsky et al., 2012)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 8 / 12

slide-57
SLIDE 57

Data Splitting Example Continued

Example (cont’d):

DBMs (Spitkovsky et al., 2012)

length & type left & right complete 51 S IN NN incomplete 12 SBAR IN NNS 2 NP NN NN 6 PP IN NNS 14 VP VBZ NNS 4 NP DT NNP 8 NP DT NNPS 5 VP VBD NN

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 8 / 12

slide-58
SLIDE 58

Data Splitting Example Continued

Example (cont’d):

DBMs (Spitkovsky et al., 2012)

DBM-2 length & type left & right complete 51 S IN NN incomplete 12 SBAR IN NNS 2 NP NN NN 6 PP IN NNS 14 VP VBZ NNS 4 NP DT NNP 8 NP DT NNPS 5 VP VBD NN

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 8 / 12

slide-59
SLIDE 59

Data Splitting Example Continued

Example (cont’d):

DBMs (Spitkovsky et al., 2012)

DBM-1 length & type left & right complete 51 S IN NN incomplete 12 SBAR IN NNS 2 NP NN NN 6 PP IN NNS 14 VP VBZ NNS 4 NP DT NNP 8 NP DT NNPS 5 VP VBD NN

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 8 / 12

slide-60
SLIDE 60

Data Splitting Example Continued

Example (cont’d):

DBMs (Spitkovsky et al., 2012)

length & type left & right complete 51 S IN NN incomplete 12 SBAR IN NNS 2 NP NN NN 6 PP IN NNS 14 VP VBZ NNS 4 NP DT NNP 8 NP DT NNPS DBM-3 5 VP VBD NN

partial parse forests

“easy-first” (Goldberg and Elhadad, 2010), optional soft EM

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 8 / 12

slide-61
SLIDE 61

Data Splitting Results

We tried; it works...

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-62
SLIDE 62

Data Splitting Results

We tried; it works...

experimental setup

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-63
SLIDE 63

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-64
SLIDE 64

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-65
SLIDE 65

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-66
SLIDE 66

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-67
SLIDE 67

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

(Spitkovsky et al., 2011)

59.1 [EMNLP]

context-sensitive clusters (Spitkovsky et al., 2011)

58.4 [CoNLL]

punctuation constraints (Tu and Honavar, 2012)

57.0 [EMNLP-CoNLL]

(Blunsom and Cohn, 2011)

55.7 [EMNLP]

(Gillenwater et al., 2010)

53.3 [TechReport]

(Bisk and Hockenmaier, 2012)

53.3 [AAAI]

(Spitkovsky et al., 2010)

47.9 [CoNLL]

Viterbi training

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-68
SLIDE 68

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-69
SLIDE 69

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-70
SLIDE 70

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-71
SLIDE 71

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-72
SLIDE 72

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

results with initially-split data — 60.2%

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-73
SLIDE 73

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

results with initially-split data — 60.2%

◮ can do better with simpler initial models — 61.2%

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-74
SLIDE 74

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

results with initially-split data — 60.2% (3.5% exact)

◮ can do better with simpler initial models — 61.2% (5.0%)

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-75
SLIDE 75

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

results with initially-split data — 60.2% (3.5% exact)

◮ can do better with simpler initial models — 61.2% (5.0%) ◮ e.g., better not to model roots of incomplete fragments

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-76
SLIDE 76

Data Splitting Results

We tried; it works...

experimental setup:

◮ context-sensitive unsupervised tags (no gold POS)1 ◮ performance metric is directed dependency accuracy ◮ evaluation on Section 23 (all sentences)

state-of-the-art baseline: 59.7%

◮ DBMs on whole inputs only ◮ staged training on WSJ15 → WSJ45 ◮ strong punctuation-induced constraints for full data ◮ weaker constraints used in decoding for evaluation

results with initially-split data — 60.2% (3.5% exact)

◮ can do better with simpler initial models — 61.2% (5.0%) ◮ e.g., better not to model roots of incomplete fragments ◮ ... as well as non-adjacency for short inputs

1nlp.stanford.edu/pubs/goldtags-data.tar.bz2:untagger.model

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 9 / 12

slide-77
SLIDE 77

Conclusion Summary

Summary

instead of bootstrapping dependency grammar inducers from 16K short whole sentences (160K tokens)

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 10 / 12

slide-78
SLIDE 78

Conclusion Summary

Summary

instead of bootstrapping dependency grammar inducers from 16K short whole sentences (160K tokens), we

◮ start with 35K inter-punctuation fragments (709K tokens) Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 10 / 12

slide-79
SLIDE 79

Conclusion Summary

Summary

instead of bootstrapping dependency grammar inducers from 16K short whole sentences (160K tokens), we

◮ start with 35K inter-punctuation fragments (709K tokens) ◮ using appropriate models that can handle incomplete data Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 10 / 12

slide-80
SLIDE 80

Conclusion Summary

Summary

instead of bootstrapping dependency grammar inducers from 16K short whole sentences (160K tokens), we

◮ start with 35K inter-punctuation fragments (709K tokens) ◮ using appropriate models that can handle incomplete data ◮ and improved state-of-the-art accuracy by more than 2% Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 10 / 12

slide-81
SLIDE 81

Conclusion Future

Possible future directions?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 11 / 12

slide-82
SLIDE 82

Conclusion Future

Possible future directions?

could we induce grammars from ungrammatical inputs?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 11 / 12

slide-83
SLIDE 83

Conclusion Future

Possible future directions?

could we induce grammars from ungrammatical inputs?

◮ perhaps sentence prefixes and suffixes? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 11 / 12

slide-84
SLIDE 84

Conclusion Future

Possible future directions?

could we induce grammars from ungrammatical inputs?

◮ perhaps sentence prefixes and suffixes? ◮ could we go all the way down to n-grams? Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 11 / 12

slide-85
SLIDE 85

Conclusion Questions?

Thanks!

Any questions?

Spitkovsky et al. (Stanford & Google) Incomplete Fragments / Austere Models ICGI (2012-09-07) 12 / 12