From Baby Steps to Leapfrog: How Less is More in Unsupervised - - PowerPoint PPT Presentation

from baby steps to leapfrog how less is more
SMART_READER_LITE
LIVE PREVIEW

From Baby Steps to Leapfrog: How Less is More in Unsupervised - - PowerPoint PPT Presentation

From Baby Steps to Leapfrog: How Less is More in Unsupervised Dependency Parsing Valentin I. Spitkovsky with Hiyan Alshawi (Google Inc.) and Daniel Jurafsky (Stanford University) Spitkovsky et al. (Stanford & Google) From Baby Steps


slide-1
SLIDE 1

From Baby Steps to Leapfrog: How “Less is More”

in Unsupervised Dependency Parsing

Valentin I. Spitkovsky with Hiyan Alshawi (Google Inc.) and Daniel Jurafsky (Stanford University)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 1 / 30

slide-2
SLIDE 2

Overview

Idea: (At Least) Two Axes worth Scaffolding

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-3
SLIDE 3

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-4
SLIDE 4

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity [classic NLP] — word alignment (unsupervised), e.g., IBM models 1-5

(Brown et al., 1993)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-5
SLIDE 5

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity [classic NLP] — word alignment (unsupervised), e.g., IBM models 1-5

(Brown et al., 1993)

— parsing (supervised), e.g., “coarse-to-fine” grammars

(Charniak and Johnson, 2005; Petrov 2009)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-6
SLIDE 6

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity [classic NLP] — word alignment (unsupervised), e.g., IBM models 1-5

(Brown et al., 1993)

— parsing (supervised), e.g., “coarse-to-fine” grammars

(Charniak and Johnson, 2005; Petrov 2009)

Data (or Problem / Task) Complexity

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-7
SLIDE 7

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity [classic NLP] — word alignment (unsupervised), e.g., IBM models 1-5

(Brown et al., 1993)

— parsing (supervised), e.g., “coarse-to-fine” grammars

(Charniak and Johnson, 2005; Petrov 2009)

Data (or Problem / Task) Complexity [rare in NLP] — reinforcement learning, e.g., robot navigation

(Singh, 1992; Sanger 1994)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-8
SLIDE 8

Overview

Idea: (At Least) Two Axes worth Scaffolding

Model (or Algorithmic) Complexity [classic NLP] — word alignment (unsupervised), e.g., IBM models 1-5

(Brown et al., 1993)

— parsing (supervised), e.g., “coarse-to-fine” grammars

(Charniak and Johnson, 2005; Petrov 2009)

Data (or Problem / Task) Complexity [rare in NLP] — reinforcement learning, e.g., robot navigation

(Singh, 1992; Sanger 1994)

— closest in NLP: cautious named entity classification

(Collins and Singer, 1999; Yarowsky, 1995)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 2 / 30

slide-9
SLIDE 9

Overview

Outline: Three Data-Complexity-Aware Techniques

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30

slide-10
SLIDE 10

Overview

Outline: Three Data-Complexity-Aware Techniques

Baby Steps: scaffolding on data complexity — iterative, requires no initialization

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30

slide-11
SLIDE 11

Overview

Outline: Three Data-Complexity-Aware Techniques

Baby Steps: scaffolding on data complexity — iterative, requires no initialization Less is More: filtering by data complexity — batch, capable of using a good initializer

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30

slide-12
SLIDE 12

Overview

Outline: Three Data-Complexity-Aware Techniques

Baby Steps: scaffolding on data complexity — iterative, requires no initialization Less is More: filtering by data complexity — batch, capable of using a good initializer Leapfrog: a combination (best of both worlds) — intended as an efficiency hack (but performs best)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 3 / 30

slide-13
SLIDE 13

The Problem

Problem: Unsupervised Learning of Parsing

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30

slide-14
SLIDE 14

The Problem

Problem: Unsupervised Learning of Parsing

Input: Raw Text ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

  • September. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30

slide-15
SLIDE 15

The Problem

Problem: Unsupervised Learning of Parsing

NN NNS VBD IN NN ♦ | | | | | | Factory payrolls fell in September . Input: Raw Text (Sentences, Tokens and POS-tags) ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

  • September. So did the Federal Reserve ...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30

slide-16
SLIDE 16

The Problem

Problem: Unsupervised Learning of Parsing

NN NNS VBD IN NN ♦ | | | | | | Factory payrolls fell in September . Input: Raw Text (Sentences, Tokens and POS-tags) ... By most measures, the nation’s industrial sector is now growing very slowly — if at all. Factory payrolls fell in

  • September. So did the Federal Reserve ...

Output: Syntactic Structures (and a Probabilistic Grammar)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 4 / 30

slide-17
SLIDE 17

The Problem

Motivation: Unsupervised (Dependency) Parsing

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-18
SLIDE 18

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-19
SLIDE 19

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks:

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-20
SLIDE 20

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-21
SLIDE 21

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal).

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-22
SLIDE 22

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-23
SLIDE 23

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-24
SLIDE 24

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation

— word alignment, phrase extraction, reordering;

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-25
SLIDE 25

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-26
SLIDE 26

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search

— retrieval, query refinement;

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-27
SLIDE 27

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search

— retrieval, query refinement;

◮ question answering Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-28
SLIDE 28

The Problem

Motivation: Unsupervised (Dependency) Parsing

Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications:

◮ machine translation

— word alignment, phrase extraction, reordering;

◮ web search

— retrieval, query refinement;

◮ question answering, speech recognition, etc. Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

slide-29
SLIDE 29

State-of-the-Art

State-of-the-Art: Directed Dependency Accuracy

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

slide-30
SLIDE 30

State-of-the-Art

State-of-the-Art: Directed Dependency Accuracy

42.2% on Section 23 (all sentences) of WSJ

(Cohen and Smith, 2009)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

slide-31
SLIDE 31

State-of-the-Art

State-of-the-Art: Directed Dependency Accuracy

42.2% on Section 23 (all sentences) of WSJ

(Cohen and Smith, 2009)

31.7% for the (right-branching) baseline

(Klein and Manning, 2004)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

slide-32
SLIDE 32

State-of-the-Art

State-of-the-Art: Directed Dependency Accuracy

42.2% on Section 23 (all sentences) of WSJ

(Cohen and Smith, 2009)

31.7% for the (right-branching) baseline

(Klein and Manning, 2004)

Scoring example:

NN NNS VBD IN NN ♦ | | | | | | Factory payrolls fell in September . Directed Score:

3 5 = 60%

(baseline:

2 5 = 40%);

Undirected Score:

4 5 = 80%

(baseline:

4 5 = 80%).

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

slide-33
SLIDE 33

State-of-the-Art

State-of-the-Art: A Brief History

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-34
SLIDE 34

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-35
SLIDE 35

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-36
SLIDE 36

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-37
SLIDE 37

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-38
SLIDE 38

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-39
SLIDE 39

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-40
SLIDE 40

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-41
SLIDE 41

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

2006 — structural biasing

(Smith and Eisner)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-42
SLIDE 42

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

2006 — structural biasing

(Smith and Eisner)

2007 — common cover link representation

(Seginer)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-43
SLIDE 43

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

2006 — structural biasing

(Smith and Eisner)

2007 — common cover link representation

(Seginer)

2008 — logistic normal priors

(Cohen et al.)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-44
SLIDE 44

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

2006 — structural biasing

(Smith and Eisner)

2007 — common cover link representation

(Seginer)

2008 — logistic normal priors

(Cohen et al.)

2009 — lexicalization and smoothing

(Headden et al.)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-45
SLIDE 45

State-of-the-Art

State-of-the-Art: A Brief History

1992 — word classes

(Carroll and Charniak)

1998 — greedy linkage via mutual information

(Yuret)

2001 — iterative re-estimation with EM

(Paskin)

2004 — right-branching baseline — valence (DMV)

(Klein and Manning)

2004 — annealing techniques

(Smith and Eisner)

2005 — contrastive estimation

(Smith and Eisner)

2006 — structural biasing

(Smith and Eisner)

2007 — common cover link representation

(Seginer)

2008 — logistic normal priors

(Cohen et al.)

2009 — lexicalization and smoothing

(Headden et al.)

2009 — soft parameter tying

(Cohen and Smith)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

slide-46
SLIDE 46

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-47
SLIDE 47

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-48
SLIDE 48

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-49
SLIDE 49

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-50
SLIDE 50

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-51
SLIDE 51

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-52
SLIDE 52

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-53
SLIDE 53

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1 a2

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-54
SLIDE 54

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1 a2

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-55
SLIDE 55

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1 a2

STOP

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-56
SLIDE 56

State-of-the-Art

State-of-the-Art: Dependency Model with Valence

a head-outward model, with word classes and valence/adjacency

(Klein and Manning, 2004)

h a1 a2

STOP

P(th) =

  • dir∈{L,R}

  PSTOP(ch, dir,

adj

  • 1n=0)

n

  • i=1

P(tai) PATTACH(ch, dir, cai) (1 − PSTOP(ch, dir,

adj

  • 1i=1))

  

n=|args(h,dir)|

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

slide-57
SLIDE 57

State-of-the-Art

State-of-the-Art: Unsupervised Learning Engine

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

slide-58
SLIDE 58

State-of-the-Art

State-of-the-Art: Unsupervised Learning Engine

EM, via inside-outside re-estimation (Baker, 1979)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

slide-59
SLIDE 59

State-of-the-Art

State-of-the-Art: Unsupervised Learning Engine

EM, via inside-outside re-estimation (Baker, 1979)

w1 wm wp−1 wp wq wq+1 N1

(Manning and Sch¨ utze, 1999)

Nj · · · · · · · · ·

α β

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

slide-60
SLIDE 60

State-of-the-Art

State-of-the-Art: Unsupervised Learning Engine

EM, via inside-outside re-estimation (Baker, 1979)

w1 wm wp−1 wp wq wq+1 N1

(Manning and Sch¨ utze, 1999)

Nj · · · · · · · · ·

α β BLACK BOX

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

slide-61
SLIDE 61

State-of-the-Art

State-of-the-Art: The Standard Corpus

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-62
SLIDE 62

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-63
SLIDE 63

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

◮ The Wall Street Journal section of the

Penn Treebank Project (Marcus et al., 1993)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-64
SLIDE 64

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

◮ The Wall Street Journal section of the

Penn Treebank Project (Marcus et al., 1993)

◮ ... stripped of punctuation, etc. Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-65
SLIDE 65

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

◮ The Wall Street Journal section of the

Penn Treebank Project (Marcus et al., 1993)

◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left

with no more than 10 POS tags;

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-66
SLIDE 66

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

◮ The Wall Street Journal section of the

Penn Treebank Project (Marcus et al., 1993)

◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left

with no more than 10 POS tags;

◮ ... and converted to reference dependencies

using “head percolation rules” (Collins, 1999).

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-67
SLIDE 67

State-of-the-Art

State-of-the-Art: The Standard Corpus

Training: WSJ10 (Klein, 2005)

◮ The Wall Street Journal section of the

Penn Treebank Project (Marcus et al., 1993)

◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left

with no more than 10 POS tags;

◮ ... and converted to reference dependencies

using “head percolation rules” (Collins, 1999).

Evaluation: Section 23 of WSJ∞ (all sentences).

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

slide-68
SLIDE 68

State-of-the-Art

State-of-the-Art: The Standard Corpus

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Sentences (1,000s) Tokens (1,000s) 100 200 300 400 500 600 700 800 900 WSJk

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30

slide-69
SLIDE 69

State-of-the-Art

State-of-the-Art: The Standard Corpus

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Sentences (1,000s) Tokens (1,000s) 100 200 300 400 500 600 700 800 900 WSJk

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30

slide-70
SLIDE 70

(At Least) Two Issues

Issue I: Why so little data?

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

slide-71
SLIDE 71

(At Least) Two Issues

Issue I: Why so little data?

extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

slide-72
SLIDE 72

(At Least) Two Issues

Issue I: Why so little data?

extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009) yet state-of-the-art unsupervised methods use even less than what’s available for supervised training...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

slide-73
SLIDE 73

(At Least) Two Issues

Issue I: Why so little data?

extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009) yet state-of-the-art unsupervised methods use even less than what’s available for supervised training... we will explore (three) judicious uses of data and simple, scalable machine learning techniques

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

slide-74
SLIDE 74

(At Least) Two Issues

Issue II: Non-convex objective...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

slide-75
SLIDE 75

(At Least) Two Issues

Issue II: Non-convex objective...

maximizing the probability of data (sentences): ˆ θUNS = arg max

θ

  • s

log

  • t∈T(s)

Pθ(t)

  • Pθ(s)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

slide-76
SLIDE 76

(At Least) Two Issues

Issue II: Non-convex objective...

maximizing the probability of data (sentences): ˆ θUNS = arg max

θ

  • s

log

  • t∈T(s)

Pθ(t)

  • Pθ(s)

supervised objective would be convex (counting): ˆ θSUP = arg max

θ

  • s

log Pθ(t∗(s)).

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

slide-77
SLIDE 77

(At Least) Two Issues

Issue II: Non-convex objective...

maximizing the probability of data (sentences): ˆ θUNS = arg max

θ

  • s

log

  • t∈T(s)

Pθ(t)

  • Pθ(s)

supervised objective would be convex (counting): ˆ θSUP = arg max

θ

  • s

log Pθ(t∗(s)). in general, ˆ θSUP = ˆ θUNS and ˆ θUNS = ˜ θUNS... (see CoNLL)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

slide-78
SLIDE 78

(At Least) Two Issues

Issue II: Non-convex objective...

maximizing the probability of data (sentences): ˆ θUNS = arg max

θ

  • s

log

  • t∈T(s)

Pθ(t)

  • Pθ(s)

supervised objective would be convex (counting): ˆ θSUP = arg max

θ

  • s

log Pθ(t∗(s)). in general, ˆ θSUP = ˆ θUNS and ˆ θUNS = ˜ θUNS... (see CoNLL) initialization matters!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

slide-79
SLIDE 79

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk
slide-80
SLIDE 80

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk
slide-81
SLIDE 81

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed

slide-82
SLIDE 82

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed

slide-83
SLIDE 83

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed

slide-84
SLIDE 84

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed

slide-85
SLIDE 85

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle

slide-86
SLIDE 86

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle

slide-87
SLIDE 87

(At Least) Two Issues

Issues: The Lay of the Land

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle K&M (Ad-Hoc Harmonic Init)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 14 / 30

slide-88
SLIDE 88

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-89
SLIDE 89

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-90
SLIDE 90

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-91
SLIDE 91

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-92
SLIDE 92

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-93
SLIDE 93

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-94
SLIDE 94

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-95
SLIDE 95

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-96
SLIDE 96

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init) incremental step: smooth WSJk; re-init WSJ(k + 1)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-97
SLIDE 97

Baby Steps

Idea I: Baby Steps ... as Non-convex Optimization

global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init) incremental step: smooth WSJk; re-init WSJ(k + 1) ... this really is grammar induction!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

slide-98
SLIDE 98

Baby Steps

Idea I: Baby Steps ... as Graduated Learning

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

slide-99
SLIDE 99

Baby Steps

Idea I: Baby Steps ... as Graduated Learning

WSJ1 — Atone (verbs!)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

slide-100
SLIDE 100

Baby Steps

Idea I: Baby Steps ... as Graduated Learning

WSJ1 — Atone (verbs!) Darkness fell. (nouns!) WSJ2 — It is. Judge Not

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

slide-101
SLIDE 101

Baby Steps

Idea I: Baby Steps ... as Graduated Learning

WSJ1 — Atone (verbs!) Darkness fell. (nouns!) WSJ2 — It is. Judge Not Become a Lobbyist (determiners!) WSJ3 — But many have. They didn’t.

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

slide-102
SLIDE 102

Baby Steps

Idea I: Baby Steps ... and Related Notions

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-103
SLIDE 103

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-104
SLIDE 104

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-105
SLIDE 105

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-106
SLIDE 106

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-107
SLIDE 107

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-108
SLIDE 108

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-109
SLIDE 109

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

stepping stones

(Brown et al., 1993)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-110
SLIDE 110

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

stepping stones

(Brown et al., 1993)

coarse-to-fine

(Charniak and Johnson, 2005)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-111
SLIDE 111

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

stepping stones

(Brown et al., 1993)

coarse-to-fine

(Charniak and Johnson, 2005)

curriculum learning

(Bengio et al., 2009)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-112
SLIDE 112

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

stepping stones

(Brown et al., 1993)

coarse-to-fine

(Charniak and Johnson, 2005)

curriculum learning

(Bengio et al., 2009)

continuation methods

(Allgower and Georg, 1990)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-113
SLIDE 113

Baby Steps

Idea I: Baby Steps ... and Related Notions

shaping

(Skinner, 1938)

less is more

(Kail, 1984; Newport, 1988; 1990)

starting small

(Elman, 1993)

◮ scaffold on model complexity

[restrict memory]

◮ scaffold on data complexity

[restrict input]

controversy!

(Rohde and Plaut, 1999)

stepping stones

(Brown et al., 1993)

coarse-to-fine

(Charniak and Johnson, 2005)

curriculum learning

(Bengio et al., 2009)

continuation methods

(Allgower and Georg, 1990)

successive approximations!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 17 / 30

slide-114
SLIDE 114

Baby Steps

Idea I: Baby Steps ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle K&M (Ad-Hoc Harmonic Init)

slide-115
SLIDE 115

Baby Steps

Idea I: Baby Steps ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle K&M Baby Steps

slide-116
SLIDE 116

Baby Steps

Idea I: Baby Steps ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Uninformed Oracle K&M Baby Steps

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 18 / 30

slide-117
SLIDE 117

Baby Steps

Idea I: Baby Steps ... Concerns?

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30

slide-118
SLIDE 118

Baby Steps

Idea I: Baby Steps ... Concerns?

ignores a good initializer

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30

slide-119
SLIDE 119

Baby Steps

Idea I: Baby Steps ... Concerns?

ignores a good initializer unnecessarily meticulous

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30

slide-120
SLIDE 120

Baby Steps

Idea I: Baby Steps ... Concerns?

ignores a good initializer unnecessarily meticulous excruciatingly slow!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30

slide-121
SLIDE 121

Baby Steps

Idea I: Baby Steps ... Concerns?

ignores a good initializer unnecessarily meticulous excruciatingly slow! about a year behind state-of-the-art (on long sentences)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 19 / 30

slide-122
SLIDE 122

Less is More

Idea II: Less is More

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30

slide-123
SLIDE 123

Less is More

Idea II: Less is More

short sentences are not representative (and few)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30

slide-124
SLIDE 124

Less is More

Idea II: Less is More

short sentences are not representative (and few) long sentences are overwhelmingly difficult ...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30

slide-125
SLIDE 125

Less is More

Idea II: Less is More

short sentences are not representative (and few) long sentences are overwhelmingly difficult ... is there a sweet spot data gradation?

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30

slide-126
SLIDE 126

Less is More

Idea II: Less is More

short sentences are not representative (and few) long sentences are overwhelmingly difficult ... is there a sweet spot data gradation? perhaps train where Baby Steps flatlines!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 20 / 30

slide-127
SLIDE 127

Less is More

Idea II: Less is More ... the Learning Curve

5 10 15 20 25 30 35 40 45 3.0 3.5 4.0 4.5 5.0 WSJk bpt

Cross-entropy h (in bits per token) on WSJ45

slide-128
SLIDE 128

Less is More

Idea II: Less is More ... the Learning Curve

5 10 15 20 25 30 35 40 45 3.0 3.5 4.0 4.5 5.0 WSJk bpt

Cross-entropy h (in bits per token) on WSJ45

Knee [7, 15] Tight, Flat, Asymptotic Bound

slide-129
SLIDE 129

Less is More

Idea II: Less is More ... the Learning Curve

5 10 15 20 25 30 35 40 45 3.0 3.5 4.0 4.5 5.0 WSJk bpt

Cross-entropy h (in bits per token) on WSJ45

Knee [7, 15] Tight, Flat, Asymptotic Bound

— automatically detect the knee: [7, 15]

slide-130
SLIDE 130

Less is More

Idea II: Less is More ... the Learning Curve

5 10 15 20 25 30 35 40 45 3.0 3.5 4.0 4.5 5.0 WSJk bpt

Cross-entropy h (in bits per token) on WSJ45

Knee [7, 15] Tight, Flat, Asymptotic Bound

— automatically detect the knee: [7, 15] — train at the “sweet spot” gradation: WSJ15

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 21 / 30

slide-131
SLIDE 131

Less is More

Idea II: Less is More ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%) Oracle Baby Steps

  • n WSJk

K&M

slide-132
SLIDE 132

Less is More

Idea II: Less is More ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90

  • n WSJ40

WSJk Directed Dependency Accuracy (%) Oracle Baby Steps

slide-133
SLIDE 133

Less is More

Idea II: Less is More ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90

  • n WSJ40

WSJk Directed Dependency Accuracy (%) Oracle Baby Steps Less is More

  • K&M∗

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 22 / 30

slide-134
SLIDE 134

Less is More

Idea II: Less is More ... Concerns?

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30

slide-135
SLIDE 135

Less is More

Idea II: Less is More ... Concerns?

discards most of the data

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30

slide-136
SLIDE 136

Less is More

Idea II: Less is More ... Concerns?

discards most of the data beats state-of-the-art (on long sentences, off WSJ15)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30

slide-137
SLIDE 137

Less is More

Idea II: Less is More ... Concerns?

discards most of the data beats state-of-the-art (on long sentences, off WSJ15) ignores a decent complementary initialization strategy

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 23 / 30

slide-138
SLIDE 138

Leapfrog

Idea III: Leapfrog ... a Hack

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-139
SLIDE 139

Leapfrog

Idea III: Leapfrog ... a Hack

use both good systems!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-140
SLIDE 140

Leapfrog

Idea III: Leapfrog ... a Hack

use both good systems! thorough training up to WSJ15, where it’s cheap

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-141
SLIDE 141

Leapfrog

Idea III: Leapfrog ... a Hack

use both good systems! thorough training up to WSJ15, where it’s cheap use both good initializers (mix their best parse trees)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-142
SLIDE 142

Leapfrog

Idea III: Leapfrog ... a Hack

use both good systems! thorough training up to WSJ15, where it’s cheap use both good initializers (mix their best parse trees) execute just a few steps of EM where it’s expensive

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-143
SLIDE 143

Leapfrog

Idea III: Leapfrog ... a Hack

use both good systems! thorough training up to WSJ15, where it’s cheap use both good initializers (mix their best parse trees) execute just a few steps of EM where it’s expensive hop on from WSJ15 to WSJ45, via WSJ30...

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 24 / 30

slide-144
SLIDE 144

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps

slide-145
SLIDE 145

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps K&M∗

slide-146
SLIDE 146

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps K&M∗

slide-147
SLIDE 147

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps K&M∗

slide-148
SLIDE 148

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps K&M∗

slide-149
SLIDE 149

Leapfrog

Idea III: Leapfrog ... Results!

5 10 15 20 25 30 35 40 20 30 40 50 60 70 80 90 WSJk Directed Dependency Accuracy (%)

  • n WSJk

Oracle Uninformed Baby Steps K&M∗ Leapfrog

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 25 / 30

slide-150
SLIDE 150

Results

Results: ... on Section 23 of WSJ

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-151
SLIDE 151

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-152
SLIDE 152

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7% DMV @10 34.2%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-153
SLIDE 153

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7% DMV @10 34.2% Baby Steps @15 39.2% Baby Steps @45 39.4%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-154
SLIDE 154

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7% DMV @10 34.2% Baby Steps @15 39.2% Baby Steps @45 39.4% Soft Parameter Tying

(Cohen and Smith, 2009)

42.2%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-155
SLIDE 155

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7% DMV @10 34.2% Baby Steps @15 39.2% Baby Steps @45 39.4% Soft Parameter Tying

(Cohen and Smith, 2009)

42.2% Less is More @15 44.1%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-156
SLIDE 156

Results

Results: ... on Section 23 of WSJ

Right-Branching

(Klein and Manning, 2004)

31.7% DMV @10 34.2% Baby Steps @15 39.2% Baby Steps @45 39.4% Soft Parameter Tying

(Cohen and Smith, 2009)

42.2% Less is More @15 44.1% Leapfrog @45 45.0%

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 26 / 30

slide-157
SLIDE 157

Conclusion

Summary

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30

slide-158
SLIDE 158

Conclusion

Summary

explored scaffolding on data complexity

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30

slide-159
SLIDE 159

Conclusion

Summary

explored scaffolding on data complexity awareness of data complexity does help!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30

slide-160
SLIDE 160

Conclusion

Summary

explored scaffolding on data complexity awareness of data complexity does help! beats state-of-the-art with older techniques

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 27 / 30

slide-161
SLIDE 161

Conclusion

Conclusion

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-162
SLIDE 162

Conclusion

Conclusion

(need a less adversarial learning algorithm)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-163
SLIDE 163

Conclusion

Conclusion

(need a less adversarial learning algorithm) paradox: improved performance with less data

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-164
SLIDE 164

Conclusion

Conclusion

(need a less adversarial learning algorithm) paradox: improved performance with less data despite discarding samples from the true (test) distribution

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-165
SLIDE 165

Conclusion

Conclusion

(need a less adversarial learning algorithm) paradox: improved performance with less data despite discarding samples from the true (test) distribution focusing on simple examples guides unsupervised learning

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-166
SLIDE 166

Conclusion

Conclusion

(need a less adversarial learning algorithm) paradox: improved performance with less data despite discarding samples from the true (test) distribution focusing on simple examples guides unsupervised learning mirrors supervised boosting (Freund and Schapire, 1997)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 28 / 30

slide-167
SLIDE 167

Conclusion

Teaser

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30

slide-168
SLIDE 168

Conclusion

Teaser

we push the state-of-the-art further, to 50.4% (up another 5%) using even faster and simpler methods!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30

slide-169
SLIDE 169

Conclusion

Teaser

we push the state-of-the-art further, to 50.4% (up another 5%) using even faster and simpler methods! ... hear us at CoNLL and ACL (Spitkovsky et al., 2010)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30

slide-170
SLIDE 170

Conclusion

Teaser

we push the state-of-the-art further, to 50.4% (up another 5%) using even faster and simpler methods! ... hear us at CoNLL and ACL (Spitkovsky et al., 2010) similar approaches may apply in other settings (e.g., word alignment)

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30

slide-171
SLIDE 171

Conclusion

Teaser

we push the state-of-the-art further, to 50.4% (up another 5%) using even faster and simpler methods! ... hear us at CoNLL and ACL (Spitkovsky et al., 2010) similar approaches may apply in other settings (e.g., word alignment) ... more to come!

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 29 / 30

slide-172
SLIDE 172

Conclusion

Thanks!

Questions?

Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 30 / 30