Dynamic Programming for Linear-Time Incremental Parsing Liang Huang - - PowerPoint PPT Presentation

dynamic programming for linear time incremental parsing
SMART_READER_LITE
LIVE PREVIEW

Dynamic Programming for Linear-Time Incremental Parsing Liang Huang - - PowerPoint PPT Presentation

Dynamic Programming for Linear-Time Incremental Parsing Liang Huang Information Sciences Institute University of Southern California (Joint work with Kenji Sagae, USC/ICT) JHU CLSP Seminar September 14, 2010 Remembering Fred Jelinek


slide-1
SLIDE 1

Dynamic Programming for Linear-Time Incremental Parsing

Liang Huang

Information Sciences Institute University of Southern California

(Joint work with Kenji Sagae, USC/ICT)

JHU CLSP Seminar September 14, 2010

slide-2
SLIDE 2

Remembering Fred Jelinek (1932-2010)

  • Prof. Jelinek hosted my visit and this talk on his last day.

He was very supportive of this work, which is related to his work on structured language models, and I dedicate my work to his memory.

slide-3
SLIDE 3

DP for Incremental Parsing

Ambiguity and Incrementality

3

  • NLP is (almost) all about ambiguity resolution
  • human-beings resolve ambiguity incrementally
slide-4
SLIDE 4

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas;

  • NLP is (almost) all about ambiguity resolution
  • human-beings resolve ambiguity incrementally
slide-5
SLIDE 5

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas; how he got into my pajamas I’ll never know.

  • NLP is (almost) all about ambiguity resolution
  • human-beings resolve ambiguity incrementally
slide-6
SLIDE 6

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas; how he got into my pajamas I’ll never know.

  • NLP is (almost) all about ambiguity resolution
  • human-beings resolve ambiguity incrementally
slide-7
SLIDE 7

DP for Incremental Parsing

Ambiguity and Incrementality

3

One morning in Africa, I shot an elephant in my pajamas; how he got into my pajamas I’ll never know.

  • NLP is (almost) all about ambiguity resolution
  • human-beings resolve ambiguity incrementally
slide-8
SLIDE 8

CS 562 - Intro

Ambiguities in Translation

4

slide-9
SLIDE 9

CS 562 - Intro

Ambiguities in Translation

4

slide-10
SLIDE 10

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

slide-11
SLIDE 11

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

slide-12
SLIDE 12

CS 562 - Intro

Ambiguities in Translation

4

Google translate: carefully slide

slide-13
SLIDE 13

CS 562 - Intro

If you are stolen...

5

slide-14
SLIDE 14

CS 562 - Intro

If you are stolen...

5

Google translate: Once the theft to the police

slide-15
SLIDE 15

CS 562 - Intro

  • r even...

6

slide-16
SLIDE 16

CS 562 - Intro

  • r even...

6

clear evidence that NLP is used in real life!

slide-17
SLIDE 17

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

  • let’s focus on dependency structures for simplicity
  • ambiguous attachments of nearby and in
  • ambiguity explodes exponentially with sentence length
  • must design efficient (polynomial) search algorithm
  • typically using dynamic programming (DP); e.g. CKY

7

slide-18
SLIDE 18

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

  • let’s focus on dependency structures for simplicity
  • ambiguous attachments of nearby and in
  • ambiguity explodes exponentially with sentence length
  • must design efficient (polynomial) search algorithm
  • typically using dynamic programming (DP); e.g. CKY

7

slide-19
SLIDE 19

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

  • let’s focus on dependency structures for simplicity
  • ambiguous attachments of nearby and in
  • ambiguity explodes exponentially with sentence length
  • must design efficient (polynomial) search algorithm
  • typically using dynamic programming (DP); e.g. CKY

7

slide-20
SLIDE 20

DP for Incremental Parsing

Ambiguities in Parsing

I feed cats nearby in the garden ...

  • let’s focus on dependency structures for simplicity
  • ambiguous attachments of nearby and in
  • ambiguity explodes exponentially with sentence length
  • must design efficient (polynomial) search algorithm
  • typically using dynamic programming (DP); e.g. CKY

7

slide-21
SLIDE 21

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

  • full DP (like CKY) is too slow (cubic-time)
  • while human parsing is fast & incremental (linear-time)
slide-22
SLIDE 22

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

  • full DP (like CKY) is too slow (cubic-time)
  • while human parsing is fast & incremental (linear-time)
  • how about incremental parsing then?
  • yes, but only with greedy search (accuracy suffers)
  • explores tiny fraction of trees (even w/ beam search)
slide-23
SLIDE 23

DP for Incremental Parsing

But full DP is too slow...

8

I feed cats nearby in the garden ...

  • full DP (like CKY) is too slow (cubic-time)
  • while human parsing is fast & incremental (linear-time)
  • how about incremental parsing then?
  • yes, but only with greedy search (accuracy suffers)
  • explores tiny fraction of trees (even w/ beam search)
  • can we combine the merits of both approaches?
  • a fast, incremental parser with dynamic programming?
  • explores exponentially many trees in linear-time?
slide-24
SLIDE 24

DP for Incremental Parsing

Linear-Time Incremental DP

9

greedy search principled search

incremental parsing

(e.g. shift-reduce)

(Nivre 04; Collins/Roark 04; ...)

this work:

fast shift-reduce parsing

with dynamic programming

full DP

(e.g. CKY)

(Eisner 96; Collins 99; ...)

fast (linear-time) slow (cubic-time)

☺ ☺

slide-25
SLIDE 25

DP for Incremental Parsing

Big Picture

10

natural languages programming languages human computer

psycholinguistics

☹NLP

compiler theory

(LR, LALR, ...)

slide-26
SLIDE 26

DP for Incremental Parsing

Big Picture

10

natural languages programming languages human computer

psycholinguistics

☹NLP

compiler theory

(LR, LALR, ...)

slide-27
SLIDE 27

DP for Incremental Parsing

Preview of the Results

  • very fast linear-time dynamic programming parser
  • best reported dependency accuracy on PTB/CTB
  • explores exponentially many trees (and outputs forest)

11

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length

slide-28
SLIDE 28

DP for Incremental Parsing

Preview of the Results

  • very fast linear-time dynamic programming parser
  • best reported dependency accuracy on PTB/CTB
  • explores exponentially many trees (and outputs forest)

11

C h a r n i a k B e r k e l e y MST this work

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length

slide-29
SLIDE 29

DP for Incremental Parsing

Preview of the Results

  • very fast linear-time dynamic programming parser
  • best reported dependency accuracy on PTB/CTB
  • explores exponentially many trees (and outputs forest)

11

C h a r n i a k B e r k e l e y MST this work

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length 100 102 104 106 108 1010 0 10 20 30 40 50 60 70 sentence length

DP: exponential

non-DP beam search

slide-30
SLIDE 30

DP for Incremental Parsing

Outline

  • Motivation
  • Incremental (Shift-Reduce) Parsing
  • Dynamic Programming for Incremental Parsing
  • Experiments

12

slide-31
SLIDE 31

DP for Incremental Parsing

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

Shift-Reduce Parsing

13

action stack queue

I feed cats nearby in the garden.

slide-32
SLIDE 32

DP for Incremental Parsing

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

Shift-Reduce Parsing

14

action stack queue

I feed cats nearby in the garden.

I

  • 1

shift

slide-33
SLIDE 33

DP for Incremental Parsing

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

Shift-Reduce Parsing

15

action stack queue

I feed cats nearby in the garden.

I feed I

  • 1

shift 2 shift

slide-34
SLIDE 34

DP for Incremental Parsing

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

Shift-Reduce Parsing

16

action stack queue

I feed cats nearby in the garden.

I feed I feed

I

  • 1

shift 2 shift 3 l-reduce

slide-35
SLIDE 35

DP for Incremental Parsing

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

Shift-Reduce Parsing

17

action stack queue

I feed cats nearby in the garden.

I feed I feed

I

feed cats

I

  • 1

shift 2 shift 3 l-reduce 4 shift

slide-36
SLIDE 36

DP for Incremental Parsing

Shift-Reduce Parsing

18

action stack queue

I feed cats nearby in the garden.

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

I feed I feed

I

feed cats

I

feed

I

cats

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce

slide-37
SLIDE 37

DP for Incremental Parsing

Shift-Reduce Parsing

19

action stack queue

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

I feed cats nearby in the garden.

I feed I feed

I

feed cats

I

feed

I

cats

feed cats nearby

I

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift

slide-38
SLIDE 38

DP for Incremental Parsing

Shift-Reduce Parsing

20

action stack queue

shift-reduce conflict

I feed cats nearby in the garden.

I feed cats ... feed cats nearby ... cats nearby in ... cats nearby in ... nearby in the ... nearby in the ... in the garden ...

I feed I feed

I

feed cats

I

feed

I

cats

feed cats nearby

I

  • 1

shift 2 shift 3 l-reduce 4 shift 5a r-reduce 5b shift

slide-39
SLIDE 39

DP for Incremental Parsing

Choosing Parser Actions

  • score each action using features f and weights w
  • features are drawn from a local window
  • abstraction (or signature) of a state -- this inspires DP!
  • weights trained by structured perceptron (Collins 02)

21

... s2 s1 s0 q0 q1 ...

← stack queue → ← stack queue →

features: (s0.w, s0.rc, q0, ...) = (cats, nearby, in, ...)

... feed cats

I nearby

in the garden ...

slide-40
SLIDE 40

DP for Incremental Parsing

Greedy Search

22

  • each state => three new states (shift, l-reduce, r-reduce)
  • search space should be exponential
  • greedy search: always pick the best next state
slide-41
SLIDE 41

DP for Incremental Parsing

Greedy Search

23

  • each state => three new states (shift, l-reduce, r-reduce)
  • search space should be exponential
  • greedy search: always pick the best next state
slide-42
SLIDE 42

DP for Incremental Parsing

Beam Search

24

  • each state => three new states (shift, l-reduce, r-reduce)
  • search space should be exponential
  • beam search: always keep top-b states
slide-43
SLIDE 43

DP for Incremental Parsing

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

25

slide-44
SLIDE 44

DP for Incremental Parsing

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

26

“graph-structured stack” (Tomita, 1988)

slide-45
SLIDE 45

DP for Incremental Parsing

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

27

“graph-structured stack” (Tomita, 1988)

slide-46
SLIDE 46

DP for Incremental Parsing

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

27

“graph-structured stack” (Tomita, 1988)

each DP state corresponds to exponentially many non-DP states

slide-47
SLIDE 47

DP for Incremental Parsing

Dynamic Programming

  • each state => three new states (shift, l-reduce, r-reduce)
  • key idea of DP: share common subproblems
  • merge equivalent states => polynomial space

28

“graph-structured stack” (Tomita, 1988)

100 102 104 106 108 1010 0 10 20 30 40 50 60 70 sentence length

DP: exponential

non-DP beam search

each DP state corresponds to exponentially many non-DP states

slide-48
SLIDE 48

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I

29

... s2 s1 s0 q0 q1 ...

← stack queue → ... cats

re feed

... feed I

sh sh

slide-49
SLIDE 49

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I

29

... s2 s1 s0 q0 q1 ...

← stack queue → ... cats

re feed

... feed I

sh sh

assume features only look at root of s0

slide-50
SLIDE 50

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I

29

... s2 s1 s0 q0 q1 ...

← stack queue → ... cats

re feed

... feed I

sh sh

assume features only look at root of s0 two states are equivalent if they agree on root of s0

slide-51
SLIDE 51

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I

29

... s2 s1 s0 q0 q1 ...

← stack queue → ... cats

re feed

... feed I

sh sh

assume features only look at root of s0 two states are equivalent if they agree on root of s0

slide-52
SLIDE 52

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I cats

30

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed ...

slide-53
SLIDE 53

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I

  • feed cats nearby in the garden

I cats

30

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed ...

slide-54
SLIDE 54

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed cats nearby in the garden

I nearby

  • feed cats nearby in the garden

I cats

31

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats sh ... nearby

...

slide-55
SLIDE 55

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

32

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats

... feed

re

... feed

re sh ... nearby

...

slide-56
SLIDE 56

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

32

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats

... feed

re

... feed

re sh ... nearby

...

slide-57
SLIDE 57

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

33

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats re

... feed

re sh ... nearby

...

slide-58
SLIDE 58

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

33

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats re

... feed

re sh ... nearby

...

(local) ambiguity-packing!

slide-59
SLIDE 59

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

34

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats re

... feed

re sh ... nearby

... in

sh

...

slide-60
SLIDE 60

DP for Incremental Parsing

Merging Equivalent States

  • two states are equivalent if they agree on features
  • because same features guarantee same cost
  • shift-reduce conflict:
  • feed in the garden

I cats nearby

  • feed in the garden

I cats nearby

34

... s2 s1 s0 q0 q1 ...

← stack queue →

sh re

... cats ... nearby ... feed

re ... cats re

... feed

re sh ... nearby

... in

sh

...

graph-structured stack

slide-61
SLIDE 61

DP for Incremental Parsing

Theory: Polynomial-Time DP

  • this DP is exact and polynomial-time if features are:
  • a) bounded -- for polynomial time
  • features can only look at a local window
  • b) monotonic -- for correctness (optimal substructure)
  • features should draw no more info from trees farther

away from stack top than from trees closer to top

  • both are intuitive: a) always true; b) almost always true

35

... s2 s1 s0 q0 q1 ...

← stack queue →

slide-62
SLIDE 62

DP for Incremental Parsing

Theory: Monotonic History

  • related: grammar refinement by annotation (Johnson, 1998)
  • annotate vertical context history (e.g., parent)
  • monotonicity: can’t annotate grand-parent without

annotating the parent (otherwise DP would fail)

  • our features: left-context history instead of vertical-context
  • similarly, can’t annotate s2 without annotating s1
  • but we can always design “minimum monotonic superset”

36

parent grand-parent s1 s2 s0

stack

slide-63
SLIDE 63

DP for Incremental Parsing

Related Work

  • Graph-Structured Stack (Tomita 88): Generalized LR
  • GSS is just a chart viewed from left to right (e.g. Earley 70)
  • this line of work started w/ Lang (1974); stuck since 1990
  • b/c explicit LR table is impossible with modern grammars
  • Jelinek (2004) independently rediscovered GSS

37

slide-64
SLIDE 64

DP for Incremental Parsing

Related Work

  • Graph-Structured Stack (Tomita 88): Generalized LR
  • GSS is just a chart viewed from left to right (e.g. Earley 70)
  • this line of work started w/ Lang (1974); stuck since 1990
  • b/c explicit LR table is impossible with modern grammars
  • Jelinek (2004) independently rediscovered GSS
  • We revived and advanced this line of work in two aspects
  • theoretical: implicit LR table based on features
  • merge and split on-the-fly; no pre-compilation needed
  • monotonic feature functions guarantee correctness (new)
  • practical: achieved linear-time performance with pruning

37

slide-65
SLIDE 65

DP for Incremental Parsing

Jelinek (2004)

38 In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

slide-66
SLIDE 66

DP for Incremental Parsing

Jelinek (2004)

38

graph-structured stack!

In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

slide-67
SLIDE 67

DP for Incremental Parsing

Jelinek (2004)

38

I don’t know anything about this paper... graph-structured stack!

In: M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.): Mathematical Foundations of Speech and Language Processing, 2004

slide-68
SLIDE 68

DP for Incremental Parsing

Jelinek (2004)

  • structured language model as graph-structured stack

39

see also (Chelba and Jelinek, 98; 00; Xu, Chelba, Jelinek, 02)

slide-69
SLIDE 69

DP for Incremental Parsing

Jelinek (2004)

  • structured language model as graph-structured stack

39

pSLM( a | has, show) p3gram( a | its, host)

see also (Chelba and Jelinek, 98; 00; Xu, Chelba, Jelinek, 02)

slide-70
SLIDE 70

Experiments

slide-71
SLIDE 71

DP for Incremental Parsing

Speed Comparison

41

  • 5 times faster with the same parsing accuracy

time (hours)

n

  • n
  • D

P DP

slide-72
SLIDE 72

DP for Incremental Parsing

Correlation of Search and Parsing

42

92.2 92.3 92.4 92.5 92.6 92.7 92.8 92.9 93 93.1 2365 2370 2375 2380 2385 2390 2395 dependency accuracy average model score DP non-DP

  • better search quality <=> better parsing accuracy
slide-73
SLIDE 73

DP for Incremental Parsing

Search Space: Exponential

43

100 102 104 106 108 1010 0 10 20 30 40 50 60 70 sentence length

DP: exponential

non-DP: fixed (beam-width)

number of trees explored

slide-74
SLIDE 74

DP for Incremental Parsing

N-Best / Forest Oracles

44

DP forest oracle (98.15) DP k-best in forest n

  • n
  • D

P k

  • b

e s t i n b e a m

slide-75
SLIDE 75

DP for Incremental Parsing

Better Search => Better Learning

45

  • DP leads to faster and better learning w/ perceptron
slide-76
SLIDE 76

DP for Incremental Parsing

Learning Details: Early Updates

  • greedy search: update at first error
  • beam search: update when gold is pruned (Collins/Roark 04)
  • DP search: also update when gold is “merged” (new!)
  • b/c we know gold can’t make to the top again

46

DP non-DP

slide-77
SLIDE 77

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

  • parsing speed (scatter plot) compared to other parsers

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length

slide-78
SLIDE 78

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

  • parsing speed (scatter plot) compared to other parsers

C h a r n i a k B e r k e l e y MST t h i s w

  • r

k

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length

slide-79
SLIDE 79

DP for Incremental Parsing

Parsing Time vs. Sentence Length

47

  • parsing speed (scatter plot) compared to other parsers

C h a r n i a k B e r k e l e y MST t h i s w

  • r

k

0.2 0.4 0.6 0.8 1 1.2 1.4 0 10 20 30 40 50 60 70 parsing time (secs) sentence length

O(n2) O(n) O(n2.4) O(n2.5)

slide-80
SLIDE 80

DP for Incremental Parsing

Final Results

  • much faster than major parsers (even with Python!)
  • first linear-time incremental dynamic programming parser
  • best reported dependency accuracy on Penn Treebank

time

complexity trees searched

0.12

O(n2)

exponential

  • O(n4)

exponential 0.11

O(n)

constant 0.04

O(n)

exponential 0.49

O(n2.5)

exponential 0.21

O(n2.4)

exponential

McDonald et al 05 - MST Koo et al 08 baseline* Zhang & Clark 08 single this work Charniak 00 Petrov & Klein 07 89 91 93 92.4 92.5 92.1 91.4 92.0 90.2

slide-81
SLIDE 81

DP for Incremental Parsing

Final Results

  • much faster than major parsers (even with Python!)
  • first linear-time incremental dynamic programming parser
  • best reported dependency accuracy on Penn Treebank

time

complexity trees searched

0.12

O(n2)

exponential

  • O(n4)

exponential 0.11

O(n)

constant 0.04

O(n)

exponential 0.49

O(n2.5)

exponential 0.21

O(n2.4)

exponential

McDonald et al 05 - MST Koo et al 08 baseline* Zhang & Clark 08 single this work Charniak 00 Petrov & Klein 07 89 91 93 92.4 92.5 92.1 91.4 92.0 90.2

slide-82
SLIDE 82

DP for Incremental Parsing

Final Results

  • much faster than major parsers (even with Python!)
  • first linear-time incremental dynamic programming parser
  • best reported dependency accuracy on Penn Treebank

time

complexity trees searched

0.12

O(n2)

exponential

  • O(n4)

exponential 0.11

O(n)

constant 0.04

O(n)

exponential 0.49

O(n2.5)

exponential 0.21

O(n2.4)

exponential

McDonald et al 05 - MST Koo et al 08 baseline* Zhang & Clark 08 single this work Charniak 00 Petrov & Klein 07 89 91 93 92.4 92.5 92.1 91.4 92.0 90.2

*at this ACL: Koo & Collins 10: 93.0 with O(n4)

slide-83
SLIDE 83

DP for Incremental Parsing

Final Results on Chinese

  • also the best parsing accuracy on Chinese
  • Penn Chinese Treebank (CTB 5)
  • all numbers below use gold-standard POS tags

49

Duan et al. 2007 Zhang & Clark 08 (single) this work 70 85

78.3 76.7 73.7

85.5 84.7 84.4

85.2 84.3 83.9

word non-root root

slide-84
SLIDE 84

DP for Incremental Parsing

Conclusion

50

greedy search principled search

incremental parsing

(e.g. shift-reduce)

☺✓

full dynamic programming

(e.g. CKY)

fast (linear-time) slow (cubic-time)

slide-85
SLIDE 85

DP for Incremental Parsing

Conclusion

50

greedy search principled search

incremental parsing

(e.g. shift-reduce)

☺✓

full dynamic programming

(e.g. CKY)

fast (linear-time) slow (cubic-time)

linear-time

shift-reduce parsing

w/ dynamic programming

slide-86
SLIDE 86

DP for Incremental Parsing

Zoom out to Big Picture...

51

natural languages programming languages human computer

psycholinguistics

☹ ☺?

NLP

compiler theory still a long way to go...

slide-87
SLIDE 87

DP for Incremental Parsing

Thank You

  • a general theory of DP for shift-reduce parsing
  • as long as features are bounded and monotonic
  • fast, accurate DP parser release coming soon:
  • http://www.isi.edu/~lhuang
  • future work
  • adapt to constituency parsing (straightforward)
  • other grammar formalisms like CCG and TAG
  • integrate POS tagging into the parser
  • integrate semantic interpretation

52

slide-88
SLIDE 88

DP for Incremental Parsing

How I was invited to give this talk

  • Fred attended ACL 2010 in Sweden
  • Mark Johnson mentioned to him about this work
  • Fred saw my co-author Kenji Sagae giving the talk
  • but didn’t realize it was Kenji; he thought it was me
  • he emailed me (but mis-spelled my name in the address)
  • not getting a reply, he asked Kevin Knight to “forward it to

Liang Haung or his student Sagae.”

  • Fred complained that my paper is very hard to read

“As you can see, I am completely confused!” And he was right.

  • finally he said “come here to give a talk and explain it.”

53