NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig - - PowerPoint PPT Presentation

nlp programming tutorial 8 phrase structure parsing
SMART_READER_LITE
LIVE PREVIEW

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig - - PowerPoint PPT Presentation

NLP Programming Tutorial 8 Phrase Structure Parsing NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST) 1 NLP Programming Tutorial 8 Phrase Structure Parsing


slide-1
SLIDE 1

1

NLP Programming Tutorial 8 – Phrase Structure Parsing

NLP Programming Tutorial 8 - Phrase Structure Parsing

Graham Neubig Nara Institute of Science and Technology (NAIST)

slide-2
SLIDE 2

2

NLP Programming Tutorial 8 – Phrase Structure Parsing

Interpreting Language is Hard!

I saw a girl with a telescope

  • “Parsing” resolves structural ambiguity in a formal way
slide-3
SLIDE 3

3

NLP Programming Tutorial 8 – Phrase Structure Parsing

Two Types of Parsing

  • Dependency: focuses on relations between words
  • Phrase structure: focuses on identifying phrases and

their recursive structure

I saw a girl with a telescope I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S NP

slide-4
SLIDE 4

4

NLP Programming Tutorial 8 – Phrase Structure Parsing

Recursive Structure?

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S NP

slide-5
SLIDE 5

5

NLP Programming Tutorial 8 – Phrase Structure Parsing

Recursive Structure?

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S NP

slide-6
SLIDE 6

6

NLP Programming Tutorial 8 – Phrase Structure Parsing

Recursive Structure?

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP

slide-7
SLIDE 7

7

NLP Programming Tutorial 8 – Phrase Structure Parsing

Recursive Structure?

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP

slide-8
SLIDE 8

8

NLP Programming Tutorial 8 – Phrase Structure Parsing

Recursive Structure?

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP

slide-9
SLIDE 9

9

NLP Programming Tutorial 8 – Phrase Structure Parsing

Different Structure, Different Interpretation I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP NP

slide-10
SLIDE 10

10

NLP Programming Tutorial 8 – Phrase Structure Parsing

Different Structure, Different Interpretation I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP NP

slide-11
SLIDE 11

11

NLP Programming Tutorial 8 – Phrase Structure Parsing

Different Structure, Different Interpretation I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP NP

slide-12
SLIDE 12

12

NLP Programming Tutorial 8 – Phrase Structure Parsing

Different Structure, Different Interpretation I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S

???

NP NP

slide-13
SLIDE 13

13

NLP Programming Tutorial 8 – Phrase Structure Parsing

Non-Terminals, Pre-Terminals, Terminals

I saw a girl with a telescope

PRP VBD DT NN IN DT NN NP NP PP VP S NP

Pre-Terminal Non-Terminal Terminal

slide-14
SLIDE 14

14

NLP Programming Tutorial 8 – Phrase Structure Parsing

Parsing as a Prediction Problem

  • Given a sentence X, predict its parse tree Y
  • A type of “structured” prediction (similar to POS

tagging, word segmentation, etc.)

I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S NP

slide-15
SLIDE 15

15

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Model for Parsing

  • Given a sentence X, predict the most probable parse

tree Y

I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S

argmax

Y

P(Y∣X)

NP

slide-16
SLIDE 16

16

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Generative Model

  • We assume some probabilistic model generated the

parse tree Y and sentence X jointly

  • The parse tree with highest joint probability given X

also has the highest conditional probability

P(Y , X)

argmax

Y

P(Y∣X)=argmax

Y

P(Y , X)

slide-17
SLIDE 17

17

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Context Free Grammar (PCFG)

  • How do we define a joint probability for a parse tree?

P( )

I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S NP

slide-18
SLIDE 18

18

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Context Free Grammar (PCFG)

  • PCFG: Define probability for each node

I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S P(S → NP VP) P(PRP → “I”) P(VP → VBD NP PP) P(PP → IN NP) P(NP → DT NN) P(NN → “telescope”) NP

slide-19
SLIDE 19

19

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Context Free Grammar (PCFG)

  • PCFG: Define probability for each node
  • Parse tree probability is product of node probabilities

P(S → NP VP) * P(NP → PRP) * P(PRP → “I”) * P(VP → VBD NP PP) * P(VBD → “saw”) * P(NP → DT NN) * P(DT → “a”) * P(NN → “girl”) * P(PP → IN NP) * P(IN → “with”) * P(NP → DT NN) * P(DT → “a”) * P(NN → “telescope”)

I saw a girl with a telescope

PRPVBD DT NN IN DT NN NP NP PP VP S P(S → NP VP) P(PRP → “I”) P(VP → VBD NP PP) P(PP → IN NP) P(NP → DT NN) P(NN → “telescope”) NP

slide-20
SLIDE 20

20

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Parsing

  • Given this model, parsing is the algorithm to find
  • Can we use the Viterbi algorithm as we did before?

argmax

Y

P(Y , X)

slide-21
SLIDE 21

21

NLP Programming Tutorial 8 – Phrase Structure Parsing

Probabilistic Parsing

  • Given this model, parsing is the algorithm to find
  • Can we use the Viterbi algorithm as we did before?
  • Answer: No!
  • Reason: Parse candidates are not graphs, but

hypergraphs.

argmax

Y

P(Y , X)

slide-22
SLIDE 22

22

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • Let's say we have

two parse trees

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1 NP 0,1

slide-23
SLIDE 23

23

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • Most parts are the

same!

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1 NP 0,1

slide-24
SLIDE 24

24

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • Graph with all same edges + all nodes

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1

slide-25
SLIDE 25

25

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • Create graph with all same edges + all nodes

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1

slide-26
SLIDE 26

26

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • With the edges in the first trees:

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1

slide-27
SLIDE 27

27

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • With the edges in the second tree:

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7 NP 0,1

slide-28
SLIDE 28

28

NLP Programming Tutorial 8 – Phrase Structure Parsing

What is a Hypergraph?

  • With the edges in the first and second trees:

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7

Two choices! Choose red, get the first tree Choose blue, get the second tree

NP 0,1

slide-29
SLIDE 29

29

NLP Programming Tutorial 8 – Phrase Structure Parsing

Why a “Hyper”graph?

  • The “degree” of an edge is the number of children
  • The degree of a hypergraph is the maximum degree of

all its edges

  • A graph is a hypergraph of degree 1!

PRP 0,1

I

VBD 1,2

saw

Degree 1

VP 1,7 VBD 1,2 NP 2,7

Degree 2

VP 1,7 VBD 1,2 NP 2,4

Degree 3

PP 4,7

1 2 3 2.5 4.0 2.3 2.1 1.4

Example →

slide-30
SLIDE 30

30

NLP Programming Tutorial 8 – Phrase Structure Parsing

Weighted Hypergraphs

  • Like graphs:
  • can add weights to hypergraph edges
  • use negative log probability of rule

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 2,7

  • log(P(S → NP VP))
  • log(P(VP → VBD NP PP))

log(P(PRP → “I”))

  • log(P(VP → VBD NP))

NP 0,1

slide-31
SLIDE 31

31

NLP Programming Tutorial 8 – Phrase Structure Parsing

Solving Hypergraphs

  • Parsing = finding minimum path through a hypergraph
slide-32
SLIDE 32

32

NLP Programming Tutorial 8 – Phrase Structure Parsing

Solving Hypergraphs

  • Parsing = finding minimum path through a hypergraph
  • We can do this for graphs with the Viterbi algorithm
  • Forward: Calculate score of best path to each state
  • Backward: Recover the best path
slide-33
SLIDE 33

33

NLP Programming Tutorial 8 – Phrase Structure Parsing

Solving Hypergraphs

  • Parsing = finding minimum path through a hypergraph
  • We can do this for graphs with the Viterbi algorithm
  • Forward: Calculate score of best path to each state
  • Backward: Recover the best path
  • For hypergraphs, almost identical algorithm!
  • Inside: Calculate score of best subtree for each node
  • Outside: Recover the best tree
slide-34
SLIDE 34

34

NLP Programming Tutorial 8 – Phrase Structure Parsing

Review: Viterbi Algorithm (Forward Step) 1 2 3

2.5 4.0 2.3 2.1 1.4

best_score[0] = 0 for each node in the graph (ascending order) best_score[node] = ∞ for each incoming edge of node score = best_score[edge.prev_node] + edge.score if score < best_score[node] best_score[node] = score best_edge[node] = edge e1 e2 e3 e5 e4

slide-35
SLIDE 35

35

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0

0.0

1

2

3

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize:

slide-36
SLIDE 36

36

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0 score = 0 + 2.5 = 2.5 (< ∞) best_score[1] = 2.5 best_edge[1] = e1

0.0

1

2.5

2

3

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize: Check e1:

slide-37
SLIDE 37

37

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0 score = 0 + 2.5 = 2.5 (< ∞) best_score[1] = 2.5 best_edge[1] = e1

0.0

1

2.5

2

1.4

3

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize: Check e1:

score = 0 + 1.4 = 1.4 (< ∞) best_score[2] = 1.4 best_edge[2] = e2

Check e2:

slide-38
SLIDE 38

38

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0 score = 0 + 2.5 = 2.5 (< ∞) best_score[1] = 2.5 best_edge[1] = e1

0.0

1

2.5

2

1.4

3

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize: Check e1:

score = 0 + 1.4 = 1.4 (< ∞) best_score[2] = 1.4 best_edge[2] = e2

Check e2:

score = 2.5 + 4.0 = 6.5 (> 1.4) No change!

Check e3:

slide-39
SLIDE 39

39

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0 score = 0 + 2.5 = 2.5 (< ∞) best_score[1] = 2.5 best_edge[1] = e1

0.0

1

2.5

2

1.4

3

4.6

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize: Check e1:

score = 0 + 1.4 = 1.4 (< ∞) best_score[2] = 1.4 best_edge[2] = e2

Check e2:

score = 2.5 + 4.0 = 6.5 (> 1.4) No change!

Check e3:

score = 2.5 + 2.1 = 4.6 (< ∞) best_score[3] = 4.6 best_edge[3] = e4

Check e4:

slide-40
SLIDE 40

40

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example:

best_score[0] = 0 score = 0 + 2.5 = 2.5 (< ∞) best_score[1] = 2.5 best_edge[1] = e1

0.0

1

2.5

2

1.4

3

3.7

2.5 4.0 2.3 2.1 1.4

e1 e3 e2 e4 e5

Initialize: Check e1:

score = 0 + 1.4 = 1.4 (< ∞) best_score[2] = 1.4 best_edge[2] = e2

Check e2:

score = 2.5 + 4.0 = 6.5 (> 1.4) No change!

Check e3:

score = 2.5 + 2.1 = 4.6 (< ∞) best_score[3] = 4.6 best_edge[3] = e4

Check e4:

score = 1.4 + 2.3 = 3.7 (< 4.6) best_score[3] = 3.7 best_edge[3] = e5

Check e5:

slide-41
SLIDE 41

41

NLP Programming Tutorial 8 – Phrase Structure Parsing

Result of Forward Step

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4

e1 e2 e3 e5 e4

best_score = ( 0.0, 2.5, 1.4, 3.7 ) best_edge = ( NULL, e1, e2, e5 )

slide-42
SLIDE 42

42

NLP Programming Tutorial 8 – Phrase Structure Parsing

Review: Viterbi Algorithm (Backward Step)

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4

e1 e2 e3 e5 e4 best_path = [ ] next_edge = best_edge[best_edge.length – 1] while next_edge != NULL add next_edge to best_path next_edge = best_edge[next_edge.prev_node] reverse best_path

slide-43
SLIDE 43

43

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example of Backward Step

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4 e1 e2 e3 e5 e4

Initialize:

best_path = [] next_edge = best_edge[3] = e5

slide-44
SLIDE 44

44

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example of Backward Step

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4 e1 e2 e3 e5 e4

Initialize:

best_path = [] next_edge = best_edge[3] = e5

Process e5:

best_path = [e5] next_edge = best_edge[2] = e2

slide-45
SLIDE 45

45

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example of Backward Step

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4 e1 e2 e3 e5 e4

Initialize:

best_path = [] next_edge = best_edge[3] = e5

Process e5:

best_path = [e5] next_edge = best_edge[2] = e2

Process e2:

best_path = [e5, e2] next_edge = best_edge[0] = NULL

slide-46
SLIDE 46

46

NLP Programming Tutorial 8 – Phrase Structure Parsing

Example of Backward Step

0.0 1 2.5 2 1.4 3 3.7 2.5 4.0 2.3 2.1 1.4 e1 e2 e3 e5 e4

Initialize:

best_path = [] next_edge = best_edge[3] = e5

Process e5:

best_path = [e5] next_edge = best_edge[2] = e2

Process e5:

best_path = [e5, e2] next_edge = best_edge[0] = NULL

Reverse:

best_path = [e2, e5]

slide-47
SLIDE 47

47

NLP Programming Tutorial 8 – Phrase Structure Parsing

Inside Step for Hypergraphs:

  • Find the score of best subtree of VP1,7

VBD 1,2 NP 2,4 PP 4,7 VP 1,7 NP 2,7

e1 e2

slide-48
SLIDE 48

48

NLP Programming Tutorial 8 – Phrase Structure Parsing

Inside Step for Hypergraphs:

  • Find the score of best subtree of VP1,7

VBD 1,2 NP 2,4 PP 4,7 VP 1,7 NP 2,7

score(e1) =

  • log(P(VP → VBD NP PP)) +

best_score[VBD1,2] + best_score[NP2,4] + best_score[NP2,7] score(e2) =

  • log(P(VP → VBD NP)) +

best_score[VBD1,2] + best_score[VBD2,7]

e1 e2

slide-49
SLIDE 49

49

NLP Programming Tutorial 8 – Phrase Structure Parsing

Inside Step for Hypergraphs:

  • Find the score of best subtree of VP1,7

VBD 1,2 NP 2,4 PP 4,7 VP 1,7 NP 2,7

score(e1) =

  • log(P(VP → VBD NP PP)) +

best_score[VBD1,2] + best_score[NP2,4] + best_score[NP2,7] score(e2) =

  • log(P(VP → VBD NP)) +

best_score[VBD1,2] + best_score[VBD2,7] best_edge[VB1,7] = argmine1,e2 score

e1 e2

slide-50
SLIDE 50

50

NLP Programming Tutorial 8 – Phrase Structure Parsing

Inside Step for Hypergraphs:

  • Find the score of best subtree of VP1,7

VBD 1,2 NP 2,4 PP 4,7 VP 1,7 NP 2,7

score(e1) =

  • log(P(VP → VBD NP PP)) +

best_score[VBD1,2] + best_score[NP2,4] + best_score[NP2,7] score(e2) =

  • log(P(VP → VBD NP)) +

best_score[VBD1,2] + best_score[VBD2,7] best_edge[VB1,7] = argmine1,e2 score best_score[VB1,7] = score(best_edge[VB1,7])

e1 e2

slide-51
SLIDE 51

51

NLP Programming Tutorial 8 – Phrase Structure Parsing

Building Hypergraphs from Grammars

  • Ok, we can solve hypergraphs, but what we have is:
  • How do we build a hypergraph?

P(S → NP VP) = 0.8 P(S → PRP VP) = 0.2 P(VP → VBD NP PP) = 0.6 P(VP → VBD NP)= 0.4 P(NP → DT NN) = 0.5 P(NP → NN) = 0.5 P(PRP → “I”) = 0.4 P(VBD → “saw”) = 0.05 P(DT → “a”) = 0.6 ...

A Grammar A Sentence I saw a girl with a telescope

slide-52
SLIDE 52

52

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

  • The CKY (Cocke-Kasami-Younger) algorithm creates

and solves hypergraphs

  • Grammar must be in Chomsky normal form (CNF)
  • All rules have two non-terminals or one terminal on right
  • Can convert rules into CNF

S → NP VP S → PRP VP VP → VBD NP VP → VBD NP PP NP → NN NP → PRP PRP → “I” VBD → “saw” DT → “a”

OK OK Not OK!

VP → VBD NP PP VP → VBD VP' VP' → NP PP NP → PRP + PRP → “I” NP_PRP → “I”

slide-53
SLIDE 53

53

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

  • Start by expanding all rules for terminals with scores

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 NP 0,1 0.5

slide-54
SLIDE 54

54

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

  • Expand all possible nodes for 0,2

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 0.5 + 3.2 + 1.0 = 4.7 5.3

slide-55
SLIDE 55

55

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

  • Expand all possible nodes for 1,3

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 5.0

slide-56
SLIDE 56

56

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

  • Expand all possible nodes for 0,3

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

slide-57
SLIDE 57

57

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

  • Find the S that covers the entire sentence and its best

edge

slide-58
SLIDE 58

58

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

  • Expand the left child, right child recursively until we

have our tree

slide-59
SLIDE 59

59

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

  • Expand the left child, right child recursively until we

have our tree

slide-60
SLIDE 60

60

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

  • Expand the left child, right child recursively until we

have our tree

slide-61
SLIDE 61

61

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Algorithm

I saw him

PRP 0,1 VP 1,2 1.0 VBD 1,2 3.2 1.4 PRP 2,3 2.4 NP 2,3 2.6 S 0,2 NP 0,1 0.5 SBAR 0,2 VP 1,3 4.7 5.3 S 0,3 5.9 5.0 SBAR 0,3 6.1

  • Expand the left child, right child recursively until we

have our tree

slide-62
SLIDE 62

62

NLP Programming Tutorial 8 – Phrase Structure Parsing

Printing Parse Trees

  • Standard text format for parse tree: “Penn Treebank”

IN DT NN NP PP

with a telescope

(PP (IN with) (NP (DT a) (NN telescope)))

slide-63
SLIDE 63

63

NLP Programming Tutorial 8 – Phrase Structure Parsing

Printing Parse Trees

  • Hypergraphs printed recursively, starting at top:

I saw a girl with a telescope

PRP 0,1 VBD 1,2 DT 2,3 NN 3,4 IN 4,5 DT 5,6 NN 6,7 NP 5,7 NP 2,4 PP 4,7 VP 1,7 S 0,7 NP 0,1

print(S0,7) = “(S “ + print(NP0,1) + “ “ + print(VP1,7)+”)” print(NP0,1) = “(NP “ + print(PRP0,1) + ”)” print(PRP0,1) = “(PRP I)”

...

slide-64
SLIDE 64

64

NLP Programming Tutorial 8 – Phrase Structure Parsing

Pseudo-Code

slide-65
SLIDE 65

65

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Pseudo-Code: Read Grammar

# Read a grammar in format “lhs \t rhs \t prob \n” make list nonterm # Make list of (lhs, rhs1, rhs2, prob) make map preterm # Make a map preterm[rhs] = [ (lhs, prob) ...] for rule in grammar_file split rule into lhs, rhs, prob (with “\t”) # Rule P(lhs → rhs)=prob split rhs into rhs_symbols (with “ “) if length(rhs) == 1: # If this is a pre-terminal add (lhs, log(prob)) to preterm[rhs] else: # Otherwise, it is a non-terminal add (lhs, rhs[0], rhs[1], log(prob)) to nonterm

slide-66
SLIDE 66

66

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Pseudo-Code: Add Pre-Terminals

split line into words make map best_score # index: symi,j value = best log prob make map best_edge # index: symi,j value = (lsymi,k, rsymk,j) # Add the pre-terminal sym for i in 0 .. length(words)-1: for lhs, log_prob in preterm where P(lhs → words[i]) > 0: best_score[lhsi,i+1] = [log_prob]

slide-67
SLIDE 67

67

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Pseudo-Code: Combine Non-Terminals

for j in 2 .. length(words): # j is right side of the span for i in j-2 .. 0: # i is left side (Note: Reverse order!) for k in i+1 .. j-1: # k is beginning of the second child # Try every grammar rule log(P(sym → lsym rsym)) = logprob for sym, lsym, rsym, logprob in nonterm: # Both children must have a probability if best_score[lsymi,k] > -∞ and best_score[rsymk,j] > -∞: # Find the log probability for this node/edge my_lp = best_score[lsymi,k] + best_score[rsymk,j] + logprob # If this is the best edge, update if my_lp > best_score[symi,j]: best_score[symi,j] = my_lp best_edge[symi,j] = (lsymi,k, rsymk,j)

slide-68
SLIDE 68

68

NLP Programming Tutorial 8 – Phrase Structure Parsing

CKY Pseudo-Code: Print Tree

print(S0,length(words)) # Print the “S” that spans all words subroutine print(symi,j): if symi,j exists in best_edge: # for non-terminals return “(“+sym+” “ + print(best_edge[0]) + “ ” + + print(best_edge[1]) + “)” else: # for terminals return “(“+sym+“ ”+words[i]+“)”

slide-69
SLIDE 69

69

NLP Programming Tutorial 8 – Phrase Structure Parsing

Exercise

slide-70
SLIDE 70

70

NLP Programming Tutorial 8 – Phrase Structure Parsing

Exercise

  • Write cky.py
  • Test the program
  • Input: test/08-input.txt
  • Grammar: test/08-grammar.txt
  • Answer: test/08-output.txt
  • Run the program on actual data:
  • data/wiki-en-test.grammar, data/wiki-en-short.tok
  • Visualize the trees
  • script/print-trees.py < wiki-en-test.trees
  • (Requires NLTK: http://nltk.org/)
  • Challenge: think of a way to handle unknown words
slide-71
SLIDE 71

71

NLP Programming Tutorial 8 – Phrase Structure Parsing

Thank You!