Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan - - PowerPoint PPT Presentation

algorithms for nlp
SMART_READER_LITE
LIVE PREVIEW

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan - - PowerPoint PPT Presentation

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan Titov University of Edinburgh, Taylor Berg-Kirkpatrick CMU/UCSD, Dan Klein UC Berkeley Ambiguity I saw a girl with a telescope Parsing INPUT: The move


slide-1
SLIDE 1

Parsing I

Yulia Tsvetkov – CMU Slides: Ivan Titov – University of Edinburgh, Taylor Berg-Kirkpatrick – CMU/UCSD, Dan Klein – UC Berkeley

Algorithms for NLP

slide-2
SLIDE 2

Ambiguity

▪ I saw a girl with a telescope

slide-3
SLIDE 3

Parsing

▪ INPUT:

▪ The move followed a round of similar increases by other

lenders, reflecting a continuing decline in that market ▪ OUTPUT:

slide-4
SLIDE 4

A Supervised ML Problem

Canadian Utilities had 1988 revenue of $ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers .

▪ Data for parsing experiments:

▪ Penn WSJ Treebank = 50,000 sentences with associated

trees ▪ Usual set-up: 40,000 training, 2,400 test

[from Michael Collins slides]

slide-5
SLIDE 5

Outline

▪ Syntax: intro, CFGs, PCFGs ▪ CFGs: Parsing ▪ PCFGs: Parsing ▪ Parsing evaluation

slide-6
SLIDE 6

Syntax

slide-7
SLIDE 7

Syntax

▪ The study of the patterns of formation of sentences and phrases from word

▪ my dog Pron N ▪ the dog Det N ▪ the cat Det N ▪ the large cat Det Adj N ▪ the black cat Det Adj N ▪ ate a sausage V Det N

slide-8
SLIDE 8

Syntax

▪ The study of the patterns of formation of sentences and phrases from word

▪ Borders with semantics and morphology sometimes blurred

Afyonkarahisarlılaştırabildiklerimizdenmişsinizcesinee

in Turkish means "as if you are one of the people that we thought to be originating from Afyonkarahisar" [wikipedia]

slide-9
SLIDE 9

Parsing

▪ The process of predicting syntactic representations ▪ Syntactic Representations

▪ Different types of syntactic representations are possible, for example:

Constuent (a.k.a. phrase-structure) tree

slide-10
SLIDE 10

Constituent trees

▪ Internal nodes correspond to phrases

▪ S – a sentence ▪ NP (Noun Phrase): My dog, a sandwich, lakes,.. ▪ VP (Verb Phrase): ate a sausage, barked, … ▪ PP (Prepositional phrases): with a friend, in a

car, …

▪ Nodes immediately above words are PoS tags (aka preterminals)

▪ PN – pronoun ▪ D – determiner ▪ V – verb ▪ N – noun ▪ P – preposition

slide-11
SLIDE 11

Bracketing notation

▪ It is often convenient to represent a tree as a bracketed sequence

(S (NP (PN My) (N Dog) ) (VP (V ate) (NP (D a ) (N sausage) ) ) )

slide-12
SLIDE 12

Parsing

▪ The process of predicting syntactic representations ▪ Syntactic Representations

▪ Different types of syntactic representations are possible, for example:

Constuent (a.k.a. phrase-structure) tree Dependency tree

slide-13
SLIDE 13

Dependency trees

▪ Nodes are words (along with PoS tags) ▪ Directed arcs encode syntactic dependencies between them ▪ Labels are types of relations between the words

▪ poss – possesive ▪ dobj – direct object ▪ nsub - subject ▪ det - determiner

root My PN dog N ate V a D sausage N

root poss nsubj dobj det

slide-14
SLIDE 14

Recovering shallow semantics

▪ Some semantic information can be (approximately) derived from syntactic information

▪ Subjects (nsubj) are (often) agents ("initiator / doers for an action") ▪ Direct objects (dobj) are (often) patients ("affected entities")

root My PN dog N ate V a D sausage N

root poss nsubj dobj det

slide-15
SLIDE 15

Recovering shallow semantics

▪ Some semantic information can be (approximately) derived from syntactic information

▪ Subjects (nsubj) are (often) agents ("initiator / doers for an action") ▪ Direct objects (dobj) are (often) patients ("affected entities")

▪ But even for agents and patients consider:

▪ Mary is baking a cake in the oven ▪ A cake is baking in the oven

▪ In general it is not trivial even for the most shallow forms of semantics

▪ E.g., consider prepositions: in can encode direction, position, temporal information, …

root My PN dog N ate V a D sausage N

root poss nsubj dobj det

slide-16
SLIDE 16

Constituent and dependency representations

▪ Constituent trees can (potentially) be converted to dependency trees ▪ Dependency trees can (potentially) be converted to constituent trees

slide-17
SLIDE 17

Constituent trees

▪ Internal nodes correspond to phrases

▪ S – a sentence ▪ NP (Noun Phrase): My dog, a sandwich, lakes,.. ▪ VP (Verb Phrase): ate a sausage, barked, … ▪ PP (Prepositional phrases): with a friend, in a

car, …

▪ Nodes immediately above words are PoS tags (aka preterminals)

▪ PN – pronoun ▪ D – determiner ▪ V – verb ▪ N – noun ▪ P – preposition

slide-18
SLIDE 18

Constituency Tests

▪ How do we know what nodes go in the tree? ▪ Classic constituency tests:

▪ Substitution by proform ▪ Movement ▪ Clefting

▪ Preposing ▪ Passive

▪ Modification ▪ Coordination/Conjunction ▪ Ellipsis/Deletion

slide-19
SLIDE 19

Conflicting Tests

▪ Constituency isn’t always clear

▪ Units of transfer:

▪ think about ~ penser à ▪ talk about ~ hablar de

▪ Phonological reduction:

▪ I will go → I’ll go ▪ I want to go → I wanna go ▪ a le centre → au centre

La vélocité des ondes sismiques

slide-20
SLIDE 20

CFGs

slide-21
SLIDE 21

Context Free Grammar (CFG)

▪ Other grammar formalisms: LFG, HPSG, TAG, CCG…

Grammar (CFG) Lexicon

ROOT → S S → NP VP NP → DT NN NP → NN NNS NN → interest NNS → raises VBP → interest VBZ → raises … NP → NP PP VP → VBP NP VP → VBP NP PP PP → IN NP

slide-22
SLIDE 22

Treebank Sentences

slide-23
SLIDE 23

CFGs

slide-24
SLIDE 24

CFGs

slide-25
SLIDE 25

CFGs

slide-26
SLIDE 26

CFGs

slide-27
SLIDE 27

CFGs

slide-28
SLIDE 28

CFGs

slide-29
SLIDE 29

CFGs

slide-30
SLIDE 30

CFGs

slide-31
SLIDE 31

Context-Free Grammars

▪ A context-free grammar is a 4-tuple <N, T, S, R>

▪ N : the set of non-terminals

▪ Phrasal categories: S, NP, VP, ADJP, etc. ▪ Parts-of-speech (pre-terminals): NN, JJ, DT, VB

▪ T : the set of terminals (the words) ▪ S : the start symbol

▪ Often written as ROOT or TOP ▪ Not usually the sentence non-terminal S

▪ R : the set of rules

▪ Of the form X → Y1 Y2 … Yk, with X, Yi ∈ N ▪ Examples: S → NP VP, VP → VP CC VP ▪ Also called rewrites, productions, or local trees

slide-32
SLIDE 32

An example grammar

(NP A girl) (VP ate a sandwich)

(V ate) (NP a sandwich) (VP saw a girl) (PP with a telescope) (NP a girl) (PP with a sandwich) (P with) (NP with a sandwich) (D a) (N sandwich)

Preterminal rules Called Inner rules

slide-33
SLIDE 33

Why context-free?

What can be a sub-tree is only affected by what the phrase type is (VP) but not the context

slide-34
SLIDE 34

Why context-free?

What can be a sub-tree is only affected by what the phrase type is (VP) but not the context Not grammatical

slide-35
SLIDE 35

Coordination ambiguity

▪ Here, the coarse VP and NP categories cannot enforce subject-verb agreement in number resulting in the coordination ambiguity

This tree would be ruled out if the context would be somehow captured (subject-verb agreement) "Bark" can refer both to a noun or a verb

Coordination

slide-36
SLIDE 36

Ambiguities

slide-37
SLIDE 37

Why parsing is hard? Ambiguity

▪ Prepositional phrase attachment ambiguity

slide-38
SLIDE 38

PP Ambiguity

Put the block in the box on the table in the kitchen ▪ 3 prepositional phrases, 5 interpretations:

▪ Put the block ((in the box on the table) in the kitchen) ▪ Put the block (in the box (on the table in the kitchen)) ▪ Put ((the block in the box) on the table) in the kitchen. ▪ Put (the block (in the box on the table)) in the kitchen. ▪ Put (the block in the box) (on the table in the kitchen)

slide-39
SLIDE 39

PP Ambiguity

Put the block in the box on the table in the kitchen ▪ 3 prepositional phrases, 5 interpretations:

▪ Put the block ((in the box on the table) in the kitchen) ▪ Put the block (in the box (on the table in the kitchen)) ▪ Put ((the block in the box) on the table) in the kitchen. ▪ Put (the block (in the box on the table)) in the kitchen. ▪ Put (the block in the box) (on the table in the kitchen)

▪ A general case:

Catalan numbers

slide-40
SLIDE 40

A typical tree from a standard dataset (Penn treebank WSJ)

Canadian Utilities had 1988 revenue of $ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers .

[from Michael Collins slides]

slide-41
SLIDE 41

Syntactic Ambiguities I

▪ Prepositional phrases: They cooked the beans in the pot on the stove with handles. ▪ Particle vs. preposition: The puppy tore up the staircase. ▪ Complement structures The tourists objected to the guide that they couldn’t hear. She knows you like the back of her hand. ▪ Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

slide-42
SLIDE 42

Syntactic Ambiguities II

▪ Modifier scope within NPs impractical design requirements plastic cup holder ▪ Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. ▪ Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

slide-43
SLIDE 43

Dark Ambiguities

▪ Dark ambiguities: most analyses are shockingly bad (meaning, they don’t have an interpretation you can get your mind around) ▪ Unknown words and new usages ▪ Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this

This analysis corresponds to the correct parse of “This is panic buying ! ”

slide-44
SLIDE 44

How to Deal with Ambiguity?

▪ We want to score all the derivations to encode how plausible they are

Put the block in the box on the table in the kitchen

slide-45
SLIDE 45

PCFGs

slide-46
SLIDE 46

Probabilistic Context-Free Grammars

▪ A context-free grammar is a tuple <N, T, S, R>

▪ N : the set of non-terminals

▪ Phrasal categories: S, NP, VP, ADJP, etc. ▪ Parts-of-speech (pre-terminals): NN, JJ, DT, VB

▪ T : the set of terminals (the words) ▪ S : the start symbol

▪ Often written as ROOT or TOP ▪ Not usually the sentence non-terminal S

▪ R : the set of rules

▪ Of the form X → Y1 Y2 … Yk, with X, Yi ∈ N ▪ Examples: S → NP VP, VP → VP CC VP ▪ Also called rewrites, productions, or local trees

▪ A PCFG adds:

▪ A top-down production probability per rule P(Y1 Y2 … Yk | X)

slide-47
SLIDE 47

PCFGs

(NP A girl) (VP ate a sandwich)

(VP ate) (NP a sandwich) (VP saw a girl) (PP with …) (NP a girl) (PP with ….) (P with) (NP with a sandwich) (D a) (N sandwich)

1.0

Associate probabilities with the rules :

0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

Now we can score a tree as a product of probabilities corresponding to the used rules

slide-48
SLIDE 48

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-49
SLIDE 49

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-50
SLIDE 50

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-51
SLIDE 51

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-52
SLIDE 52

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-53
SLIDE 53

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-54
SLIDE 54

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-55
SLIDE 55

PCFGs

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-56
SLIDE 56

PCFG Estimation

slide-57
SLIDE 57

ML estimation

▪ A treebank: a collection sentences annotated with constituent trees ▪ An estimated probability of a rule (maximum likelihood estimates) ▪ Smoothing is helpful

▪ Especially important for preterminal rules

The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank

slide-58
SLIDE 58

Distribution over trees

▪ We defined a distribution over production rules for each nonterminal ▪ Our goal was to define a distribution over parse trees ▪ Good news: any PCFG estimated with the maximum likelihood procedure are always proper (Chi and Geman, 98)

Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum

  • ver probabilities of all trees the grammar can generate may be less than 1:
slide-59
SLIDE 59

Penn Treebank: peculiarities

▪ Wall street journal: around 40, 000 annotated sentences, 1,000,000 words

▪ Fine-grained part of speech tags (45), e.g., for verbs ▪ Flat NPs (no attempt to disambiguate NP attachment)

VBD Verb, past tense VBG Verb, gerund or present participle VBP Verb, present (non-3rd person singular) VBZ Verb, present (3rd person singular) MD Modal

slide-60
SLIDE 60

CKY Parsing

slide-61
SLIDE 61

Parsing

▪ Parsing is search through the space of all possible parses

▪ e.g., we may want either any parse, all parses or the highest scoring parse (if PCFG):

▪ Bottom-up:

▪ One starts from words and attempt to construct the full tree

▪ Top-down

▪ Start from the start symbol and attempt to expand to get the sentence

arg max P (T )

T ∈G(x)

slide-62
SLIDE 62

CKY algorithm (aka CYK)

▪ Cocke-Kasami-Younger algorithm

▪ Independently discovered in late 60s / early 70s

▪ An efficient bottom up parsing algorithm for (P)CFGs

▪ can be used both for the recognition and parsing problems ▪ Very important in NLP (and beyond)

▪ We will start with the non-probabilistic version

slide-63
SLIDE 63

Constraints on the grammar

▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF):

Unary preterminal rules (generation of words given PoS tags) Binary inner rules

slide-64
SLIDE 64

Constraints on the grammar

▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF): ▪ Any CFG can be converted to an equivalent CNF

▪ Equivalent means that they define the same language ▪ However (syntactic) trees will look differently ▪ It is possible to address it by defining such transformations that allows for easy reverse transformation

slide-65
SLIDE 65

Transformation to CNF form

▪ What one need to do to convert to CNF form

▪ Get rid of unary rules: ▪ Get rid of N-ary rules:

Not a problem, as our CKY algorithm will support unary rules Crucial to process them, as required for efficient parsing

slide-66
SLIDE 66

Transformation to CNF form: binarization

▪ Consider ▪ How do we get a set of binary rules which are equivalent?

slide-67
SLIDE 67

Transformation to CNF form: binarization

▪ Consider ▪ How do we get a set of binary rules which are equivalent?

slide-68
SLIDE 68

Transformation to CNF form: binarization

▪ Consider ▪ How do we get a set of binary rules which are equivalent? ▪ A more systematic way to refer to new non-terminals

slide-69
SLIDE 69

Transformation to CNF form: binarization

▪ Instead of binarizing tuples we can binarize trees on preprocessing:

Can be easily reversed

  • n postprocessing

Also known as lossless Markovization in the context of PCFGs

slide-70
SLIDE 70

CKY: Parsing task

▪ We a given

▪ a grammar <N, T, S, R> ▪ a sequence of words

▪ Our goal is to produce a parse tree for w

slide-71
SLIDE 71

CKY: Parsing task

▪ We a given

▪ a grammar <N, T, S, R> ▪ a sequence of words

▪ Our goal is to produce a parse tree for w ▪ We need an easy way to refer to substrings of w

71

indices refer to fenceposts

span (i, j) refers to words between fenceposts i and j

slide-72
SLIDE 72

Parsing one word

slide-73
SLIDE 73

Parsing one word

slide-74
SLIDE 74

Parsing one word

slide-75
SLIDE 75

Parsing longer spans

Check through all C1, C2, mid

slide-76
SLIDE 76

Parsing longer spans

Check through all C1, C2, mid

slide-77
SLIDE 77

Parsing longer spans

slide-78
SLIDE 78

CKY in action

Preterminal rules Inner rules

slide-79
SLIDE 79

Preterminal rules Inner rules

Chart (aka parsing triangle)

slide-80
SLIDE 80

Preterminal rules Inner rules

slide-81
SLIDE 81

Preterminal rules Inner rules

slide-82
SLIDE 82

Preterminal rules Inner rules

slide-83
SLIDE 83

Preterminal rules Inner rules

slide-84
SLIDE 84

Preterminal rules Inner rules

slide-85
SLIDE 85

Preterminal rules Inner rules

slide-86
SLIDE 86

Preterminal rules Inner rules

slide-87
SLIDE 87

Preterminal rules Inner rules

slide-88
SLIDE 88

Preterminal rules Inner rules

slide-89
SLIDE 89

Preterminal rules Inner rules

Check about unary rules

slide-90
SLIDE 90

Preterminal rules Inner rules

slide-91
SLIDE 91

Preterminal rules Inner rules

slide-92
SLIDE 92

Preterminal rules Inner rules

slide-93
SLIDE 93

Preterminal rules Inner rules

Check about unary rules: no unary rules here

slide-94
SLIDE 94

Preterminal rules Inner rules

slide-95
SLIDE 95

Preterminal rules Inner rules

slide-96
SLIDE 96

CKY in action

Preterminal rules Inner rules

Check about unary rules: no unary rules here

slide-97
SLIDE 97

Preterminal rules Inner rules

slide-98
SLIDE 98

Preterminal rules Inner rules

slide-99
SLIDE 99

Preterminal rules Inner rules

mid=1

slide-100
SLIDE 100

Preterminal rules Inner rules

mid=2

slide-101
SLIDE 101

Preterminal rules Inner rules

Apparently the sentence is ambiguous for the grammar: (as the grammar

  • vergenerates)
slide-102
SLIDE 102

Ambiguity

No subject-verb agreement, and poison used as an intransitive verb

slide-103
SLIDE 103

CKY more formally

Chart can be represented by a Boolean 3D array chart[min][max][label]

▶ Relevant entries have

if the signature (min, max, C) is already added to the chart;

  • therwise.

Here we assume that labels (C) are integer indices

slide-104
SLIDE 104

Implementation: preterminal rules

slide-105
SLIDE 105

Implementation: binary rules

max min

slide-106
SLIDE 106
slide-107
SLIDE 107

Unary rules

▪ How to integrate unary rules C→C1 ?

slide-108
SLIDE 108

Unary rules

▪ How to integrate unary rules C→C1 ?

slide-109
SLIDE 109

Unary rules

▪ How to integrate unary rules C→C1 ?

But we forgot something!

slide-110
SLIDE 110

Unary closure

▪ What if the grammar contained 2 rules: ▪ But C can be derived from A by a chain of rules: ▪ One could support chains in the algorithm but it is easier to extend the grammar, to get the transitive closure

slide-111
SLIDE 111

Unary closure

▪ What if the grammar contained 2 rules: ▪ But C can be derived from A by a chain of rules: ▪ One could support chains in the algorithm but it is easier to extend the grammar, to get the transitive closure

Convenient for programming reasons in the PCFG case

slide-112
SLIDE 112

Algorithm analysis

Time complexity?

slide-113
SLIDE 113

Algorithm analysis

Time complexity? O(n3|R|) where |R| is is the number of rules in the grammar

slide-114
SLIDE 114

Practical time complexity

slide-115
SLIDE 115

Probabilistic CKY

slide-116
SLIDE 116

1.0 0.2 1.0 0.4 0.5 0.2 0.3 0.5 1.0 0.6 0.5 0.3 0.3 0.7

PCFGs

116

1.0 0.2 0.4 0.4 0.3 0.5 0.2 1.0 0.2 0.7 0.1 1.0 0.5 0.5 0.6 0.4 0.3 0.7

slide-117
SLIDE 117

CKY with PCFGs

▪ Chart is represented by a 3d array of floats chart[min][max][label]

▪ It stores probabilities for the most probable subtree with a given signature

▪ chart[0][n][S] will store the probability of the most probable full parse tree ▪

slide-118
SLIDE 118

Intuition

For every C choose C1 , C2 and mid such that is maximal, where T1 and T2 are left and right subtrees.

slide-119
SLIDE 119

Implementation: preterminal rules

slide-120
SLIDE 120

Implementation: binary rules

slide-121
SLIDE 121

Unary rules

▪ Similarly to CFGs: after producing scores for signatures (c, i, j), try to improve the scores by applying unary rules (and rule chains)

▪ If improved, update the scores

slide-122
SLIDE 122

Unary (reflexive transitive) closure

Note that this is not a PCFG anymore as the rules do not sum to 1 for each parent

slide-123
SLIDE 123

Unary (reflexive transitive) closure

Note that this is not a PCFG anymore as the rules do not sum to 1 for each parent

The fact that the rule is composite needs to be stored to recover the true tree

slide-124
SLIDE 124

Unary (reflexive transitive) closure

Note that this is not a PCFG anymore as the rules do not sum to 1 for each parent

The fact that the rule is composite needs to be stored to recover the true tree What about loops, like: ?

slide-125
SLIDE 125

Recovery of the tree

▪ For each signature we store backpointers to the elements from which it was built (e.g., rule and, for binary rules, midpoint)

▪ start recovering from [0, n, S]

▪ Be careful with unary rules

▪ Basically you can assume that you always used an unary rule from the closure (but it could be the trivial one C → C )

slide-126
SLIDE 126

Speeding up the algorithm (approximate search)

Any ideas?

slide-127
SLIDE 127

Speeding up the algorithm

▪ Basic pruning (roughly):

▪ For every span (i,j) store only labels which have the probability at most N times smaller than the probability of the most probable label for this span ▪ Check not all rules but only rules yielding subtree labels having non-zero probability

▪ Coarse-to-fine pruning

▪ Parse with a smaller (simpler) grammar, and precompute (posterior) probabilities for each spans, and use only the ones with non-negligible probability from the previous grammar

slide-128
SLIDE 128

Parsing evaluation

▪ Intrinsic evaluation:

▪ Automatic: evaluate against annotation provided by human experts (gold standard) according to some predefined measure ▪ Manual: … according to human judgment

▪ Extrinsic evaluation: score syntactic representation by comparing how well a system using this representation performs on some task

▪ E.g., use syntactic representation as input for a semantic analyzer and compare results of the analyzer using syntax predicted by different parsers.

slide-129
SLIDE 129

Standard evaluation setting in parsing

▪ Automatic intrinsic evaluation is used: parsers are evaluated against gold standard by provided by linguists

▪ There is a standard split into the parts:

▪ training set: used for estimation of model parameters ▪ development set: used for tuning the model (initial experiments) ▪ test set: final experiments to compare against previous work

slide-130
SLIDE 130

Automatic evaluation of constituent parsers

▪ Exact match: percentage of trees predicted correctly ▪ Bracket score: scores how well individual phrases (and their boundaries) are identified ▪ Crossing brackets: percentage of phrases boundaries crossing

The most standard measure; we will focus on it

slide-131
SLIDE 131

Brackets scores

▪ The most standard score is bracket score ▪ It regards a tree as a collection of brackets: ▪ The set of brackets predicted by a parser is compared against the set of brackets in the tree annotated by a linguist ▪ Precision, recall and F1 are used as scores

Subtree signatures for CKY

slide-132
SLIDE 132

Preview: F1 bracket score