Dependency Parsing Diyi Yang Presenting: Yuval Pinter (uvp@) - - PowerPoint PPT Presentation

dependency parsing
SMART_READER_LITE
LIVE PREVIEW

Dependency Parsing Diyi Yang Presenting: Yuval Pinter (uvp@) - - PowerPoint PPT Presentation

CS 4650/7650: Natural Language Processing Dependency Parsing Diyi Yang Presenting: Yuval Pinter (uvp@) Representing Sentence Structure Constituent (Phrase-Structure) Representation Dependency Representation Dependency Representation


slide-1
SLIDE 1

CS 4650/7650: Natural Language Processing

Dependency Parsing

Diyi Yang Presenting: Yuval Pinter (uvp@)

slide-2
SLIDE 2

Representing Sentence Structure

slide-3
SLIDE 3

Constituent (Phrase-Structure) Representation

slide-4
SLIDE 4

Dependency Representation

slide-5
SLIDE 5

Dependency Representation

slide-6
SLIDE 6

Dependency Representation

slide-7
SLIDE 7

Dependency vs Constituency

◼ Constituency structures explicitly represent

◼ Phrases (nonterminal nodes) ◼ Structural categories (nonterminal labels)

◼ Dependency structures explicitly represent

◼ Head-dependent relations (directed arcs) ◼ Functional categories (arc labels) ◼ Possibly some structural categories (parts of speech)

slide-8
SLIDE 8

Dependency vs Constituency

slide-9
SLIDE 9

Dependency Representation

“CoNLL format”

slide-10
SLIDE 10

Dependency Relations

slide-11
SLIDE 11

Grammatical Functions

Selected dependency relations from the Universal Dependency Set

slide-12
SLIDE 12

Dependency Constraints

◼ Syntactic structure is complete (connectedness)

◼ Connectedness can be enforced by adding a special root node

◼ Syntactic structure is hierarchical (acyclicity)

◼ There is a unique pass from the root to each vertex

◼ Every word has at most one syntactic head (single-head constraint)

◼ Except root that does not have incoming arcs

◼ This makes the dependencies a tree

slide-13
SLIDE 13

Projectivity

◼ Projective parse

◼ Arcs don’t across each other ◼ Mostly true for English

◼ Non-projective structures are needed to account for

◼ Long-distance dependencies ◼ Flexible word order

slide-14
SLIDE 14

Projectivity

◼ Dependency grammars do not normally assume that all dependency-trees

are projective, because some linguistic phenomena can only be achieved using non-projective trees.

◼ But a lot of parsers assume that the output trees are projective ◼ Reasons: ◼ Conversion from constituency to dependency ◼ The most widely used families of parsing algorithms impose projectivity

slide-15
SLIDE 15

Dependency Treebanks

◼ The major English dependency treebanks converted from the WSJ sections of

the PTB (Marcus et al., 1993)

◼ OntoNotes project (Hovy et al., 2006, Weischedel et al., 2011) adds

conversational telephone speech, weblogs, usenet newsgroups, broadcast, and talk shows in English, Chinese and Arabic

◼ Annotated dependency treebanks created for morphologically rich languages

such as Czech, Hindi and Finnish, e.g., Prague Dependency Treebank (Bejcek et al., 2013)

◼ https://universaldependencies.org/ (122 treebanks, 71 languages)

◼ Different schemas exist - not all treebanks follow the same attachment rules

slide-16
SLIDE 16

The Parsing Problem

slide-17
SLIDE 17

The Parsing Problem

◼ This is equivalent to finding a spanning tree in the complete graph

containing all possible arcs

slide-18
SLIDE 18

Evaluation

◼ Which is bigger?

slide-19
SLIDE 19

Evaluation

◼ Which is bigger? ◼ Does 90% sound like a lot?

slide-20
SLIDE 20

Parsing Algorithms

◼ Graph based

◼ Minimum Spanning Tree for a sentence ◼ McDonald et al.’s (2005) MSTParser ◼ Martins et al.’s (2009) Turbo Parser

◼ Transition based

◼ Greedy choice of local transitions guided by a good classifier ◼ Deterministic ◼ MaltParser (Nivre et al., 2008)

slide-21
SLIDE 21

Graph-Based Parsing Algorithms

◼ Start with a fully-connected directed graph ◼ Find a Minimum Spanning Tree ◼ Chu and Liu (1965) and Edmonds (1967) algorithm

slide-22
SLIDE 22

Chu-Liu Edmonds Algorithm

slide-23
SLIDE 23

Chu-Liu Edmonds Algorithm

◼ Select best incoming edge for each node

slide-24
SLIDE 24

Chu-Liu Edmonds Algorithm

◼ Subtract its score from all incoming edges

slide-25
SLIDE 25

Chu-Liu Edmonds Algorithm

◼ Contract nodes if there are cycles

slide-26
SLIDE 26

Chu-Liu Edmonds Algorithm

◼ Recursively compute MST

slide-27
SLIDE 27

Chu-Liu Edmonds Algorithm

◼ Expand contracted nodes

slide-28
SLIDE 28

Chu-Liu Edmonds Algorithm

◼ Expand contracted nodes

Who sees a potential problem?

slide-29
SLIDE 29

Scores

◼ Word forms, lemmas, and parts of speech of the headword and its

dependent.

◼ Corresponding features from the contexts before, after, between the words ◼ Word embeddings / contextual embeddings from LSTM or Transformer ◼ The dependency relation itself ◼ The direction of the relation (to the right or left) ◼ The distance from the head to the dependent

slide-30
SLIDE 30

Parsing Algorithms

◼ Graph based

◼ Minimum Spanning Tree for a sentence ◼ McDonald et al.’s (2005) MSTParser ◼ Martins et al.’s (2009) Turbo Parser

◼ Transition based

◼ Greedy choice of local transitions guided by a good classifier ◼ Deterministic ◼ MaltParser (Nivre et al., 2008)

slide-31
SLIDE 31

Transition Based Parsing

◼ Greedy discriminative dependency parser ◼ Motivated by a stack-based approach called shift-reduce parsing

  • riginally developed for analyzing programming languages (Aho &

Ullman, 1972)

slide-32
SLIDE 32

Configuration

◼ Basic transition-based parser. The parser examines the top two elements of the stack and

selects an action based on consulting an oracle that examines the current configuration

slide-33
SLIDE 33

Configuration

slide-34
SLIDE 34

Operations

At each step choose:

  • Shift
slide-35
SLIDE 35

Operations

At each step choose:

  • Shift
  • LeftArc (Reduce left)
slide-36
SLIDE 36

Operations

At each step choose:

  • Shift
  • LeftArc (Reduce left)
  • RightArc (Reduce right)
slide-37
SLIDE 37

Shift-Reduce Parsing

slide-38
SLIDE 38

Shift-Reduce Parsing

slide-39
SLIDE 39

Shift-Reduce Parsing

slide-40
SLIDE 40

Shift-Reduce Parsing

slide-41
SLIDE 41

Shift-Reduce Parsing

slide-42
SLIDE 42

Shift-Reduce Parsing

slide-43
SLIDE 43

Shift-Reduce Parsing

slide-44
SLIDE 44

Shift-Reduce Parsing

slide-45
SLIDE 45

Shift-Reduce Parsing

slide-46
SLIDE 46

Shift-Reduce Parsing

slide-47
SLIDE 47

Shift-Reduce Parsing

slide-48
SLIDE 48

Shift-Reduce Parsing

slide-49
SLIDE 49

Shift-Reduce Parsing

slide-50
SLIDE 50

Shift-Reduce Parsing

Oracle decisions can correspond to unlabeled or labeled arcs

slide-51
SLIDE 51

Training an Oracle

◼ The Oracle is a supervised classifier that learns a function from the

configuration to the next operation

◼ How to extract the training set?

slide-52
SLIDE 52

Training an Oracle

slide-53
SLIDE 53

Training an Oracle: Features

◼ POS, word-forms, lemmas on the stack/buffer ◼ Morphological features for some languages ◼ Previous relations ◼ Conjunction features

slide-54
SLIDE 54

Learning

◼ Before 2014: SVMs ◼ After 2014: Neural Nets

slide-55
SLIDE 55

Chen & Manning 2014

slide-56
SLIDE 56

Chen & Manning 2014

slide-57
SLIDE 57

Stack LSTM (Dyer et al. 2015)

◼ Instead of recalculating features, configuration updates via NN

slide-58
SLIDE 58

Limitations of Transition Parsers

◼ Oracle prediction - early mistakes are very expensive. Solutions:

◼ Different transition systems (arc-standard vs. arc-eager) ◼ Beam Search

slide-59
SLIDE 59

Limitations of Transition Parsers

◼ Oracle prediction - early mistakes are very expensive. Solutions:

◼ Different transition systems (arc-standard vs. arc-eager) ◼ Beam Search ◼ Can only produce projective trees. Solutions: ◼ Complicate the transition system (SWAP action) ◼ Apply post-parsing, language-specific rules

slide-60
SLIDE 60

Summary

◼ Graph based

◼ + Exact or close-to-exact decoding ◼ - Weaker features

◼ Transition based

◼ + Fast ◼ + Rich features of context ◼ - Greedy decoding ◼ - Projective only