natural language processing csep 517 dependency syntax
play

Natural Language Processing (CSEP 517): Dependency Syntax and - PowerPoint PPT Presentation

Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith 2017 c University of Washington nasmith@cs.washington.edu May 1, 2017 1 / 96 To-Do List Online quiz: due Sunday Read: K ubler et al. (2009, ch.


  1. Natural Language Processing (CSEP 517): Dependency Syntax and Parsing Noah Smith � 2017 c University of Washington nasmith@cs.washington.edu May 1, 2017 1 / 96

  2. To-Do List ◮ Online quiz: due Sunday ◮ Read: K¨ ubler et al. (2009, ch. 1, 2, 6) ◮ A3 due May 7 (Sunday) ◮ A4 due May 14 (Sunday) 2 / 96

  3. Dependencies Informally, you can think of dependency structures as a transformation of phrase-structures that ◮ maintains the word-to-word relationships induced by lexicalization, ◮ adds labels to them, and ◮ eliminates the phrase categories. There are also linguistic theories built on dependencies (Tesni` ere, 1959; Mel’ˇ cuk, 1987), as well as treebanks corresponding to those. ◮ Free(r)-word order languages (e.g., Czech) 3 / 96

  4. Dependency Tree: Definition Let x = � x 1 , . . . , x n � be a sentence. Add a special root symbol as “ x 0 .” A dependency tree consists of a set of tuples � p, c, ℓ � , where ◮ p ∈ { 0 , . . . , n } is the index of a parent ◮ c ∈ { 1 , . . . , n } is the index of a child ◮ ℓ ∈ L is a label Different annotation schemes define different label sets L , and different constraints on the set of tuples. Most commonly: ◮ The tuple is represented as a directed edge from x p to x c with label ℓ . ◮ The directed edges form an arborescence (directed tree) with x 0 as the root (sometimes denoted root ). 4 / 96

  5. Example S NP VP Pronoun Verb NP we wash Determiner Noun our cats Phrase-structure tree. 5 / 96

  6. Example S NP VP Pronoun NP Verb we wash Determiner Noun our cats Phrase-structure tree with heads. 6 / 96

  7. Example S wash NP we VP wash Pronoun we NP cats Verb wash we wash Determiner our Noun cats our cats Phrase-structure tree with heads, lexicalized. 7 / 96

  8. Example we wash our cats “Bare bones” dependency tree. 8 / 96

  9. Example we wash our cats who stink 9 / 96

  10. Example we vigorously wash our cats who stink 10 / 96

  11. Content Heads vs. Function Heads Credit: Nathan Schneider little kids were always watching birds with fish little kids were always watching birds with fish 11 / 96

  12. Labels root pobj sbj dobj prep kids saw birds with fish Key dependency relations captured in the labels include: subject, direct object, preposition object, adjectival modifier, adverbial modifier. In this lecture, I will mostly not discuss labels, to keep the algorithms simpler. 12 / 96

  13. Coordination Structures we vigorously wash our cats and dogs who stink The bugbear of dependency syntax. 13 / 96

  14. Example we vigorously wash our cats and dogs who stink Make the first conjunct the head? 14 / 96

  15. Example we vigorously wash our cats and dogs who stink Make the coordinating conjunction the head? 15 / 96

  16. Example we vigorously wash our cats and dogs who stink Make the second conjunct the head? 16 / 96

  17. Dependency Schemes ◮ Transform the treebank: define “head rules” that can select the head child of any node in a phrase-structure tree and label the dependencies. ◮ More powerful, less local rule sets, possibly collapsing some words into arc labels. ◮ Stanford dependencies are a popular example (de Marneffe et al., 2006). ◮ Direct annotation. 17 / 96

  18. Three Approaches to Dependency Parsing 1. Dynamic programming with the Eisner algorithm. 2. Transition-based parsing with a stack. 3. Chu-Liu-Edmonds algorithm for arborescences. 18 / 96

  19. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . 19 / 96

  20. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . A bilexical dependency grammar binarizes the dependents, generating only one per rule. 20 / 96

  21. Dependencies and Grammar Context-free grammars can be used to encode dependency structures. For every head word and constellation of dependent children: N head → N leftmost-sibling . . . N head . . . N rightmost-sibling And for every v ∈ V : N v → v and S → N v . A bilexical dependency grammar binarizes the dependents, generating only one per rule. Such a grammar can produce only projective trees, which are (informally) trees in which the arcs don’t cross. 21 / 96

  22. Bilexical Dependency Grammar: Derivation S N wash N we N wash we N wash N cats N our N cats wash our cats ıvely, the CKY algorithm will require O ( n 5 ) runtime. Why? Na¨ 22 / 96

  23. CKY for Bilexical Context-Free Grammars N x h N x c N x c N x h i j j + 1 k i j j + 1 k p ( N x h N x c | N x h ) p ( N x c N x h | N x h ) N x h N x h i k i k 23 / 96

  24. CKY Example goal S Nwash Nwash Ncats Nwe Nwash Nour. Ncats we wash our cats 24 / 96

  25. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) h d d h c h h c Items: ◮ Both triangles indicate that x d is a descendant of x h . ◮ Both trapezoids indicate that x c can be attached as the child of x h . ◮ In all cases, the words “in between” are descendants of x h . 25 / 96

  26. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Initialization: p ( x i | N x i ) 1 i i i i Goal: i n 1 i p ( N x i | S ) goal 26 / 96

  27. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Attaching a left dependent: Complete a left child: i j j + 1 k i j j k p ( N x i N x k | N x k ) i k i k 27 / 96

  28. Dependency Parsing with the Eisner Algorithm (Eisner, 1996) Attaching a right dependent: Complete a right child: i j j + 1 k i j j k p ( N x i N x k | N x i ) i k i k 28 / 96

  29. Eisner Algorithm Example goal we wash our cats 29 / 96

  30. Three Approaches to Dependency Parsing 1. Dynamic programming with the Eisner algorithm. 2. Transition-based parsing with a stack. 3. Chu-Liu-Edmonds algorithm for arborescences. 30 / 96

  31. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. 31 / 96

  32. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . 32 / 96

  33. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. 33 / 96

  34. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. ◮ The “arc standard” transition set (Nivre, 2004): ◮ shift the word at the front of the buffer B onto the stack S . ◮ right-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v → u ) . ◮ left-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v ← u ) . (For labeled parsing, add labels to the right-arc and left-arc transitions.) 34 / 96

  35. Transition-Based Parsing ◮ Process x once, from left to right, making a sequence of greedy parsing decisions. ◮ Formally, the parser is a state machine ( not a finite-state machine) whose state is represented by a stack S and a buffer B . ◮ Initialize the buffer to contain x and the stack to contain the root symbol. ◮ The “arc standard” transition set (Nivre, 2004): ◮ shift the word at the front of the buffer B onto the stack S . ◮ right-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v → u ) . ◮ left-arc : u = pop ( S ) ; v = pop ( S ) ; push ( S, v ← u ) . (For labeled parsing, add labels to the right-arc and left-arc transitions.) ◮ During parsing, apply a classifier to decide which transition to take next, greedily. No backtracking. 35 / 96

  36. Transition-Based Parsing: Example Buffer B : we vigorously Stack S : wash our root cats who stink Actions: 36 / 96

  37. Transition-Based Parsing: Example Buffer B : vigorously Stack S : wash our we cats root who stink Actions: shift 37 / 96

  38. Transition-Based Parsing: Example Buffer B : Stack S : wash our vigorously cats we who root stink Actions: shift shift 38 / 96

  39. Transition-Based Parsing: Example Buffer B : Stack S : our wash cats vigorously who we stink root Actions: shift shift shift 39 / 96

  40. Transition-Based Parsing: Example Stack S : Buffer B : our cats vigorously wash who stink we root Actions: shift shift shift left-arc 40 / 96

  41. Transition-Based Parsing: Example Stack S : Buffer B : our cats who we vigorously wash stink root Actions: shift shift shift left-arc left-arc 41 / 96

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend