Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 - - PowerPoint PPT Presentation

transition based
SMART_READER_LITE
LIVE PREVIEW

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 - - PowerPoint PPT Presentation

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 David Przybilla davida@coli.uni-saarland.de Outline 1. MaltParser 2. Transition Based Parsing a. Example b. Oracle 3. Integrating Graph and Transition Based 4. Non


slide-1
SLIDE 1

Transition-Based Dependency Parsing

Saarbrücken, December 23rd 2011 David Przybilla – davida@coli.uni-saarland.de

slide-2
SLIDE 2

Outline

  • 1. MaltParser
  • 2. Transition Based Parsing
  • a. Example
  • b. Oracle
  • 3. Integrating Graph and Transition Based
  • 4. Non –Projective Dependency Parsing
slide-3
SLIDE 3

MaltParser

  • Different Languages ( No tuning for an Specific Lang)
  • Language independent: accurate parsing for a wide

variety of languages

  • Accuracy between 80% and 90%
  • Deterministic

Treebank

MaltParser

Dependency Parser ( Transition Based )

input

  • utput
slide-4
SLIDE 4

Transition Based Parsing

Stack Buffer 𝑋

𝑗

𝑋

𝑗+1

𝑋

𝑗+2

𝑋

𝑙

𝑋

𝑦

.. ..

  • Shift
  • Left-arc
  • Right-arc
  • Reduction

Transitions Actions

slide-5
SLIDE 5
slide-6
SLIDE 6

Example

John hit the ball

Stack Buffer John hit the ball

slide-7
SLIDE 7

Example

John hit the ball

Stack Buffer John hit the ball Transition=Shift

slide-8
SLIDE 8

Example

John hit the ball

Stack Buffer John hit the ball Transition=left Arc

Subj

Only if ℎ(𝑘𝑝ℎ𝑜) = 0

slide-9
SLIDE 9

Example

John hit the ball

Stack Buffer hit ball Transition=Shift

Subj

the

slide-10
SLIDE 10

Example

John hit the ball

Stack Buffer hit the ball Transition=Shift

Subj

slide-11
SLIDE 11

Example

John hit the ball

Stack Buffer hit the ball Transition=left Arc

Subj Det

Only if ℎ(𝑢ℎ𝑓) = 0

slide-12
SLIDE 12

Example

John hit the ball

Stack Buffer hit ball Transition=Right Arc

Subj Det Obj

Only if ℎ(𝑐𝑏𝑚𝑚) = 0

slide-13
SLIDE 13

Example

John hit the ball

Stack Buffer hit ball

Subj Det Obj

Buffer is Empty= Terminal Configuration

slide-14
SLIDE 14

Transition Based Parsing

Stack Buffer 𝑋

𝑗

𝑋

𝑗+1

𝑋

𝑗+2

𝑋

𝑙

𝑋

𝑦

.. .. Reduction Stack 𝑋

𝑦

.. .. Buffer 𝑋

𝑗

𝑋

𝑗+1

𝑋

𝑗+2

𝑋

𝑙

𝑇𝑓𝑜𝑢𝑓𝑜𝑑𝑓: 𝑋

𝑦 𝑋 𝑙…𝑋 𝑗 𝑋 𝑗+1 … .

Only if ℎ(𝑋

𝑙) ≠ 0

slide-15
SLIDE 15

Oracle

  • Greedy Algorithm, choose a local optimal hoping it will lead

to the global optimal

  • It makes Transition Based Algorithm Deterministic.
  • Originally there might be more than one possible transition from
  • ne configuration to another
  • Construct the Optimal Transition sequence for the

Input Sentence

  • How to Build the Oracle? Build a Classifier
slide-16
SLIDE 16

Classifier

The Classifier

Classes:

  • Shift
  • Left-arc
  • Right-arc
  • Reduction

Feature Vector (Features)

  • POS of words in the Buffer

and Stack

  • Words themselves
  • The First Word in the Stack
  • The L World in the Buffer
  • The current arcs in the

Graph

slide-17
SLIDE 17

Results of the MaltParser

  • Evaluation Metrics:
  • ASU (Unlabeled Attachment Score): Proportion of Tokens assigned

the correct head

  • ASL(Labeled Attachment Score): Proportion of tokens assigned with

the correct head and the correct dependency type

slide-18
SLIDE 18

Results of the MaltParser

More flexible Word order Rich Morphology More Inflexible Word order, ‘poor’ Morphology English Chinese Czech Turkish Danish Dutch Italian Swedish German

Goal -> Evaluate if Maltparser can do reasonably accurate parsing for a wide variety of languages

slide-19
SLIDE 19

Results of the MaltParser

slide-20
SLIDE 20

Results of the MaltParser

  • Results:
  • Above 80% unlabeled dependency Accuracy (ASU) for all languages
  • morphological richness and word order are the cause of variation

across languages In General lower accuracy for languages like Czech and Turkish.

There are more non-projective structures in those languages

  • It is difficult to do Cross-Language Comparison:

– Big difference in the amount of annotated data – existence of accurate POS Taggers..

State of the art for Italian, Swedish, Danish, Turkish

slide-21
SLIDE 21

Graph Based vs Transition Based

Graph Based

  • Search for Optimal Graph

(Highest Scoring Graph)

  • Globally Trained(Global

Optimal)

  • Limited History of Parsing

Desitions

  • Less rich feature

representation

Transition Based

  • Search for Optimal Graph by finding

the best transition between two

  • states. (Local Optimal Desitions)
  • Locally Trained (configurations)
  • Rich History of Parsing Desitions
  • More rich feature but Error

Propagation (Greedy Alg.)

slide-22
SLIDE 22

Graph Based vs Transition Based

Graph Based (MST)

  • Better for Long

Dependencies

  • More accurate for

dependents that are :

  • Verbs
  • Adjectives
  • Adverbs

Transition Based(Malt)

  • Better for Short dependencies
  • More accurate for dependents

that are:

  • Nouns
  • Pronouns

Integrate Both Approaches

slide-23
SLIDE 23

Integrating Graph and Transition Based

Treebank T Malt Parser Transition Based Parser Parse sed d T

  • Integrate both approaches at learning time.

MST Parser

  • Base MSTParser guided by Malt

Treebank T MST Parser Transition Based Parser Malt Parser

  • Base MALTParser guided by MLT

Parse sed d T

slide-24
SLIDE 24

Features used in the Integration

  • MSTParser guided by

Malt

  • Is arc (𝑗, 𝑘,∗) in 𝐻𝑛𝑏𝑚𝑢
  • Is arc (𝑗, 𝑘, 𝑚) in 𝐻𝑛𝑏𝑚𝑢
  • Is arc 𝑗, 𝑘,∗ 𝑜𝑝𝑢 in 𝐻𝑛𝑏𝑚𝑢
  • Identity of 𝑚’ such that

𝑗, 𝑘, 𝑚′ is in 𝐻𝑛𝑏𝑚𝑢

  • ..

MaltParser guided by MST

  • Is arc (𝑇0, 𝐶0,∗) in 𝐻𝑛𝑡𝑢
  • Is arc (𝐶0, 𝑇0,∗) in 𝐻𝑛𝑡𝑢
  • Head direction of 𝐶0 in 𝐻𝑛𝑡𝑢

(left,right,root..)

  • Identity of 𝑚’ such that ∗, 𝐶0, 𝑚′

is in 𝐻𝑛𝑡𝑢 𝑇0=fist element of the Stack, 𝐶0 =First element of the Buffer

slide-25
SLIDE 25

Results of Integration

Asl(Correct head And Correct Label)

slide-26
SLIDE 26

Results of Integration

Asl(Correct head And Correct Label)

slide-27
SLIDE 27

Results of Integration

Asl(Correct head And Correct Label)

slide-28
SLIDE 28

Results of Integration

  • Graph-based models predict better long arcs
  • Each model learn streghts from the others
  • The integration actually improves accuracy
  • Trying to do more chaining of systems do not

gain better accuracy

slide-29
SLIDE 29

Non-Projectivity

  • Some Sentences have long distance dependencies which

cannot be parsed with this algorithm

  • Cause it only consider relations between neighbors words
  • 25% or more of the sentences in some languages are non-

projective

  • Useful for some languages with less constraints on word
  • rder
  • Harder Problem, There could be relations over unbounded

distances.

slide-30
SLIDE 30

Non-Projectivity

A dependency Tree 𝑈 is Projective:

if for every 𝐵𝑠𝑑 (𝑋

𝑗, 𝑋 𝑘, 𝑠𝑓𝑚) there is a path from 𝑋 𝑗 to 𝑋 𝑙 , if 𝑋 𝑙

is between 𝑋

𝑗 and 𝑋 𝑘

From ‘Scheduled’ 𝑋

2 there is an arc to 𝑋 5 however there is no

way to get to 𝑋

4, 𝑋 3 from 𝑋 2

slide-31
SLIDE 31

Non-Projectivity

  • Why the previous transition algorithm would not be able to

generate this tree?

Stack Buffer is hearing On … … ‘is’ can never be reduced ‘hearing’ and ‘on’ will never get an arc

slide-32
SLIDE 32

Handling Non-Projectivity

  • Add a new Transition – ’’Swap’’

Stack Buffer 𝑋

𝑙

𝑋

𝑗

𝑋

𝑗+1

Stack Buffer 𝑋

𝑙

𝑋

𝑗

𝑋

𝑗+1

swap

  • Re-Order the initial Input Sentance
slide-33
SLIDE 33

Non-Projectivity

Stack Buffer is hearing On … … Stack Buffer is .. Hearing On … swap

slide-34
SLIDE 34

Non-Projective Dependency Parsing

  • Useful for some languages with less constraints on word
  • rder

Theoretically

  • Best case 𝑃(𝑂), , that is: no swaps
  • Worst Case 𝑃(𝑂2),
slide-35
SLIDE 35

Results Non-Projective Dependency Parsing

Running Time

  • Test on 5 languages( Danish, Arabic, Czech, Slovene, Turkish)
  • In practice the running time is 𝑃 𝑂 .

Parsing Accuracy

  • Criteria
  • Attachment Score: Percentage of tokens with correct head and

dependency label

  • Exact match: completely correct labeled dependency tree
slide-36
SLIDE 36

Results Non-Projective Dependency Parsing

  • Systems Compared
  • 𝑻𝒗= allowing Non Projective
  • 𝑻𝒒 =Just Projective
  • 𝑻𝒒𝒒=Handling non-Projectivity as a pos-processing
  • AS: Percentage of tokens with correct head and dependency label
  • EM: completely correct labeled dependency tree
slide-37
SLIDE 37

Results Non-Projective Dependency Parsing

  • AS
  • Performance of 𝑇𝑣 is better for for:

– Czech and Slovene  more non-porjective arcs in this languages.

  • In AS 𝑇𝑣 is lower than 𝑇𝑞, however the drop is not really significant
  • For Arabic the results are not meaningful since there are only 11 non-

projective arcs in the whole set

  • ME
  • 𝑇𝑣 outperforms all other parsers.
  • The positive effect of 𝑇𝑣 is dependent on the non-projectivity arcs in the

language

slide-38
SLIDE 38
slide-39
SLIDE 39

References

  • Joakim Nivre, Jens Nilsson, Johan Hall, Atanas Chanev, Gülsen Eryigit, Sandra

Kübler, Svetoslav Marinov, and Erwin Marsi. Maltparser: a language- independent system for data-driven dependency parsing. Natural Language Engineering, 13(1):1–41, 2007.

  • Joakim Nivre and Ryan McDonald. Integrating graph-based and transition-

based dependency parsers. In Proceedings of ACL-08: HLT, pages 950–958, Columbus, Ohio, June 2008.

  • Joakim Nivre. Non-projective dependency parsing in expected linear time.

In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 351–359, Suntec, Singapore, 2009.

  • Sandra Kübler, Ryan McDonald, Joakim Nivre. Dependency Parsing, Morgan &

Claypool Publishers, 2009