Hebrew Dependency Parsing: Initial Results Yoav Goldberg Michael - - PowerPoint PPT Presentation

hebrew dependency parsing initial results
SMART_READER_LITE
LIVE PREVIEW

Hebrew Dependency Parsing: Initial Results Yoav Goldberg Michael - - PowerPoint PPT Presentation

Hebrew Dependency Parsing: Initial Results Yoav Goldberg Michael Elhadad IWPT 2009, Paris Motivation We want a Hebrew Dependency parser Motivation We want a Hebrew Dependency parser Initial steps: Know Hebrew Create Hebrew


slide-1
SLIDE 1

Hebrew Dependency Parsing: Initial Results

Yoav Goldberg Michael Elhadad IWPT 2009, Paris

slide-2
SLIDE 2

Motivation

◮ We want a Hebrew Dependency parser

slide-3
SLIDE 3

Motivation

◮ We want a Hebrew Dependency parser ◮ Initial steps:

◮ Know Hebrew ◮ Create Hebrew Dependency Treebank ◮ Experiment with existing state-of-the-art systems

slide-4
SLIDE 4

Motivation

◮ We want a Hebrew Dependency parser ◮ Initial steps:

◮ Know Hebrew ◮ Create Hebrew Dependency Treebank ◮ Experiment with existing state-of-the-art systems

◮ Next year:

◮ Do better

slide-5
SLIDE 5

Motivation

◮ We want a Hebrew Dependency parser ◮ Initial steps:

Know Hebrew

Create Hebrew Dependency Treebank

Experiment with existing state-of-the-art systems

◮ Next year:

◮ Do better

slide-6
SLIDE 6

Know Hebrew

slide-7
SLIDE 7

Know Hebrew

◮ Relatively free constituent order

◮ Suitable for a dependency based representation

Mostly SVO, but OVS, VSO also possible. Verbal arguments appear before or after the verb.

◮ went from-Israel to-Paris ◮ went to-Paris from-Israel ◮ to-Paris from-Israel went ◮ to-Paris went from-Israel

. . .

slide-8
SLIDE 8

Know Hebrew

◮ Relatively free constituent order ◮ Rich morphology

◮ Many word forms ◮ Agreement – noun/adj, verb/subj: should help parsing!

slide-9
SLIDE 9

Know Hebrew

◮ Relatively free constituent order ◮ Rich morphology ◮ Agglutination

◮ Many function words are attached to the next token ◮ Together with rich morphology ⇒ Very High Ambiguity ◮ Leaves of tree not known in advance!

slide-10
SLIDE 10

Hebrew Dependency Treebank

◮ Converted from Hebrew Constituency Treebank (V2)

◮ Some heads marked in Treebank ◮ For others: (extended) head percolation table from Reut

Tsarfaty

◮ 6220 sentences ◮ 34 non-projective sentences

slide-11
SLIDE 11

Hebrew Dependency Treebank

◮ Choice of heads

slide-12
SLIDE 12

Hebrew Dependency Treebank

◮ Choice of heads

◮ Prepositions are head of PPs

slide-13
SLIDE 13

Hebrew Dependency Treebank

◮ Choice of heads

◮ Prepositions are head of PPs ◮ Relativizers are head of Relative clauses

slide-14
SLIDE 14

Hebrew Dependency Treebank

◮ Choice of heads

◮ Prepositions are head of PPs ◮ Relativizers are head of Relative clauses ◮ Main verb is head of infinitive verb

slide-15
SLIDE 15

Hebrew Dependency Treebank

◮ Choice of heads

◮ Prepositions are head of PPs ◮ Relativizers are head of Relative clauses ◮ Main verb is head of infinitive verb ◮ Coordinators are head of Conjunctions

slide-16
SLIDE 16

Hebrew Dependency Treebank

◮ Choice of heads

◮ Prepositions are head of PPs ◮ Relativizers are head of Relative clauses ◮ Main verb is head of infinitive verb ◮ Coordinators are head of Conjunctions ← hard for parsers

slide-17
SLIDE 17

Hebrew Dependency Treebank

Dependency labels

◮ Marked in TBv2

◮ OBJ ◮ SUBJ ◮ COMP

◮ Trivially added

◮ ROOT ◮ suffix-inflections

◮ We are investigating ways of adding more labels ◮ This work focus on unlabeled dependency parsing.

slide-18
SLIDE 18

Experiments

slide-19
SLIDE 19

Parameters

Graph vs. Transitions How important is lexicalization? Does morphology help?

slide-20
SLIDE 20

Parsers

◮ Transition based: MALTPARSER (Joakim Nivre)

◮ MALT: malt parser, out-of-box feature set ◮ MALTARA: malt parser, arabic optimized feature set (should

do morphology..)

◮ Graph based: MST PARSER (Ryan Mcdonald)

◮ MST1: first order MST parser ◮ MST2: second order MST parser

slide-21
SLIDE 21

Experimental Setup

◮ Oracle setting:

use gold morphology/tagging/segmentation

◮ Pipeline setting:

use tagger based morphology/tagging/segmentation

slide-22
SLIDE 22

Results

Features MST1 MST2 MALT MALT-ARA

  • MORPH

Full Lex 83.60 84.31 80.77 80.32 Lex 20 82.99 84.52 79.69 79.40 Lex 100 82.56 83.12 78.66 78.56

+MORPH

Full Lex 83.60 84.39 80.77 80.73 Lex 20 83.60 84.77 79.69 79.84 Lex 100 83.23 83.80 78.66 78.56

Table: oracle token segmentation and POS-tagging.

Features MST1 MST2 MALT MALT-ARA

  • MORPH

Full Lex 75.64 76.38 73.03 72.94 Lex 20 75.48 76.41 72.04 71.88 Lex 100 74.97 75.49 70.93 70.73

+MORPH

Full Lex 73.90 74.62 73.03 73.43 Lex 20 73.56 74.41 72.04 72.30 Lex 100 72.90 73.78 70.93 70.97

Table: Tagger token segmentation and POS-tagging.

slide-23
SLIDE 23

Results

Best oracle result: 84.77% Best real result: 76.41%

slide-24
SLIDE 24

Results

MST2 > MST1 > MALT

slide-25
SLIDE 25

Results

MST2 > MST1 > MALT

Simply a better model

slide-26
SLIDE 26

Results

MST2 > MST1 > MALT

Partly because of coordination representation

slide-27
SLIDE 27

Results Lexical items appearing > 20 times ∼ all lexical items

slide-28
SLIDE 28

Results With Oracle Morphology

◮ Morphological features don’t really help

slide-29
SLIDE 29

Results With Tagger Morphology

◮ Morphological features help MALT a little ◮ Morphological features hurt MST a lot

slide-30
SLIDE 30

Where do we go from here?

◮ We have a Hebrew Dependency Treebank ◮ Realistic performance still too low ◮ Current models don’t utilize morphological information well

◮ Can we do better?

◮ Pipeline model hurt performance

◮ Can we do parsing, tagging and segmentation jointly?