PASSAGE: From French Parser Evaluation to Large Sized Treebanks - - PowerPoint PPT Presentation

passage
SMART_READER_LITE
LIVE PREVIEW

PASSAGE: From French Parser Evaluation to Large Sized Treebanks - - PowerPoint PPT Presentation

PASSAGE: From French Parser Evaluation to Large Sized Treebanks http://atoll.inria.fr/passage ric de la Clergerie (INRIA) Olivier Hamon (ELDA-LIPN) Djamel Mostefa (ELDA) Christelle Ayache (ELDA) Patrick Paroubek (CNRS/LIMSI) Anne Vilnat


slide-1
SLIDE 1

INRIA

PASSAGE:

From French Parser Evaluation to Large Sized Treebanks

http://atoll.inria.fr/passage Éric de la Clergerie (INRIA) Olivier Hamon (ELDA-LIPN) Djamel Mostefa (ELDA) Christelle Ayache (ELDA) Patrick Paroubek (CNRS/LIMSI) Anne Vilnat (CNRS/LIMSI) LREC’08 Marrakech, May 29th 2008

INRIA É. de la Clergerie & al PASSAGE 05/29/08 1 / 20

slide-2
SLIDE 2

INRIA

From EASy to Passage

EASy (2003–2006) French Technolangue program First French parsing evaluation campaign 15 parsers

INRIA É. de la Clergerie & al PASSAGE 05/29/08 4 / 20

slide-3
SLIDE 3

INRIA

From EASy to Passage

EASy (2003–2006) French Technolangue program First French parsing evaluation campaign 15 parsers PASSAGE (2007–2009) French ANR MDCA Evaluation & much more Dynamic Treebank Benefits from EASy : Existence of several parsers for French These parsers are able to produce EASy annotations

INRIA É. de la Clergerie & al PASSAGE 05/29/08 4 / 20

slide-4
SLIDE 4

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-5
SLIDE 5

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Parser3 Annotations3 Parser2 Annotations2 Parser1 Annotations1

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-6
SLIDE 6

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Merging ROVER Parser3 Annotations3 Parser2 Annotations2 Parser1 Annotations1

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-7
SLIDE 7

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Merging ROVER Parser3 Annotations3 Parser2 Annotations2 Parser1 Annotations1 Lexicon Acquisition/integration

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-8
SLIDE 8

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Merging ROVER Parser3 Annotations3 Parser2 Annotations2 Parser1 Annotations1 Lexicon Acquisition/integration Exploitation

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-9
SLIDE 9

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Merging ROVER EASy Treebank 80Kwords Parser3 Annotations3 eval Parser2 Annotations2 eval Parser1 Annotations1 eval Lexicon Acquisition/integration Exploitation

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-10
SLIDE 10

INRIA

Entering a virtuous loop between tools and resources

Corpus brut 100Mwords Merging ROVER Reference Treebank 400Kwords Validation Parser3 Annotations3 eval Parser2 Annotations2 eval Parser1 Annotations1 eval Lexicon Acquisition/integration Exploitation

INRIA É. de la Clergerie & al PASSAGE 05/29/08 5 / 20

slide-11
SLIDE 11

INRIA

Consortium

ALPAGE INRIA Paris7 LIR/LIMSI TALARIS/LORIA LIC2M/CEA-LIST

INRIA É. de la Clergerie & al PASSAGE 05/29/08 6 / 20

slide-12
SLIDE 12

INRIA

Consortium

ALPAGE INRIA Paris7 LIR/LIMSI TALARIS/LORIA LIC2M/CEA-LIST ELDA TAGMATICA LPL SYNAPSE XRCE LIRMM

INRIA É. de la Clergerie & al PASSAGE 05/29/08 6 / 20

slide-13
SLIDE 13

INRIA

10 parsers cooperating

An unique opportunity, source of diversity (formalisms, technologies, . . . ) Parsers From Nature FRMG INRIA TIG/TAG+DYALOG SXLFG INRIA LFG+SYNTAX LLP2 LORIA TAG LIMA CEA-LIST Rule system TAGPARSER TAGMATICA Induction + rules GP1 & GP2 LPL Property grammars CORDIAL SYNAPSE Rule-based SYGMART LIRMM XIP XRCE Rule-based cascade

INRIA É. de la Clergerie & al PASSAGE 05/29/08 7 / 20

slide-14
SLIDE 14

INRIA

Exploiting large corpora

Treebanks are very valuable for NLP but rare and costly to develop. On the other hand, easy to access large amount of electronic French documents : Corpus Size Type EASy Corpus 1Mwords multi-styles Wikipedia Fr ∼ 86Mwords collaborative encyclopedia Wikisources ∼ 80Mwords free literacy Monde Diplomatique 18Mwords journalistic FRANTEXT 20Mwords free literacy Europarl 28Mwords European Parliament debates JRC-Acquis 39Mwords European Law Corpus Ester 1Mwords Speech transcription Total (current) > 270 Mmots

INRIA É. de la Clergerie & al PASSAGE 05/29/08 8 / 20

slide-15
SLIDE 15

INRIA

EASy annotations

Based on 6 kinds of chunks and 14 kinds of dependencies

INRIA É. de la Clergerie & al PASSAGE 05/29/08 9 / 20

slide-16
SLIDE 16

INRIA

EASy annotations

Based on 6 kinds of chunks and 14 kinds of dependencies

Type Explanation GN Nominal Chunk NV Verbal Kernel GA Adjectival Chunk GR Adverbial Chunk GP Prepositional Chunk PV

  • Prep. Verbal Ker-

nel

INRIA É. de la Clergerie & al PASSAGE 05/29/08 9 / 20

slide-17
SLIDE 17

INRIA

EASy annotations

Based on 6 kinds of chunks and 14 kinds of dependencies

Type Explanation GN Nominal Chunk NV Verbal Kernel GA Adjectival Chunk GR Adverbial Chunk GP Prepositional Chunk PV

  • Prep. Verbal Ker-

nel Type Anchors Explanation SUJ-V suject,verb Subject-verb dep. AUX-V auxiliary, verb Aux-verb dep. COD-V

  • bject, verb

direct objects CPL-V complement, verb

  • ther verb complements

MOD-V modifier, verb verb modifiers COMP complementizer, verb subordinate sentences ATB-SO attribute, verb verb attribute MOD-N modifier, noun noun modifier MOD-A mod., adjec- tive adjective modifier MOD-R mod., adverb adverb modifier MOD-P mod., prep.

  • prep. modifier

COORD coord., left, right coordination APPOS first, second apposition JUXT first, second juxtaposition

INRIA É. de la Clergerie & al PASSAGE 05/29/08 9 / 20

slide-18
SLIDE 18

INRIA

EASy annotations (cont’d)

INRIA É. de la Clergerie & al PASSAGE 05/29/08 10 / 20

slide-19
SLIDE 19

INRIA

Evaluating the parsers

Expertise of EASy : [EASy] End of 2004 (1week) Best results (f-measure) : Chunks > 80% Dependencies > 50% [Passage1] Fall 2007 (2months, closed on Dec. 21st 2007) Best results (f-measure) : Chunks > 90% Dependencies > 60%

◮ Objective : calibrate the ROVER

[Passage2] End of 2009

◮ To be done on new data (from the reference treebank) ◮ Objective : To assess the evolutions of the parsers INRIA É. de la Clergerie & al PASSAGE 05/29/08 11 / 20

slide-20
SLIDE 20

INRIA

Data sets

EASy corpus (1Mw) 40K tokenized sentences journalistic, literacy,

  • ral, mail, medical, . . .

INRIA É. de la Clergerie & al PASSAGE 05/29/08 12 / 20

slide-21
SLIDE 21

INRIA

Data sets

EASy corpus (1Mw) 40K tokenized sentences journalistic, literacy,

  • ral, mail, medical, . . .

easydev (76Kw) 4K annotated sentences known to participants

INRIA É. de la Clergerie & al PASSAGE 05/29/08 12 / 20

slide-22
SLIDE 22

INRIA

Data sets

EASy corpus (1Mw) 40K tokenized sentences journalistic, literacy,

  • ral, mail, medical, . . .

easydev (76Kw) 4K annotated sentences known to participants easytest 400 new annotated sentences

INRIA É. de la Clergerie & al PASSAGE 05/29/08 12 / 20

slide-23
SLIDE 23

INRIA

Data sets

EASy corpus (1Mw) 40K tokenized sentences journalistic, literacy,

  • ral, mail, medical, . . .

easydev (76Kw) 4K annotated sentences known to participants easytest 400 new annotated sentences passagedev (900Kw) un-tokenized text wikipedia wikinews wikibooks europarl jrc-acquis ester lemonde

INRIA É. de la Clergerie & al PASSAGE 05/29/08 12 / 20

slide-24
SLIDE 24

INRIA

WEB-based evaluation server

Use of a WEB-based evaluation server

◮ Centralized information/data ◮ Allow multiple evaluations ◮ Instant feedback for participants

precision, recall, f-measure, plots, logs, . . .

Procedure :

◮ Server opened for 2 months ◮ Participants upload their outputs ◮ Each output submitted is evaluated automatically on the easydev data set

= ⇒ immediate feedback

◮ Results are kept on the server (max. of ten kept) ◮ Before the end, each participant selects a primary submission ◮ After the closing, access on the server to the results for the primary

submission on the easytest data set.

Conclusion : a very positive initiative

◮ Participant P5 submitted more than 50 runs, improving f-measure on chunks

from 92.5% to 96% in a few weeks

◮ =

⇒ the server has been re-opened for new submissions

INRIA É. de la Clergerie & al PASSAGE 05/29/08 13 / 20

slide-25
SLIDE 25

INRIA

Results on Chunks and Relations (on Test data)

Chunks 10 systems F-measure : f #sys > 90% 7 > 80% 3 Relations 7 systems F-measure : f #sys > 60% 3 > 50% 2 > 40% 2

INRIA É. de la Clergerie & al PASSAGE 05/29/08 14 / 20

slide-26
SLIDE 26

INRIA

Result landscape (easytest data set)

Performance on Chunks seems very stable wrt corpus and wrt types for this specific system. Performances on Relations are less stable, more dependent on relation types Do we retrieve these properties for the other systems ?

INRIA É. de la Clergerie & al PASSAGE 05/29/08 15 / 20

slide-27
SLIDE 27

INRIA

System stability (easytest data set)

Tested using weighted variance Important to assess the level of stability (confidence) for each system wrt corpus type (variances very good, specially for chunks), annotation type (larger variances, specially for relations) and possibly more specific contexts.

INRIA É. de la Clergerie & al PASSAGE 05/29/08 16 / 20

slide-28
SLIDE 28

INRIA

The ROVER : combining the annotations

Using a ROVER (Recognizer output voting error reduction) to combine the various sets of annotations : tried in speech, tagging, translation, . . . based on majority vote pondering parser weights using evaluation results per kind of chunks & relations, corpus style, . . . feedback on agreement between basic and weighted majority control through manual validation iterative process Issues : ensure the coherence of the ROVER annotations

  • n chunks : already good results from the parsers

and mostly local effects (chunks are small)

  • n dependencies : more complex have to enforce dependency properties

(single governor, projectivity, . . . )

  • n relationships between chunks and dependencies

INRIA É. de la Clergerie & al PASSAGE 05/29/08 17 / 20

slide-29
SLIDE 29

INRIA

ROVER : preliminary algorithm

apply next steps first on chunks, then on relations (constrained by selected chunks) confidence for an annotation a of type τ in corpus c produced by participant i with rank r : conf(i)(a) = (|systems| − (r − 1)) ∗ prec(i)

c,τ

annotation a is selected if p(a) = Σiconf(i)(a) |{i|i returns a}| ≥ max

i

(precc,τ) still very preliminary : many parameters to try selection order, confidence weighting, selection threshold, consistence modeling, annotation similarities, . . . the current algorithm favors precision over recall (maybe to be changed)

INRIA É. de la Clergerie & al PASSAGE 05/29/08 18 / 20

slide-30
SLIDE 30

INRIA

ROVER : preliminary results (on 6 participants)

bold : where ROVER has winning precision but even when not the best one, not far from the best participant and get a confidence level for each annotation

INRIA É. de la Clergerie & al PASSAGE 05/29/08 19 / 20

slide-31
SLIDE 31

INRIA

Next steps

applying machine learning techniques to fine tune the ROVER but already encouraging results deploying a larger scale infrastructure to work on large corpora couping WEB-based evaluation server + ROVER + prototype WEB service EASYREF (to view and edit annotations) = ⇒ iterative collaborative controlled process progressively covering the full Passage corpus (>100Mwords)

INRIA É. de la Clergerie & al PASSAGE 05/29/08 20 / 20

slide-32
SLIDE 32

INRIA

Next steps

applying machine learning techniques to fine tune the ROVER but already encouraging results deploying a larger scale infrastructure to work on large corpora couping WEB-based evaluation server + ROVER + prototype WEB service EASYREF (to view and edit annotations) = ⇒ iterative collaborative controlled process progressively covering the full Passage corpus (>100Mwords)

Thank you !

INRIA É. de la Clergerie & al PASSAGE 05/29/08 20 / 20