EASY, Evaluation of Parsers of French:what are the results? P. - - PowerPoint PPT Presentation

easy evaluation of parsers of french what are the results
SMART_READER_LITE
LIVE PREVIEW

EASY, Evaluation of Parsers of French:what are the results? P. - - PowerPoint PPT Presentation

EASY, Evaluation of Parsers of French:what are the results? P. Paroubek*, I. Robba*, A. Vilnat*, C. Ayache** LREC2008 Marrakech General presentation EASY: Sytactic Parser Evaluation 1 of the 8 evaluation campaigns of the evalda


slide-1
SLIDE 1

EASY, Evaluation of Parsers of French:what are the results?

  • P. Paroubek*, I. Robba*, A. Vilnat*, C. Ayache**

LREC2008 Marrakech

∗ ∗∗

slide-2
SLIDE 2

General presentation

EASY: Sytactic Parser Evaluation 1 of the 8 evaluation campaigns of the evalda platform, which itself is part of the technolangue program 5 corpus providers, 12 participants, 15 runs The steps

1 at first:

to define the annotation to collect and to annotate the corpora to modify the parsers to fulfill the demands of EASY

2 to define the evaluation measures 3 to evaluate the parser results 4 to combine the results of the parsers

slide-3
SLIDE 3

Outline

1 Corpus 2 Annotation of the reference 3 Evaluation measures 4 Performance 5 First ROVER test 6 Conclusion and Perspective

slide-4
SLIDE 4

Corpus

Different linguistic types newspaper articles from Le Monde (as usual...) literary texts from ATILF databases medical texts, for specialized texts questions, with EQueR, a specific syntactic form manually transcribed parliamentary debates, “controlled” web pages and e-mails, to go further in direction

  • f hybrid forms
  • ral transcriptions

Globally : 40,000 sentences 770,000 words

slide-5
SLIDE 5

Annotation of the reference

Choice made with all the participants small, not embedded constituents dependencies relations 6 kinds of constituents GN for Noun Phrase, as le petit chat, GP for Prepositional Phrase, as de la maison or comme eux, NV for Verb Kernel, including clitics as j’ai, or souffert, PV for Verb Kernel introduced by a Preposition, as de venir, GA for Adjectival Phrase, used for postponed adjectives in French, which are not included in GN, GR for Adverb Phrase as longtemps

slide-6
SLIDE 6

Annotation of the reference : the relations

14 kinds of dependencies SUJ V (subject), AUX V (auxiliary), COD V (direct object), CPL V (verb complement) and MOD V (verb modifier) for the different verb complements, COMP (complementor), ATB SO (attribute of the subject or of the object), MOD N, MOD A, MOD R, MOD P (modifier respectively of the noun, the adjective, the adverb or the proposition), COORD (coordination), APP (apposition), JUXT (juxtaposition).

slide-7
SLIDE 7

Annotation of the reference: an example from literary corpus

coord cpl−v mod−n aux−v suj−v suj−v aux−v cpl−v mod−v Longtemps j’ai été comme eux et j’ai souffert du meme malaise

Figure: Tentative translation:For a long time, I have lived as they do, and I suffered from the same unease

slide-8
SLIDE 8

Evaluation measures

Precision, recall and f-measure for constituents for relations for both of them For each parser for each kind of constituent for each relation for each genre of sub-corpus

  • r globally
slide-9
SLIDE 9

Evaluation measures: which comparisons?

Different equality measures between two text spans from R (reference) and H (hypothesis) equality: H = R, the less permissive unitary fuzziness |H\R| ≤ 1 inclusion: H ⊂ R barycenter: 2∗|R∩H|

|R|+|H| > 0.25

intersection: R ∩ H = ∅, the most lenient

slide-10
SLIDE 10

Evaluation measures: which comparisons?

Two constituents are considered equal if they have the same type, they have equal text spans. Two relations are considered equal if they have the same type, their respective source and target have equal text spans.

slide-11
SLIDE 11

Evaluation measures for constituents: global results

0.2 0.4 0.6 0.8 1 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 CONSTITUENTS

Figure: Results of the 15 parsers for constituents in precision/recall/f-measure (in this order), globally for all sub-corpora and all annotations together.

slide-12
SLIDE 12

Evaluation measures for relations: global results

0.2 0.4 0.6 0.8 1 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 RELATIONS

Figure: Results of the 15 parsers for relations in precision/recall/f-measure (in this order), globally for all sub-corpora and all annotations together.

slide-13
SLIDE 13

Parser obtaining the best precision

ALL MED ORAL MAIL WEB QUEST PARLM MONDE LITTR ALL SV XV COD CV ATB CMP MN MV MA MR MP CRD AP JXT 0.2 0.4 0.6 0.8 1

Figure: Results for relations of the parser obtaining the best precision measure

slide-14
SLIDE 14

Parser obtaining the best recall

ALL MED ORAL MAIL WEB QUEST PARLM MONDE LITTR ALL SV XV COD CV ATB CMP MN MV MA MR MP CRD AP JXT 0.2 0.4 0.6 0.8 1

Figure: Results for relations of the parser obtaining the best recall measure

slide-15
SLIDE 15

Parser obtaining the best f-measure

ALL MED ORAL MAIL WEB QUEST PARLM MONDE LITTR ALL SV XV COD CV ATB CMP MN MV MA MR MP CRD AP JXT 0.2 0.4 0.6 0.8 1

Figure: Results for relations of the parser obtaining the best f-measure

slide-16
SLIDE 16

First conclusions

First results interesting: relations: best systems average f-measure near 0.60, high variability of results for relation annotation but some parsers manage to preserve the same level of performance across text genres. there is still an important part of work to do for analyzing syntactic phenomena which are rarely or never handled by the actual parsers (apposition or juxtaposition relation, or when coordination are combined together or mixed up with ellipses), best performances obtained by different parsers (different performance profiles), so there is a priori a relatively important margin for performance increase which could be

  • btained by combining the annotations of different parsers
slide-17
SLIDE 17

First ROVER test

ROVER ALL MED ORAL MAIL WEB QUEST PARLM MONDE LITTR ALL SV XV COD CV ATB CMP MN MV MA MR MP CRD AP JXT 0.2 0.4 0.6 0.8 1

Figure: Relative gain of performance in precision against the best precision result

slide-18
SLIDE 18

Comparative precision results

Relations precision (front view) ROVER P8 P3 P10 ALL MED ORAL MAIL WEB QUEST PARLM MONDE LITTR ALL SV XV COD CV ATB CMP MN MV MA MR MP CRD AP JXT 0.2 0.4 0.6 0.8 1

Figure: Compared precisions of the ROVER and the three best systems

slide-19
SLIDE 19

Conclusion and perspectives

From EASY to PASSAGE... first campaign deploying the evaluation paradigm in real size for syntactic parsers of French with a black-box evaluation scheme using objective quantitative measures. create a working group on parsing evaluation the beginning of PASSAGE... in a few minutes!