variabilit et figement d expressions polylexicales
play

Variabilit et figement dexpressions polylexicales: annotation - PowerPoint PPT Presentation

MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Variabilit et figement dexpressions polylexicales: annotation multilingue, codage lexical et mesures de variabilit Agata Savary Universit de


  1. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Variabilité et figement d’expressions polylexicales: annotation multilingue, codage lexical et mesures de variabilité Agata Savary Université de Tours, France Séminaire LIMSI, 27 mars 2018, Orsay 1/29

  2. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Multiword expressions Word combinations, which exhibit lexical, syntactic, semantic, pragmatic and/or statistical irregularities . Examples: all of a sudden , a hot dog , to pay a visit , to pull one’s leg Encompass heterogeneous objects : idioms, compounds, light verb constructions, rhetorical figures, institutionalised phrases or named entities Pervasive feature: non-compositional semantics - the meaning of an MWE cannot be deduced from the meanings of its components, and from its syntactic structure, in a way deemed regular for the given language. Varying degree of syntactic variability ( flexibility ), especially in verbal MWEs (VMWEs). 2/29

  3. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Morpho-syntactic variability of VMWEs N tourne la page ‘ N turns the page ’ ⇒ ‘N stops dealing with sth.’ (More) regular properties Free subject: Jean/il/elle tourne la page ‘ Jean turns the page ’ ⇒ ‘Jean stops dealling with sth.’ Verb inflection: Jean tournera la page ‘ Jean will turn the page ’ ⇒ ‘Jean will stops dealling with sth.’ Noun modification: Jean a tourné la page de la politique ‘ Jean turned the page of politics ’ ⇒ ‘Jean stopped dealing with politics’ Passive: La page a été tournée ‘ the page was turned ’ ⇒ ‘Someone stopped dealing with sth’ Determiner alternation: ?Jean tourne la/cette/une page ‘ Jean turns the page ’ ⇒ ‘Jean stops dealling with sth.’ . . . (More) idiosyncratic properties Lexicalized verb and object: #Jean pivote la feuille ‘ Jean rotates the sheet ’ No verb reduction: *La page que Jean tourne est une page ‘ The page that Jean turns is a page ’ No noun inflection: ?Jean a tourné plusieurs pages ‘ Jean turned any pages ’ . . . 3/29

  4. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Scale-wise morpho-syntactic variability of VMWEs Noun modification Det. alternation Noun inflection Verb reduction Verb inflection Free subject Free object Free verb Passive N 0 V ( DetN ) 1 expression N prend la pomme � � � � � � � � � ‘ N takes an apple ’ N prend une décision � � � � � � � ‘ N takes a decision ’ ⇒ ‘N makes a decision’ N tourne la page ‘ N turns the page ’ ⇒ ‘N stops dealing with sth.’ � � ? � � ? N prend la porte ‘ N takes the door ’ ⇒ ‘N leaves (because forced)’ � � 4/29

  5. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography MWE variability/regularity – state of the art lexical encoding of VMWE variability profile [Gross(1986), Mel’čuk et al. (1988)] theory-neutral lexical encoding of VMWE irregularity [Grégoire(2010), Przepiórkowski et al. (2014), McShanea et al. (2015)] MWE flexibility as a matter of scale [Gross(1988)] MWE flexibility as a result of decomposability [Nunberg et al. (1994), Sheinfux et al. (2017)] VMWEs encoding in computational grammars : HPSG [Sag et al. (2002), Copestake et al. (2002), Villavicencio et al. (2004), Bond et al. (2015), Herzig Sheinfux et al. (2015)], LFG [Attia(2006)], TAG (MWE-friendly formalism) [Abeillé and Schabes(1989), Abeillé and Schabes(1996), Vaidya et al. (2014), Lichte and Kallmeyer(2016)] MWE variant conflation in NLP [Jacquemin(2001), Krstev et al. (2014)] variability as a major challenge in NLP [Savary and Jacquemin(2003), Hachey et al. (2013), Constant et al. (2017)] restricted variability as a hint in MWE identification [Fazly et al. (2009), Tsvetkov and Wintner(2014), Buljan and Šnajder(2017)] 5/29

  6. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography MWE variability/regularity - challenges annotation: few treebanks with a full-fledged VMWE annotation heterogenous annotation practices [Rosén et al. (2015), Savary et al. (2018)] lexical encoding: account for the irregularity of a MWE, while avoiding redundancy , mutualize VMWE lexicons processing: measure VMWE variability [Fazly et al. (2009), Pasquer et al. (2018)] , conflate VMWE variants [Pasquer et al. (2018)] . 6/29

  7. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWEs annotation in the PARSEME scientific network Methodology 22 languages (6 non Indo-European), Unified annotation guidelines as decision trees driven by linguistic tests (with examples in many languages), Universal categories, room for language-specific categories and tests, Close links with Universal Dependencies treebanking. 7/29

  8. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWE variability in the PARSEME guidelines Prototypical form : head verb is in active voice, finite form; other lexicalized components depend either on the verb or on another lexicalized component. elle prend une décision Meaning-preserving variants : analytical tenses: elle a pris une décision relative clauses: la décision qu’elle prend non-finite clauses: la décision prise , en prenant une décision diathesis alternation: la décision sera prise interposed modifiers: prendre une série de décisions Canonical form : prototypical or most neutral form keeping the idiomatic reading elle prend une décision les carottes sont cuites Variant neutralization during annotation: Texts contain VMWE variants , Linguistic tests apply to the canonical form . 8/29

  9. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWE typology (v. 1.1) Universal categories (valid for all languages): light verb constructions ( LVCs ) LVC.full : to give a lecture LVC.cause : to grant rights verbal idioms ( VIDs ) to call it a day Quasi-universal categories (valid for many languages): inherently reflexive verbs ( IRVs ) (FR) s’évanouir ‘to faint’ verb-particle constructions ( VPCs ) VPC.full to do in ‘to kill’ VPC.semi to eat up ‘to eat completely’ multi-verb constructions ( MVCs ) to let go Experimental (optional) category inherently adpositional verbs ( IAVs ) to come across sth/sb , to rely on sth/sb 9/29

  10. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography PARSEME VMWE corpus and shared task (v. 1.0) Corpus 1.0 [Savary et al. (2018)] Corpus Sentences Tokens VMWEs Licence Training 230,062 4,536,603 52,724 CC v4 Testing 44,314 902,601 9,494 Shared task 1.0 [Savary et al. (2017)] & 1.1 7 systems, all languages covered, Evaluation measures including VMWE variability. 10/29

  11. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography XMG [Crabbé et al. (2013), Petitjean et al. (2016)] a language object-oriented – objects, classes, inheritance declarative – grammaticality is defined in terms of constraints rather than procedures notationally expressive - modularity, inheritance, conjunction/disjunction of tree fragments, namespaces extensible to new dimensions (semantics, frames etc.), formalisms (IG, etc.), linguistic principles (e.g. clitic ordering) a metagrammar compiler (for each tager language, here FS-LTAG) – constraint solver: produces minimal tree models respecting the constraints 11/29

  12. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography FrenchTAG – French XMG metagrammar [Crabbé et al. (2013)] XMG implementation of the syntactic TAG grammar of French by [Abeillé(2002)] 285 XMG classes, 96 families (classes assigned to lexemes), compiled into 9045 TAG trees toy lexicon of 555 lexemes, including 248 verbs Example Jean prend la porte ‘ John takes the door ’ ⇒ ‘John leaves because he is forced to’ XMG covers literal readings (by compositionality) XMG does not cover idiomatic readings 12/29

  13. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Morphology (simplified) 13/29

  14. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Lemmas Trivial classes stddeterminer → N propename → noun → CliticT → N ⋄ N ⋄ CL ⋄ D ⋄ N ∗ 14/29

  15. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography From metagramar to parsing: n0Vn1 ( Jean prend la porte ) Metagrammar tree fragments inherited by n0Vn1 Grammar tree activeVerbMorphology → S CanonicalSubject → CanonicalObject → S VN S N ↓ N ↓ VN V ⋄ VN Derivation tree Derived tree 15/29

  16. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography From metagramar to parsing: n0Vn1 ( Jean la prend ) Metagrammar tree fragments inherited by n0Vn1 Grammar tree activeVerbMorphology → S CanonicalSubject → CliticObject → S VN S . . . N ↓ CL ↓ VN V ⋄ VN Derivation tree Derived tree 16/29

  17. MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography XMG classes 17/29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend