 
              MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Variabilité et figement d’expressions polylexicales: annotation multilingue, codage lexical et mesures de variabilité Agata Savary Université de Tours, France Séminaire LIMSI, 27 mars 2018, Orsay 1/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Multiword expressions Word combinations, which exhibit lexical, syntactic, semantic, pragmatic and/or statistical irregularities . Examples: all of a sudden , a hot dog , to pay a visit , to pull one’s leg Encompass heterogeneous objects : idioms, compounds, light verb constructions, rhetorical figures, institutionalised phrases or named entities Pervasive feature: non-compositional semantics - the meaning of an MWE cannot be deduced from the meanings of its components, and from its syntactic structure, in a way deemed regular for the given language. Varying degree of syntactic variability ( flexibility ), especially in verbal MWEs (VMWEs). 2/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Morpho-syntactic variability of VMWEs N tourne la page ‘ N turns the page ’ ⇒ ‘N stops dealing with sth.’ (More) regular properties Free subject: Jean/il/elle tourne la page ‘ Jean turns the page ’ ⇒ ‘Jean stops dealling with sth.’ Verb inflection: Jean tournera la page ‘ Jean will turn the page ’ ⇒ ‘Jean will stops dealling with sth.’ Noun modification: Jean a tourné la page de la politique ‘ Jean turned the page of politics ’ ⇒ ‘Jean stopped dealing with politics’ Passive: La page a été tournée ‘ the page was turned ’ ⇒ ‘Someone stopped dealing with sth’ Determiner alternation: ?Jean tourne la/cette/une page ‘ Jean turns the page ’ ⇒ ‘Jean stops dealling with sth.’ . . . (More) idiosyncratic properties Lexicalized verb and object: #Jean pivote la feuille ‘ Jean rotates the sheet ’ No verb reduction: *La page que Jean tourne est une page ‘ The page that Jean turns is a page ’ No noun inflection: ?Jean a tourné plusieurs pages ‘ Jean turned any pages ’ . . . 3/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Scale-wise morpho-syntactic variability of VMWEs Noun modification Det. alternation Noun inflection Verb reduction Verb inflection Free subject Free object Free verb Passive N 0 V ( DetN ) 1 expression N prend la pomme � � � � � � � � � ‘ N takes an apple ’ N prend une décision � � � � � � � ‘ N takes a decision ’ ⇒ ‘N makes a decision’ N tourne la page ‘ N turns the page ’ ⇒ ‘N stops dealing with sth.’ � � ? � � ? N prend la porte ‘ N takes the door ’ ⇒ ‘N leaves (because forced)’ � � 4/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography MWE variability/regularity – state of the art lexical encoding of VMWE variability profile [Gross(1986), Mel’čuk et al. (1988)] theory-neutral lexical encoding of VMWE irregularity [Grégoire(2010), Przepiórkowski et al. (2014), McShanea et al. (2015)] MWE flexibility as a matter of scale [Gross(1988)] MWE flexibility as a result of decomposability [Nunberg et al. (1994), Sheinfux et al. (2017)] VMWEs encoding in computational grammars : HPSG [Sag et al. (2002), Copestake et al. (2002), Villavicencio et al. (2004), Bond et al. (2015), Herzig Sheinfux et al. (2015)], LFG [Attia(2006)], TAG (MWE-friendly formalism) [Abeillé and Schabes(1989), Abeillé and Schabes(1996), Vaidya et al. (2014), Lichte and Kallmeyer(2016)] MWE variant conflation in NLP [Jacquemin(2001), Krstev et al. (2014)] variability as a major challenge in NLP [Savary and Jacquemin(2003), Hachey et al. (2013), Constant et al. (2017)] restricted variability as a hint in MWE identification [Fazly et al. (2009), Tsvetkov and Wintner(2014), Buljan and Šnajder(2017)] 5/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography MWE variability/regularity - challenges annotation: few treebanks with a full-fledged VMWE annotation heterogenous annotation practices [Rosén et al. (2015), Savary et al. (2018)] lexical encoding: account for the irregularity of a MWE, while avoiding redundancy , mutualize VMWE lexicons processing: measure VMWE variability [Fazly et al. (2009), Pasquer et al. (2018)] , conflate VMWE variants [Pasquer et al. (2018)] . 6/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWEs annotation in the PARSEME scientific network Methodology 22 languages (6 non Indo-European), Unified annotation guidelines as decision trees driven by linguistic tests (with examples in many languages), Universal categories, room for language-specific categories and tests, Close links with Universal Dependencies treebanking. 7/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWE variability in the PARSEME guidelines Prototypical form : head verb is in active voice, finite form; other lexicalized components depend either on the verb or on another lexicalized component. elle prend une décision Meaning-preserving variants : analytical tenses: elle a pris une décision relative clauses: la décision qu’elle prend non-finite clauses: la décision prise , en prenant une décision diathesis alternation: la décision sera prise interposed modifiers: prendre une série de décisions Canonical form : prototypical or most neutral form keeping the idiomatic reading elle prend une décision les carottes sont cuites Variant neutralization during annotation: Texts contain VMWE variants , Linguistic tests apply to the canonical form . 8/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography VMWE typology (v. 1.1) Universal categories (valid for all languages): light verb constructions ( LVCs ) LVC.full : to give a lecture LVC.cause : to grant rights verbal idioms ( VIDs ) to call it a day Quasi-universal categories (valid for many languages): inherently reflexive verbs ( IRVs ) (FR) s’évanouir ‘to faint’ verb-particle constructions ( VPCs ) VPC.full to do in ‘to kill’ VPC.semi to eat up ‘to eat completely’ multi-verb constructions ( MVCs ) to let go Experimental (optional) category inherently adpositional verbs ( IAVs ) to come across sth/sb , to rely on sth/sb 9/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography PARSEME VMWE corpus and shared task (v. 1.0) Corpus 1.0 [Savary et al. (2018)] Corpus Sentences Tokens VMWEs Licence Training 230,062 4,536,603 52,724 CC v4 Testing 44,314 902,601 9,494 Shared task 1.0 [Savary et al. (2017)] & 1.1 7 systems, all languages covered, Evaluation measures including VMWE variability. 10/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography XMG [Crabbé et al. (2013), Petitjean et al. (2016)] a language object-oriented – objects, classes, inheritance declarative – grammaticality is defined in terms of constraints rather than procedures notationally expressive - modularity, inheritance, conjunction/disjunction of tree fragments, namespaces extensible to new dimensions (semantics, frames etc.), formalisms (IG, etc.), linguistic principles (e.g. clitic ordering) a metagrammar compiler (for each tager language, here FS-LTAG) – constraint solver: produces minimal tree models respecting the constraints 11/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography FrenchTAG – French XMG metagrammar [Crabbé et al. (2013)] XMG implementation of the syntactic TAG grammar of French by [Abeillé(2002)] 285 XMG classes, 96 families (classes assigned to lexemes), compiled into 9045 TAG trees toy lexicon of 555 lexemes, including 248 verbs Example Jean prend la porte ‘ John takes the door ’ ⇒ ‘John leaves because he is forced to’ XMG covers literal readings (by compositionality) XMG does not cover idiomatic readings 12/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Morphology (simplified) 13/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography Lemmas Trivial classes stddeterminer → N propename → noun → CliticT → N ⋄ N ⋄ CL ⋄ D ⋄ N ∗ 14/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography From metagramar to parsing: n0Vn1 ( Jean prend la porte ) Metagrammar tree fragments inherited by n0Vn1 Grammar tree activeVerbMorphology → S CanonicalSubject → CanonicalObject → S VN S N ↓ N ↓ VN V ⋄ VN Derivation tree Derived tree 15/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography From metagramar to parsing: n0Vn1 ( Jean la prend ) Metagrammar tree fragments inherited by n0Vn1 Grammar tree activeVerbMorphology → S CanonicalSubject → CliticObject → S VN S . . . N ↓ CL ↓ VN V ⋄ VN Derivation tree Derived tree 16/29
MWEs Annotation Lexical encoding Variability measure Conclusions Projet Bibliography XMG classes 17/29
Recommend
More recommend