ic1207 cost action parseme parsing and multi word
play

IC1207 COST Action PARSEME PARSing and Multi-Word Expressions - PowerPoint PPT Presentation

COST PARSEME Working Groups Management Events IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and computational efficiency in natural language processing Agata Savary Linguistics Circle,


  1. COST PARSEME Working Groups Management Events IC1207 COST Action PARSEME PARSing and Multi-Word Expressions Towards linguistic precision and computational efficiency in natural language processing Agata Savary Linguistics Circle, University of Malta, 11 June 2013 1/18

  2. COST PARSEME Working Groups Management Events COST inter-governmental framework (founded in 1971), coordination of nationally-funded European research, funded bu FP7 via ESF (European Science Foundation). 2/18

  3. COST PARSEME Working Groups Management Events What Is a COST Action? bottom-up approach : scientific challenges defined by researchers, topics: major challenges in the foundations of science, objective: to overcome the research fragmentation issues, COST supports cooperation and dissemination : meetings, workshops, short-term missions, training schools, no direct research funding , precursor role for other European programmes , typically a large number of partners involved (about 20 countries ), experts from non-COST countries admitted (up to 4), important roles given to Early-Stage Researchers ( < PhD+8), budget: 129,000–156,000 euros per year for all partners, proposal selectivity : 6%. 3/18

  4. COST PARSEME Working Groups Management Events COST Instruments Scientific/Working Group meetings, Workshops and Seminars, Short Term Scientific Missions (STSMs) (2 weeks – 3/6 months), Training Schools, Dissemination. 4/18

  5. COST PARSEME Working Groups Management Events PARSEME: Pars ing and M ulti-word E xpressions General aim Increasing and enhancing the ICT support of the European multilingual heritage . Objectives to put multilingualism in focus of linguistic and technological studies, to establish a long-lasting collaboration of Natural Language Processing (NLP) experts within a cross-lingual , cross-theoretical and cross-methodological research network , bridging the gap between linguistic precision and computation efficiency in NLP application. 5/18

  6. COST PARSEME Working Groups Management Events PARSEME: Pars ing and M ulti-word E xpressions General aim Increasing and enhancing the ICT support of the European multilingual heritage . Objectives to put multilingualism in focus of linguistic and technological studies, to establish a long-lasting collaboration of Natural Language Processing (NLP) experts within a cross-lingual , cross-theoretical and cross-methodological research network , bridging the gap between linguistic precision and computation efficiency in NLP application. 5/18

  7. COST PARSEME Working Groups Management Events Key problem Multi-Word Expressions The prime time speech by first lady Michelle Obama set the house on fire . She made crystal clear which issues she took to heart , but she was preaching to the choir . Facts MWEs are prevalent (40% of text items), MWEs are complex phenomena involving different levels of language (lexicon, syntax, meaning . . . ) , MWEs are still not sufficiently understood, MWEs are under-represented in language resources and tools, MWEs are hard to detect, understand, translate, etc. 6/18

  8. COST PARSEME Working Groups Management Events Key problem Multi-Word Expressions The prime time speech by first lady Michelle Obama set the house on fire . She made crystal clear which issues she took to heart , but she was preaching to the choir . Facts MWEs are prevalent (40% of text items), MWEs are complex phenomena involving different levels of language (lexicon, syntax, meaning . . . ) , MWEs are still not sufficiently understood, MWEs are under-represented in language resources and tools, MWEs are hard to detect, understand, translate, etc. 6/18

  9. COST PARSEME Working Groups Management Events Consortium 75 members (official and unofficial) , 25 COST countries, 3 experts from 2 non-COST countries (USA, Brazil) , multidisciplinary experts: linguists, computational linguists, computer scientists, psycholinguists, industrials, . . . , different linguistic frameworks: CCG (Combinatory Categorial Grammar), DG (Dependancy Grammar), HPSG (Head-driven Phrase Structure Grammar), LFG (Lexical Functional Grammar), TAG (Tree Adjoining Grammar), . . . two methodological trends: knowledge-based, data-driven. 7/18

  10. COST PARSEME Working Groups Management Events Languages 23 languages , 9 European language families: Celtic : Gaelic, Germanic : English, Danish, Dutsch, German, Icelandic, Norwegian, Swedish, Finno-Ugric : Estonian, Hungarian, Hellenic : Greek, Romance : French, Italian, Portuguese, Spanish, Semitic : Hebrew, Maltese, Slavic : Bulgarian, Czech, Polish, Serbian, Macedonian, Turkic : Turkish. dialects: British vs. American English , Belgian vs. Swiss vs. France French , European vs. Brazilian Portuguese . 8/18

  11. COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18

  12. COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18

  13. COST PARSEME Working Groups Management Events Working Groups WG1 : Lexicon/Grammar Interface, WG2 : Parsing Techniques for MWEs, WG3 : Hybrid Parsing of MWEs, WG4 : Annotating MWEs in Treebanks. Crossing barriers between . . . different levels of linguistic processing, different linguistic frameworks, different methodological frameworks. Expression of interest in at least 2 WGs from each member (at the full proposal period) . 9/18

  14. COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 1 Simultaneously account for the fixed character of MWEs and their similarities to regular syntactic structures (FR) Marie a pris le train de 18 heures. ⇒ Marie a pris un train. Marie a pris un train de 6 heures de l’apr` es-midi. (FR) Marie a pris un train de banlieu. ⇒ Marie a pris un train. Marie a pris un *train de faubourg. (FR) Le gouvernement a pris un train de mesures. / ⇒ *Le gouvernement a pris / / / un train. Le gouvernement a pris un *train d’moyens. 10/18

  15. COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 2 Represent parsing phenomena at the lexicon level ( agreement , discontinuity and free word order ) (FR) assistant approvisionneur, assistants approvisionneurs, assistante approvisionneuse, assistantes approvisionneuses – agree in gen. and num. (FR) bateau mouche, bateaux mouches – agree in num. only (EN) He has finally made up his bloody mind. (PL) bezwzględna większość, większość bezwzględna (‘absolute majority’) (PL) panna młoda, *młoda panna (‘bride’) Challenge 3 Enrich existing lexicons and valence dictionaries with MWEs. Challenge 4 Design cost-saving abstract models of MWEs’ properties, automatically mapped to different grammar formalisms. 11/18

  16. COST PARSEME Working Groups Management Events WG1: Lexicon/Grammar Interface Challenge 2 Represent parsing phenomena at the lexicon level ( agreement , discontinuity and free word order ) (FR) assistant approvisionneur, assistants approvisionneurs, assistante approvisionneuse, assistantes approvisionneuses – agree in gen. and num. (FR) bateau mouche, bateaux mouches – agree in num. only (EN) He has finally made up his bloody mind. (PL) bezwzględna większość, większość bezwzględna (‘absolute majority’) (PL) panna młoda, *młoda panna (‘bride’) Challenge 3 Enrich existing lexicons and valence dictionaries with MWEs. Challenge 4 Design cost-saving abstract models of MWEs’ properties, automatically mapped to different grammar formalisms. 11/18

  17. COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 1 Design interoperable MWE representation for different syntactic frameworks: HPSG , LFG , TAG , CCG , DG , . . . . 12/18

  18. COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18

  19. COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18

  20. COST PARSEME Working Groups Management Events WG2: Parsing Techniques for MWEs Challenge 2 Reduce the cost of grammar production . Challenge 3 Enhance parsing speed and precision by reducing spurious ambiguity in MWEs. (FR) Il a lu ce livre d’un auteur ´ etranger . (FR) Il a lu ce livre d’un seul coup . Challenge 4 Express the semantics of MWEs in parse structures. 13/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend