mwe wn community discussion
play

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda - PowerPoint PPT Presentation

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda Feedback from the joint workshop MWE-related announcements SIGLEX The future of the PARSEME corpus and shared task 2 Feedback 3 Feedback from bringing 2


  1. MWE-WN Community discussion Florence, August 2, 2019 1

  2. Agenda Feedback from the joint workshop ● MWE-related announcements ● SIGLEX ● The future of the PARSEME corpus and shared task ● 2

  3. Feedback 3

  4. Feedback from bringing 2 communities together 2 communities ● MWE workshop - organised by SIGLEX since 2003 (15th edition) ○ WordNet - 9 past Global WordNet Conferences ○ MWE-WN 2019: ● Research track: ○ 37 submissions: 35 on MWEs, 13 on WN ■ 20 selected papers (12 long, 8 short): 6 cover both topics ■ 54% selectivity rate ■ Dissemination track (for previously published papers): ○ 0 submissions ■ 4

  5. Feedback from participants Added value from bringing 2 communities together ● Future research directions ● How to further develop synergies? ● 5

  6. Announcements 6

  7. Phraseology and Multiword Expressions Book series at Language Science Press, Berlin ● Open access, collaborative proofreading ● Recently published ● Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword ○ expressions: Current trends Published in 2018: ● Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a ○ multi-lingual perspective Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword ○ expressions at length and in depth: Extended papers from the MWE 2017 workshop 2 other books in the pipeline, new book proposals are welcome ● Project to establish a shared MWE bibliography attached to a typology of ● research questions (cf. LAW-MWE-CxG 2019 business meeting) - contributors are welcome 7

  8. MWE research questions (slide from 2018) Motivations ● The CL/NLP community is becoming increasingly engineering-oriented. ○ It is often hard to understand the underlying research issues, the theoretical hypotheses ○ which the experimental science is trying to (in)validate. See also Joakim Nivre's ACL 2017 presidential address (fast science vs. slow science) ○ Aim: better formulate the research questions and hypotheses underlying ● the activities of the MWE community - see a draft Objectives ● Better understanding of the state-of-the-art and perspectives of the MWE research ○ Make the MWE research more interesting ○ Lead the efforts of the community towards important challenges to be addressed ○ Pave the way towards convergences with other communities ○ 8

  9. UD-PARSEME coordination MWE working group at UDs ● Dagstuhl Seminar "Universals of Linguistic Idiosyncrasy in Multilingual ● Computational Linguistics", 21-26 June 2020, Dagstuhl, Germany Objectives ○ Theoretical: To deepen the understanding of language universals, and of linguistic ■ idiosyncrasy in particular... Practical: To harness idiosyncrasy in treebanking frameworks, in computationally ■ tractable ways... Networking: To promote a higher degree of convergence to universalism-driven ■ initiatives... COST Action proposal UniDive (Universality, diversity and idiosyncrasy in ● language technology) - to be submitted 5 Sept 2019 9

  10. Other announcements from the audience 10

  11. SIGLEX 11

  12. SIGLEX SIGLEX is expected to change its constitution soon ● Less officers (4 + 2 section representatives, instead of 8) ○ Shorter mandate for section representatives (2 years instead of 3) ○ Double mandate for the 4 other officers (2+2 years) ○ Referendum about the changes ○ Email was sent to SIGLEX members on 4 May ■ Please, vote until 5 August 2019 ! ■ Elections to SIGLEX ● To be run in fall 2019 ○ Candidates needed for the MWE section representative position (2020-2022) ○ Candidates also welcome for a SIGLEX Vice-President and Vice-Secretary ○ 12

  13. SIGLEX-MWE section The MWE section of SIGLEX also has a constitution and a Standing ● Committee 1 elected representative ○ Agata Savary (2016-2019) ■ new representative to be elected in fall 2019 ■ 4 nominated officers ○ (2018-2020) Jelena Mitrović, Carla Parra Escartín; remaining for 1 more year ■ (2017-2019): Francis Bond, Styliani Markantonatou; stepping down ■ Candidates needed for the the 2 open positions (2-year term) ■ Conditions: be a member of the Section (and of SIGLEX) and have published research ■ work in topics related to MWEs Deadlines: ■ Expressions of interest: 30 August ● Beginning of mandate: end September ● 13

  14. SIGLEX-MWE section MWE 2020 workshop ● ○ Continue joint workshops with other communities? ○ UD-MWE workshop in 2020 (ACL?) or 2021 (EACL?) 2020 consistent with UD-PARSEME dynamics ● Two close UD-PARSEME events in 2020? ● A COST action can fund workshops in Europe (EACL 2021?) but not in the USA ● (ACL 2020?) ○ Other ideas for 2020? Jelena: Rhetorical figures: metaphor, simile, irony (cf. Workshop on Figurative ■ Language Processing) Carla: Multilingual aspects of MWEs (lexicons, alignement, discovery, translation,...) ■ Stella: Largely understood idiomaticity, also in the use of single words ■ ... ■ ○ The new SC will be in charge... 14

  15. PARSEME corpus and shared task 15

  16. PARSEME corpus PARSEME corpus edition 1.1 (Ramisch et al., 2018) ● 20 languages, 6 mln tokens, 80,000 verbal MWE annotations ○ Openly available on LINDAT/CLARIN: ○ Future developments ● Unifying PARSEME and UD guidelines ○ Annotating new MWE categories (implies prior work on annotation guidelines) ○ Nominal MWEs: ■ non-compositional NPs ( hot dog ), ● MW named entities ( Red Sea ), ● complex terms ( recurrent neural network ) ● Adjectival MWEs: crystal clear, as busy as a bee ■ New languages (call for language leaders) ○ Continuous corpus enhancements (regular releases) ● 16

  17. PARSEME shared task on weakly supervised VMWE identification? Objectives: ● Boost performances on unseen data - cf. (Savary et al. 2019) ○ Boost MWE lexicon development ○ Input data: ● ○ Closed track ■ PARSEME training corpus ■ Large non-annotated corpus (parsed?) ■ Mechanism to project a lexicon on a raw corpus ■ Baseline system ○ Open track: Closed track input ■ Any external data, including handmade MWE lexicons ■ 17

  18. 18

  19. PARSEME shared task on weakly supervised VMWE identification? System output ● ○ Discovered (+pre-existing) lexicon ○ List of queries for projecting the lexicon on the test corpus Identified MWEs on the test corpus ○ Evaluation ● ○ F-measure on data unseen in the train corpus Experimentally, (lexical, morphological, syntactic) diversity measure ○ ○ Global F-measure When? ● ○ Culminating event at the MWE 2020 workshop ? 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend