MWE-WN Community discussion
Florence, August 2, 2019
1
MWE-WN Community discussion Florence, August 2, 2019 1 Agenda - - PowerPoint PPT Presentation
MWE-WN Community discussion Florence, August 2, 2019 1 Agenda Feedback from the joint workshop MWE-related announcements SIGLEX The future of the PARSEME corpus and shared task 2 Feedback 3 Feedback from bringing 2
Florence, August 2, 2019
1
2
3
○ MWE workshop - organised by SIGLEX since 2003 (15th edition) ○ WordNet - 9 past Global WordNet Conferences
○ Research track: ■ 37 submissions: 35 on MWEs, 13 on WN ■ 20 selected papers (12 long, 8 short): 6 cover both topics ■ 54% selectivity rate ○ Dissemination track (for previously published papers): ■ 0 submissions
4
5
6
○ Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword expressions: Current trends
○ Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a multi-lingual perspective ○ Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop
research questions (cf. LAW-MWE-CxG 2019 business meeting) - contributors are welcome
7
○ The CL/NLP community is becoming increasingly engineering-oriented. ○ It is often hard to understand the underlying research issues, the theoretical hypotheses which the experimental science is trying to (in)validate. ○ See also Joakim Nivre's ACL 2017 presidential address (fast science vs. slow science)
the activities of the MWE community - see a draft
○ Better understanding of the state-of-the-art and perspectives of the MWE research ○ Make the MWE research more interesting ○ Lead the efforts of the community towards important challenges to be addressed ○ Pave the way towards convergences with other communities
8
Computational Linguistics", 21-26 June 2020, Dagstuhl, Germany
○ Objectives ■ Theoretical: To deepen the understanding of language universals, and of linguistic idiosyncrasy in particular... ■ Practical: To harness idiosyncrasy in treebanking frameworks, in computationally tractable ways... ■ Networking: To promote a higher degree of convergence to universalism-driven initiatives...
language technology) - to be submitted 5 Sept 2019
9
10
11
○ Less officers (4 + 2 section representatives, instead of 8) ○ Shorter mandate for section representatives (2 years instead of 3) ○ Double mandate for the 4 other officers (2+2 years) ○ Referendum about the changes ■ Email was sent to SIGLEX members on 4 May ■ Please, vote until 5 August 2019 !
○ To be run in fall 2019 ○ Candidates needed for the MWE section representative position (2020-2022) ○ Candidates also welcome for a SIGLEX Vice-President and Vice-Secretary
12
Committee
○ 1 elected representative ■ Agata Savary (2016-2019) ■ new representative to be elected in fall 2019 ○ 4 nominated officers ■ (2018-2020) Jelena Mitrović, Carla Parra Escartín; remaining for 1 more year ■ (2017-2019): Francis Bond, Styliani Markantonatou; stepping down ■ Candidates needed for the the 2 open positions (2-year term) ■ Conditions: be a member of the Section (and of SIGLEX) and have published research work in topics related to MWEs ■ Deadlines:
13
○
Continue joint workshops with other communities?
○
UD-MWE workshop in 2020 (ACL?) or 2021 (EACL?)
(ACL 2020?)
○
Other ideas for 2020? ■ Jelena: Rhetorical figures: metaphor, simile, irony (cf. Workshop on Figurative Language Processing) ■ Carla: Multilingual aspects of MWEs (lexicons, alignement, discovery, translation,...) ■ Stella: Largely understood idiomaticity, also in the use of single words ■ ...
○
The new SC will be in charge...
14
15
○ 20 languages, 6 mln tokens, 80,000 verbal MWE annotations ○ Openly available on LINDAT/CLARIN:
○ Unifying PARSEME and UD guidelines ○ Annotating new MWE categories (implies prior work on annotation guidelines) ■ Nominal MWEs:
■ Adjectival MWEs: crystal clear, as busy as a bee ○ New languages (call for language leaders)
16
○ Boost performances on unseen data - cf. (Savary et al. 2019) ○ Boost MWE lexicon development
○
Closed track
■
PARSEME training corpus
■
Large non-annotated corpus (parsed?)
■
Mechanism to project a lexicon on a raw corpus
■
Baseline system
○
Open track: ■ Closed track input ■ Any external data, including handmade MWE lexicons
17
18
○
Discovered (+pre-existing) lexicon
○
List of queries for projecting the lexicon on the test corpus ○ Identified MWEs on the test corpus
○
F-measure on data unseen in the train corpus ○ Experimentally, (lexical, morphological, syntactic) diversity measure
○
Global F-measure
○
Culminating event at the MWE 2020 workshop?
19