[PPT] - MWE-WN Community discussion Florence, August 2, 2019 1 Agenda PowerPoint Presentation

SLIDE 1

MWE-WN Community discussion

Florence, August 2, 2019

1

SLIDE 2

Agenda

Feedback from the joint workshop
MWE-related announcements
SIGLEX
The future of the PARSEME corpus and shared task

2

SLIDE 3

Feedback

3

SLIDE 4

Feedback from bringing 2 communities together

2 communities

○ MWE workshop - organised by SIGLEX since 2003 (15th edition) ○ WordNet - 9 past Global WordNet Conferences

MWE-WN 2019:

○ Research track: ■ 37 submissions: 35 on MWEs, 13 on WN ■ 20 selected papers (12 long, 8 short): 6 cover both topics ■ 54% selectivity rate ○ Dissemination track (for previously published papers): ■ 0 submissions

4

SLIDE 5

Feedback from participants

Added value from bringing 2 communities together
Future research directions
How to further develop synergies?

5

SLIDE 6

Announcements

6

SLIDE 7

Phraseology and Multiword Expressions

Book series at Language Science Press, Berlin
Open access, collaborative proofreading
Recently published

○ Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword expressions: Current trends

Published in 2018:

○ Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a multi-lingual perspective ○ Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop

2 other books in the pipeline, new book proposals are welcome
Project to establish a shared MWE bibliography attached to a typology of

research questions (cf. LAW-MWE-CxG 2019 business meeting) - contributors are welcome

7

SLIDE 8

MWE research questions (slide from 2018)

Motivations

○ The CL/NLP community is becoming increasingly engineering-oriented. ○ It is often hard to understand the underlying research issues, the theoretical hypotheses which the experimental science is trying to (in)validate. ○ See also Joakim Nivre's ACL 2017 presidential address (fast science vs. slow science)

Aim: better formulate the research questions and hypotheses underlying

the activities of the MWE community - see a draft

Objectives

○ Better understanding of the state-of-the-art and perspectives of the MWE research ○ Make the MWE research more interesting ○ Lead the efforts of the community towards important challenges to be addressed ○ Pave the way towards convergences with other communities

8

SLIDE 9

UD-PARSEME coordination

MWE working group at UDs
Dagstuhl Seminar "Universals of Linguistic Idiosyncrasy in Multilingual

Computational Linguistics", 21-26 June 2020, Dagstuhl, Germany

○ Objectives ■ Theoretical: To deepen the understanding of language universals, and of linguistic idiosyncrasy in particular... ■ Practical: To harness idiosyncrasy in treebanking frameworks, in computationally tractable ways... ■ Networking: To promote a higher degree of convergence to universalism-driven initiatives...

COST Action proposal UniDive (Universality, diversity and idiosyncrasy in

language technology) - to be submitted 5 Sept 2019

9

SLIDE 10

Other announcements from the audience

10

SLIDE 11

SIGLEX

11

SLIDE 12

SIGLEX

SIGLEX is expected to change its constitution soon

○ Less officers (4 + 2 section representatives, instead of 8) ○ Shorter mandate for section representatives (2 years instead of 3) ○ Double mandate for the 4 other officers (2+2 years) ○ Referendum about the changes ■ Email was sent to SIGLEX members on 4 May ■ Please, vote until 5 August 2019 !

Elections to SIGLEX

○ To be run in fall 2019 ○ Candidates needed for the MWE section representative position (2020-2022) ○ Candidates also welcome for a SIGLEX Vice-President and Vice-Secretary

12

SLIDE 13

SIGLEX-MWE section

The MWE section of SIGLEX also has a constitution and a Standing

Committee

○ 1 elected representative ■ Agata Savary (2016-2019) ■ new representative to be elected in fall 2019 ○ 4 nominated officers ■ (2018-2020) Jelena Mitrović, Carla Parra Escartín; remaining for 1 more year ■ (2017-2019): Francis Bond, Styliani Markantonatou; stepping down ■ Candidates needed for the the 2 open positions (2-year term) ■ Conditions: be a member of the Section (and of SIGLEX) and have published research work in topics related to MWEs ■ Deadlines:

Expressions of interest: 30 August
Beginning of mandate: end September

13

SLIDE 14

SIGLEX-MWE section

MWE 2020 workshop

○

Continue joint workshops with other communities?

○

UD-MWE workshop in 2020 (ACL?) or 2021 (EACL?)

2020 consistent with UD-PARSEME dynamics
Two close UD-PARSEME events in 2020?
A COST action can fund workshops in Europe (EACL 2021?) but not in the USA

(ACL 2020?)

○

Other ideas for 2020? ■ Jelena: Rhetorical figures: metaphor, simile, irony (cf. Workshop on Figurative Language Processing) ■ Carla: Multilingual aspects of MWEs (lexicons, alignement, discovery, translation,...) ■ Stella: Largely understood idiomaticity, also in the use of single words ■ ...

○

The new SC will be in charge...

14

SLIDE 15

PARSEME corpus and shared task

15

SLIDE 16

PARSEME corpus

PARSEME corpus edition 1.1 (Ramisch et al., 2018)

○ 20 languages, 6 mln tokens, 80,000 verbal MWE annotations ○ Openly available on LINDAT/CLARIN:

Future developments

○ Unifying PARSEME and UD guidelines ○ Annotating new MWE categories (implies prior work on annotation guidelines) ■ Nominal MWEs:

non-compositional NPs (hot dog),
MW named entities (Red Sea),
complex terms (recurrent neural network)

■ Adjectival MWEs: crystal clear, as busy as a bee ○ New languages (call for language leaders)

Continuous corpus enhancements (regular releases)

16

SLIDE 17

PARSEME shared task on weakly supervised VMWE identification?

Objectives:

○ Boost performances on unseen data - cf. (Savary et al. 2019) ○ Boost MWE lexicon development

Input data:

○

Closed track

■

PARSEME training corpus

■

Large non-annotated corpus (parsed?)

■

Mechanism to project a lexicon on a raw corpus

■

Baseline system

○

Open track: ■ Closed track input ■ Any external data, including handmade MWE lexicons

17

SLIDE 18

18

SLIDE 19

PARSEME shared task on weakly supervised VMWE identification?

System output

○

Discovered (+pre-existing) lexicon

○

List of queries for projecting the lexicon on the test corpus ○ Identified MWEs on the test corpus

Evaluation

○

F-measure on data unseen in the train corpus ○ Experimentally, (lexical, morphological, syntactic) diversity measure

○

Global F-measure

When?

○

Culminating event at the MWE 2020 workshop?

19