MWE-WN Community discussion Florence, August 2, 2019 1 Agenda - - PowerPoint PPT Presentation

mwe wn community discussion
SMART_READER_LITE
LIVE PREVIEW

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda - - PowerPoint PPT Presentation

MWE-WN Community discussion Florence, August 2, 2019 1 Agenda Feedback from the joint workshop MWE-related announcements SIGLEX The future of the PARSEME corpus and shared task 2 Feedback 3 Feedback from bringing 2


slide-1
SLIDE 1

MWE-WN Community discussion

Florence, August 2, 2019

1

slide-2
SLIDE 2

Agenda

  • Feedback from the joint workshop
  • MWE-related announcements
  • SIGLEX
  • The future of the PARSEME corpus and shared task

2

slide-3
SLIDE 3

Feedback

3

slide-4
SLIDE 4

Feedback from bringing 2 communities together

  • 2 communities

○ MWE workshop - organised by SIGLEX since 2003 (15th edition) ○ WordNet - 9 past Global WordNet Conferences

  • MWE-WN 2019:

○ Research track: ■ 37 submissions: 35 on MWEs, 13 on WN ■ 20 selected papers (12 long, 8 short): 6 cover both topics ■ 54% selectivity rate ○ Dissemination track (for previously published papers): ■ 0 submissions

4

slide-5
SLIDE 5

Feedback from participants

  • Added value from bringing 2 communities together
  • Future research directions
  • How to further develop synergies?

5

slide-6
SLIDE 6

Announcements

6

slide-7
SLIDE 7

Phraseology and Multiword Expressions

  • Book series at Language Science Press, Berlin
  • Open access, collaborative proofreading
  • Recently published

○ Yannick Parmentier, Jakub Waszczuk (eds.) Representation and parsing of multiword expressions: Current trends

  • Published in 2018:

○ Manfred Sailer, Stella Markantonatou (eds.) Multiword expressions: Insights from a multi-lingual perspective ○ Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze (eds.) Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop

  • 2 other books in the pipeline, new book proposals are welcome
  • Project to establish a shared MWE bibliography attached to a typology of

research questions (cf. LAW-MWE-CxG 2019 business meeting) - contributors are welcome

7

slide-8
SLIDE 8

MWE research questions (slide from 2018)

  • Motivations

○ The CL/NLP community is becoming increasingly engineering-oriented. ○ It is often hard to understand the underlying research issues, the theoretical hypotheses which the experimental science is trying to (in)validate. ○ See also Joakim Nivre's ACL 2017 presidential address (fast science vs. slow science)

  • Aim: better formulate the research questions and hypotheses underlying

the activities of the MWE community - see a draft

  • Objectives

○ Better understanding of the state-of-the-art and perspectives of the MWE research ○ Make the MWE research more interesting ○ Lead the efforts of the community towards important challenges to be addressed ○ Pave the way towards convergences with other communities

8

slide-9
SLIDE 9

UD-PARSEME coordination

  • MWE working group at UDs
  • Dagstuhl Seminar "Universals of Linguistic Idiosyncrasy in Multilingual

Computational Linguistics", 21-26 June 2020, Dagstuhl, Germany

○ Objectives ■ Theoretical: To deepen the understanding of language universals, and of linguistic idiosyncrasy in particular... ■ Practical: To harness idiosyncrasy in treebanking frameworks, in computationally tractable ways... ■ Networking: To promote a higher degree of convergence to universalism-driven initiatives...

  • COST Action proposal UniDive (Universality, diversity and idiosyncrasy in

language technology) - to be submitted 5 Sept 2019

9

slide-10
SLIDE 10

Other announcements from the audience

10

slide-11
SLIDE 11

SIGLEX

11

slide-12
SLIDE 12

SIGLEX

  • SIGLEX is expected to change its constitution soon

○ Less officers (4 + 2 section representatives, instead of 8) ○ Shorter mandate for section representatives (2 years instead of 3) ○ Double mandate for the 4 other officers (2+2 years) ○ Referendum about the changes ■ Email was sent to SIGLEX members on 4 May ■ Please, vote until 5 August 2019 !

  • Elections to SIGLEX

○ To be run in fall 2019 ○ Candidates needed for the MWE section representative position (2020-2022) ○ Candidates also welcome for a SIGLEX Vice-President and Vice-Secretary

12

slide-13
SLIDE 13

SIGLEX-MWE section

  • The MWE section of SIGLEX also has a constitution and a Standing

Committee

○ 1 elected representative ■ Agata Savary (2016-2019) ■ new representative to be elected in fall 2019 ○ 4 nominated officers ■ (2018-2020) Jelena Mitrović, Carla Parra Escartín; remaining for 1 more year ■ (2017-2019): Francis Bond, Styliani Markantonatou; stepping down ■ Candidates needed for the the 2 open positions (2-year term) ■ Conditions: be a member of the Section (and of SIGLEX) and have published research work in topics related to MWEs ■ Deadlines:

  • Expressions of interest: 30 August
  • Beginning of mandate: end September

13

slide-14
SLIDE 14

SIGLEX-MWE section

  • MWE 2020 workshop

Continue joint workshops with other communities?

UD-MWE workshop in 2020 (ACL?) or 2021 (EACL?)

  • 2020 consistent with UD-PARSEME dynamics
  • Two close UD-PARSEME events in 2020?
  • A COST action can fund workshops in Europe (EACL 2021?) but not in the USA

(ACL 2020?)

Other ideas for 2020? ■ Jelena: Rhetorical figures: metaphor, simile, irony (cf. Workshop on Figurative Language Processing) ■ Carla: Multilingual aspects of MWEs (lexicons, alignement, discovery, translation,...) ■ Stella: Largely understood idiomaticity, also in the use of single words ■ ...

The new SC will be in charge...

14

slide-15
SLIDE 15

PARSEME corpus and shared task

15

slide-16
SLIDE 16

PARSEME corpus

  • PARSEME corpus edition 1.1 (Ramisch et al., 2018)

○ 20 languages, 6 mln tokens, 80,000 verbal MWE annotations ○ Openly available on LINDAT/CLARIN:

  • Future developments

○ Unifying PARSEME and UD guidelines ○ Annotating new MWE categories (implies prior work on annotation guidelines) ■ Nominal MWEs:

  • non-compositional NPs (hot dog),
  • MW named entities (Red Sea),
  • complex terms (recurrent neural network)

■ Adjectival MWEs: crystal clear, as busy as a bee ○ New languages (call for language leaders)

  • Continuous corpus enhancements (regular releases)

16

slide-17
SLIDE 17

PARSEME shared task on weakly supervised VMWE identification?

  • Objectives:

○ Boost performances on unseen data - cf. (Savary et al. 2019) ○ Boost MWE lexicon development

  • Input data:

Closed track

PARSEME training corpus

Large non-annotated corpus (parsed?)

Mechanism to project a lexicon on a raw corpus

Baseline system

Open track: ■ Closed track input ■ Any external data, including handmade MWE lexicons

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

PARSEME shared task on weakly supervised VMWE identification?

  • System output

Discovered (+pre-existing) lexicon

List of queries for projecting the lexicon on the test corpus ○ Identified MWEs on the test corpus

  • Evaluation

F-measure on data unseen in the train corpus ○ Experimentally, (lexical, morphological, syntactic) diversity measure

Global F-measure

  • When?

Culminating event at the MWE 2020 workshop?

19