Multiword expressions: Getting the taste of things to come MWE 2017 - - PowerPoint PPT Presentation

multiword expressions
SMART_READER_LITE
LIVE PREVIEW

Multiword expressions: Getting the taste of things to come MWE 2017 - - PowerPoint PPT Presentation

Multiword expressions: Getting the taste of things to come MWE 2017 Workshop Panel discussion Outline 1. Announcements 2. SIGLEX MWE Section 3. Shared Task 2 Announcements 3 Phraseology and Multiword Expressions (PMWE)


slide-1
SLIDE 1

Multiword expressions:

Getting the taste of things to come MWE 2017 Workshop — Panel discussion

slide-2
SLIDE 2

Outline

  • 1. Announcements
  • 2. SIGLEX MWE Section
  • 3. Shared Task

2

slide-3
SLIDE 3

Announcements

3

slide-4
SLIDE 4

Phraseology and Multiword Expressions (PMWE)

4/XTOTALX

http://langsci-press.org/catalog/series/pmwe Α new series with Language Science Press

Editors

  • Agata Savary (University of Tours, Blois, France)
  • Manfred Sailer (Goethe University Frankfurt a.M., Germany)
  • Yannick Parmentier (University of Orléans, France)
  • Victoria Rosén (University of Bergen, Norway)
  • Mike Rosner (University of Malta, Malta)
slide-5
SLIDE 5

Two volumes are about to be published with PMWE as a result of collaborative work within the IC1207 COST Action PARSEME.

“MWE representation and parsing” Yannick Parmentier and Yakub Waszczuk (editors) “Mutliword Expressions: Insights from a Multi-lingual Perspective” Manfred Sailer and Stella Markantonatou (editors)

5

slide-6
SLIDE 6

New PMWE volume

6/XTOTALX

...with extended selected papers from + EACL MWE 2017 (main track) + SHARED TASK + the wider community

slide-7
SLIDE 7

Other announcements?

7

slide-8
SLIDE 8

SIGLEX-MWE Section

8

slide-9
SLIDE 9

SIGLEX

  • SIGLEX = ACL Special Interest Group on the Lexicon

○ Organising and endorsing events:

■ *SEM, SemEval, MWE workshop, MUMTTT wroskhop

○ Adam Kilgarriff prize ○ 2 sections: SemEval, MWE

  • SIGLEX board:

○ 8 people elected for 3 years ○ One representative per section ○ Skype meeting every 3 months

9

slide-10
SLIDE 10

SIGLEX-MWE Section

  • Currently about 210 members
  • New members still welcome
  • To join, subscribe to the mailing list:

○ multiword-expressions@lists.sourceforge.net

  • Natural follow-up of PARSEME:

○ Integration of PARSEME outcomes into a larger international context

  • Activities:

○ MWE workshop (yearly) ○ Stabilizing the MUMTTT workshop ○ Others (shared tasks, books, joints events with other SIGs)?

10

slide-11
SLIDE 11

Need for SIGLEX-MWE core group

  • MWE community is becoming large; it should no longer

be led by a single person

  • An official core group needed:

○ SIGLEX-MWE representative + 3-4 other people

  • Responsibilities:

○ naming organizers of the annual MWE workshop (to be approved by the SIGLEX board) ○ animating the community ○ maintaining the website and the mailing list

11

slide-12
SLIDE 12

SIGLEX-MWE core group - legitimacy

  • From nomination

○ By the SIGLEX board upon a proposal of the previous members (?) ○ Advantage: balance can be ensured (of continents, language families, gender, age, CS/Ling expertize, etc.) ○ Drawback: non-democratic principle

  • From elections

○ Advantages: democratic principle ○ Drawbacks:

■ Balance not ensured ■ The elected people may have problems working as a team

12/XTOTALX

slide-13
SLIDE 13
  • 3 years - coinciding with the SIGLEX board mandate

+ Simplicity

  • Transfer of experiences not ensured
  • 2 years, 2+2 overlapping mandates

+ Transfer of competences ensured +

Important for the shared task infrastructure

  • More organizational effort, frequent

elections/nominations

SIGLEX-MWE core group - mandate

13/XTOTALX

slide-14
SLIDE 14

Brainstorming

  • Core group nomination vs. election
  • Criteria for nomination

○ working in the area, ○ balance wrt. languages, continents, gender, CS/linguistics background ○ Proposal by the SIGLEX-MWE representative, validation by SIGMEX board

  • Instruments for election

○ As for SIGLEX board (nominating officers, electronic vote, …)

  • Mandate duration
  • Different MWE workshop chairs year after year?
  • Next MWE workshop venue (preferably outside Europe)

14/XTOTALX

slide-15
SLIDE 15

Shared Task

15

slide-16
SLIDE 16
  • Discuss the achievements of the first shared task
  • Gather feedback from workshop attendants and

specially shared task participants

○ What worked well? ○ What could have been better?

  • Present our ideas for next edition(s)
  • Gather feedback and suggestions for next edition(s)

Goals of this discussion

16/XTOTALX

slide-17
SLIDE 17

Shared Task 1.0 (2017)

  • "Universal" guidelines for annotating verbal MWEs
  • Freely available annotated corpora in 18 languages

○ 3 language families (Romance, Slavic, Germanic) + others ○ More than 60k annotated VMWEs in all languages

  • Task definition: identify which tokens are lexicalized

components of a VMWE

○ Allowing discontinuities, overlap, and nesting

  • 7 VMWE identification systems submitted:

○ 6 in the closed track ○ 1 in the open track

17/XTOTALX

slide-18
SLIDE 18

Shared Task 1.0 Achievements

  • We have produced a valuable new resource
  • We gained experience with "universal" guidelines
  • We have a large group of highly motivated contributors
  • We have the infrastructure in place

○ Work organization into languages, language groups, etc. ○ Dynamic guidelines with multilingual examples ○ Customizable annotation platform FLAT ○ Dedicated tools to verify coherence and silence ○ File formats and evaluation tools ○ Communication tools: mailing lists, git issues, websites

18/XTOTALX

slide-19
SLIDE 19

Shared Task 1.0 - how can we improve?

  • Double annotation was possible only for a sample
  • Guidelines still have fuzzy areas

○ Definition of predicative nouns ○ Meaning shift for IReflV ○ ...

  • Cross-lingual homogenization, specially in lang. family
  • Amount of annotated data for some languages
  • Development of in-house "adjudication" tools
  • Suggestions?

19/XTOTALX

slide-20
SLIDE 20

Next edition(s)

  • Shared task 1.1 (2018)

○ Extension of first edition with additional and better data ○ Keep focus on token-based identification of VMWEs ○ To be submitted to SemEval 2018

  • Shared task 2.0 (2019)

○ New task definition ○ Extension to new MWE categories ○ To be submitted to CoNLL 2019 (?)

20/XTOTALX

slide-21
SLIDE 21

Shared Task 1.1 (2018)

  • Cover new languages

○ English, Asian languages: Japanese, Chinese, Korean, Hindi ○ Other languages?

  • Enhanced guidelines

○ Intensive use of OTH category in some languages ○ Creation of language-specific categories (e.g. compound verbs) ○ Reformulation and clarification of LVC tests (see Issues)

  • Enhanced annotation quality

○ Double annotation and/or mandatory coherence check

  • Add missing CoNLL-U files with better dependencies
  • Extend corpora, specially for "small" languages
  • Annotate new test sets for the shared task evaluation

21/XTOTALX

slide-22
SLIDE 22

Shared Task 2.0 (2019)

  • Extension of the task

○ Joint parsing and MWE identification ○ MWE and named entity identification

  • Cover other MWE categories, not only verbal

○ Adjectival, adverbial, nominal, terms, similes

  • Other ideas?

22/XTOTALX

slide-23
SLIDE 23

Who's in to join/pursue the adventure?

  • Annotators?
  • Language leaders?
  • Language group leaders?
  • Technical experts?
  • Coordinators?

Spread the word!

23/XTOTALX

slide-24
SLIDE 24

Thank you!

24