SemEval-2013 Task 13: Word Sense Induction for Graded and - - PowerPoint PPT Presentation

semeval 2013 task 13 word sense induction for graded and
SMART_READER_LITE
LIVE PREVIEW

SemEval-2013 Task 13: Word Sense Induction for Graded and - - PowerPoint PPT Presentation

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens Dipartimento di Informatica Sapienza Universita di Roma jurgens@di.uniroma1.it Ioannis Klapaftis Search Technology Center Europe Microsoft


slide-1
SLIDE 1

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses

jurgens@di.uniroma1.it ioannisk@microsoft.com

David Jurgens

Dipartimento di Informatica Sapienza Universita di Roma

Ioannis Klapaftis

Search Technology Center Europe Microsoft

slide-2
SLIDE 2
  • Introduction
  • Task Overview
  • Data
  • Evaluation
  • Results
slide-3
SLIDE 3

Which meaning of the word is being used?

John sat on the chair.

  • 1. a seat for one person, with a support for the back
  • 2. the position of professor
  • 3. the officer who presides at the meetings of an organization
slide-4
SLIDE 4

Which meaning of the word is being used?

John sat on the chair.

  • 1. a seat for one person, with a support for the back
  • 2. the position of professor
  • 3. the officer who presides at the meetings of an organization

This is the problem of Word Sense Disambiguation (WSD)

slide-5
SLIDE 5

What are the meanings

  • f a word?

It was dark outside Her dress was a dark green We didn’t ask what dark purpose the knife was for It was too dark to see I light candles when it gets dark These are some dark glasses The dark blue clashed with the yellow The project was made with dark designs

slide-6
SLIDE 6

What are the meanings

  • f a word?

It was dark outside Her dress was a dark green We didn’t ask what dark purpose the knife was for It was too dark to see I light candles when it gets dark These are some dark glasses The dark blue clashed with the yellow The project was made with dark designs

This is the problem of Word Sense Induction (WSI)

slide-7
SLIDE 7
  • Introduction
  • Task Overview
  • Data
  • Evaluation
  • Results
slide-8
SLIDE 8

Task 13 Overview

Induce senses WSD system Use WordNet

  • r

Annotate the same text and measure the similarity of annotations Lexicographers

slide-9
SLIDE 9

Why another WSD/WSI task?

slide-10
SLIDE 10

Why another WSD/WSI task?

Application-based (Task 11) Annotation-focused (this task)

slide-11
SLIDE 11

WSD Evaluation is tied to Inter-Annotator Agreement (IAA)

Lexicographers If lexicographers can’t agree on which meaning is present, WSD systems will do no better.

slide-12
SLIDE 12

Why might humans not agree?

slide-13
SLIDE 13

He struck them with full force.

slide-14
SLIDE 14

He struck them with full force.

strike#v#1“deliver a sharp blow”

He’s probably fighting so

slide-15
SLIDE 15

He struck them with full force.

strike#v#10 “produce by manipulating keys”

He’s clearly playing a piano!

slide-16
SLIDE 16

He struck them with full force.

strike#v#19 “form by stamping”

I thought he was minting coins the old fashioned way

slide-17
SLIDE 17

He struck them with full force.

  • strike#v#1 “deliver a sharp blow”
  • strike#v#10 “produce by manipulating keys”
  • strike#v#19 “form by stamping”

Only one sense is correct, but contextual ambiguity makes it impossible to determine which one.

slide-18
SLIDE 18

She handed the paper to her professor

slide-19
SLIDE 19
  • paper#n#1 - a material made of cellulose
  • paper#n#2 - an essay or assignment

She handed the paper to her professor

Multiple, mutually- compatible meanings

slide-20
SLIDE 20
  • paper#n#1 - a material made of cellulose
  • paper#n#2 - an essay or assignment

She handed the paper to her professor

a physical property

Multiple, mutually- compatible meanings

slide-21
SLIDE 21

Multiple, mutually- compatible meanings

  • paper#n#1 - a material made of cellulose
  • paper#n#2 - an essay or assignment

She handed the paper to her professor

a physical property a functional property

slide-22
SLIDE 22

Parallel literal and metaphoric interpretations

  • dark#a#1 – devoid of or deficient in light or brightness;

shadowed or black

  • dark#a#5 – secret

We commemorate our births from out

  • f the dark centers of women
slide-23
SLIDE 23

Annotators will use multiple senses if you let them

  • Véronis (1998)
  • Murray and Green (2004)
  • Erk et al. (2009, 2012)
  • Jurgens (2012)
  • Passoneau et al. (2012)
  • Navigli et al. (2013) - Task 12
  • Korkontzelos et al. (2013) - Task 5
slide-24
SLIDE 24

New in Task 13: More Ambiguity!

Induce senses WSD system Use WordNet

  • r

Annotate the same text and measure the similarity of annotations Lexicographers

slide-25
SLIDE 25

Task 13 models explicitly annotating instances with...

  • Ambiguity
  • Non-exclusive property-based senses in the

sense inventory

  • Concurrent literal and metaphoric

interpretations

slide-26
SLIDE 26

Task 13 annotation has lexicographers and WSD systems use multiple senses with weights

The student handed her paper to the professor

slide-27
SLIDE 27
  • paper%1:10:01:: – an essay
  • paper%1:27:00:: – a material made of

cellulose pulp

The student handed her paper to the professor

Definitely! 100%

Task 13 annotation has lexicographers and WSD systems use multiple senses with weights

slide-28
SLIDE 28
  • paper%1:10:01:: – an essay
  • paper%1:27:00:: – a material made of

cellulose pulp

The student handed her paper to the professor

Definitely! 100% Sort of? 30%

Task 13 annotation has lexicographers and WSD systems use multiple senses with weights

slide-29
SLIDE 29

Potential Applications

  • Identifying “less bad” translations in

ambiguous contexts

  • Potentially preserve ambiguity across

translations

  • Detecting poetic or figurative usages
  • Provide more accurate evaluations when

WSD systems detect multiple senses

slide-30
SLIDE 30
  • Introduction
  • Task Overview
  • Data
  • Evaluation
  • Results
slide-31
SLIDE 31

Task 13 Data

  • Drawn from the Open ANC
  • Both written and spoken
  • 50 target lemmas
  • 20 noun, 20 verb, 10 adjective
  • 4,664 Instances total
slide-32
SLIDE 32

Annotation Process

1 Use methods from Jurgens (2013) to get

MTurk annotations

slide-33
SLIDE 33

Annotation Process

Achieve high (> 0. 8) agreement

1 2

Use methods from Jurgens (2013) to get MTurk annotations

slide-34
SLIDE 34

Annotation Process

Achieve high (> 0. 8) agreement

1 2

Use methods from Jurgens (2013) to get MTurk annotations

3 Analyze annotations and discover Turkers

are agreeing but are also wrong

slide-35
SLIDE 35

Annotation Process

Achieve high (> 0. 8) agreement

1 2

Use methods from Jurgens (2013) to get MTurk annotations

3 Analyze annotations and discover Turkers

are agreeing but are also wrong Annotate the data ourselves

4

slide-36
SLIDE 36

Annotation Setup

  • Rate the applicability of each sense on a

scale from one to five

  • One indicates doesn’t apply
  • Five is exactly applies
slide-37
SLIDE 37

Multiple sense annotation rates

1 1.1 1.2 1.3 1.4 Senses Per Instance

Face-to-face Telephone Fiction Journal Letter Non-fiction Technical Travel Guides

Spoken Written

slide-38
SLIDE 38
  • Introduction
  • Task Overview
  • Data
  • Evaluation
  • Results
slide-39
SLIDE 39

Evaluating WSI and WSD Systems

Lexicographer Evaluation WSD Evaluation

slide-40
SLIDE 40

WSI Evaluations

It was dark outside Her dress was a dark green We didn’t ask what dark purpose the knife was for

slide-41
SLIDE 41

WSI Evaluations

It was dark outside Her dress was a dark green We didn’t ask what dark purpose the knife was for

It was too dark to see I light candles when it gets dark Dark nights and short days Make it dark red These are some dark glasses The dark blue clashed with the yellow The project was made with dark designs He had that dark look in his eyes

slide-42
SLIDE 42

WSI Evaluations

It was dark outside Her dress was a dark green We didn’t ask what dark purpose the knife was for

It was too dark to see I light candles when it gets dark Dark nights and short days Make it dark red These are some dark glasses The dark blue clashed with the yellow The project was made with dark designs He had that dark look in his eyes

slide-43
SLIDE 43

WSI Evaluations

The project was make with dark designs

Lexicographer

slide-44
SLIDE 44

WSI Evaluations

The project was make with dark designs

Lexicographer WSI System

slide-45
SLIDE 45

WSI Evaluations

The project was make with dark designs

Lexicographer WSI System How similar are the clusters of usages?

slide-46
SLIDE 46

The complication of fuzzy clusters

Lexicographer WSI System

slide-47
SLIDE 47

The complication of fuzzy clusters

Lexicographer WSI System

Overlapping Partial membership

slide-48
SLIDE 48

Evaluation 1: Fuzzy B-Cubed

Lexicographer WSI System

How similar are the clusters of this item in both solutions?

slide-49
SLIDE 49

Evaluation 1: Fuzzy Normalized Mutual Information

Lexicographer WSI System

How much information does this cluster give us about the cluster(s)

  • f its items in the
  • ther solution?
slide-50
SLIDE 50

Why two measures?

B-Cubed: performance with the same sense distribution NMI: performance independent of sense distribution

slide-51
SLIDE 51

WSD Evaluations

slide-52
SLIDE 52

WSD Evaluations

Induce senses WSD system Use WordNet

  • r
slide-53
SLIDE 53

WSD Evaluations

Induce senses WSD system Use WordNet

  • r

Learn a mapping function that converts an induced labeling to a WordNet labeling

  • 80% use to learn

mapping

  • 20% used for testing
  • Used Jurgens (2012)

method for mapping

slide-54
SLIDE 54

WSD Evaluations

Which senses apply? Which senses apply more? How much does each sense apply?

1 2 3

slide-55
SLIDE 55

WSD Evaluations

Which senses apply?

1

Gold = {wn1, wn2 } Jaccard Index |Gold ∩ Test| |Gold ∪ Test| Test = {wn1}

slide-56
SLIDE 56

WSD Evaluations

Which senses apply more?

2

Gold = {wn1:0.5, wn2:1.0, wn3:0.9} Test = {wn1:0.6, wn2:1.0,}

wn2 > wn3: > wn1 wn2 > wn1: > wn3

Kendall’s Tau Similarity

with positional weighting

slide-57
SLIDE 57

WSD Evaluations

How much does each sense apply?

3

Weighted Normalized Discounted Cumulative Gain

slide-58
SLIDE 58

WSD Evaluations

  • All measures are bounded in [0,1]

1 0.9 0.8 1 .8 .8 .7

Avg: 0.9 Avg: 0.825

slide-59
SLIDE 59

WSD Evaluations

  • All measures are bounded in [0,1]
  • Extend Recall to be average across all answers

1 0.9 0.8 1 .8 .8 .7

Avg: 0.9 Avg: 0.825 Recall: 0.675 Recall: 0.825

slide-60
SLIDE 60

Teams

AI-KU (WSI) Lexical Substitution + Clustering

slide-61
SLIDE 61

Teams

AI-KU (WSI) Unimelb (WSI) Lexical Substitution + Clustering Topic Modeling

slide-62
SLIDE 62

Teams

AI-KU (WSI) UoS (WSI) Unimelb (WSI) Lexical Substitution + Clustering Topic Modeling Graph Clustering

slide-63
SLIDE 63

Teams

AI-KU (WSI) UoS (WSI) Unimelb (WSI) La Sapienza (WSD) Lexical Substitution + Clustering Topic Modeling Graph Clustering PageRank over WordNet graph

slide-64
SLIDE 64

WSI Baselines

One cluster per instance (1c1inst) One cluster

slide-65
SLIDE 65

WSD Baselines

  • MFS - All instances labeled with MFS from

SemCor

  • Ranked Senses - All instances labeled

with all senses, proportionally weighted by their frequency in SemCor

slide-66
SLIDE 66
  • Introduction
  • Task Overview
  • Data
  • Evaluation
  • Results
slide-67
SLIDE 67

WSI Results

0.175 0.35 0.525 0.7 0.02 0.04 0.06 0.08 Fuzzy B-Cubed Fuzzy NMI

One Cluster 1c1inst AI-KU AI-KU (add 1000) AI-KU (add 1000, remove 5) Unimelb (5p) Unimelb (50k) UoS (WN) UoS (Top)

slide-68
SLIDE 68

WSD Results

Detection Ranking Weighting 0.175 0.35 0.525 0.7

AI-KU (add+rem) Unimelb (5000k) UoS (Top) La Sapienza #2 One cluster (WSI) 1c1inst (WSI) Semcor MFS Semcor Ranked

U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t S e m C

  • r

M F S U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t

slide-69
SLIDE 69

WSD Results

Detection Ranking Weighting 0.175 0.35 0.525 0.7

AI-KU (add+rem) Unimelb (5000k) UoS (Top) La Sapienza #2 One cluster (WSI) 1c1inst (WSI) Semcor MFS Semcor Ranked

U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t S e m C

  • r

M F S U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t

slide-70
SLIDE 70

WSD Results

Detection Ranking Weighting 0.175 0.35 0.525 0.7

AI-KU (add+rem) Unimelb (5000k) UoS (Top) La Sapienza #2 One cluster (WSI) 1c1inst (WSI) Semcor MFS Semcor Ranked

U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t S e m C

  • r

M F S U n i m e l b ( 5 k ) U

  • S

( T

  • p

) O n e C l u s t e r S e m C

  • r

M F S L a S a p i e n z a # 2 S e m C

  • r

R a n k e d A I

  • K

U ( a d d + r e m ) 1 c 1 i n s t

slide-71
SLIDE 71

Issues with Evaluation

Test Trial 100% 11%

Multi-sense Annotation Rate

Task 13 evaluation measures specifically designed for multiple senses

slide-72
SLIDE 72

Evaluation #2

  • Modify the WSI mapping procedure to only

produce a single sense

  • Modify WSD systems to retain only

highest-weighted sense

slide-73
SLIDE 73

WSD Results for single-sense instances

0.175 0.35 0.525 0.7

0.477 0.569 0.217 0.6 0.605 0.641

F-1 Score AI-KU Unimelb (5000k) UoS (Top) La Sapienza (#2) One Cluster SemCor MFS One Cluster Per Instance

slide-74
SLIDE 74

Conclusions

  • Multiple sense annotations offers a way to

improve annotation by making ambiguity explicit

  • WSI offer some hope for creating highly

accurate semi-supervised systems

slide-75
SLIDE 75

Future Work

  • Embed this application in a task
  • Task 11 extension with multiple labels?
  • Have systems annotate why an instance

needs multiple senses

  • Build WSI sense mapping on an external

tuning corpus

slide-76
SLIDE 76

Summary

  • All resources released on the Task 13

website: http://www.cs.york.ac.uk/ semeval-2013/task13/

  • All evaluation scoring and IAA code is

released on Google code https:// code.google.com/p/cluster-comparison-tools/

  • Annotations (hopefully) being folded into

MASC

slide-77
SLIDE 77

Any questions?

Inquery The subject matter at issue A sentence of inquery Doubtfulness Formal proposals for action Marriage proposals

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses

jurgens@di.uniroma1.it ioannisk@microsoft.com

David Jurgens Ioannis Klapaftis and