Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft - - PowerPoint PPT Presentation

making sense of word sense
SMART_READER_LITE
LIVE PREVIEW

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft - - PowerPoint PPT Presentation

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft fr Sprachwissenschaft (DGfS) Gottingen Rebecca J. Passonneau Nancy Ide Vikas Bhardwaj Vassar College Ansaf Salleb Aouissi Outline The word sense conundrum The MASC


slide-1
SLIDE 1

Making Sense of Word Sense

24 February, 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) Gottingen Rebecca J. Passonneau Vikas Bhardwaj Ansaf Salleb‐Aouissi Nancy Ide Vassar College

slide-2
SLIDE 2

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 2

slide-3
SLIDE 3

Word sense conundrum

  • Adam Kilgariff , 2003, I don’t believe in word senses

– Abstractions from corpus clusters – Corpus citations . . . are the basic objects in the

  • ntology
  • James Pustejovsky, 1991, The generative lexicon

– No fixed set of conceptual primitives – A fixed number of generative devices – Lexical semantics is an interface between commonsense knowledge and linguistic form

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 3

slide-4
SLIDE 4

Zipf’s Law

An epiphenomenon of . . .

  • Words (types or tokens)
  • Senses
  • Many other phenomena (Newman, M. E. J.;

2005): city population, books sold, net worth, . . .

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 4

slide-5
SLIDE 5

Granularity

Concepts versus comparisons of experience

  • Infinite divisibility of reality: how fine‐grained should

a cluster be?

– WordNet senses for primitive, Adj: 1. Belonging to an early stage of development 2. Characteristic of an ancestral type 3. Preliterate or non‐industrial societies 4. Created by one without formal training

  • Shared experience: the basis of social reality, ways of

verbalizing social reality

1‐ 3. Anthropology 4. Art history

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 5

slide-6
SLIDE 6

Corpus‐based Sense Classes

  • Ontological questions: deferred

– How are clusters used as basic ontological objects? – How is commonsense knowledge represented

  • Identify same/different contexts, within limits:

– Same? . . . a primitive granite boar, carved in prehistoric times, . . . has a primitive Easter‐island look, – Different? Bin Laden’s training camps were primitive . . . . . . one or more of the primitive gluing or ungluing

  • perations

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 6

slide-7
SLIDE 7

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 7

slide-8
SLIDE 8

American National Corpus

24 February 2011 8 Deutschen Gesellschaft für Sprachwissenschaft (DGfS)

  • 100 Million Words
  • Completely unrestricted
  • Post 1990 American English
  • Many genres
slide-9
SLIDE 9

MASC: Manually Annotated Sub‐Corpus

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 9

Participants

  • Nancy Ide (PI, NSF CRI; Vassar College)
  • Collin Baker (ICSI; FrameNet)
  • Christiane Fellbaum (Princeton;

WordNet)

  • Rebecca J. Passonneau (Columbia

Univ.) Size: 500,000 Words,

  • Manually validated automatic
  • Manual annotations

Selected annotations

  • Token, Sentence, Lemma (Validated)
  • Named entities (Validated)
  • WordNet (Manual: 1.5 Million Word

Sentence Corpus)

  • FrameNet (Manual: 150K Words)

http://www.anc.org/MASC/

slide-10
SLIDE 10

MASC Corpus

  • Three releases

– MASC I: 82K words, release date May, 2010 – MASC I‐II: 142 K words, release date March, 2011 – MASC I‐III: 500K words, release date July, 2011

  • Fourteen types of annotation

– Manually validated automatic, e.g., NP Chunks – Manual, e.g, word sense

  • Twenty genres, evenly balanced
  • Freely available from MASC website, and from:

– LDC – NLTK

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 10

slide-11
SLIDE 11

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 11

Genre

  • rds

% of corpus 1 Court transcript 20817 4 2 Debate transcript 32325 6 3 Email 20470 4 4 Essay 25590 5 5 Fiction 25681 5 6 Gov't documents 24605 5 7 Journal 25635 5 8 Letters 24750 5 9 Newspaper/ newswire 17951 4 10 Non-fiction 25182 5 11 Spoken 25783 5 12 Technical 25426 5 13 Travel guides 26708 5 14 Twitter 24180 5 15 Blog 2 5 0 0 0 5 16 ficlets 2 5 0 0 0 5 17 movie script 28240 6 18 poetry 2 5 0 0 0 5 19 spam 2 5 0 0 0 5 20 jokes 2 5 0 0 0 5 Total 498343

Figures in bold indicate that the texts have not yet been chosen

slide-12
SLIDE 12

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 12

slide-13
SLIDE 13

Word Sense Annotation Goals

  • Freely available word sense corpus
  • Harmonize WordNet and FrameNet
  • Investigate moderately polysemous words (avg.=7)
  • Large sentence‐based corpus

– 100 words, balanced for part‐of‐speech – 1000 sentences per word – Avg. sente3nce length in MASC I > 20 words – 2 million word corpus, representing 700 senses

  • Provide measures of interannotator agreement

– Chance corrected coefficients – Krippendorff’s Alpha

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 13

slide-14
SLIDE 14

WordNet Sense Information

SENSEID: a unique identifier SYNSET: a list of synonymous senses (SENSEIDS) DEFINITION: a phrase EXAMPLES: list of glosses FREQUENCY COUNT: integer Nouns have domain, . . . etc Verbs have verb group, . . . etc Adjectives have attributes, . . . etc

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 14

slide-15
SLIDE 15

WordNet Senses: time (noun)

8 WordNet senses used 1. (time1, clip2) (an instance or single occasion for some event) "this time he succeeded"; "he called four times"; "he could do ten at a clip" 2. (a period of time considered as a resource under your control and sufficient to accomplish something) "take time to smell the roses"; "I didn't have time to finish"; "it took more than half my time" 3. (an indefinite period (usually marked by specific attributes or activities)) "he waited a long time"; "the time of year for planting"; "he was a great actor in his time " 4. (a suitable moment) "it is time to go" 5. (the continuum of experience in which events pass from the future through the present to the past) 6. (a person's experience on a particular occasion) "he had a time holding back the tears"; "they had a good time together" 7. (time7, clock_time1) (a reading of a point in time as given by a clock) "do you know what time it is?"; "the time is 10 o'clock" 8. (time8, fourth_dimension1), (the fourth coordinate that is required (along with three spatial dimensions) to specify a physical event) 2 WordNet senses not used 9. (time9, meter4, metre3), (rhythm as given by division into parts of equal duration) 10. (time10, prison term1 , sentence3), (the period of time a prisoner is imprisoned) "he served a prison term of 15 months"; "his sentence was 5 to 10 years"; "he is doing time in the county jail"

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 15

slide-16
SLIDE 16

Senses of time‐N

Sense Num Definitions 1 171 An instance or single occasion for an event 2 131 A period of time . . . sufficient to accomplish something 3 427 An indefinite period 4 59 A suitable moment 5 34 The continuum of experience . . . the future . . . 6 19 A person's experience on a particular occasion 7 38 A reading of a point in time as given by a clock 8 47 The fourth coordinate . . . to specify an event Total 926

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 16

slide-17
SLIDE 17

Senses of time‐N

Sense Num Definitions 1 171 An instance or single occasion for an event

When When the bride the bride and groom and groom came together ame together for for the first the first time time

2 131 A period of time . . . sufficient to accomplish something 3 427 An indefinite period 4 59 A suitable moment 5 34 The continuum of experience . . . the future . . . 6 19 A person's experience on a particular occasion 7 38 A reading of a point in time as given by a clock 8 47 The fourth coordinate . . . to specify an event Total 926

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 17

slide-18
SLIDE 18

Senses of time‐N

Sense Num Definitions 1 171 An instance or single occasion for an event 2 131 A period of time . . . sufficient to accomplish something 3 427 An indefinite period 4 59 A suitable moment

A time for A time for a youngster a youngster to enjoy to enjoy the fun and the fun and benefits benefits of

  • f camp . . .

camp . . .

5 34 The continuum of experience . . . the future . . . 6 19 A person's experience on a particular occasion 7 38 A reading of a point in time as given by a clock 8 47 The fourth coordinate . . . to specify an event Total 926

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 18

slide-19
SLIDE 19

Senses of time‐N

Sense Num Definitions 1 171 An instance or single occasion for an event 2 131 A period of time . . . sufficient to accomplish something 3 427 An indefinite period 4 59 A suitable moment 5 34 The continuum of experience . . . the future . . .

Turn back the hands of time Turn back the hands of time and remember and remember when you . when you . . . .

6 19 A person's experience on a particular occasion 7 38 A reading of a point in time as given by a clock 8 47 The fourth coordinate . . . to specify an event Total 926

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 19

slide-20
SLIDE 20

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 20

slide-21
SLIDE 21

MASC Sense Annotation Rounds

  • 1050 sentences per lemma‐pos

– N up to 1050 from MASC – If N<1050, balance from OANC

  • Pre‐annotation sample: Sense inventory revision

– Random selection of 50 instances – WordNet 3.0 sense annotation – Multiple annotators

  • typically 2‐3; 6 in Round 2
  • Same core group of Vassar, Columbia undergrads
  • Highly trained
  • Sense revision (to be added to WordNet 3.1)
  • Core annotation

– 100 sentences subsample

  • FrameNet annotation
  • Multiple annotators (typically 2‐3) for interannotator agreement

– 900 sentences, one annotator per sentence (not always the same annotator)

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 21

slide-22
SLIDE 22

Annotation Tool

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 22

  • Loads WordNet 3.0
  • Sense number
  • Definition
  • Glosses
  • Synset
  • Other labels
  • Collocation
  • Wrong POS
  • No sense applies
  • Not enough context
  • Subversion
  • Comment field used during

pre‐annotation sample

slide-23
SLIDE 23

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 23

slide-24
SLIDE 24

Multiple Annotators

Poesio and Artstein, Reliability of Anaphoric Annotation Reconsidered, 2005

  • Anaphora
  • Lack of unique interpretation in the context of its occurrence due

to problematic annotation scheme, to be fixed by less specific representations (e.g., word senses, Buitelaar, 1998; Palmer et al., 2005)

  • Applies to polysemy, not to many other cases, e.g., anaphora
  • Word sense
  • Need for revision of sense inventory
  • Need for underspecification: annotators disagree unsystematically
  • Need to account for differences in interpretation: annotators disagree

systematically

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 24

slide-25
SLIDE 25

Interannotator Agreement

Instances i, annotators j, annotation values k

  • Percent Agreement: Proportion of i where all j pick k

– Does not generalize well to multiple annotators – Does not take probability into account – Sensitive to data skew – Primarily a measure of coverage What proportion of instances have unanimity?

  • Agreement coefficient (Krippendorff’s Alpha):

Proportion of agreement > predicted by chance

– Same interpretation independent of data skew – Handles multiple annotators – Primarily a measure of variance – Does not indicate coverage

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 25

slide-26
SLIDE 26

Example Agreement Matrix

Anns Instances 1 2 3 4 5 6 7 8 9 10 Ann1 a b b b c a c c b a Ann2 a c b a c c c b b b Ann3 a a b a c b c b b c

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 26

  • Percent Agreement
  • 50 (5 of 10 columns)
  • 63 (13 of 30 cells)
  • Agreement coefficient
  • Krippendorff’s Alpha: 30.1

p(a)= 8/30 = 0.27

  • Cohen’s Kappa: 27.8

p_Ann1(a)=p_Ann3(a)= 3/10 = 0.30 p_Ann2(a)= 2/10 = 0.20

slide-27
SLIDE 27

Alpha Scores on Round 2, Trained

Lemma POS WN Senses Senses Used Alpha Outliers Alpha ‘ long Adj 9 4 0.67 1 0.80 fair 10 6 0.54 2 0.63 quiet 6 7 0.49 0.49 time Noun 7 7 0.68 0.68 work 10 8 0.62 0.62 land 11 9 0.49 1 0.54 tell Verb 12 10 0.46 0.46 say 8 8 0.38 2 0.52 show 11 10 0.46 1 0.48 know 11 10 0.38 2 0.63

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 27

slide-28
SLIDE 28

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 28

slide-29
SLIDE 29

Interpreting Interannotator Agreement

  • Krippendorff:

– ≥ 0.67 supports tentative conclusions – ≥ 0.80 good reliability

  • Landis & Koch

– 0.21‐0.40 fair – 0.41‐0.60 moderate – 0.61‐0.80 substantial

  • Poesio & Artstein

– No single threshold applicable for all purposes – 0.70 for many NLP annotations

  • Passonneau

– Paradigmatic reliability analysis – E.g., significance tests based on uses of different annotators labels

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 29

slide-30
SLIDE 30

Observed Variation in Alpha

  • Part‐of‐Speech effect

– Adjectives and Nouns have higher Alpha than verbs – Only a partial explanation of the variation

  • Within each part of speech, Alpha varies
  • Poor (inverse) correlation of Alpha with

#senses available (= ‐ 0.38)

  • Modest inverse correlation of Alpha with #

senses used (= ‐ 0.56)

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 30

slide-31
SLIDE 31

Anveshan: Annotation Variance Estimation

  • For use with data from multiple annotators

– Identify outliers among annotators – Find subsets of annotators with similar behavior (systematic disagreement) – Identify confusable senses

  • Variation can occur due to differences among

– Annotators (expertise) – Items (difficulty) – Label sets (number and similarity of labels)

  • Uses Kullbach‐Liebler divergence, Jensen‐

Shannon divergence

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 31

slide-32
SLIDE 32

Anveshan Basics

  • For each annotator , sense , compute
  • For each annotator , compute the average
  • Compute leverage for each , where

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 32

j

s

1

( , ) ( ) ( , )

i

j i a j m k i k

count s a P S s count s a

=

= = ∑

i

a

1 1

( ) 1

m

i a j m

P S s n

− =

= −

i

a

, m m i ∀ ≠

( , ) | ( ) ( )|

k

Lev P Q P k Q k = −

( )

i

a j

P s

slide-33
SLIDE 33

Anveshan Basics, Continued

  • Compute Kullbach‐Liebler Divergence for each

annotator’s sense distribution against the average

  • f all other annotators’ sense distributions
  • Compute Jensen‐Shannon Divergence for each

annotator’s sense distribution against other annotators’ sense distributions, for all

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 33

1 1 ( , ) ( , ) ( , ), 2 2 ( ) 2 JSD P Q KLD P M KLD Q M P Q where M = + + =

( ) ( , ) ( )log ( )

i

P i KLD P Q P i Q i =∑

slide-34
SLIDE 34

Outliers

KLD for long‐Adj

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 34

slide-35
SLIDE 35

Systematic Disagreements

JSD and Alpha for show‐Verb

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 35

slide-36
SLIDE 36

Confusability of Senses

Sense distributions for say‐Verb

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 36

10 20 30 40 50 60 A101 A103 Overall WN1 WN2 WN3 WN4 WN5 WN6 WN7 WN8 WN9 WN11

slide-37
SLIDE 37

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 37

slide-38
SLIDE 38

Mechanical Turkers

  • Two adjectives
  • 150 sentences per adjective

– 15 HITs; rejected for turkers not completing all 15 – 10 sentences per hit – 13 turkers – long (9 WN senses, all used): Alpha=0.15 – fair (10 WN senses, all used): Alpha =0.25

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 38

wn1 wn2 wn3 wn4 wn5 wn6 wn7 wn8 wn9 wn10 Other Total 891 437 52 60 63 21 21 135 10 14 236 1950 wn1 wn2 wn3 wn4 wn5 wn6 wn7 wn8 wn9 Other Total 659 458 115 110 160 156 66 56 64 115 1950

slide-39
SLIDE 39

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 39

slide-40
SLIDE 40

Comparison of Two Learning Paradigms

  • Ground truth labels from author
  • Unsupervised learning from multilabels

– Maximum likelihood estimates obtained using EM

  • Supervised learning from features

– Features:

  • Word and sentence length features
  • Tf*Idf
  • Named Entities
  • DAL features (Dictionary of Affect in Language)

– SVMLight – 4‐fold cross validation – Best C‐values

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 40

slide-41
SLIDE 41

Machine Learning from MultiLabels

  • GLAD (Whitehill et. al, 2009, Whose vote should

count more?)

  • Graphical model
  • Hidden variables

– true labels (Z) – labeler accuracy (α) – image difficulty (β)

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 41

slide-42
SLIDE 42

Learning Results, Fair

Sense Ann Recall Precision F measure Accuracy

GLAD Fair‐j, WN1 MASC 0.92 0.94 0.93 0.93 AMT 1.00 0.71 0.85 0.79 Both 1.00 0.74 0.87 0.82 SVM NA 1.00 0.65 0.82 0.72 GLAD Fair‐j, WN2 MASC 0.69 0.48 0.59 0.83 AMT 0.81 0.93 0.87 0.96 Both 0.81 0.93 0.87 0.96 SVM NA 0.60 0.33 0.46 0.68

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 42

slide-43
SLIDE 43

Learning Results, Long

Sense Annotators Recall Precision F measure Accuracy

GLAD Long‐j, WN1 MASC 0.88 0.84 0.86 0.84 AMT 1.00 0.98 0.99 0.99 Both 1.00 0.98 0.99 0.99 SVM 1.00 0.61 0.80 0.63 GLAD Long‐j, WN2 MASC 0.74 0.80 0.77 0.83 AMT 0.79 0.94 0.86 0.90 Both 0.95 0.97 0.96 0.97 SVM 0.85 0.83 0.84 0.66

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 43

slide-44
SLIDE 44

Summary of Learning Results

  • GLAD

– Performs better on 13 turkers than on 6 trained annotators, apart from fair‐adj,WN1; why? – Combining trained/untrained labels

  • Improvement for long‐adj,WN2
  • Degradation for fair‐adj,WN1
  • No improvement for long‐adj, WN1, fair‐adj,WN2

– No consistent pattern of results – No apparent correlations of instance difficulty with features; of annotator expertise with interannotator agreement

  • SVM

– Performance not quite as good as GLAD

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 44

slide-45
SLIDE 45

Outline

  • The word sense conundrum
  • The MASC Project
  • WordNet and sense annotation
  • MASC annotation rounds
  • Round 2: Multiple trained annotators
  • Interannotator agreement and beyond
  • Round 2: Mechanical turkers
  • Machine learning from labels versus features
  • Conclusion

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 45

slide-46
SLIDE 46

Resnik and Yarowsky Proposal

1999, Distinguishing Systems and Distinguishing Senses

  • Use cross‐entropy, KLD, or related measure to compare probability

distributions of system against senses

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 46

Senses of Interest System 1 System 2 System 3 System 4 monetary 0.47 0.85 0.28 1.00 stake or share 0.42 0.05 0.24 0.00 benefit, advantage 0.06 0.05 0.24 0.00 Intellectual curiosity 0.05 0.05 0.24 0.00 0.42 0.05 0.24 0.00

( | , )

S i i i

P cs w context

slide-47
SLIDE 47

Conclusion

  • Annotators can agree well above chance on a fine‐

grained sense inventory

  • Disagreement can be systematic

– Sense confusion – Subsets of annotators with different interpretations

  • Ground truth as a distribution over senses
  • Evaluation by comparison of sense distributions
  • Learning methods that take into account

– Distribution of sense probabilities – Features – Item difficulty – Annotator expertise – Sense difficulty

24 February 2011 Deutschen Gesellschaft für Sprachwissenschaft (DGfS) 47