WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn - - PowerPoint PPT Presentation

word sense disambiguation
SMART_READER_LITE
LIVE PREVIEW

WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn - - PowerPoint PPT Presentation

WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn Seminar: Language Processing for differentdomains andgenres Outline DomainSpecific WordSenseDisambiguation Idea AutomaticMethod


slide-1
SLIDE 1

WordSenseDisambiguation

Predominant SenseAquisition

LisaBeinborn Seminar: Language Processing for differentdomains andgenres

slide-2
SLIDE 2

WordSenseDisambiguation LisaBeinborn 2

Outline

DomainSpecific WordSenseDisambiguation

♦ Idea

AutomaticMethod

♦ Mc Carthy etal.2004

Evaluation

slide-3
SLIDE 3

WordSenseDisambiguation LisaBeinborn 3

Problem

Wordscan have differentsenses Star

Celestial body Shape Celebrity

slide-4
SLIDE 4

WordSenseDisambiguation LisaBeinborn 4

Basesolutions

1)Use supervised machine learning with SemCor

♦ SemCor = subset ofBrownCorpus ♦ Openclass words are sensetagged

2)Takemost frequent sense

♦ Skewed sense distributions Problem:not enough data

slide-5
SLIDE 5

WordSenseDisambiguation LisaBeinborn 5

Ideas

Onesense prevails inagiven discourse Mostfrequent sense often depends ondomain Nodomainspecific sensetagged corpora

available

Automatically induce predominant sense

slide-6
SLIDE 6

WordSenseDisambiguation LisaBeinborn 6

AutomaticMethod [McCarthyetal2004]

Get senses si for word wfrom sense

inventory

slide-7
SLIDE 7

WordSenseDisambiguation LisaBeinborn 7

Get senses si for word wfrom WordNet Rankthem

♦ depends ontraining corpus

AutomaticMethod [McCarthyetal2004]

slide-8
SLIDE 8

WordSenseDisambiguation LisaBeinborn 8

Distributional Similarity

Consider knearest neighbours

♦ Wordsthat appear inthe same context ♦ The star revealed… ♦ The actor revealed…

Build thesaurus with k=50 “nearest“≈ distributional similarity score (dss)

slide-9
SLIDE 9

WordSenseDisambiguation LisaBeinborn 9

Contribution ofneighbours

Differentneighbours share differentsenses

with word

♦ actor

celebrity

♦ planet

celestial body

♦ circle

shape

How can these relationsbe inferred?

slide-10
SLIDE 10

WordSenseDisambiguation LisaBeinborn 10

Semantic Similarity

sss‘= semantic similarity score

♦ Closeness oftwo senses

Foreach neighbour n

♦ Get senses sx ♦ Calculate sss‘(si,sx) ♦ sss(si,n)= max sss‘

Neighbours:{actor,planet,…} sx(actor):{role player,worker…} sss‘(celebrity,role player)=0.7

slide-11
SLIDE 11

WordSenseDisambiguation LisaBeinborn 11

Semantic Similarity

sss‘= semantic similarity score

♦ Closeness oftwo senses

Foreach neighbour n

♦ Get senses sx ♦ Calculate sss‘(si,sx) ♦ sss(si,n)= max sss‘

Neighbours:{actor,planet,…} sx(actor):{role player,worker…} sss‘(celebrity,role player)=0.7 sss‘(celebrity,worker)=0.5 sss(celebrity,actor)= 0.7

slide-12
SLIDE 12

WordSenseDisambiguation LisaBeinborn 12

Prevalence Score

PrevalenceScore w , si

n j

dss w ,n j sss si ,n j normalization

Ranksthe senses

  • fword w

50nearest neighbours Scored neighbours Weighted: normalized semantic similarity ofsense andneighbour

slide-13
SLIDE 13

WordSenseDisambiguation LisaBeinborn 13

Prevalence Score

PrevalenceScore w , si

n j

dss w ,n j sss si ,n j normalization

Contribution of neighbour nj tosense si Contribution ofall neighbours tosense si

slide-14
SLIDE 14

WordSenseDisambiguation LisaBeinborn 14

Evaluation

Senserankings for asample ofnouns Corpora

♦ BNC ♦ Finance ♦ Sports

slide-15
SLIDE 15

WordSenseDisambiguation LisaBeinborn 15

WordSelection F&S

Only polysemous nouns Atleastone synset (WN)labeled with sports Atleastone synset labeled with economics Examples:

♦ F&S F&S F&S F&S (17):manager,record,score,check,return, competition,club,…

Manualsense annotation

slide-16
SLIDE 16

WordSenseDisambiguation LisaBeinborn 16

SenseDistribution

slide-17
SLIDE 17

WordSenseDisambiguation LisaBeinborn 17

Additionalsets

Selected based onsalience

♦ most salient words indomain ♦ Salience computed by frequency

Sets

♦ S S S Ssal sal sal sal (8):fan,star,transfer,striker,goal,title,… ♦ F F F Fsal sal sal sal (8):package,chip,bank,market,strike,… ♦ eq eq eq eq sal sal sal sal (7):will,phase,half,top,performance,…

slide-18
SLIDE 18

WordSenseDisambiguation LisaBeinborn 18

SenseDistribution

Evenindomainspecific corpora,ambiguity is

stillpresent,though it is less than for general text

The domain specific sense is not always the

predominant sense inadomainspecific corpus

♦ but more frequent than ingeneral corpus

slide-19
SLIDE 19

WordSenseDisambiguation LisaBeinborn 19

Example

Return= atennis stroke

♦ Notthe most frequent sense inSPORTS ♦ Frequency = 19 ♦ AbsentinFINANCEandBNC

slide-20
SLIDE 20

WordSenseDisambiguation LisaBeinborn 20

Results

When applied tocorresponding domain,

McCarthyetal.2004 method beats random baseline andSemCor FSinallcases

slide-21
SLIDE 21

WordSenseDisambiguation LisaBeinborn 21

Results

APPR= training onappropriate domain SC= SemCor

slide-22
SLIDE 22

WordSenseDisambiguation LisaBeinborn 22

Results

Trainingonappropriate domain makes sense for allwords Assumption:salient words benefit more

slide-23
SLIDE 23

WordSenseDisambiguation LisaBeinborn 23

Conclusions

Automaticacquisition ofpredominant senses

from domainspecific corpora outperforms the automatic acquisition from SemCor for the sample words

But:stillanapproximation,lotsof

problematic cases

Better:Use local context for disambiguation

slide-24
SLIDE 24

WordSenseDisambiguation LisaBeinborn 24

Conclusions

Automaticmethod is cheaper

Use method if there is nomanually tagged data available or if the data seems tobe inappropriate for the word anddomain

slide-25
SLIDE 25

WordSenseDisambiguation LisaBeinborn 25

Questions?

slide-26
SLIDE 26

WordSenseDisambiguation LisaBeinborn 26

Thank you!

slide-27
SLIDE 27

WordSenseDisambiguation LisaBeinborn 27

References

DianaMcCarthy,RobKoeling,Julie Weeds andJohn

Carroll,2004.Finding PredominantWord Senses in Untagged Text, Proceedings ofACL04,Barcelona,Spain.

RobKoeling,DianaMcCarthyandJohnCarroll,2005.

DomainSpecific SenseDistributions andPredominant Sense Acquisition, EMNLP 2005

DianaMcCarthy,RobKoeling,JulieWeeds andJohn

Carroll,2007.Unsupervised Acquisition ofPredominant WordSenses,Computational Linguistics 33(4), pp.553 590.

slide-28
SLIDE 28

WordSenseDisambiguation LisaBeinborn 28

References

Distributional Similarity

JulieWeeds,2003.Measures andApplications ofLexical Distributional Similarity.Ph.D.thesis,Departmentof Informatics,UniversityofSussex,Brighton,UK.

Semantic Similarity

Siddharth Patwardhan,Satanjeev Banerjee andTedPedersen. 2003.Using measures ofsemantic relatedness for word sense disambiguation.InProceedings ofCICLing 2003,pp.241– 257,Mexico City,Mexico.