WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn - - PowerPoint PPT Presentation
WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn - - PowerPoint PPT Presentation
WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn Seminar: Language Processing for differentdomains andgenres Outline DomainSpecific WordSenseDisambiguation Idea AutomaticMethod
WordSenseDisambiguation LisaBeinborn 2
Outline
DomainSpecific WordSenseDisambiguation
♦ Idea
AutomaticMethod
♦ Mc Carthy etal.2004
Evaluation
WordSenseDisambiguation LisaBeinborn 3
Problem
Wordscan have differentsenses Star
Celestial body Shape Celebrity
WordSenseDisambiguation LisaBeinborn 4
Basesolutions
1)Use supervised machine learning with SemCor
♦ SemCor = subset ofBrownCorpus ♦ Openclass words are sensetagged
2)Takemost frequent sense
♦ Skewed sense distributions Problem:not enough data
WordSenseDisambiguation LisaBeinborn 5
Ideas
Onesense prevails inagiven discourse Mostfrequent sense often depends ondomain Nodomainspecific sensetagged corpora
available
Automatically induce predominant sense
WordSenseDisambiguation LisaBeinborn 6
AutomaticMethod [McCarthyetal2004]
Get senses si for word wfrom sense
inventory
WordSenseDisambiguation LisaBeinborn 7
Get senses si for word wfrom WordNet Rankthem
♦ depends ontraining corpus
AutomaticMethod [McCarthyetal2004]
WordSenseDisambiguation LisaBeinborn 8
Distributional Similarity
Consider knearest neighbours
♦ Wordsthat appear inthe same context ♦ The star revealed… ♦ The actor revealed…
Build thesaurus with k=50 “nearest“≈ distributional similarity score (dss)
WordSenseDisambiguation LisaBeinborn 9
Contribution ofneighbours
Differentneighbours share differentsenses
with word
♦ actor
celebrity
♦ planet
celestial body
♦ circle
shape
How can these relationsbe inferred?
WordSenseDisambiguation LisaBeinborn 10
Semantic Similarity
sss‘= semantic similarity score
♦ Closeness oftwo senses
Foreach neighbour n
♦ Get senses sx ♦ Calculate sss‘(si,sx) ♦ sss(si,n)= max sss‘
Neighbours:{actor,planet,…} sx(actor):{role player,worker…} sss‘(celebrity,role player)=0.7
WordSenseDisambiguation LisaBeinborn 11
Semantic Similarity
sss‘= semantic similarity score
♦ Closeness oftwo senses
Foreach neighbour n
♦ Get senses sx ♦ Calculate sss‘(si,sx) ♦ sss(si,n)= max sss‘
Neighbours:{actor,planet,…} sx(actor):{role player,worker…} sss‘(celebrity,role player)=0.7 sss‘(celebrity,worker)=0.5 sss(celebrity,actor)= 0.7
WordSenseDisambiguation LisaBeinborn 12
Prevalence Score
PrevalenceScore w , si
n j
dss w ,n j sss si ,n j normalization
Ranksthe senses
- fword w
50nearest neighbours Scored neighbours Weighted: normalized semantic similarity ofsense andneighbour
WordSenseDisambiguation LisaBeinborn 13
Prevalence Score
PrevalenceScore w , si
n j
dss w ,n j sss si ,n j normalization
Contribution of neighbour nj tosense si Contribution ofall neighbours tosense si
WordSenseDisambiguation LisaBeinborn 14
Evaluation
Senserankings for asample ofnouns Corpora
♦ BNC ♦ Finance ♦ Sports
WordSenseDisambiguation LisaBeinborn 15
WordSelection F&S
Only polysemous nouns Atleastone synset (WN)labeled with sports Atleastone synset labeled with economics Examples:
♦ F&S F&S F&S F&S (17):manager,record,score,check,return, competition,club,…
Manualsense annotation
WordSenseDisambiguation LisaBeinborn 16
SenseDistribution
WordSenseDisambiguation LisaBeinborn 17
Additionalsets
Selected based onsalience
♦ most salient words indomain ♦ Salience computed by frequency
Sets
♦ S S S Ssal sal sal sal (8):fan,star,transfer,striker,goal,title,… ♦ F F F Fsal sal sal sal (8):package,chip,bank,market,strike,… ♦ eq eq eq eq sal sal sal sal (7):will,phase,half,top,performance,…
WordSenseDisambiguation LisaBeinborn 18
SenseDistribution
Evenindomainspecific corpora,ambiguity is
stillpresent,though it is less than for general text
The domain specific sense is not always the
predominant sense inadomainspecific corpus
♦ but more frequent than ingeneral corpus
WordSenseDisambiguation LisaBeinborn 19
Example
Return= atennis stroke
♦ Notthe most frequent sense inSPORTS ♦ Frequency = 19 ♦ AbsentinFINANCEandBNC
WordSenseDisambiguation LisaBeinborn 20
Results
When applied tocorresponding domain,
McCarthyetal.2004 method beats random baseline andSemCor FSinallcases
WordSenseDisambiguation LisaBeinborn 21
Results
APPR= training onappropriate domain SC= SemCor
WordSenseDisambiguation LisaBeinborn 22
Results
Trainingonappropriate domain makes sense for allwords Assumption:salient words benefit more
WordSenseDisambiguation LisaBeinborn 23
Conclusions
Automaticacquisition ofpredominant senses
from domainspecific corpora outperforms the automatic acquisition from SemCor for the sample words
But:stillanapproximation,lotsof
problematic cases
Better:Use local context for disambiguation
WordSenseDisambiguation LisaBeinborn 24
Conclusions
Automaticmethod is cheaper
Use method if there is nomanually tagged data available or if the data seems tobe inappropriate for the word anddomain
WordSenseDisambiguation LisaBeinborn 25
Questions?
WordSenseDisambiguation LisaBeinborn 26
Thank you!
WordSenseDisambiguation LisaBeinborn 27
References
DianaMcCarthy,RobKoeling,Julie Weeds andJohn
Carroll,2004.Finding PredominantWord Senses in Untagged Text, Proceedings ofACL04,Barcelona,Spain.
RobKoeling,DianaMcCarthyandJohnCarroll,2005.
DomainSpecific SenseDistributions andPredominant Sense Acquisition, EMNLP 2005
DianaMcCarthy,RobKoeling,JulieWeeds andJohn
Carroll,2007.Unsupervised Acquisition ofPredominant WordSenses,Computational Linguistics 33(4), pp.553 590.
WordSenseDisambiguation LisaBeinborn 28
References
Distributional Similarity
JulieWeeds,2003.Measures andApplications ofLexical Distributional Similarity.Ph.D.thesis,Departmentof Informatics,UniversityofSussex,Brighton,UK.
Semantic Similarity