A SEMANTIC UNSUPERVISED LEARNING APPROACH TO WORD SENSE - - PowerPoint PPT Presentation

a semantic unsupervised learning approach to word sense
SMART_READER_LITE
LIVE PREVIEW

A SEMANTIC UNSUPERVISED LEARNING APPROACH TO WORD SENSE - - PowerPoint PPT Presentation

A SEMANTIC UNSUPERVISED LEARNING APPROACH TO WORD SENSE DISAMBIGUATION Dissertation Presentation April 4, 2018 Dian I. Martin Presenta tati tion Overview Background LSA-WSD Approach Word Importance in a Sentence


slide-1
SLIDE 1

A SEMANTIC UNSUPERVISED LEARNING APPROACH TO WORD SENSE DISAMBIGUATION

Dissertation Presentation April 4, 2018 Dian I. Martin

slide-2
SLIDE 2

Presenta tati tion Overview

■ Background ■ LSA-WSD Approach ■ Word Importance in a Sentence ■ Automatic Word Sense Induction ■ Automatic Word Sense Disambiguation ■ Future Research

slide-3
SLIDE 3

THE PROBLEM

WORD SENSE DISAMBIGUATION (WSD): WHICH SENSE OF A WORD IS BEING USED IN A GIVEN CONTEXT?

Mowing the lawn was a hard task for the little boy. The boxer threw a hard left to the chin of his opponent.

slide-4
SLIDE 4

WSD

Multiple Meanings = Different Word Senses All Word Senses = Word Definition

slide-5
SLIDE 5

Tw Two WSD Tasks

Se Sense Di Disco covery

Determine all the senses for a target word, word A.

Se Sense Id Identifica cation

Determine which sense a target word, word A, is being used in a particular context.

slide-6
SLIDE 6

WS WSD Approaches

A Priori Knowledge

■ Dictionary-based or Knowledge- based methods ■ Supervised methods ■ Minimally supervised methods

No A Priori Knowledge

■ Unsupervised methods

slide-7
SLIDE 7

WS WSD Applications

To name a few … ■ Any NLP application ■ Information retrieval ■ Text mining ■ Information Extraction ■ Lexicography ■ Ed Educ ucat atio ional nal ap applic icat atio ions ns ■ Ana Analysis is of the learning ning system

slide-8
SLIDE 8

LSA-WSD APPROACH

An unsupervised algorithm for automated WSD

slide-9
SLIDE 9

La Latent Se t Semantic tic A Analysis is

Unsupervised Learning Algorithm

■ Represents a cognitive model ■ Mimics human learning ■ Many applications where LSA-based learning system (LS) has simulated human knowledge – Essay grading – Interactive auto-tutors – Synonym tests – Text comprehension – Summarization feedback

slide-10
SLIDE 10

Co Compo positiona nality Co y Cons nstra raint nt

The meaning of a document is the sum

  • f the meaning of the

terms that it contains. The meaning of a term is defined by all the contexts in which it does and does not appear.

slide-11
SLIDE 11

LSA LSA-Ba Based ed Lea earni ning ng Sys System em

slide-12
SLIDE 12

Lat Latent ent S Sem emant antic ic A Anal nalysis sis (L (LSA) A)

■ Text => Term x Document (TD) matrix ■ TD matrix => Weighted TD matrix ■ Weighted TD matrix => Singular Value Decomposition (SVD) ■ SVD => Term vectors and Document vectors ■ Term vectors => Projections ■ Vector comparisons => Semantic Similarity

slide-13
SLIDE 13

LS LSA-WS WSD Ap Approa

  • ach:

Se Sense Di Disco covery

Semantic Mean Clustering (SMC) Sentence clustering (sentclusters) Synonym clustering (synclusters)

slide-14
SLIDE 14

LSA LSA-WS WSD Approach: Sen Sense se Iden enti tificati ation

For given target word and particular context:

■ Map sentence or context into LSA semantic space ■ Determine closest cluster ■ Closest cluster identifies the sense

slide-15
SLIDE 15

Doc Document Col

  • llection
  • ns

Do Docum ument ent Set # # Documents # # Sentences # # Unique Words Grade Level A 150K 162777 1955690 141252 Grade Level B 150K 162845 1958077 141774 Grade Level A 200K 209365 2503308 162295 Grade Level B 200K 209423 2503697 162308 Grade Level Unique A 200K 196261 2309345 164940 Grade Level Unique B 200K 196262 2306918 164975 Grade Level A 250K 259847 3099118 182492 Grade Level B 250K 260059 3097901 182311 News A 200K 200000 2782399 254236 News B 200K 200000 2781141 255640

slide-16
SLIDE 16

WORD IMPORTANCE IN A SENTENCE

Finding adequate contexts to use in sentence clustering for deriving senses for a target word.

slide-17
SLIDE 17

Wo Word Importance 3 3 Quest uestions ns

■ Does sentence length have an impact on the importance of a word in a sentence? ■ Are there specific words that never contribute or always contribute to the meaning of a sentence? ■ How often do sentences have important words, ones that contribute notably to the meaning of the sentence?

slide-18
SLIDE 18

Co Cosine sine Im Impac act Va Value (C (CIV)

Determine impact of a word on the meaning of a sentence:

  • Project the sentences with and

without target word into the LSA semantic space

  • Compute cosine similarity between

them (CIV) CIV has inverse relationship with impact of a word on the meaning of a sentence

slide-19
SLIDE 19

Co Cosine sine Im Impac act V Val alue ues Cal s Calcul ulat ated

To identify a general indicator of word importance, consider:

■ Sentences of lengths two or greater ■ Sentences of lengths 2 to 19 for the grade level document set ■ Sentences of lengths 10 to 32 for the news document set ■ Each word in each of these sentences ■ Each of the 234,568,429

234,568,429 CIVs

slide-20
SLIDE 20

Ef Effect o t of Se Sentence Le Length th o

  • n

Wo Word Importance

slide-21
SLIDE 21

Di Distribution

  • n of
  • f CIVs for
  • r Sentence

Le Length th T Ten

slide-22
SLIDE 22

Di Distri ribut ution o n of CIV CIVs f for Di r Differe rent nt S Sent ntence nce Len Lengths for a Documen ent Collec ection

slide-23
SLIDE 23

Wo Word Characteristics for Wo Word Im Impo porta tance in in a a Se Sentence

slide-24
SLIDE 24

Ap Appeara rance of

  • f Impor
  • rtant Wor
  • rds

ds in Se Sentences

slide-25
SLIDE 25

Wo Word Importance Observations

■ CIV of 0.90 determines individual importance for a word on the meaning of a sentence ■ Few words in a corpus, less than 7%, are important to one or more sentences in which they appear ■ Words that are always important to the meaning of the sentences in which they are appear are nouns ■ Majority of sentences do contain at least one important word ■ Sentences of length four or less generally contain all important words ■ As sentence length increases, individual word importance decreases ■ Corpus size and content did not have an effect on word importance measures

slide-26
SLIDE 26

WORD SENSE INDUCTION

Step 1 in LSA-WSD approach: The automatic discovery of the possible word senses for a given word.

slide-27
SLIDE 27

Cr Crea eating ng the he Lea earni ning ng Sys ystem em (L (LS)

■ Precursor to Word Sense Induction (WSI) ■ WSI dependent on the knowledge contained in LS ■ Just as humans determination of senses is different so will senses of WSI systems ■ LSA-based LS beneficial for deriving senses indicative a particular learner or domain ■ Used two document collections of 200K documents from each source in WSI experiments

slide-28
SLIDE 28

Clus Cluster ering ng Exp xpect ectations ns

■ Items would be evenly distributed across individual clusters ■ Outliers an anomaly – obscure sense or noise? ■ Singleton clusters not desirable ■ All items in one cluster – one sense discovered or multi-sense?

slide-29
SLIDE 29

Ta Target Words

bank interest pretty batch keep raise build line sentence capital masterpiece serve enjoy monkey turkey hard palm work

slide-30
SLIDE 30

Se Sense D Dis iscovery with with Se Sentc tclusters

WSI Experiments using sentclustering (cluster sentences with SMC) for a target word:

  • 1. All sentences vs. important word set
  • 2. Determining appropriate clusters
  • 3. Larger grade level LS
  • 4. Different source for LS and sentences
  • 5. Augmented sentence vector
  • 6. Sentence with target word removed

Problem: Multi-sense cluster

slide-31
SLIDE 31

Se Senses Induced using g Se Sentclusters fo for the Target Word bank bank

WS WSC # # # in Clu Cluster Ex Exampl ple se sentences 1 1 Bits of broken shell lie on the sunny bank. 2 2 The bank was held up. The bank held Arncaster’s mortgage. 3 1 She retrieved the shopping bags and hurried to the bottle bank. 4 1 They walked from bank to bank. 5 74 The Brickster was a bank robber. In the bank, Mark goes up to a teller. In my bank, one quarter goes CLANK. “My piggy bank,” Slither said. There’s one hiding in the bushes on the bank. She does a perfect cannonball from the mossy bank. Sunny squinted, searching her memory bank.

slide-32
SLIDE 32

Se Sense D Dis iscovery with with Sy Synclusters

■ Examine meaning of target word by examining words close to it within the LSA-based learning system ■ Embedded in the term vector is all the senses of the term ■ Separate senses by clustering synonyms based on cosine similarity ■ Top k terms closest to target word are clustered by SMC ■ Closest word to centroid of word sense clusters (WSC) is the identifier for the cluster

slide-33
SLIDE 33

To Top 100 Closest Words to the Ta Target Word ba bank nk

Te Terms 1-12 12 13 13-24 24 25 25-36 36 37 37-48 48 49 49-60 60 61 61-72 72 73 73-84 84 85 85-96 96 97 97-10 100

ba bank cu current bo boatmen dep deposit mo monongahe hela ri riffles wa wading sa sandbars um uminpeachable ba banks ra raft ca canoe lo loan pa paddle sn snags ri riverside po portage po potomac do downstrea eam tr tributar tary st steamboat wi willows ws re reeds mo money na narm rmada bi bills ma marshy ri riverb rbank nk ba barge fo footbridge na nashua ua ca cash sh shallows rh rhadamnanthus sw swift sp spanned up upstre ream st steamboats fl flood ri riverb rbed fe ferryman cr creek co cocy cytus sa sawmills ri river mu muddy fe ferrymen ba barges bo boatman co conononka ra radarscope pa padding ra rapids eddi eddies es da dammed ed pa paddled ri riverb rbank nks sa savings in insecttortured mi mississippi do downriver er bl bluffs bo bottomlands tr tributar taries da dams fl flowin ing sh shallow da damming da dam le levee sa sandbar th tham ames ra rafts bo bottomland wa waterfall me meander up upri river go gorge ge fl flatboats mi midstream he headwaters cr creeks ks wa waded mu murky br bridge fl flatboat ro robb ca canal si silt wa watercourse

  • v
  • verhanging

pl platte fl flowed be bend st stream co countercu currents po poling po poled cr crossing ri riverb rboat

slide-34
SLIDE 34

WS WSCs Discovered Using Sy Synclusters fo for the Target Word bank bank

WS WSC # # in WSC WS WSC De Descr criptor Ne Next closest words Co Cosine be between ba bank an and WS WSC ce centroid WSC 1 93 downstream river, rapids, upstream, riverbank 0.78 WSC 2 6 money bills, cash, savings, loan 0.51 WS WSC # # in WSC WS WSC De Descr criptor Ne Next closest words Co Cosine be between ba bank an and WS WSC ce centroid WSC 1 88 banks banking, deposits, bankers, lending 0.78 WSC 2 9 rates interest, reserve, mortgage, discount 0.36 WSC 3 1 finance 0.21 WSC 4 1 manages 0.21

Gr Grade Level LS Ne News LS

slide-35
SLIDE 35

WS WSCs Discovered Using Sy Synclusters

■ Target word: sentence ■ Target word: raise

Gr Grade Level LS => 1 1 WSC => => spelling ing Ne News LS => 1 WSC => => pris ison

Gr Grade Level Learning Sy System Ne News Learning g Sy System

WSC WSC La Label # # in WSC Co Cosine to Ce Centroid WSC WSC La Label # # in WSC Co Cosine to Ce Centroid

money 71 0.57 increases 11 0.58 raised 2 0.55 funds 4 0.50 crops 6 0.50 tax 26 0.48 support 6 0.37 interest 4 0.38

slide-36
SLIDE 36

Sy Synclusters

■ Produced reasonable results ■ Candidate WSCs should have cosine similarity between centroid and target word > 0.35 ■ Can be applied to any word in document collection ■ Allows for user refinement of candidate WSCs ■ Two learning systems not equal in their representation of knowledge

slide-37
SLIDE 37

Us User In Inpu put t t to Sy Synclusterin ing

WSI Software Input:

  • n # (# synonyms)
  • l # (cluster

inclusion threshold)

New Input:

  • k # (cluster

threshold min)

  • m # (min #

in WSC) Saved WSCs => Induced Senses Examine WSCs => Done or Refine?

slide-38
SLIDE 38

WORD SENSE IDENTIFICATION

Step 2 in LSA-WSD approach: Determine in which sense a target word is used in a particular context.

slide-39
SLIDE 39

Tw Two Methods to Determine Correct Se Sense for a Se Sentence

Synonym Replacement (SR) Method

SentA = Original Sentence For each WSC derived from synclustering: 1. SentB = Original Sentence with target word replaced with WSC identifier 2. Project SentA and SentB into the LSA semantic space 3. Compute cosine between projections Determine highest cosine similarity and corresponding cluster is the identified word sense

Context Comparison (CC) Method

SentA = Original Sentence Remove target word from SentA Project SentA into the LSA semantic space For each WSC derived from synclustering: Compute cosine between projection and centroid vector Determine highest cosine similarity and corresponding cluster is the identified word sense

slide-40
SLIDE 40

Te Test Sentences for WSD Task for the Wo Word line line

An Annotat ated WS WSC L Lab abel Se Sentences s Usi sing lin line in in t this is S Sense zo zone ne (line marked on a field or court)

Ja Jackie stepped to the he line and dropped in both h foul sho hots. Ji Jim plowed forward to stop the he quarterback from reachi hing the he goal line.

as assonan ance ce (line of poetry)

The The pattern of stressed and unstressed syllables discernible in a line of po poetry y has been n ana nalyze yzed in n order to determine ne whether the line ne follows an n ia iambic ic or a dacty tylic ic or an anapestic tic metr tric ical arrangement. t. Ea Each h stanza ha has eight ht lines.

ba bait (line on a fishing rod)

He He reeled in the he line and bent the he pole. He He cast out hi his line.

ho horizontal (mathematical term for a line

  • r lines in particular directions)

The The curved line represents the he variation of voltage in the he signal. Dr Draw a horizontal line e above e the e ver ertical line. e.

ah ahead ad (line marking the starting point or finishing point in a race)

Ma Matthew dashed across the finish line. I c I crossed t the f finish l line, j , jogged t to a a s stop, a , and k kneeled o

  • n t

the c cinders, , br breat eathing deep eeply.

Di Differ erent ent sens ense

The The workers would build the hem on a moving assembly line.

Am Ambig iguous s sense

Ho Hold the he line a minute, Diane.

slide-41
SLIDE 41

Nu Number of Times Word Senses Co Correctly Id Ident ntif ifie ied in Co in Cont ntext

slide-42
SLIDE 42

Out Outcomes o es of A Aut utomat atic ic Disa Disambig biguatio tion

■ CC method performed the best at identifying the sense of a target word in a given context ■ CC method identified the correct sense 84% of the time ■ Cosine similarity measure produced by the CC method appears to provide information about the confidence of the sense identification

slide-43
SLIDE 43

CONCLUSION AND FUTURE RESEARCH

The LSA-WSD approach to automated WSI and WSD is a SUCCESS! What next?

slide-44
SLIDE 44

Wo Word Importance in a Sentence

Concl Conclus usions ions

■ Importance of a word to the meaning of a sentence can be computed by the CIV ■ Words with CIV less than 0.90 are primary contributors to meaning of sentence ■ Most words had a small impact on meaning of a sentence ■ Majority of the sentences in the corpora contained at least one important word ■ All words contribute to the meaning of sentence (compositionality constraint) ■ Corpus size and content did not have an

  • bservable effect on word importance

Fu Future research

■ First of its kind using the LSA-based LS ■ Can be extended to apply sentence importance within a document, or sub- part of text ■ Apply word importance to other applications For example: Educational settings

slide-45
SLIDE 45

Automated Word Sense Inducti tion

Concl Conclus usions ions

■ Sentclustering was able to discover multiple senses for a target word, but there was an existence of a multi- sense cluster ■ Synclustering produced reasonable results for sense discovery – Candidate WSCs should have a cosine similarity between centroid and target word > 0.35 – Grade level LS produced more broad-based results – Senses can be induced for any word – Can be a semi-supervised system

Fu Future Research

■ Refine sentclustering: – Use human annotated sentences – Secondary processing of multi- sense clusters ■ More experimentation using synclustering: – Induce senses for more words – Define the number of synonyms to use and the optimal cluster inclusion threshold

slide-46
SLIDE 46

Au Auto tomated Word Sense Disambiguation

Concl Conclus usions ions

■ Two methods were considered ■ The CC method produced more accurate results, identifying the correct sense for a target word within context sentences 84% of the time ■ Cosine similarity measure gives a degree of confidence in the CC method

Fu Future Research

■ Other methods can be tried ■ More sentences need to be tested for further validation and generalization of results

slide-47
SLIDE 47

Br Broader der Impli licati tions ns of This Re Research

Ev Evaluation of

  • f the LS

LSA-WSD system can ■ Indicate how well a LS knows the senses of a word ■ Help to define the body of knowledge and use of language captured in the LS ■ Guide the creation of a LS for use in different applications

Au Automation of the Tasks

LSA-WSD system can ■ Produce a viable unsupervised, automated system for both WSI and WSD tasks ■ Be adapted to different languages or updated when language changes ■ Be used for different applications

slide-48
SLIDE 48

THANK YOU!

Questions?

slide-49
SLIDE 49

WS WSD History

■ 1940s: Distinct task in machine translation ■ 1950s: Turing’s revelation ■ 1960s: Bar-Hillel’s assessment, progress stalled ■ 1970s: Automated WSD in artificial intelligence (AI) approaches ■ 1980s: Turning point for WSD: word experts, Lesk algorithm, polarized words ■ 1990s: Development of WordNet, Senseval workshop, statistical revolution ■ 2000s; Development of many different approaches

slide-50
SLIDE 50

Lo Local al and and Gl Global bal W Weig eight hting ing

■ Local functions: Normalize term frequency (tf) within each context ■ Global functions: Normalize the global term frequency (gf) across all contexts ■ Generally log-entropy applied within LSA-based learning system ■ Log (log (𝑢𝑔 + 1)) => approximates the simple growth of standard learning ■ Entropy ((1 + ∑

,-. /012 ,-. 3452 (6)

  • , where 𝑞>? =

AB-. 5B- , for term 𝑗 and document 𝑘)) =>

estimates the degree to which observing a term indicates the context in which it appears

slide-51
SLIDE 51

Si Singu gular Value Decomposition

slide-52
SLIDE 52

Doc Document Overlap ap Rat atio

slide-53
SLIDE 53

Se Sentence Le Length th C Compa paris isons Be Betw tween een Co Corpora