Understanding Text with Knowledge-Bases and Random Walks Eneko - - PowerPoint PPT Presentation

understanding text with knowledge bases and random walks
SMART_READER_LITE
LIVE PREVIEW

Understanding Text with Knowledge-Bases and Random Walks Eneko - - PowerPoint PPT Presentation

Understanding Text with Knowledge-Bases and Random Walks Eneko Agirre ixa2.si.ehu.es/eneko IXA NLP Group University of the Basque Country MAVIR, 2011 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54 Random Walks on Large


slide-1
SLIDE 1

Understanding Text with Knowledge-Bases and Random Walks

Eneko Agirre ixa2.si.ehu.es/eneko

IXA NLP Group University of the Basque Country

MAVIR, 2011

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 1 / 54

slide-2
SLIDE 2

Random Walks on Large Graphs

WWW, PageRank and Google

source: http://opte.org Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54

slide-3
SLIDE 3

Random Walks on Large Graphs

WWW, PageRank and Google

source: http://opte.org Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 2 / 54

slide-4
SLIDE 4

Random Walks on Large Graphs

Linked Data

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

slide-5
SLIDE 5

Random Walks on Large Graphs

Wikipedia (DBpedia)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

slide-6
SLIDE 6

Random Walks on Large Graphs

WordNet

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

slide-7
SLIDE 7

Random Walks on Large Graphs

Unified Medical Language System

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

slide-8
SLIDE 8

Random Walks on Large Graphs

sources: http://sixdegrees.hu/ http://www2.research.att.com/˜yifanhu/ http://www.cise.ufl.edu/research/sparse/matrices/Gleich/ http://www.ebremer.com/ Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 3 / 54

slide-9
SLIDE 9

Text Understanding

Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

slide-10
SLIDE 10

Text Understanding

Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

slide-11
SLIDE 11

Text Understanding

Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

slide-12
SLIDE 12

Text Understanding

Understanding of broad language, what’s behind the surface strings Barcelona boss says that Jose Mourinho is ’the best coach in the world’ End systems that we would like to build:

natural dialogue, speech recognition, machine translation improving parsing, semantic role labeling, information retrieval, question answering

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 4 / 54

slide-13
SLIDE 13

Text Understanding

From string to semantic representation (First Order Logic) Barcelona coach praises Jose Mourinho. Exist e1, x1, x2, x3 such that FC Barcelona(x1) and coach:n:1(x2) and praise:v:2(e1,x2,x3) and Jos´ e Mourinho(x3) Disambiguation: Concepts, Entities and Semantic Roles Quantifiers, modality, negation, etc. Inference and Reasoning Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho . . . with respect to some Knowledge Base

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54

slide-14
SLIDE 14

Text Understanding

From string to semantic representation (First Order Logic) Barcelona coach praises Jose Mourinho. Exist e1, x1, x2, x3 such that FC Barcelona(x1) and coach:n:1(x2) and praise:v:2(e1,x2,x3) and Jos´ e Mourinho(x3) Disambiguation: Concepts, Entities and Semantic Roles Quantifiers, modality, negation, etc. Inference and Reasoning Barcelona coach praises Mourinho ∼ Guardiola honors Mourinho . . . with respect to some Knowledge Base

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 5 / 54

slide-15
SLIDE 15

Text Understanding: Knowledge Bases and Random Walks

Focus on the following tasks on inference and disambiguation: Map words in context to KB concepts (Word Sense Disambiguation) Similarity between concepts and words Similarity to improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.es/ukb/

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54

slide-16
SLIDE 16

Text Understanding: Knowledge Bases and Random Walks

Focus on the following tasks on inference and disambiguation: Map words in context to KB concepts (Word Sense Disambiguation) Similarity between concepts and words Similarity to improve ad-hoc information retrieval Applied to WordNet(s), UMLS, Wikipedia Excellent results Open source software and data: http://ixa2.si.ehu.es/ukb/

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 6 / 54

slide-17
SLIDE 17

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 7 / 54

slide-18
SLIDE 18

WordNet, PageRank and Personalized PageRank

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 8 / 54

slide-19
SLIDE 19

WordNet, PageRank and Personalized PageRank

Wordnet, Pagerank and Personalized PageRank (with Aitor Soroa)

WordNet is the most widely used hierarchically organized lexical database for English (Fellbaum, 1998) Broad coverage of nouns, verbs, adjectives, adverbs Main unit: synset (concept)

coach#1, manager#3, handler#2 someone in charge of training an athlete or a team.

Relations between concepts: synonymy (built-in), hyperonymy, antonymy, meronymy, entailment, derivation, gloss Closely linked versions in several languages

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 9 / 54

slide-20
SLIDE 20

WordNet, PageRank and Personalized PageRank

Wordnet

Example of hypernym relations: coach#1 trainer leader person

  • rganism

. . . entity synonyms: manager, handler gloss words (and synsets): charge, train (verb), athlete, team hyponyms: baseball coach, basketball coach, conditioner, football coach instance:John McGraw domain: sport, athletics derivation: coach (verb), managership, manage (verb), handle (verb)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 10 / 54

slide-21
SLIDE 21

WordNet, PageRank and Personalized PageRank

Wordnet

Representing WordNet as a graph: Nodes represent concepts Edges represent relations (undirected) In addition, directed edges from words to corresponding concepts (senses)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 11 / 54

slide-22
SLIDE 22

WordNet, PageRank and Personalized PageRank

Wordnet

coach#n1 managership#n3 sport#n1 trainer#n1 handle#v6 coach#n2 teacher#n1 tutorial#n1 coach#n5 public_transport#n1 fleet#n2 seat#n1 holonym holonym hyperonym domain derivation hyperonym derivation hyperonym derivation

coach

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 12 / 54

slide-23
SLIDE 23

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

Given a graph, ranks nodes according to their relative structural importance If an edge from ni to nj exists, a vote from ni to nj is produced

Strength depends on the rank of ni The more important ni is, the more strength its votes will have.

PageRank is more commonly viewed as the result of a random walk process

Rank of ni represents the probability of a random walk

  • ver the graph ending on ni, at a sufficiently large time.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 13 / 54

slide-24
SLIDE 24

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

G: graph with N nodes n1, . . . , nN di: outdegree of node i M: N × N matrix Mji =    1 di an edge from i to j exists

  • therwise

PageRank equation: Pr = cMPr + (1 − c)v surfer follows edges surfer randomly jumps to any node (teleport) c: damping factor: the way in which these two terms are combined

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

slide-25
SLIDE 25

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

G: graph with N nodes n1, . . . , nN di: outdegree of node i M: N × N matrix Mji =    1 di an edge from i to j exists

  • therwise

PageRank equation: Pr = cMPr + (1 − c)v surfer follows edges surfer randomly jumps to any node (teleport) c: damping factor: the way in which these two terms are combined

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

slide-26
SLIDE 26

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

G: graph with N nodes n1, . . . , nN di: outdegree of node i M: N × N matrix Mji =    1 di an edge from i to j exists

  • therwise

PageRank equation: Pr = cMPr + (1 − c)v surfer follows edges surfer randomly jumps to any node (teleport) c: damping factor: the way in which these two terms are combined

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

slide-27
SLIDE 27

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

G: graph with N nodes n1, . . . , nN di: outdegree of node i M: N × N matrix Mji =    1 di an edge from i to j exists

  • therwise

PageRank equation: Pr = cMPr + (1 − c)v surfer follows edges surfer randomly jumps to any node (teleport) c: damping factor: the way in which these two terms are combined

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

slide-28
SLIDE 28

WordNet, PageRank and Personalized PageRank

Random Walks: PageRank

G: graph with N nodes n1, . . . , nN di: outdegree of node i M: N × N matrix Mji =    1 di an edge from i to j exists

  • therwise

PageRank equation: Pr = cMPr + (1 − c)v surfer follows edges surfer randomly jumps to any node (teleport) c: damping factor: the way in which these two terms are combined

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 14 / 54

slide-29
SLIDE 29

WordNet, PageRank and Personalized PageRank

Random Walks: Personalized PageRank

Pr = cMPr + (1 − c)v PageRank: v is a stochastic normalized vector, with elements 1

N Equal probabilities to all nodes in case of random jumps

Personalized PageRank, non-uniform v (Haveliwala 2002)

Assign stronger probabilities to certain kinds of nodes Bias PageRank to prefer these nodes

For ex. if we concentrate all mass on node i

All random jumps return to ni Rank of i will be high High rank of i will make all the nodes in its vicinity also receive a high rank Importance of node i given by the initial v spreads along the graph

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

slide-30
SLIDE 30

WordNet, PageRank and Personalized PageRank

Random Walks: Personalized PageRank

Pr = cMPr + (1 − c)v PageRank: v is a stochastic normalized vector, with elements 1

N Equal probabilities to all nodes in case of random jumps

Personalized PageRank, non-uniform v (Haveliwala 2002)

Assign stronger probabilities to certain kinds of nodes Bias PageRank to prefer these nodes

For ex. if we concentrate all mass on node i

All random jumps return to ni Rank of i will be high High rank of i will make all the nodes in its vicinity also receive a high rank Importance of node i given by the initial v spreads along the graph

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

slide-31
SLIDE 31

WordNet, PageRank and Personalized PageRank

Random Walks: Personalized PageRank

Pr = cMPr + (1 − c)v PageRank: v is a stochastic normalized vector, with elements 1

N Equal probabilities to all nodes in case of random jumps

Personalized PageRank, non-uniform v (Haveliwala 2002)

Assign stronger probabilities to certain kinds of nodes Bias PageRank to prefer these nodes

For ex. if we concentrate all mass on node i

All random jumps return to ni Rank of i will be high High rank of i will make all the nodes in its vicinity also receive a high rank Importance of node i given by the initial v spreads along the graph

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 15 / 54

slide-32
SLIDE 32

Random walks for WSD

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 16 / 54

slide-33
SLIDE 33

Random walks for WSD

Word Sense Disambiguation (WSD)

Goal: determine senses of the open-class words in a text.

“Nadal is sharing a house with his uncle and coach, Toni.” “Our fleet comprises coaches from 35 to 58 seats.”

Knowledge Base (e.g. WordNet):

coach#1 someone in charge of training an athlete or a team. coach#2 a person who gives private instruction (as in singing, acting, etc.). coach#3 a railcar where passengers ride. coach#4 a carriage pulled by four horses with one driver. coach#5 a vehicle carrying many passengers; used for public transport.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 17 / 54

slide-34
SLIDE 34

Random walks for WSD

Word Sense Disambiguation (WSD)

Goal: determine senses of the open-class words in a text.

“Nadal is sharing a house with his uncle and coach, Toni.” “Our fleet comprises coaches from 35 to 58 seats.”

Knowledge Base (e.g. WordNet):

coach#1 someone in charge of training an athlete or a team. coach#2 a person who gives private instruction (as in singing, acting, etc.). coach#3 a railcar where passengers ride. coach#4 a carriage pulled by four horses with one driver. coach#5 a vehicle carrying many passengers; used for public transport.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 17 / 54

slide-35
SLIDE 35

Random walks for WSD

Word Sense Disambiguation (WSD)

Supervised corpus-based WSD performs best

Train classifiers on hand-tagged data (typically SemCor) Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples (25,20,2,1,0. . . ) Results decrease when train/test from different sources (even Brown, BNC) Decrease even more when train/test from different domains

Knowledge-based WSD

Uses information in a KB (WordNet) Performs close to but lower than Most Frequent Sense (MFS, supervised) Vocabulary coverage Relation coverage

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 18 / 54

slide-36
SLIDE 36

Random walks for WSD

Word Sense Disambiguation (WSD)

Supervised corpus-based WSD performs best

Train classifiers on hand-tagged data (typically SemCor) Data sparseness, e.g. coach 20 examples (20,0,0,0,0,0), bank 48 examples (25,20,2,1,0. . . ) Results decrease when train/test from different sources (even Brown, BNC) Decrease even more when train/test from different domains

Knowledge-based WSD

Uses information in a KB (WordNet) Performs close to but lower than Most Frequent Sense (MFS, supervised) Vocabulary coverage Relation coverage

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 18 / 54

slide-37
SLIDE 37

Random walks for WSD

Domain adaptation

Deploying NLP techniques in real applications is challenging, specially for WSD: Sense distributions change across domains Data sparseness hurts more Context overlap is reduced New senses, new terms

  • But. . .

Some words get less interpretations in domains: bank in finance, coach in sports

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 19 / 54

slide-38
SLIDE 38

Random walks for WSD

Domain adaptation

Deploying NLP techniques in real applications is challenging, specially for WSD: Sense distributions change across domains Data sparseness hurts more Context overlap is reduced New senses, new terms

  • But. . .

Some words get less interpretations in domains: bank in finance, coach in sports

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 19 / 54

slide-39
SLIDE 39

Random walks for WSD

Knowledge-based WSD using random walks (with Aitor Soroa, Oier Lopez de Lacalle)

Use information in WordNet for disambiguation:

“Our fleet comprises coaches from 35 to 58 seats.”

Traditional approach (Patwardhan et al. 2007):

Compare each target sense of coach with those of the words in the context (fleet, comprise, blue, seat) Using semantic relatedness between pairs of senses (6x4x3x8x9=5184) Alternative to comb. explosion, each word disambiguated individually (6x(4+3+8+9)=144)

sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . . sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . . . . .

Graph-based methods

Exploit the structural properties of the graph underlying WordNet Find globally optimal solutions Disambiguate large portions of text in one go Principled solution to combinatorial explosion

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 20 / 54

slide-40
SLIDE 40

Random walks for WSD

Knowledge-based WSD using random walks (with Aitor Soroa, Oier Lopez de Lacalle)

Use information in WordNet for disambiguation:

“Our fleet comprises coaches from 35 to 58 seats.”

Traditional approach (Patwardhan et al. 2007):

Compare each target sense of coach with those of the words in the context (fleet, comprise, blue, seat) Using semantic relatedness between pairs of senses (6x4x3x8x9=5184) Alternative to comb. explosion, each word disambiguated individually (6x(4+3+8+9)=144)

sim(coach#1,fleet#1) + sim(coach#1,fleet#2) + . . . sim(coach#1,seat#1) . . . sim(coach#2,fleet#1) + sim(coach#2,fleet#2) + . . . sim(coach#2,seat#1) . . . . . .

Graph-based methods

Exploit the structural properties of the graph underlying WordNet Find globally optimal solutions Disambiguate large portions of text in one go Principled solution to combinatorial explosion

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 20 / 54

slide-41
SLIDE 41

Random walks for WSD

Using PageRank for WSD

Given a graph representation of the LKB PageRank over the whole WordNet would get a context-independent ranking of word senses We would like:

Given an input text, disambiguate all open-class words in the input taking the rest as context

Two alternatives

1

Create a context-sensitive subgraph and apply PageRank over it (Navigli and Lapata, 2007; Agirre et al. 2008)

2

Use Personalized PageRank over the complete graph, initializing v with the context words

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 21 / 54

slide-42
SLIDE 42

Random walks for WSD

Using PageRank for WSD

Given a graph representation of the LKB PageRank over the whole WordNet would get a context-independent ranking of word senses We would like:

Given an input text, disambiguate all open-class words in the input taking the rest as context

Two alternatives

1

Create a context-sensitive subgraph and apply PageRank over it (Navigli and Lapata, 2007; Agirre et al. 2008)

2

Use Personalized PageRank over the complete graph, initializing v with the context words

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 21 / 54

slide-43
SLIDE 43

Random walks for WSD

Using Personalized PageRank (PPR)

For each word Wi, i = 1 . . . m in the context Initialize v with uniform probabilities over words Wi Context words act as source nodes injecting probability mass into the concept graph Run Personalized PageRank Choose highest ranking sense for target word

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 22 / 54

slide-44
SLIDE 44

Random walks for WSD

Using Personalized PageRank (PPR)

coach#n1 managership#n3 sport#n1 trainer#n1 handle#n8 coach#n2 teacher#n1 tutorial#n1 coach#n5 public_transport#n1 fleet#n2 seat#n1

coach fleet comprise ... seat

comprise#v1 ... Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 23 / 54

slide-45
SLIDE 45

Random walks for WSD

Experiment setting

Two datasets

Senseval 2 All Words (S2AW) Senseval 3 All Words (S3AW)

Both labelled with WordNet 1.7 tags Create input contexts of at least 20 words

Adding sentences immediately before and after if original too short

PageRank settings:

Damping factor (c): 0.85 End after 30 iterations

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 24 / 54

slide-46
SLIDE 46

Random walks for WSD

Results and comparison to related work (S2AW)

Senseval-2 All Words dataset System All N V Adj. Adv. Mih05 54.2 57.5 36.5 56.7 70.9 Sihna07 56.4 65.6 32.3 61.4 60.2 Tsatsa07 49.2 – – – – PPR 58.6 70.4 38.9 58.3 70.1 MFS 60.1 71.2 39.0 61.1 75.4 (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank. (Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for each PoS. Development over S3AW. (Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN, then spreading activation.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 25 / 54

slide-47
SLIDE 47

Random walks for WSD

Results and comparison to related work (S2AW)

Senseval-2 All Words dataset System All N V Adj. Adv. Mih05 54.2 57.5 36.5 56.7 70.9 Sihna07 56.4 65.6 32.3 61.4 60.2 Tsatsa07 49.2 – – – – PPR 58.6 70.4 38.9 58.3 70.1 MFS 60.1 71.2 39.0 61.1 75.4 (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank. (Sinha & Mihalcea, 2007) Several similarity measures, voting, fine-tuning for each PoS. Development over S3AW. (Tsatsaronis et al., 2007) Subgraph BFS over WordNet 1.7 and eXtended WN, then spreading activation.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 25 / 54

slide-48
SLIDE 48

Random walks for WSD

Comparison to related work (S3AW)

System All N V Adj. Adv. Mih05 52.2

  • Sihna07

52.4 60.5 40.6 54.1 100.0 Nav10

  • 61.9

36.1 62.8

  • PPR

57.4 64.1 46.9 62.6 92.9 MFS 62.3 69.3 53.6 63.7 92.9 (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank. (Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for each PoS. Development over S3AW. (Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary relations, several centrality algorithms.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 26 / 54

slide-49
SLIDE 49

Random walks for WSD

Comparison to related work (S3AW)

System All N V Adj. Adv. Mih05 52.2

  • Sihna07

52.4 60.5 40.6 54.1 100.0 Nav10

  • 61.9

36.1 62.8

  • PPR

57.4 64.1 46.9 62.6 92.9 MFS 62.3 69.3 53.6 63.7 92.9 (Mihalcea, 2005) Pairwise Lesk between senses, then PageRank. (Sinha & Mihalcea, 2007) Several simmilarity measures, voting, fine-tuning for each PoS. Development over S3AW. (Navigli & Lapata, 2010) Subgraph DFS(3) over WordNet 2.0 plus proprietary relations, several centrality algorithms.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 26 / 54

slide-50
SLIDE 50

Adapting WSD to domains

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 27 / 54

slide-51
SLIDE 51

Adapting WSD to domains

Adapting WSD to domains

How could we improve WSD performance without tagging new data from domain or adapting WordNet manually to the domain? What would happen if we apply PPR-based WSD to specific domains? Personalized PageRank over context

“. . . has never won a league title as coach but took Parma to

  • success. . . ”

Personalized PageRank over related words

Get related words from distributional thesaurus coach: manager, captain, player, team, striker, . . .

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

slide-52
SLIDE 52

Adapting WSD to domains

Adapting WSD to domains

How could we improve WSD performance without tagging new data from domain or adapting WordNet manually to the domain? What would happen if we apply PPR-based WSD to specific domains? Personalized PageRank over context

“. . . has never won a league title as coach but took Parma to

  • success. . . ”

Personalized PageRank over related words

Get related words from distributional thesaurus coach: manager, captain, player, team, striker, . . .

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

slide-53
SLIDE 53

Adapting WSD to domains

Adapting WSD to domains

How could we improve WSD performance without tagging new data from domain or adapting WordNet manually to the domain? What would happen if we apply PPR-based WSD to specific domains? Personalized PageRank over context

“. . . has never won a league title as coach but took Parma to

  • success. . . ”

Personalized PageRank over related words

Get related words from distributional thesaurus coach: manager, captain, player, team, striker, . . .

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 28 / 54

slide-54
SLIDE 54

Adapting WSD to domains

Experiments

Dataset with examples from BNC, Sports and Finance sections Reuters (Koeling et al. 2005)

41 nouns: salient in either domain or with senses linked to these domains Sense inventory: WordNet v. 1.7.1

300 examples for each of the 41 nouns

Roughly 100 examples from each word and corpus

Experiments

Supervised: train MFS, SVM, k-NN on SemCor examples Personalized PageRank (same damping factors, iterations)

Use context 50 related words (Koeling et al. 2005) (BNC, Sports, Finance)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 29 / 54

slide-55
SLIDE 55

Adapting WSD to domains

Experiments

Dataset with examples from BNC, Sports and Finance sections Reuters (Koeling et al. 2005)

41 nouns: salient in either domain or with senses linked to these domains Sense inventory: WordNet v. 1.7.1

300 examples for each of the 41 nouns

Roughly 100 examples from each word and corpus

Experiments

Supervised: train MFS, SVM, k-NN on SemCor examples Personalized PageRank (same damping factors, iterations)

Use context 50 related words (Koeling et al. 2005) (BNC, Sports, Finance)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 29 / 54

slide-56
SLIDE 56

Adapting WSD to domains

Results

Systems BNC Sports Finances Baselines Random

∗19.7 ∗19.2 ∗19.5

SemCor MFS

∗34.9 ∗19.6 ∗37.1

Supervised SVM

∗38.7 ∗25.3 ∗38.7

k-NN 42.8

∗30.3 ∗43.4

Context PPR 43.8

∗35.6 ∗46.9

Related PPR

∗37.7

51.5 59.3 words (Koeling et al. 2005)

∗40.7 ∗43.3 ∗49.7

Skyline Test MFS

∗52.0 ∗77.8 ∗82.3

Supervised (MFS, SVM, k-NN) very low (see test MFS) PPR on context: best for BNC (* for statistical significance) PPR on related words: best for Sports and Finance and improves over Koeling et al., who use pairwise WordNet similarity.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 30 / 54

slide-57
SLIDE 57

WSD on the biomedical domain

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 31 / 54

slide-58
SLIDE 58

WSD on the biomedical domain

UMLS and biomedical text (with Aitor Soroa and Mark Stevenson)

Ambiguities believed not to occur on specific domains

On the Use of Cold Water as a Powerful Remedial Agent in Chronic Disease. Intranasal ipratropium bromide for the common cold.

11.7% of the phrases in abstracts added to MEDLINE in 1998 were ambiguous (Weeber et al. 2011) Unified Medical Language System (UMLS) Metathesaurus Concept Unique Identifiers (CUIs)

C0234192: Cold (Cold Sensation) [Physiologic Function] C0009264: Cold (cold temperature) [Natural Phenomenon or Process] C0009443: Cold (Common Cold) [Disease or Syndrome]

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 32 / 54

slide-59
SLIDE 59

WSD on the biomedical domain

UMLS and biomedical text (with Aitor Soroa and Mark Stevenson)

Ambiguities believed not to occur on specific domains

On the Use of Cold Water as a Powerful Remedial Agent in Chronic Disease. Intranasal ipratropium bromide for the common cold.

11.7% of the phrases in abstracts added to MEDLINE in 1998 were ambiguous (Weeber et al. 2011) Unified Medical Language System (UMLS) Metathesaurus Concept Unique Identifiers (CUIs)

C0234192: Cold (Cold Sensation) [Physiologic Function] C0009264: Cold (cold temperature) [Natural Phenomenon or Process] C0009443: Cold (Common Cold) [Disease or Syndrome]

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 32 / 54

slide-60
SLIDE 60

WSD on the biomedical domain

WSD and biomedical text

Thesaurus in Metathesaurus:

Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMED Clinical Terms, etc.

Relations in the Metathesaurus between CUIs:

parent, can be qualified by, related possibly sinonymous, related other

We applied random walks over a graph of CUIs. Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each) KB #CUIs #relations Acc. Terms AOD 15,901 58,998 51.5 4 MSH 278,297 1,098,547 44.7 9 CSP 16,703 73,200 60.2 3 SNOMEDCT 304,443 1,237,571 62.5 29 all above 572,105 2,433,324 64.4 48 all relations

  • 5,352,190

68.1 50 combined with cooc.

  • 73.7

50 (Jimeno and Aronson, 2011)

  • 68.4

50

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 33 / 54

slide-61
SLIDE 61

WSD on the biomedical domain

WSD and biomedical text

Thesaurus in Metathesaurus:

Alcohol and other drugs, Medical Subject Headings, Crisp Thesaurus, SNOMED Clinical Terms, etc.

Relations in the Metathesaurus between CUIs:

parent, can be qualified by, related possibly sinonymous, related other

We applied random walks over a graph of CUIs. Evaluated on NLM-WSD, 50 ambiguous terms (100 instances each) KB #CUIs #relations Acc. Terms AOD 15,901 58,998 51.5 4 MSH 278,297 1,098,547 44.7 9 CSP 16,703 73,200 60.2 3 SNOMEDCT 304,443 1,237,571 62.5 29 all above 572,105 2,433,324 64.4 48 all relations

  • 5,352,190

68.1 50 combined with cooc.

  • 73.7

50 (Jimeno and Aronson, 2011)

  • 68.4

50

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 33 / 54

slide-62
SLIDE 62

Random walks for similarity

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 34 / 54

slide-63
SLIDE 63

Random walks for similarity

Similarity

Given two words or multiword-expressions, estimate how similar they are. cord smile gem jewel magician

  • racle

Features shared, belonging to the same class

Relatedness is a more general relationship, including other relations like topical relatedness or meronymy. king cabbage movie star journey voyage Typically implemented as calculating a numeric value of similarity/relatedness.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

slide-64
SLIDE 64

Random walks for similarity

Similarity

Given two words or multiword-expressions, estimate how similar they are. cord smile gem jewel magician

  • racle

Features shared, belonging to the same class

Relatedness is a more general relationship, including other relations like topical relatedness or meronymy. king cabbage movie star journey voyage Typically implemented as calculating a numeric value of similarity/relatedness.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

slide-65
SLIDE 65

Random walks for similarity

Similarity

Given two words or multiword-expressions, estimate how similar they are. cord smile gem jewel magician

  • racle

Features shared, belonging to the same class

Relatedness is a more general relationship, including other relations like topical relatedness or meronymy. king cabbage movie star journey voyage Typically implemented as calculating a numeric value of similarity/relatedness.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 35 / 54

slide-66
SLIDE 66

Random walks for similarity

Similarity and WSD

gem jewel movie star Both WSD and Similarity are closely intertwined: Similarity between words based on similarity between senses (implicitly doing disambiguation) WSD uses similarity between senses in context

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 36 / 54

slide-67
SLIDE 67

Random walks for similarity

Similarity examples

RG dataset WordSim353 dataset cord smile 0.02 king cabbage 0.23 rooster voyage 0.04 professor cucumber 0.31 . . . . . . glass jewel 1.78 investigation effort 4.59 magician

  • racle

1.82 movie star 7.38 . . . . . . cemetery graveyard 3.88 journey voyage 9.29 automobile car 3.92 midday noon 9.29 midday noon 3.94 tiger tiger 10.00 80 pairs, 51 subjects 353 pairs, 16 subjects Similarity Similarity and relatedness

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 37 / 54

slide-68
SLIDE 68

Random walks for similarity

Similarity

Two main approaches:

Knowledge-based (Roget’s Thesaurus, WordNet, etc.) Corpus-based, also known as distributional similarity (co-occurrences)

Many potential applications:

Overcome brittleness (word match) NLP subtasks (parsing, semantic role labeling) Information retrieval Question answering Summarization Machine translation optimization and evaluation Inference (textual entailment)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 38 / 54

slide-69
SLIDE 69

Random walks for similarity

Similarity

Two main approaches:

Knowledge-based (Roget’s Thesaurus, WordNet, etc.) Corpus-based, also known as distributional similarity (co-occurrences)

Many potential applications:

Overcome brittleness (word match) NLP subtasks (parsing, semantic role labeling) Information retrieval Question answering Summarization Machine translation optimization and evaluation Inference (textual entailment)

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 38 / 54

slide-70
SLIDE 70

Random walks for similarity

Random walks for similarity (with Aitor Soroa, Montse Cuadros, German Rigau)

Based on (Hughes and Ramage, 2007) Given a pair of words (w1, w2),

Initialize teleport probability mass on w1 Run Personalized Pagerank, obtaining w1 Initialize w2 and obtain w2 Measure similarity between w1 and w2 (e.g. cosine)

Experiment settings:

Damping value c = 0.85 Calculations finish after 30 iterations

Variations for Knowledge Base:

WordNet 3.0 WordNet relations Gloss relations

  • ther relations

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 39 / 54

slide-71
SLIDE 71

Random walks for similarity

Random walks for similarity (with Aitor Soroa, Montse Cuadros, German Rigau)

Based on (Hughes and Ramage, 2007) Given a pair of words (w1, w2),

Initialize teleport probability mass on w1 Run Personalized Pagerank, obtaining w1 Initialize w2 and obtain w2 Measure similarity between w1 and w2 (e.g. cosine)

Experiment settings:

Damping value c = 0.85 Calculations finish after 30 iterations

Variations for Knowledge Base:

WordNet 3.0 WordNet relations Gloss relations

  • ther relations

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 39 / 54

slide-72
SLIDE 72

Random walks for similarity

Dataset and results

Method Source Spearman (Gabrilovich and Markovitch, 2007) Wikipedia 0.75 WordNet 3.0 + Knownets WordNet 0.71 WordNet 3.0 + glosses WordNet 0.68 (Agirre et al. 2009) Corpora 0.66 (Finkelstein et al. 2007) LSA 0.56 (Hughes and Ramage, 2007) WordNet 0.55 (Jarmasz 2003) WordNet 0.35

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 40 / 54

slide-73
SLIDE 73

Similarity and Information Retrieval

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 41 / 54

slide-74
SLIDE 74

Similarity and Information Retrieval

Similarity and Information Retrieval (with Arantxa Otegi and Xabier Arregi)

Document expansion (aka clustering and smoothing) has been shown to be successful in ad-hoc IR Use WordNet and similarity to expand documents Example:

I can’t install DSL because of the antivirus program, any hints? You should turn off virus and anti-spy software. And thats done within each

  • f the softwares themselves. Then turn them back on later after setting up

any DSL softwares.

Method:

Initialize random walk with document words Retrieve top k synsets Introduce words on those k synsets in a secondary index When retrieving, use both primary and secondary indexes

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 42 / 54

slide-75
SLIDE 75

Similarity and Information Retrieval

Example

You should turn off virus and anti-spy software. And thats done within each of the softwares themselves. Then turn them back on later after setting up any DSL softwares.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 43 / 54

slide-76
SLIDE 76

Similarity and Information Retrieval

Example

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 44 / 54

slide-77
SLIDE 77

Similarity and Information Retrieval

Example

I can’t install DSL because of the antivirus program, any hints?

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 45 / 54

slide-78
SLIDE 78

Similarity and Information Retrieval

Experiments

BM25 ranking function Combine 2 indexes: original words and expansion terms Parameters: k1, b (BM25) λ (indices) k (concepts in expansion) Three collections:

Robust at CLEF 2009 Yahoo Answer! RespubliQA (IR for QA)

Summary of results:

Default parameters: 1.43% - 4.90% improvement in all 3 datasets Optimized parameters: 0.98% - 2.20% improvement in 2 datasets Carrying parameters: 5.77% - 19.77% improvement in 4 out of 6

Robustness Particularly on short documents

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

slide-79
SLIDE 79

Similarity and Information Retrieval

Experiments

BM25 ranking function Combine 2 indexes: original words and expansion terms Parameters: k1, b (BM25) λ (indices) k (concepts in expansion) Three collections:

Robust at CLEF 2009 Yahoo Answer! RespubliQA (IR for QA)

Summary of results:

Default parameters: 1.43% - 4.90% improvement in all 3 datasets Optimized parameters: 0.98% - 2.20% improvement in 2 datasets Carrying parameters: 5.77% - 19.77% improvement in 4 out of 6

Robustness Particularly on short documents

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

slide-80
SLIDE 80

Similarity and Information Retrieval

Experiments

BM25 ranking function Combine 2 indexes: original words and expansion terms Parameters: k1, b (BM25) λ (indices) k (concepts in expansion) Three collections:

Robust at CLEF 2009 Yahoo Answer! RespubliQA (IR for QA)

Summary of results:

Default parameters: 1.43% - 4.90% improvement in all 3 datasets Optimized parameters: 0.98% - 2.20% improvement in 2 datasets Carrying parameters: 5.77% - 19.77% improvement in 4 out of 6

Robustness Particularly on short documents

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 46 / 54

slide-81
SLIDE 81

Conclusions

Outline

1

WordNet, PageRank and Personalized PageRank

2

Random walks for WSD

3

Adapting WSD to domains

4

WSD on the biomedical domain

5

Random walks for similarity

6

Similarity and Information Retrieval

7

Conclusions

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 47 / 54

slide-82
SLIDE 82

Conclusions

Conclusions

Knowledge-based method for similarity and WSD Based on random walks Exploits whole structure of underlying KB efficiently Performance:

Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESA WSD: Best KB algorithm S2AW, S3AW, Domains datasets WSD and domains:

Better than supervised WSD when adapting to domains (Sports, Finance) Best KB algorithm in Biomedical texts

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 48 / 54

slide-83
SLIDE 83

Conclusions

Conclusions

Knowledge-based method for similarity and WSD Based on random walks Exploits whole structure of underlying KB efficiently Performance:

Similarity: best KB algorithm, comparable with 1.6 Tword, slightly below ESA WSD: Best KB algorithm S2AW, S3AW, Domains datasets WSD and domains:

Better than supervised WSD when adapting to domains (Sports, Finance) Best KB algorithm in Biomedical texts

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 48 / 54

slide-84
SLIDE 84

Conclusions

Conclusions

Applications: ad-hoc Information Retrieval

performance gains and robustness

Easily ported to other languages

Provides cross-lingual similarity Only requirement of having a WordNet

Publicly available at http://ixa2.si.ehu.es/ukb

Both programs and data (WordNet, UMLS) Including program to construct graphs from new KB (e.g. Wikipedia) GPL license, open source, free

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

slide-85
SLIDE 85

Conclusions

Conclusions

Applications: ad-hoc Information Retrieval

performance gains and robustness

Easily ported to other languages

Provides cross-lingual similarity Only requirement of having a WordNet

Publicly available at http://ixa2.si.ehu.es/ukb

Both programs and data (WordNet, UMLS) Including program to construct graphs from new KB (e.g. Wikipedia) GPL license, open source, free

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

slide-86
SLIDE 86

Conclusions

Conclusions

Applications: ad-hoc Information Retrieval

performance gains and robustness

Easily ported to other languages

Provides cross-lingual similarity Only requirement of having a WordNet

Publicly available at http://ixa2.si.ehu.es/ukb

Both programs and data (WordNet, UMLS) Including program to construct graphs from new KB (e.g. Wikipedia) GPL license, open source, free

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 49 / 54

slide-87
SLIDE 87

Conclusions

Future work

Domains and WSD: interrelation of domain classification and WSD Named Entity Disambiguation using Wikipedia Information Retrieval: other options to combine similarity information Moving to sentence similarity: Semeval task on Semantic Text Similarity http://www.cs.york.ac.uk/semeval-2012/task6/

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 50 / 54

slide-88
SLIDE 88

Conclusions

Understanding Text with Knowledge-Bases and Random Walks

Eneko Agirre ixa2.si.ehu.es/eneko

IXA NLP Group University of the Basque Country

MAVIR, 2011

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 51 / 54

slide-89
SLIDE 89

Conclusions

References I

Agirre, E., Arregi, X. and Otegi, A. (2010). Document Expansion Based on WordNet for Robust IR. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling) pp. 9–17,. Agirre, E., de Lacalle, O. L. and Soroa, A. (2009). Knowledge-Based WSD on Specific Domains: Performing better than Generic Supervised WSD. In Proceedings of IJCAI, Pasadena, USA. Agirre, E. and Soroa, A. (2009). Personalizing PageRank for Word Sense Disambiguation. In Proceedings of EACL-09, Athens, Greece.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 52 / 54

slide-90
SLIDE 90

Conclusions

References II

Agirre, E., Soroa, A., Alfonseca, E., Hall, K., Kravalova, J. and Pasca, M. (2009). A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. In Proceedings of annual meeting of the North American Chapter of the Association of Computational Linguistics (NAAC), Boulder, USA. Agirre, E., Soroa, A. and Stevenson, M. (2010). Graph-based Word Sense Disambiguation of Biomedical Documents. Bioinformatics 26, 2889–2896. Eneko Agirre, Montse Cuadros, G. R. and Soroa, A. (2010). Exploring Knowledge Bases for Similarity. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), (Nicoletta Calzolari (Conference Chair), Khalid Choukri, B. M. J. M. J. O. S. P . M. R. D. T., ed.), pp. 373–377, European Language Resources Association (ELRA), Valletta, Malta.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 53 / 54

slide-91
SLIDE 91

Conclusions

References III

Otegi, A., Arregi, X. and Agirre, E. (2011). Query Expansion for IR using Knowledge-Based Relatedness. In Proceedings of the International Joint Conference on Natural Language Processing. Stevenson, M., Agirre, E. and Soroa, A. (2011). Exploiting Domain Information for Word Sense Disambiguation of Medical Documents. Journal of the American Medical Informatics Association In press, 1–6. Yeh, E., Ramage, D., Manning, C., Agirre, E. and Soroa, A. (2009). WikiWalk: Random walks on Wikipedia for Semantic Relatedness. In ACL workshop ”TextGraphs-4: Graph-based Methods for Natural Language Processing.

Agirre (UBC) Knowledge-Bases and Random Walks MAVIR 2011 54 / 54