Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes - - PowerPoint PPT Presentation

using umls cuis for wsd in the biomedical domain
SMART_READER_LITE
LIVE PREVIEW

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes - - PowerPoint PPT Presentation

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes Ted Pedersen and John Carlis University of Minnesota Twin Cities and University of Minnesota Duluth 09/11/07 1 What is WSD? The culture count doubled. Culture


slide-1
SLIDE 1

1 09/11/07

Using UMLS CUIs for WSD in the Biomedical Domain

Bridget T. McInnes¹ Ted Pedersen² and John Carlis¹

University of Minnesota Twin Cities¹ and University of Minnesota Duluth²

slide-2
SLIDE 2

2 09/11/07

What is WSD?

The culture count doubled. Culture Laboratory Culture Anthropological Culture

Sense Inventory

slide-3
SLIDE 3

3 09/11/07

Sense Inventory: UMLS

Unified Medical Language System contains a list of Concept Unique Identifiers (CUIs) which are concepts (senses) associated with a word

  • r term

Culture Laboratory Culture (C0430400) Anthropological Culture (C0010453)

Sense Inventory: UMLS

slide-4
SLIDE 4

4 09/11/07

UMLS: Semantic Network

framework encoded with different semantic and syntactic structures Anthropological Culture (C0010453)

Semantic Type(s): Idea or Concept Semantic Type(s): Laboratory Procedure Semantic Type: Mental Process semantic relation: assesses_effect_of semantic relation: result_of

Laboratory Culture (C0430400)

slide-5
SLIDE 5

5 09/11/07

MetaMap

Concept mapping system

maps text to concepts in the UMLS provides a wealth of information for all words in a document

phrasal information Part of speech (POS) of a word CUI of a word Semantic types of a word

slide-6
SLIDE 6

6 09/11/07

Example

The culture count doubled

count

CUI: Count (C0750480) semantic type: Idea or Concept (idcn) pos: noun

doubled

CUI: Duplicate (C0205173) semantic type: Functional Concept (ftcn) pos: verb

slide-7
SLIDE 7

7 09/11/07

Supervised Approaches

Leroy and Rindflesch 2005

Semantic types, semantic relations, part-

  • f-speech, and head information (from

MetaMap)

Joshi, Pedersen and Maclin 2005

unigrams

in the same sentence as the ambiguous word in the same abstract as the ambiguous word

Liu, Teller and Friedman 2004

unigrams, direction and orientation of unigrams and collocations

slide-8
SLIDE 8

8 09/11/07

Questions

slide-9
SLIDE 9

9 09/11/07

Questions

Would UMLS CUIs be an improvement

  • ver semantic types?
slide-10
SLIDE 10

10 09/11/07

Questions

Would UMLS CUIs be an improvement

  • ver semantic types?

Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?

slide-11
SLIDE 11

11 09/11/07

Questions

Would UMLS CUIs be an improvement

  • ver semantic types?

Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams? Would increasing the context window in which surrounding CUIs are found improve the results?

slide-12
SLIDE 12

12 09/11/07

Our supervised approach

Algorithm:

Naïve Bayes from WEKA datamining package using 10 fold cross validation

Features:

UMLS CUIs obtained from MetaMap

that occur in the same sentence as the ambiguous word more than one time (s-1-cui) that occur in the same abstract as the ambiguous word more than one time (a-1-cui)

slide-13
SLIDE 13

13 09/11/07

Example

... The culture count doubled. The cells multiplied by twice the expected rate ...

C0750480 Count (2) C0205173 Duplicate (1) ... C0750480 Count (2) C0205173 Duplicate (3) C0007634 Cells (4) C1517001 Expected (1) C1521828 Rate (3) ...

Sentence: Abstract:

slide-14
SLIDE 14

14 09/11/07

Example Instances Extract Relevant CUIs Training Data Test Data

Algorithm

Naïve Bayes Algorithm Sense Tagged Test Data

slide-15
SLIDE 15

15 09/11/07

Dataset

National Library of Medicine's Word Sense Disambiguation (NLM-WSD) Dataset

50 words from the 1998 MEDLINE abstracts 100 instances for each of the 50 words Each instance has been tagged by MetaMap The target word was manually assigned a UMLS concept

  • r None

Average number of concepts per ambiguous word is 2.26 (not including None)

slide-16
SLIDE 16

16 09/11/07

Data subsets

Liu subset

Liu, Teller and Friedman 2004 22 out of the 50 words in NLM-WSD

Leroy subset

Leroy and Rindflesch 2005 15 out of the 50 words in NLM-WSD

Joshi subset

Joshi, Pedersen and Maclin 2005 28 out of the 50 words in NLM-WSD

(union of Leroy and Liu subsets)

slide-17
SLIDE 17

17

Results

slide-18
SLIDE 18

18 09/11/07

Results for Question 1

Would CUIs be an improvement over semantic types?

slide-19
SLIDE 19

19 09/11/07

Comparative results with Leroy and Rindflesch 2005

s-1-cui a-1-cui s-0-Leroy 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

Accuracy using Leroy subset

71% 74.5% 65.6%

slide-20
SLIDE 20

20 09/11/07

Significance of Differences

Pairwise t-test

s-1-cui (71%) and s-0-Leroy (65.6%)

p <= 0.001

a-1-cui (74.5%) and s-0-Leroy (65.6%)

p <= .00005

slide-21
SLIDE 21

21 09/11/07

Results for Question 2

Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams?

slide-22
SLIDE 22

22 09/11/07

Comparative results with Joshi, Pedersen and Maclin 2005

s-1-cui a-1-cui s-4-Joshi a-4-Joshi 10 20 30 40 50 60 70 80 90

Accuracy using Joshi subset

77.7% 80% 82.5% 79.3%

slide-23
SLIDE 23

23 09/11/07

Significance of Results

Pairwise t-test

s-1-cui (77.7%) and s-4-Joshi (79.3%)

p < 0.135

a-1-cui (80.0%) and a-4-Joshi (82.5%)

p < 0.003

slide-24
SLIDE 24

24 09/11/07

Results for Question 3

Would increasing the size of the context window in which surrounding CUIs are found improve the results, as seen by Joshi, Pedersen and Maclin using unigrams?

slide-25
SLIDE 25

25 09/11/07

Comparative results between size of context window

s-1-cui a-1-cui 10 20 30 40 50 60 70 80

Accuracy using NLM-WSD dataset

83.3% 85.6%

slide-26
SLIDE 26

26 09/11/07

Significance of Results

Pairwise t-test

s-1-cui (83.3%) and a-1-cui (85.6%)

p < 0.0006

slide-27
SLIDE 27

27 09/11/07

Comparative results with Liu, Teller and Friedman 2004

a-1-cui s-0-Liu 10 20 30 40 50 60 70 80 90

Accuracy using the Liu subset

81.9% 85.5%

slide-28
SLIDE 28

28 09/11/07

Significance of Results

Pairwise t-test

a-1-cui (81.9%) and s-1-Liu (85.5%)

p < 0.001

slide-29
SLIDE 29

29 09/11/07

Conclusions

CUIs result in more accurate disambiguation than semantic types and are comparable to unigrams Incorporating more surrounding context improves the results MetaMap generates useful information that can used as features for supervised disambiguation

slide-30
SLIDE 30

30 09/11/07

Future Work

Combination approach Exploring additional UMLS features Unsupervised approach using information from the UMLS

slide-31
SLIDE 31

31 09/11/07

Software and Data

CuiTools version 0.05

http://cuitools.sourceforge.net

NLM-WSD Dataset

http://wsd.nlm.nih.gov

Pairwise t-test

http://www.quantitativeskills.com/sisa/stati stics/