Wordnet Ontology as a Wordnet Ontology as a Geographical - - PowerPoint PPT Presentation

wordnet ontology as a wordnet ontology as a geographical
SMART_READER_LITE
LIVE PREVIEW

Wordnet Ontology as a Wordnet Ontology as a Geographical - - PowerPoint PPT Presentation

Wordnet Ontology as a Wordnet Ontology as a Geographical Information Geographical Information Resource Resource Davide Buscaldi, Davide Buscaldi, Dpto. Sistemas Informticos y Dpto. Sistemas Informticos y (DSIC) Computacin (DSIC)


slide-1
SLIDE 1

Wordnet Ontology as a Wordnet Ontology as a Geographical Information Geographical Information Resource Resource

Davide Buscaldi, Davide Buscaldi,

  • Dpto. Sistemas Informáticos y
  • Dpto. Sistemas Informáticos y

Computación Computación

(DSIC)

(DSIC) Universidad Politécnica de Universidad Politécnica de Valencia Valencia

Valencia, Nov. 15th 2005 Valencia, Nov. 15th 2005

slide-2
SLIDE 2

Plan of the talk Plan of the talk

  • The Geographical Information Retrieval

The Geographical Information Retrieval task task

  • WordNet (in brief)

WordNet (in brief)

  • Exploiting WordNet:

Exploiting WordNet:

– – Query Expansion Query Expansion – – Index Terms Expansion Index Terms Expansion

  • Results

Results

  • Conclusions

Conclusions

slide-3
SLIDE 3

The Geographical Information The Geographical Information Retrieval Task Retrieval Task

  • Actually GIR is ambiguous:

Actually GIR is ambiguous:

– – (Geographic Information) Retrieval** (Geographic Information) Retrieval** – – Geographical (Information Retrieval)* Geographical (Information Retrieval)*

  • In this case:

In this case:

– – “ “Retrieval of information involving some kind of Retrieval of information involving some kind of spatial awareness spatial awareness” ”* (Fred Gey @ GeoCLEF 2005) * (Fred Gey @ GeoCLEF 2005) – – E.g. E.g. “ “Find news about Find news about riots riots in France. in France.” ”

  • Not to be confused with GIR as a particular

Not to be confused with GIR as a particular aspect of Spatial Information Retrieval** aspect of Spatial Information Retrieval**

– – E.g. E.g. “ “What is the What is the river river flowing through Paris? flowing through Paris?” ”

slide-4
SLIDE 4

Common GIR issues (1) Common GIR issues (1)

  • (Almost) The same Geographical Entity can

(Almost) The same Geographical Entity can be indicated in several different (and be indicated in several different (and sometimes ambiguous) manners: sometimes ambiguous) manners:

  • United Kingdom of Great

United Kingdom of Great Britain and Northern Britain and Northern Ireland Ireland

  • United Kingdom, UK, U.K.

United Kingdom, UK, U.K. + Ireland, Eire + Ireland, Eire

  • Great Britain, GB + Ireland

Great Britain, GB + Ireland

  • Reino Unido, Gran

Reino Unido, Gran Bretagna Bretagna

  • British Isles

British Isles

slide-5
SLIDE 5

Common GIR Issues (2) Common GIR Issues (2)

  • Missing

Missing explicit explicit geographical geographical information: information:

– – E.g., consider the following text: E.g., consider the following text:

“ “On Sunday mornings, the covered market opposite On Sunday mornings, the covered market opposite the station in the leafy suburb of the station in the leafy suburb of Aulnay-sous-Bois Aulnay-sous-Bois -

  • barely half an hour's drive from central

barely half an hour's drive from central Paris Paris - spills

  • spills
  • pulently on to the streets and boulevards.
  • pulently on to the streets and boulevards.”

Whereas the text is talking about events Whereas the text is talking about events in France, the GE in France, the GE France France itself is never itself is never mentioned. mentioned.

slide-6
SLIDE 6

The WordNet Ontology The WordNet Ontology

  • Lexical resource containing nouns, verbs,

Lexical resource containing nouns, verbs, adjectives and adverbs organized into adjectives and adverbs organized into synonym sets synonym sets (synsets (synsets) )

– – each synset represents one underlying lexical each synset represents one underlying lexical concept. concept. – – various relations link the synonym sets various relations link the synonym sets

  • Hypernymy (is-a relation)

Hypernymy (is-a relation)

  • Meronymy (has-part relation)

Meronymy (has-part relation)

  • Holonymy (part-of relation)

Holonymy (part-of relation)

  • Available at

Available at

– – http://wordnet.princeton.edu/perl/ http://wordnet.princeton.edu/perl/webwn webwn

slide-7
SLIDE 7

Geographical Conceptual Geographical Conceptual Networks in WordNet Networks in WordNet

UK England

  • N. Ireland

Scotland Wales British Isles Great Britain Ireland (Hibernia) Ireland (Eire) Holonym Meronym

slide-8
SLIDE 8

Exploiting WordNet Exploiting WordNet

  • WordNet can help in addressing most of GIR

WordNet can help in addressing most of GIR issues issues

  • Solve

Solve synonymy synonymy: :

– – E.g. synset corresponding to E.g. synset corresponding to “ “U.K. U.K.” ”: :

  • {United Kingdom, UK, U.K., Great Britain, GB, Britain,

{United Kingdom, UK, U.K., Great Britain, GB, Britain, United Kingdom of Great Britain and Northern Ireland} United Kingdom of Great Britain and Northern Ireland}

  • Find missing (geographical) information:

Find missing (geographical) information:

– – Meronymy ( Meronymy (“ “has member/part has member/part” ” relationship) relationship) – – Holonymy ( Holonymy (“ “is member/part of is member/part of” ”) )

  • Two solutions tested:

Two solutions tested:

– – Query Expansion (QE) Query Expansion (QE) – – Index Terms Expansion (ITE) Index Terms Expansion (ITE)

slide-9
SLIDE 9

Query Expansion Query Expansion

  • Expand the geographical terms of the

Expand the geographical terms of the query with their synonyms and (some) query with their synonyms and (some) meronyms meronyms

– – Geographical terms are identified through Geographical terms are identified through the WordNet ontology (words having the the WordNet ontology (words having the synset {region, location} among their synset {region, location} among their hypernyms hypernyms – – Meronyms containing the word Meronyms containing the word “ “capital capital” ” in in the definition ( the definition (gloss gloss) or in the meronym ) or in the meronym synset itself synset itself

slide-10
SLIDE 10

Query Expansion - Example Query Expansion - Example

“Foreign minorities in Germany Foreign minorities in Germany” ”

– – “ “Germany Germany” ” appears in the synset: appears in the synset: {Germany, Federal Republic of Germany, {Germany, Federal Republic of Germany, Deutschland, FRG} Deutschland, FRG} – – The following meronyms contain the word The following meronyms contain the word “ “capital capital” ”: :

  • Berlin, german

Berlin, german capital capital

  • Bonn (was the

Bonn (was the capital capital of Germany between

  • f Germany between

1949 and 1989) 1949 and 1989)

  • Munich, Muenchen (

Munich, Muenchen (capital capital of Bavaria)

  • f Bavaria)
  • Aachen, Aken, Aix-la-Chapelle (formerly

Aachen, Aken, Aix-la-Chapelle (formerly Charlemagne northern Charlemagne northern capital capital) )

slide-11
SLIDE 11

Index Terms Expansion Index Terms Expansion

  • Find geographical terms in the text collection

Find geographical terms in the text collection

– – openNLP

  • penNLP Named Entities detector

Named Entities detector ( (http://opennlp.sourceforge. http://opennlp.sourceforge.net net) )

  • Put all their holonyms and synonyms into a

Put all their holonyms and synonyms into a special special geo geo index index

– – Search Engine used: Lucene Search Engine used: Lucene ( (http://lucene.jakarta.org http://lucene.jakarta.org) )

  • Label geographical terms in the query with the

Label geographical terms in the query with the geo geo search field: search field:

– – E.g. E.g. “ “riots in France riots in France” ” -> text:riots geo:France

  • > text:riots geo:France
slide-12
SLIDE 12

Index Terms Expansion - Index Terms Expansion - Example Example

“ “On On Sunday mornings Sunday mornings, the , the covered market opposite covered market opposite the the station station in the in the leafy suburb leafy suburb of

  • f Aulnay-sous-Bois

Aulnay-sous-Bois -

  • barely

barely half an hour's half an hour's drive drive from from central central Paris Paris -

  • spills

spills

  • pulently
  • pulently on to the
  • n to the streets

streets and and boulevards boulevards. .” ” From WordNet: From WordNet:

  • Paris,

Paris, French capital French capital, , capital of France capital of France, , city of light city of light

  • France, French Republic

France, French Republic

  • Europe

Europe

  • Northern hemisphere

Northern hemisphere

  • To geographical index
  • To geographical index
  • To standard index
  • To standard index
slide-13
SLIDE 13

Experiment Setup Experiment Setup

  • GeoCLEF 2005 collection and queries

GeoCLEF 2005 collection and queries

– – Los Angeles Times 1994 Los Angeles Times 1994 – – Glasgow Herald 1995 Glasgow Herald 1995

“Topic Description Topic Description” ” runs: runs:

– – Typical TD from queries: Typical TD from queries:

“Shark attacks near California and Australia Shark attacks near California and Australia” ”

“Vegetable exporters of Europe Vegetable exporters of Europe” ”

“Holidays in the Scottish Trossachs Holidays in the Scottish Trossachs” ”

  • 1000 results returned for each query

1000 results returned for each query

slide-14
SLIDE 14

Results - Query Expansion Results - Query Expansion

0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 Recall levels Precision Clean System with QE

slide-15
SLIDE 15

QE - Error Analysis QE - Error Analysis

  • Why did it perform so bad?

Why did it perform so bad?

  • Two major errors:

Two major errors:

– – Inconsistent expansions Inconsistent expansions

  • E.g.

E.g. “ “Sacramento Sacramento” ” expanding expanding California California in the in the query: query: “ “Shark attacks in California Shark attacks in California” ”

– – Ambiguity Ambiguity

  • E.g.

E.g. “ “Europe Europe” ” in in “ “Vegetable exporters of Vegetable exporters of Europe Europe” ”

– – WordNet returns three senses for WordNet returns three senses for “ “Europe Europe” ”: : 1.

  • 1. Europe as continent

Europe as continent 2.

  • 2. Europe as the European Union

Europe as the European Union 3.

  • 3. Europe as the set of nations on the European

Europe as the set of nations on the European continent continent

slide-16
SLIDE 16

Results - Results - Index Terms Expansion Index Terms Expansion

0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 Recall levels Precision Clean System with ITE

slide-17
SLIDE 17

Conclusions Conclusions

  • ITE better than QE

ITE better than QE

– – Seems to be less sensitive to ambiguity problems Seems to be less sensitive to ambiguity problems – – However: it needs NE recognition during the indexing However: it needs NE recognition during the indexing phase (not trivial) phase (not trivial)

  • WordNet

WordNet can can be used as a Geographical be used as a Geographical Information Resource Information Resource

– – To be evaluated against a specialized resource like To be evaluated against a specialized resource like the TGN the TGN ( (http://www.getty.edu/research/conducting_research/ http://www.getty.edu/research/conducting_research/ vocabularies/tgn/ vocabularies/tgn/ ) )

slide-18
SLIDE 18

Thank you! Thank you! Grazie! Grazie! Gracias! Gracias! Dhanyavaad! (Hindi) Dhanyavaad! (Hindi) Manjuthe! (Telugu) Manjuthe! (Telugu) Shukria! (Urdu) Shukria! (Urdu)