the university of lisbon at geoclef 2007
play

The University of Lisbon at GeoCLEF 2007 Nuno Cardoso, David Cruz, - PowerPoint PPT Presentation

The University of Lisbon at GeoCLEF 2007 Nuno Cardoso, David Cruz, Marcirio Chaves and Mrio J. Silva {ncardoso, dcruz, mchaves, mjs}@xldb.di.fc.ul.pt In 2006... Results revealed some limitations: assigning one single geographic concept as


  1. The University of Lisbon at GeoCLEF 2007 Nuno Cardoso, David Cruz, Marcirio Chaves and Mário J. Silva {ncardoso, dcruz, mchaves, mjs}@xldb.di.fc.ul.pt

  2. In 2006... Results revealed some limitations: ● assigning one single geographic concept as a scope to each document limited the geo-ranking. ● some topics were not handled properly (e.g., “ diamond trade in Angola and South Africa” ). ● classic IR approach still prevails! The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  3. For 2007: ● Challenge: outperform classic IR. ● Generation of geographic signatures for both queries and documents. ● Geographic query expansion focused on features , feature types and spatial relationships . ● Geographic ranking improvements. The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  4. XLDB's 2007 GIR system Queries: QueOnde what QuerCol Query Query GeoCLEF spatial relat. QE terms + topics Parser expansion where Query signature Geographic Ontology Geo Faísca Document Index Text Mining signatures Sidra5 Sidra5 GeoCLEF GeoCLEF Ranking documents Indexing runs Term Index The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  5. XLDB's 2007 GIR system 1. Query Processing Queries: QueOnde what QuerCol Query Query GeoCLEF spatial relat. QE terms + topics Parser expansion where Query signature Geographic Ontology • Splits into < what , spat.rel ., where > Geo Faísca Document • Recognizes features & feature types Index Text Mining signatures Sidra5 Sidra5 GeoCLEF GeoCLEF Ranking documents Indexing runs Term Index Example: Sea traffic in Portuguese islands = Sea traffic in Portugal islands The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  6. XLDB's 2007 GIR system 1. Query Processing Queries: QueOnde what QuerCol Query Query GeoCLEF spatial relat. QE terms + topics Parser expansion where Query signature Geographic Ontology 1. Term expansion: Example: Blind Relevance Feedback Portugal islands Geo 2. Geographic expansion: Faísca Document Index Text Mining signatures Sidra5 Sidra5 GeoCLEF GeoCLEF based on query type, driven by Ranking documents Indexing runs Term spatial rel., features & feat. types Index Madeira, Porto Santo, São Miguel, Faial, Pico, S. Jorge, Terceira, etc... The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  7. XLDB's 2007 GIR system 2. Text Mining LA072694-0011: Searching documents for Queries: QueOnde what QuerCol 5668[1.00]; geographic evidence in Query Query GeoCLEF spatial relat. 2230[0.33]; < feat + feat types > and topics Parser expansion where 4555[0.33]; Query signature < feat types $ feat > patterns. 4556[0.33]; Geographic Ex: Lisbon Airport 4557[0.33] Ontology Airport of Lisbon LA072694-0012: 5388[1.00]; Geo 5389[1.00]; Faísca Document Index Text Mining signatures Sidra5 Sidra5 5390[1.00]; GeoCLEF GeoCLEF Ranking documents Indexing runs Term 12097[1.00]; Index 6653[0.67] Faísca generates document signatures: ID ConfMeas The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  8. XLDB's 2007 GIR system 3. Geographic Ranking Sidra5: Based on MG4J QueOnde what QuerCol Queries: • Generates Term & Geo index. Query Query GeoCLEF spatial relat. QE terms Parser topics expansion • Uses NormBM25 for text weight where + Query signature Geographic Ontology Geo Faísca Document Index Text Mining Sidra5 Sidra5 GeoCLEF signatures GeoCLEF Ranking documents Indexing runs Term Index • For geographic weight... how to measure geographic relevance between query signatures and doc signatures? The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  9. XLDB's 2007 GIR system 4. Geographic Reasoning Queries: QueOnde what QuerCol Query Query GeoCLEF spatial relat. topics Parser expansion where Query signature Geographic Ontology • Query parsing patterns Geo Faísca Document Index Text Mining signatures Sidra5 Sidra5 GeoCLEF GeoCLEF Ranking documents • Text mining patterns Indexing runs Term Index • Geographic query expansion The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  10. Blind Relevance Feedback “(sea | ocean | overseas) & (traffic | routes | cruising | ...) & “sea traffic in (boats | fishing | ...) Portuguese islands” in SCOPE Initial query Final query blind rel. Initial Final feedback retrieval retrieval Initial run Final run Madeira, Porto Santo, + Pico, Faial, S. Jorge, Graciosa, Terceira, ... The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  11. Blind Relevance Feedback Initial query Final query blind rel. Initial Final feedback retrieval retrieval Initial run Final run ? When is it better to insert the geographic restrictions? The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  12. Blind Relevance Feedback When is it better to insert the geographic restrictions? BEFORE Relevance Feedback? “sea traffic in Portuguese islands” Initial query Final query blind rel. Initial Final feedback retrieval retrieval Initial run Final run “(sea | ocean | overseas) & “sea traffic in Madeira, (traffic | routes | cruising | ...) & Porto Santo, Pico, Faial, (boats | fishing | ...) in Madeira, S. Jorge, Graciosa, Terceira, ... Porto Santo, Pico, Faial, S. Jorge, Graciosa, Terceira, ... The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  13. Blind Relevance Feedback When is it better to insert the geographic restrictions? or AFTER Relevance Feedback? “sea traffic in Portuguese islands” Initial query Final query blind rel. Initial Final feedback retrieval retrieval Initial run Final run “(sea | ocean | overseas) & “sea traffic in Portugal (traffic | routes | cruising | ...) & (boats | fishing | ...) in Madeira, Porto Santo, Pico, Faial, S. Jorge, Graciosa, Terceira, ... The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  14. Geographic relevance ● In 2006: one GeoSim for each pair ( s query , s doc ). ● In 2007: multiple GeoSim for each pair ( sign query , sign doc ). ● Combination of multiple GeoSim values into a single GeoScore . The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  15. Geographic relevance (cont.) GeoSim combinations: Mean , Maximum , Boolean. Hungary Query: Query: Tourist attractions in Hungary . 1.00 0.15 0.05 Document 1: GeoSim x ConfMeas: Hungary Portugal Australia Document 1: (...) there are many tourist 0.60 GeoSim x ConfMeas: attractions (...) in Hungary , Budapest Document 2: (...)near Portugal , and (...) in Australia . GeoScore Mean Max. Bool. Document 2: Document 1 (...) there are many tourist 0.40 1.00 1.00 attractions (...) in Budapest . Document 2 0.60 0.60 0.00 The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  16. Experiments # Description Baseline using classic IR approach. 1 Geographic QE before RF, but just terms : no GeoScore. Geographic IR approach. 2 Geographic QE before or after RF Geographic IR approach. 3 Test the GeoSim combinations. The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  17. Experiments IR GIR IR/GIR The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  18. Questions raised: ● Geographic QE before blind RF seems to help. Why? ● Shouldn't term query expansion be geographic-independent? ● ...or are geographic terms also good thematic terms? ● Classic IR was outperformed by IR/GIR run: does it mean that we are finally using GIR the right way? ● Boolean and Maximum GeoSim combinations still inconclusive... and also dependent on the quality of the ontology. The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  19. Future Work ● Interesting results... ● Why blind RF performs better with geographic criteria? Is it statistically significant? ● Outperforming classic IR: coincidence... or not? ● Feature type-oriented query expansion has its merits. ● Next step: mature the GIR system for further experiments The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

  20. The end. ● Thank you for you attention. ● Questions? The University of Lisbon at GeoCLEF 2007 The University of Lisbon at GeoCLEF 2007 Budapest, 21 st Sept., 2007

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend