Enriching search results with semantic metadata
POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica
Giuseppe Alberto Mangano 665701
Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado
Enriching search results with semantic metadata Giuseppe Alberto - - PowerPoint PPT Presentation
POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica Enriching search results with semantic metadata Giuseppe Alberto Mangano 665701 Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado Information Retrieval
POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica
Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado
Enriching search results with semantic metadata Giuseppe A. Mangano 2
Enriching search results with semantic metadata Giuseppe A. Mangano 3
– the first, and currently the most used method – simple matching between query and document terms – good results with very large sets of documents
– the classic VSM: TF-IDF (Salton, Wong, Yang - 1975) – a user will mainly use free text queries – three main stages:
Enriching search results with semantic metadata Giuseppe A. Mangano 4
Enriching search results with semantic metadata Giuseppe A. Mangano 5
➢ index expansion
– performed by associating to certain terms of a document
– the document can be retrieved by matching the searched
➢ query expansion
– performed by expanding the terms of the query to match
Enriching search results with semantic metadata Giuseppe A. Mangano 6
– Employing metadata in the form of payloads
Enriching search results with semantic metadata Giuseppe A. Mangano 7
Enriching search results with semantic metadata Giuseppe A. Mangano 8
Enriching search results with semantic metadata Giuseppe A. Mangano 9
Enriching search results with semantic metadata Giuseppe A. Mangano 10
– a geographical database
– it covers all countries and
dog isNarrowerThan pet pet isBroaderThan cat pet isNarrowerThan animal bed and breakfast isRelatedTo sleep
Enriching search results with semantic metadata Giuseppe A. Mangano 11
bed and breakfast in Legnano ORIGINAL DOCUMENT [bed] [and] [breakfast] [in] [Legnano] WHITESPACE TOKENIZATION [bed] [and] [breakfast] [in] [Legnano] ADDED GEONAMES TERMS [6537118]-{0.1} [Europe]-{0.0256} [Italy]-{0.064} [Lombardy]-{0.16} [Milan]-{0.4} [bed and breakfast] [in] [Legnano] ADDED ONTOLOGY TERMS [sleep]-{0.2} [6537118]-{0.1} [accomodation]-{0.4} [Europe]-{0.0256} [Italy]-{0.064} [Lombardy]-{0.16} [Milan]-{0.4}
Enriching search results with semantic metadata Giuseppe A. Mangano 12
– GeoNames parser
– Ontology parser
– Shingle matching algorithm (for multiword terms) – Payloads
Enriching search results with semantic metadata Giuseppe A. Mangano 13
– extends Lucene's DefaultSimilarity (scoring) – uses PayloadHelper's decodeFloat – overrides scorePayload (which returns 1 by default)
– a payload-aware Query – it invokes the overridden scorePayload method
– we extend Solr's QParserPlugin to create custom query
Enriching search results with semantic metadata Giuseppe A. Mangano 14
Enriching search results with semantic metadata Giuseppe A. Mangano 15
Enriching search results with semantic metadata Giuseppe A. Mangano 16
Enriching search results with semantic metadata Giuseppe A. Mangano 17
Enriching search results with semantic metadata Giuseppe A. Mangano 18
Enriching search results with semantic metadata Giuseppe A. Mangano 19
Enriching search results with semantic metadata Giuseppe A. Mangano 20
Enriching search results with semantic metadata Giuseppe A. Mangano 21
Enriching search results with semantic metadata Giuseppe A. Mangano 22
Enriching search results with semantic metadata Giuseppe A. Mangano 23
limited by the gap between the way machines work and the way we think
search engines fail to retrieve, while ensuring control over the ranking process
– handling Polysemy – storing data in an SQL database – tuning boost values – query expansion
Enriching search results with semantic metadata Giuseppe A. Mangano 24