Enriching search results with semantic metadata Giuseppe Alberto - PowerPoint PPT Presentation

POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica Enriching search results with semantic metadata Giuseppe Alberto Mangano 665701 Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado

Information Retrieval Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers) [Manning et al., 2009] 2 Giuseppe A. Mangano Enriching search results with semantic metadata

Syntactic Search: overview ● Syntactic Search – the first, and currently the most used method – simple matching between query and document terms – good results with very large sets of documents ● Vector Space Model – the classic VSM: TF-IDF (Salton, Wong, Yang - 1975) – a user will mainly use free text queries – three main stages: ● document indexing ● weighting of indexed terms ● computing similarities between query and documents 3 Giuseppe A. Mangano Enriching search results with semantic metadata

Syntactic Search: limitations the indexed document: bed and breakfast in Legnano can be retrieved with queries such as: “ bed and breakfast ”, “ Legnano ” but cannot be matched with: “ sleep ”, “ Milan ” even though the document may be relevant to the information needs of a user that inputs these terms 4 Giuseppe A. Mangano Enriching search results with semantic metadata

Semantic Search based on the computation of semantic relations between concepts ● it exploits the meaning of words using data from semantic ● networks to generate more relevant results ➢ index expansion – performed by associating to certain terms of a document other terms obtained from semantic networks – the document can be retrieved by matching the searched terms with the ones added semantically ➢ query expansion – performed by expanding the terms of the query to match additional documents already indexed 5 Giuseppe A. Mangano Enriching search results with semantic metadata

Goal ● Create a search engine prototype that enhances traditional Syntactic Search methods with the semantic expansion of terms present in documents and query strings. – Employing metadata in the form of payloads associated to terms added in the expansion, we want to ensure control over the ranking process to directly reflect the possible decrease in relevancy of documents retrieved using semantics. 6 Giuseppe A. Mangano Enriching search results with semantic metadata

Apache Lucene ● a free/open source information retrieval library originally created in Java ● Lucene is an API (not an application) that handles the indexing, searching and retrieving of documents 7 Giuseppe A. Mangano Enriching search results with semantic metadata

Apache Solr ● Solr is an open source standalone enterprise search server based on Lucene 8 Giuseppe A. Mangano Enriching search results with semantic metadata

Lucene's Token Stream ● The fundamental output generated by the analysis process ● Each token usually represents an individual word of that text ● A token carries with it a text value (the word itself) as well as some metadata: the start and end offsets in the original text, a token type, a position increment and an optional payload. ● The token position increment value relates the current token to the previous one 9 Giuseppe A. Mangano Enriching search results with semantic metadata

Data sources ● Ontologies ● GeoNames – a geographical database available through various Web Services, under a Creative Commons attribution license. – it covers all countries and contains over eight million placenames and other data such as latitude, longitude, elevation, population, administrative subdivision, and postal codes. dog isNarrowerThan pet pet isBroaderThan cat pet isNarrowerThan animal bed and breakfast isRelatedTo sleep 10 Giuseppe A. Mangano Enriching search results with semantic metadata

Expansion Example bed and breakfast in Legnano ORIGINAL DOCUMENT [bed] [and] [breakfast] [in] [Legnano] WHITESPACE TOKENIZATION [bed] [and] [breakfast] [in] [Legnano] ADDED GEONAMES TERMS [6537118] -{0.1} [Europe] -{0.0256} [Italy] -{0.064} [Lombardy] -{0.16} [Milan] -{0.4} [bed and breakfast] [in] [Legnano] ADDED ONTOLOGY TERMS [sleep] -{0.2} [6537118] -{0.1} [accomodation] -{0.4} [Europe] -{0.0256} [Italy] -{0.064} [Lombardy] -{0.16} [Milan] -{0.4} 11 Giuseppe A. Mangano Enriching search results with semantic metadata

Implementation (1) ● SemanticFilter (our custom analyzer ) – GeoNames parser ● Java API for XML Processing – Ontology parser ● JENA (a Semantic Web framework for Java) – Shingle matching algorithm ( for multiword terms ) – Payloads ● a byte array of information associated to a term ● encodeFloat of Lucene's PayloadHelper class ● setPayload of Lucene's Token class 12 Giuseppe A. Mangano Enriching search results with semantic metadata

Implementation (2) ● PayloadBoostingSimilarity – extends Lucene's DefaultSimilarity (scoring) – uses PayloadHelper's decodeFloat – overrides scorePayload (which returns 1 by default) ● BoostingTermQuery – a payload-aware Query – it invokes the overridden scorePayload method ● PayloadQParserPlugin – we extend Solr's QParserPlugin to create custom query structures 13 Giuseppe A. Mangano Enriching search results with semantic metadata

Index Expansion 14 Giuseppe A. Mangano Enriching search results with semantic metadata

Document tokenization 15 Giuseppe A. Mangano Enriching search results with semantic metadata

GeoNames parser 16 Giuseppe A. Mangano Enriching search results with semantic metadata

Ontology parser 17 Giuseppe A. Mangano Enriching search results with semantic metadata

Query input 18 Giuseppe A. Mangano Enriching search results with semantic metadata

Query processing TOKENIZATION ANALYSIS 19 Giuseppe A. Mangano Enriching search results with semantic metadata

Match highlighting 20 Giuseppe A. Mangano Enriching search results with semantic metadata

Scoring (1) 21 Giuseppe A. Mangano Enriching search results with semantic metadata

Scoring (2) 22 Giuseppe A. Mangano Enriching search results with semantic metadata

Conclusions Traditional syntactic-only search, albeit reliable and efficient, is greatly ● limited by the gap between the way machines work and the way we think Our search engine enriches search results with documents that traditional ● search engines fail to retrieve, while ensuring control over the ranking process FUTURE DEVELOPMENTS ● – handling Polysemy – storing data in an SQL database – tuning boost values – query expansion ● eg. D: “bed and breakfast in Monza”; Q: “visiting Legnano” ● Solr's query side support for payloads - SOLR-1337 (5 th August '09) 23 Giuseppe A. Mangano Enriching search results with semantic metadata

Q & A Questions? 24 Giuseppe A. Mangano Enriching search results with semantic metadata

Enriching search results with semantic metadata Giuseppe Alberto - PowerPoint PPT Presentation

POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica Enriching search results with semantic metadata Giuseppe Alberto Mangano 665701 Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado Information Retrieval

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

Learning to attach semantic metadata to Web Services Andreas He, Nicholas Kushmerick

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

FDA Mini-Sentinel as a National Resource Jeffrey Brown Harvard Pilgrim Health Care Institute

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

Effective Slot Filling Based on Shallow Distant Supervision Methods Benjamin Roth, Tassilo Barth,

PLAN FOCUS AREAS 7 6 Includes the Downtown Core, in addition to the 8 commercial corridors and

Welcome UnitedHealthcare IBH Expansion Practices 2020 QUARTERLY ADULT IBH MEETING 2-13-2020 1

Floridas Facial Recognition Network Hosted by Pinellas County Sheriffs Office Pinellas County

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

1 Understanding the exam preparation process is important to realise the level of professionalism

Enriching search results with semantic metadata Giuseppe Alberto - PowerPoint PPT Presentation

POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica Enriching search results with semantic metadata Giuseppe Alberto Mangano 665701 Relatore: Prof. Marco Colombetti Correlatore: Ing. David Laniado Information Retrieval

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

Learning to attach semantic metadata to Web Services Andreas He, Nicholas Kushmerick

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

From SDTM to displays, through ADaM &amp; Analyses Results Metadata, a flight on board METADATA

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

FDA Mini-Sentinel as a National Resource Jeffrey Brown Harvard Pilgrim Health Care Institute

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

Effective Slot Filling Based on Shallow Distant Supervision Methods Benjamin Roth, Tassilo Barth,

PLAN FOCUS AREAS 7 6 Includes the Downtown Core, in addition to the 8 commercial corridors and

Welcome UnitedHealthcare IBH Expansion Practices 2020 QUARTERLY ADULT IBH MEETING 2-13-2020 1

Floridas Facial Recognition Network Hosted by Pinellas County Sheriffs Office Pinellas County

@ International KEYSTONE Challenge Track Conference Challenge Track Koice 11 12 May 2015

1 Understanding the exam preparation process is important to realise the level of professionalism

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA