Outline Information Retrieval (IR) Syntactic IR Problems of - PowerPoint PPT Presentation

Fausto Giunchiglia, Uladzimir Kharkevich , Ilya Zaihrayeu Concept Search : Semantics Enabled Syntactic Search June 2nd, 2008, Tenerife, Spain 1

Outline � Information Retrieval (IR) � Syntactic IR � Problems of Syntactic IR � Semantic Continuum � Concept Search ( C-Search ) � C-Search via Inverted Indices � Preliminary Evaluation � Conclusion and Future work 2

I nformation Retrieval (I R) IR can be represented as a mapping function: � I R: Q → D Q - natural language queries which specify user information needs � D - a set of documents in the document collection, which meet these � needs, (optionally) ordered according to the degree of relevance. Ex. document collection: � Ex. queries: � 3

I nformation Retrieval System I R_System = < Model, Data_Structure, Term, Match> Model – IR models used for document and query representations, � for computing query answers and relevance ranking. Bag of words model (representation) � Boolean Model, Vector Space Model, Probabilistic Model (retrieval) � Data_Structure – data structures used for indexing and retrieval. � Inverted Index � Signature File � Term – an atomic element in document and query representations. � a word or multi-words phrase � Match – matching technique used for term matching. � a syntactic matching of words or phrases: � � search for equivalent words � search for words with common prefixes � search for words within a certain edit distance with a given word 4

Syntactic I R (Ex. I nv. I ndex) Q3 : 5

Problems of Syntactic I R (I) Ambiguity of Natural Language � Polysemy : one word ↔ multiple meanings � e.g., baby is a young mammal or a human child Synonymy : different words ↔ same meaning � e.g., mark and print – a visible indication made on a surface (II) Complex Concepts � Syntactic IR does not take into account complex concepts formed by � Natural Language Phrases (e.g., Noun Phrases). � E.g., Computer table → A laptop computer is on a coffee table (III) Related Concepts � Syntactic IR does not take into account related concepts : � � E.g., carnivores (flesh-eating mammals) is more general than dog OR cat 6

Syntactic I R We can think of Syntactic IR as a point in a space of IR approaches � NL Word String Similarity (0, 0, 0) Pure Syntax 7

(1) Ambiguity : Natural Language → Formal Language (FL) NL2FL 1 NL Word String Similarity (0, 0, 0) Pure Syntax E.g., baby → C(baby) : a human child � print → C(print) : a visible indication made on a surface 8

(2) Complex Concepts : Words → Multi-word Phrases W2P 1 (Free Text) … +Verb Phrase +Noun Phrase (FL) NL2FL 1 NL Word String Similarity (0, 0, 0) Pure Syntax E.g., Computer table → C (computer table) � A laptop computer is on a coffee table → { C (laptop computer), C (coffee table)} 9

(3) Related Concepts : String similarity → Knowledge W2P 1 (Free Text) … +Verb Phrase +Noun Phrase (FL) NL2FL 1 NL Word KNOW 1 … String +Statistical +Lexical (Complete Similarity Knowledge knowledge Ontological (0, 0, 0) Knowledge) Pure Syntax E.g., “ carnivores ” ≠ “ dog ” → C(carnivores) ⊒ C(dog) � 10

Semantic Continuum Full Semantics (1, 1, 1) W2P 1 (Free Text) … +Verb Phrase � C-Search +Noun Phrase (FL) NL2FL 1 NL Word KNOW 1 … String +Statistical +Lexical (Complete Similarity Knowledge knowledge Ontological (0, 0, 0) Knowledge) Pure Syntax 11

C-Search in Semantic Continuum NL2FL-axis - Lack of background knowledge : � It is not always possible to find a concept which corresponds to a � given word (e.g., a concept does not exist in the lexical database). In this case, word itself is used as the identifier for a concept. W2P-axis - Descriptive phrases � (Complex) concepts are extracted from descriptive phrases � � descriptive phrase ::= noun phrase { OR noun phrase} � E.g., C(A little dog OR a huge cat) = (little-2 ⊓ dog-1) ⊔ (huge-1 ⊓ cat- 3) KNOW-axis - lexical knowledge � We use synonyms, hyponyms, hypernyms � Semantic Matching → search for related complex concepts. � 12

C-Search in Semantic Continuum Full Semantics (1, 1, 1) W2P 1 (Free Text) … C-Search +Verb Phrase +Descriptive Phrase +Noun Phrase (FL) NL2FL 1 NL&FL NL Word KNOW 1 … String +Statistical +Lexical (Complete Similarity Knowledge knowledge Ontological (0, 0, 0) Knowledge) Pure Syntax 13

C-Search via I nverted I ndices Moving from Syntactic I R to C-Search does not require � the introduction of new data structures or retrieval models The current implementation of C-Search : � Model – Bag of concepts (representation), � Boolean Model (retrieval), Vector Space Model (ranking) Data_Structure – Inverted Index � Term – an atomic or a complex concept � Match – semantic matching of concepts � 14

C-Search (Ex. I nv. I ndex) 15

Concept Matching Goal: To find a set of document concepts matching query concept � = ⊆ ( ) { | } q d d q C C C C C ms 1 st approach - directly via S-Match � Sequentially iterate through all document concepts � Compare document concept with query concept (using S-Match ) � Collect those concepts for which S-Match return more specific ( ⊑ ) � I t can be slow! (because number of document concepts > 10E6) � 2 nd approach - via I nverted I ndices (brief overview) � A-I ndex � → Index atomic concepts by more general atomic concept ⊓ -I ndex � → Index conjunctive clauses by its components (i.e., atomic concepts ) ⊔ -I ndex � → Index DNF formulas by its components (i.e., conjunctive clauses ) 16

Concept I ndices (An example) Let us consider the following concept: � C1 = (little-2 ⊓ dog-1) ⊔ (huge-1 ⊓ cat-3) Fragments of concept indices for document concept C1: � Concept ∩ -index Concept ∪ -index Concept A-index C 2 (little ∩ dog) C 1 ,… A 1 (little) C 2 , … A 5 (canine) A 2 ,… C 3 (huge ∩ cat) C 1 ,… A 2 (dog) C 2 , … A 6 (feline) A 4 ,… … … … … … … C 3 , … A 3 (huge) C 3 , … A 4 (cat) … … 17

Concept Retrieval (An example) 0. Query concept: Cq = canine ⊔ feline � 1. For each atomic concept → more specific atomic concepts � Search A-I ndex � E.g., canine → { dog, wolf, …} and feline → { cat, lion, …} � 2. For each atomic concept → more specific conjunctive clauses � Search ⊓ -I ndex � E.g., dog → { C2= little ⊓ dog, …} and cat → { C3= huge ⊓ cat, …} � (Note that: canine → { C2= little ⊓ dog, …} and feline → { C3= huge ⊓ cat, …} ) � 3. For each disjunctive clause → more specific conjunctive clauses � Union of conjunctive clauses � E.g., canine ⊔ feline → { C2= little ⊓ dog, C3= huge ⊓ cat, …} � 4. For each disjunctive clause → more specific DNF formulas � Search ⊔ -I ndex � E.g., canine ⊔ feline → { C1= (little ⊓ dog) ⊔ (huge ⊓ cat), …} � 5. … � 18

Evaluation: Settings Data_set_1 : Home sub-tree of DMoz web directory � Document set : documents classified to nodes (29506) � Query set : concatenation of node's and its parent's labels (890) � Relevance judgment : node-document links � Data_set_2 : Only difference with Data_set_1 is: � Document set : concatenation of titles and descriptions of docs in DMoz. � WordNet is used as Lexical DB � GATE is used as NLP Tool � Lucene is used as I nverted I ndex � 19

Evaluation results Data_set_1 � Data_set_2 � 20

Conclusion and Future work Conclusion � In C-Search , syntactic IR is extended with a semantics layer � C-Search performs as good as syntactic search while allowing for � an improvement when semantics is available In principle, C-Search supports a continuum from purely syntactic IR to � fully semantic IR in which indexing and retrieval can be performed at any point of the continuum depending on how much semantics is available Future work � Development of more accurate concept extraction algorithm � Development of document relevance metrics based on both syntactic and � semantic similarities of query and document descriptions Allow semantic scope (such as equivalence, more/less general, disjoint) � Comparing the performance of the proposed solution with the state-of-the- � art syntactic IR systems using a syntactic IR benchmark 21

Thank You! 22

Outline Information Retrieval (IR) Syntactic IR Problems of - PowerPoint PPT Presentation

Fausto Giunchiglia, Uladzimir Kharkevich , Ilya Zaihrayeu Concept Search : Semantics Enabled Syntactic Search June 2nd, 2008, Tenerife, Spain 1 Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

HIGH PERFORMANCE SOLAR DISH CONCENTRATOR FOR STEAM GENERATION 04/10/2013 1.0 INTRODUCTION With the

Emerging Class 1 Nickel Producer Investor Roadshow September 2018 Michael Rodriguez, COO &

Fast Track to Production With One of the Largest Undeveloped Gold Deposits in the Americas

Voisey's Bay Stream Transaction Overview required to be delivered to any investor that receives

Managing the Complexities 1 of Document Review Panelists Ms. Laney Altamar Regional Sales Vice

Software tools for iGEMers: BioBrick / Past Project Search & Tutorials Background In iGEM...

Supporting Cutting-Edge Synthetic Biology Research with Computational Innovations WELLESLEY HCI

Mission Planning Module for ICARUS Project Pawel Musialik pjmusialik@gmail.com Institute of

Outline Information Retrieval (IR) Syntactic IR Problems of - PowerPoint PPT Presentation

Fausto Giunchiglia, Uladzimir Kharkevich , Ilya Zaihrayeu Concept Search : Semantics Enabled Syntactic Search June 2nd, 2008, Tenerife, Spain 1 Outline Information Retrieval (IR) Syntactic IR Problems of Syntactic IR Semantic

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

HIGH PERFORMANCE SOLAR DISH CONCENTRATOR FOR STEAM GENERATION 04/10/2013 1.0 INTRODUCTION With the

Emerging Class 1 Nickel Producer Investor Roadshow September 2018 Michael Rodriguez, COO &amp;

Fast Track to Production With One of the Largest Undeveloped Gold Deposits in the Americas

Voisey's Bay Stream Transaction Overview required to be delivered to any investor that receives

Managing the Complexities 1 of Document Review Panelists Ms. Laney Altamar Regional Sales Vice

Software tools for iGEMers: BioBrick / Past Project Search &amp; Tutorials Background In iGEM...

Supporting Cutting-Edge Synthetic Biology Research with Computational Innovations WELLESLEY HCI

Mission Planning Module for ICARUS Project Pawel Musialik pjmusialik@gmail.com Institute of

Emerging Class 1 Nickel Producer Investor Roadshow September 2018 Michael Rodriguez, COO &

Software tools for iGEMers: BioBrick / Past Project Search & Tutorials Background In iGEM...