Models for Models for Retrieval and Browsing Retrieval and - PowerPoint PPT Presentation

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2

Taxonomy of Classic IR Models Set Theoretic Fuzzy Extended Boolean Classic Models Boolean Algebraic Vector U Generalized Vector Probabilistic Retrieval: s Latent Semantic Adhoc e Indexing (LSI) Filtering Neural Networks r Structured Models Probabilistic T Non-Overlapping Lists a Inference Network Proximal Nodes s Belief Network k Browsing Hidden Markov Model Browsing Probabilistic LSI Language Model Flat Structure Guided probability-based Hypertext IR 2004 – Berlin Chen 2

Outline • Alternative Set Theoretic Models – Fuzzy Set Model (Fuzzy Information Retrieval) – Extended Boolean Model • Alternative Algebraic Models – Generalized Vector Space Model IR 2004 – Berlin Chen 3

Fuzzy Set Model • Premises – Docs and queries are represented through sets of keywords, therefore the matching between them is vague • Keywords cannot completely describe the user’s information need and the doc’s main theme aboutness Retrieval Model w s , w p , w q,…. w i , w j , w k,…. 陳總統、北二高、、陳水扁、北部第二高速公路、、 – For each query term (keyword) • Define a fuzzy set and that each doc has a degree of membership (0~1) in the set IR 2004 – Berlin Chen 4

Fuzzy Set Model (cont.) • Fuzzy Set Theory – Framework for representing classes (sets) whose boundaries are not well defined – Key idea is to introduce the notion of a degree of membership associated with the elements of a set – This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership • 0 → no membership • 1 → full membership – Thus, membership is now a gradual instead of abrupt • Not as conventional Boolean logic Here we will define a fuzzy set for each query (or index) term, thus each doc has a degree of membership in this set. IR 2004 – Berlin Chen 5

Fuzzy Set Model (cont.) U A B • Definition u – A fuzzy subset A of a universal of discourse U is characterized by a membership function µ A : U → [0,1] • Which associates with each element u of U a number µ A ( u ) in the interval [0,1] – Let A and B be two fuzzy subsets of U . Also, let A be the complement of A . Then, µ = − µ ( u ) 1 ( u ) • Complement A A µ = µ µ ( u ) max( ( u ), ( u )) • Union ∪ A B A B µ = µ µ • Intersection ( u ) min( ( u ), ( u )) ∩ A B A B IR 2004 – Berlin Chen 6

Fuzzy Set Model (cont.) • Fuzzy information retrieval Defining term relationship – Fuzzy sets are modeled based on a thesaurus – This thesaurus can be constructed by a term-term correlation matrix (or called keyword connection matrix) r • c : a term-term correlation matrix c , • : a normalized correlation factor for terms k i and k l i l n n = : no of docs that contain k i i , l c i i , l + − n n n n : no of docs that contain both k i and k l i , l i l i , l ranged from 0 to 1 docs, paragraphs, sentences, .. • We now have the notion of proximity among index terms – The relationship is symmetric ! ( ) ( ) µ = = = µ k c c k k l i , l l , i k i i l IR 2004 – Berlin Chen 7

Fuzzy Set Model (cont.) • The union and intersection operations are modified here U + + ab a b a b ( ) ( ) = + − + − A 1 ab 1 a b a 1 b A 2 = + − + − ab b ab a ab = − − − + 1 ( 1 a b ab ) u = − − − 1 ( 1 a )( 1 b ) – Union : algebraic sum (instead of max ) µ = µ ( u ) ( u ) µ = µ µ + µ µ + µ µ ( u ) ( u ) ( u ) ( u ) ( u ) ( u ) ( u ) ∪ ∪ ∪ A A A A L 1 2 n j ∪ A A A A A A A A j 1 2 1 2 2 1 1 2 ( ) ( ) 2 n ∏ = ∏ 1 - 1 -µ (u) = 1 - 1 -µ (u) A j A = a negative algebraic product j 1 j = j 1 – Intersection : algebraic product (instead of min ) n ∏ µ = ( u ) µ (u) µ = µ µ ( u ) ( u ) ( u ) ∩ ∩ A A A A L 1 2 n j ∩ A A A A = j 1 1 2 1 2 IR 2004 – Berlin Chen 8

Fuzzy Set Model (cont.) – The degree of membership between a doc d j and an index term k i algebraic sum (a doc is a union of index terms) k k b ( ) a ( ) ( ) c , µ = µ = µ c , d k k i b i a k ∪ k j d i k i i 1 − 1 − c , i j l c , ∈ i a k d i b l j ( ) ( ) ( ) ∏ ∏ = − − µ = − − 1 1 k 1 1 c k i i , l l ∈ ∈ k d k d l j l j • Computes an algebraic sum over all terms in the doc d j – Implemented as the complement of a negative algebraic product – A doc d j belongs to the fuzzy set associated to the term k i if its own terms are related to k i • If there is at least one index term k l of d j which is strongly related to the index k i ( ) then µ ki,dj ∼ 1 c ~ 1 i , l – k i is a good fuzzy index for doc d j – And vice versa IR 2004 – Berlin Chen 9

Fuzzy Set Model (cont.) • Example: – Query q = k a ∧ ( k b ∨ ¬ k c ) disjunctive normal form q dnf =( k a ∧ k b ∧ k c ) ∨ ( k a ∧ k b ∧ ¬ k c ) ∨ ( k a ∧ ¬ k b ∧ ¬ k c ) = cc 1 +cc 2 +cc 3 conjunctive component D a D b cc 2 cc 3 – D a is the fuzzy set of docs cc 1 associated to the term k a – Degree of membership ? D c IR 2004 – Berlin Chen 10

Fuzzy Set Model (cont.) D a D b cc 2 • Degree of membership cc 3 cc 1 algebraic sum µ = µ ∪ ∪ q , d cc cc cc , d j 1 2 3 j 3 negative algebraic product ∏ for a doc in d = − − µ D c 1 ( 1 ) j the fuzzy answer cc , d D set i j cc 3 q ) ( ) = cc 2 ( )( i 1 cc 1 = − − µ − µ − µ 1 1 1 1 ∩ ∩ ∩ ∩ a b c , d a b c , d ∩ ∩ a b c , d j j j algebraic product = − − µ µ µ 1 ( 1 ) a , d b , d c , d j j j × − µ µ − µ × − µ − µ − µ ( 1 ( 1 )) ( 1 ( 1 )( 1 )) a , d b , d c , d a , d b , d c , d j j j j j j IR 2004 – Berlin Chen 11

Fuzzy Set Model (cont.) • Advantages – The correlations among index terms are considered – Degree of relevance between queries and docs can be achieved • Disadvantages – Fuzzy IR models have been discussed mainly in the literature associated with fuzzy theory – Experiments with standard test collections are not available IR 2004 – Berlin Chen 12

Extended Boolean Model Salton et al., 1983 • Motive – Extend the Boolean model with the functionality of partial matching and term weighting 陳水扁及呂秀蓮 • E.g.: in Boolean model, for the qery q = k x ∧ k y , a doc contains either k x or k y is as irrelevant as another doc which contains neither of them • How about the disjunctive query q = k x ∨ k y 陳水扁或呂秀蓮 – Combine Boolean query formulations with characteristics of the vector model • Term weighting a ranking can be obtained • Algebraic distances for similarity measures IR 2004 – Berlin Chen 13

Extended Boolean Model (cont.) • Term weighting – The weight for the term k x in a doc d j is idf ranged from 0 to 1 = × w tf x x , j x , j max idf Normalized idf i i normalized frequency w , • is normalized to lay between 0 and 1 x j • Assume two index terms k x and k y were used w , x – Let denote the weight of term k x on doc d j x j w , y – Let denote the weight of term k y on doc d j y j r ( ) ( ) = d j x , y – The doc vector is represented as = d w , , w j x j y , j – Queries and docs can be plotted in a two-dimensional map IR 2004 – Berlin Chen 14

Extended Boolean Model (cont.) • If the query is q = k x ∧ k y (conjunctive query) -The docs near the point (1,1) are preferred -The similarity measure is defined as ( ) ( ) − + − 2 2 1 x 1 y ( ) 2-norm model = − sim q , d 1 and (Euclidean distance) 2 k y (1,1) 1 1 − 1 / 2 AND r ( ) = y w = d j+1 d w , , w y , j j x j y , j d j 0 = x w k x (0,0) 1 − 1 − 1 1 / / 2 2 x , j IR 2004 – Berlin Chen 15

Extended Boolean Model (cont.) • If the query is q = k x ∨ k y (disjunctive query) -The docs far from the point (0,0) are preferred -The similarity measure is defined as + x 2 y 2 ( ) = sim q , d 2-norm model or 2 (Euclidean distance) k y (1,1) Or 1 / 2 1 d j+1 d j y = w y,j k x 1 / 2 (0,0) 0 x = w x,j IR 2004 – Berlin Chen 16

Extended Boolean Model (cont.) ( ) sim q or , d • The similarity measures and ( ) also lay between 0 and 1 sim q and , d IR 2004 – Berlin Chen 17

Models for Models for Retrieval and Browsing Retrieval and - PowerPoint PPT Presentation

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2 Taxonomy of Classic IR Models Set

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

Google Safe Browsing: Privacy and Security Amrit Kumar Univ. de Grenoble Alpes & Privatics

Secure Browsing and Email Web Browsing with HTTPS Secure Email with OpenPGP Organised by Steven

Forced/forceful browsing sws2 1 Forced browsing (not in book!) Supplying a URL directly

Performance Metrics for Web Browsing draft fan ippm web metrics 00 Peng Fan

An interactive timeline for Speech Database Browsing Benoit Favre SRI STAR Lab Seminar

Web Browsing Topics Physical Exchange of Web Web Browsing 101 Technology Information

UCognito: Private Browsing without Tears Meng Xu, Yeongjin Yang, Xinyu Xing, Taesoo Kim, Wenke

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Algebraic Bethe Ansatz for deformed Gaudin model Nuno Cirilo Antnio Centro de Anlise

Optimization using Plasmo.jl Jordan Jalving & Victor M. Zavala Department of Chemical and

Circuit Complexity of Regular Languages Michal Koucky Presented by, Sunil K. S April 13, 2012

COMP31212: Concurrency and Process Algebra Introduction to the Course and to FSP David Rydeheard

CSC 530 Lecture Notes Week 10 Algebraic Semantics CSC530-S02-L10 Slide 2 I. A grand vision. A.

Linear Algebraic Representation of Knowledge State of Agent Satoshi Tojo JAIST 28 August, 2018

On the Logics with Propositional Quantifiers Extending S5 Yifeng Ding ( voidprove.com ) Aug. 27,

Syntax meets semantics in abstract algebraic logic Josep Maria Font University of Barcelona

Models for Models for Retrieval and Browsing Retrieval and - PowerPoint PPT Presentation

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2 Taxonomy of Classic IR Models Set

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

Google Safe Browsing: Privacy and Security Amrit Kumar Univ. de Grenoble Alpes &amp; Privatics

Secure Browsing and Email Web Browsing with HTTPS Secure Email with OpenPGP Organised by Steven

Forced/forceful browsing sws2 1 Forced browsing (not in book!) Supplying a URL directly

Performance Metrics for Web Browsing draft fan ippm web metrics 00 Peng Fan

An interactive timeline for Speech Database Browsing Benoit Favre SRI STAR Lab Seminar

Web Browsing Topics Physical Exchange of Web Web Browsing 101 Technology Information

UCognito: Private Browsing without Tears Meng Xu, Yeongjin Yang, Xinyu Xing, Taesoo Kim, Wenke

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Algebraic Bethe Ansatz for deformed Gaudin model Nuno Cirilo Antnio Centro de Anlise

Optimization using Plasmo.jl Jordan Jalving &amp; Victor M. Zavala Department of Chemical and

Circuit Complexity of Regular Languages Michal Koucky Presented by, Sunil K. S April 13, 2012

COMP31212: Concurrency and Process Algebra Introduction to the Course and to FSP David Rydeheard

CSC 530 Lecture Notes Week 10 Algebraic Semantics CSC530-S02-L10 Slide 2 I. A grand vision. A.

Linear Algebraic Representation of Knowledge State of Agent Satoshi Tojo JAIST 28 August, 2018

On the Logics with Propositional Quantifiers Extending S5 Yifeng Ding ( voidprove.com ) Aug. 27,

Syntax meets semantics in abstract algebraic logic Josep Maria Font University of Barcelona

Google Safe Browsing: Privacy and Security Amrit Kumar Univ. de Grenoble Alpes & Privatics

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Optimization using Plasmo.jl Jordan Jalving & Victor M. Zavala Department of Chemical and