models for models for retrieval and browsing retrieval
play

Models for Models for Retrieval and Browsing Retrieval and - PowerPoint PPT Presentation

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2 Taxonomy of Classic IR Models Set


  1. Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2

  2. Taxonomy of Classic IR Models Set Theoretic Fuzzy Extended Boolean Classic Models Boolean Algebraic Vector U Generalized Vector Probabilistic Retrieval: s Latent Semantic Adhoc e Indexing (LSI) Filtering Neural Networks r Structured Models Probabilistic T Non-Overlapping Lists a Inference Network Proximal Nodes s Belief Network k Browsing Hidden Markov Model Browsing Probabilistic LSI Language Model Flat Structure Guided probability-based Hypertext IR 2004 – Berlin Chen 2

  3. Outline • Alternative Set Theoretic Models – Fuzzy Set Model (Fuzzy Information Retrieval) – Extended Boolean Model • Alternative Algebraic Models – Generalized Vector Space Model IR 2004 – Berlin Chen 3

  4. Fuzzy Set Model • Premises – Docs and queries are represented through sets of keywords, therefore the matching between them is vague • Keywords cannot completely describe the user’s information need and the doc’s main theme aboutness Retrieval Model w s , w p , w q,…. w i , w j , w k,…. 陳總統、北二高、、 陳水扁、北部第二高速公路、、 – For each query term (keyword) • Define a fuzzy set and that each doc has a degree of membership (0~1) in the set IR 2004 – Berlin Chen 4

  5. Fuzzy Set Model (cont.) • Fuzzy Set Theory – Framework for representing classes (sets) whose boundaries are not well defined – Key idea is to introduce the notion of a degree of membership associated with the elements of a set – This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership • 0 → no membership • 1 → full membership – Thus, membership is now a gradual instead of abrupt • Not as conventional Boolean logic Here we will define a fuzzy set for each query (or index) term, thus each doc has a degree of membership in this set. IR 2004 – Berlin Chen 5

  6. Fuzzy Set Model (cont.) U A B • Definition u – A fuzzy subset A of a universal of discourse U is characterized by a membership function µ A : U → [0,1] • Which associates with each element u of U a number µ A ( u ) in the interval [0,1] – Let A and B be two fuzzy subsets of U . Also, let A be the complement of A . Then, µ = − µ ( u ) 1 ( u ) • Complement A A µ = µ µ ( u ) max( ( u ), ( u )) • Union ∪ A B A B µ = µ µ • Intersection ( u ) min( ( u ), ( u )) ∩ A B A B IR 2004 – Berlin Chen 6

  7. Fuzzy Set Model (cont.) • Fuzzy information retrieval Defining term relationship – Fuzzy sets are modeled based on a thesaurus – This thesaurus can be constructed by a term-term correlation matrix (or called keyword connection matrix) r • c : a term-term correlation matrix c , • : a normalized correlation factor for terms k i and k l i l n n = : no of docs that contain k i i , l c i i , l + − n n n n : no of docs that contain both k i and k l i , l i l i , l ranged from 0 to 1 docs, paragraphs, sentences, .. • We now have the notion of proximity among index terms – The relationship is symmetric ! ( ) ( ) µ = = = µ k c c k k l i , l l , i k i i l IR 2004 – Berlin Chen 7

  8. Fuzzy Set Model (cont.) • The union and intersection operations are modified here U + + ab a b a b ( ) ( ) = + − + − A 1 ab 1 a b a 1 b A 2 = + − + − ab b ab a ab = − − − + 1 ( 1 a b ab ) u = − − − 1 ( 1 a )( 1 b ) – Union : algebraic sum (instead of max ) µ = µ ( u ) ( u ) µ = µ µ + µ µ + µ µ ( u ) ( u ) ( u ) ( u ) ( u ) ( u ) ( u ) ∪ ∪ ∪ A A A A L 1 2 n j ∪ A A A A A A A A j 1 2 1 2 2 1 1 2 ( ) ( ) 2 n ∏ = ∏ 1 - 1 -µ (u) = 1 - 1 -µ (u) A j A = a negative algebraic product j 1 j = j 1 – Intersection : algebraic product (instead of min ) n ∏ µ = ( u ) µ (u) µ = µ µ ( u ) ( u ) ( u ) ∩ ∩ A A A A L 1 2 n j ∩ A A A A = j 1 1 2 1 2 IR 2004 – Berlin Chen 8

  9. Fuzzy Set Model (cont.) – The degree of membership between a doc d j and an index term k i algebraic sum (a doc is a union of index terms) k k b ( ) a ( ) ( ) c , µ = µ = µ c , d k k i b i a k ∪ k j d i k i i 1 − 1 − c , i j l c , ∈ i a k d i b l j ( ) ( ) ( ) ∏ ∏ = − − µ = − − 1 1 k 1 1 c k i i , l l ∈ ∈ k d k d l j l j • Computes an algebraic sum over all terms in the doc d j – Implemented as the complement of a negative algebraic product – A doc d j belongs to the fuzzy set associated to the term k i if its own terms are related to k i • If there is at least one index term k l of d j which is strongly related to the index k i ( ) then µ ki,dj ∼ 1 c ~ 1 i , l – k i is a good fuzzy index for doc d j – And vice versa IR 2004 – Berlin Chen 9

  10. Fuzzy Set Model (cont.) • Example: – Query q = k a ∧ ( k b ∨ ¬ k c ) disjunctive normal form q dnf =( k a ∧ k b ∧ k c ) ∨ ( k a ∧ k b ∧ ¬ k c ) ∨ ( k a ∧ ¬ k b ∧ ¬ k c ) = cc 1 +cc 2 +cc 3 conjunctive component D a D b cc 2 cc 3 – D a is the fuzzy set of docs cc 1 associated to the term k a – Degree of membership ? D c IR 2004 – Berlin Chen 10

  11. Fuzzy Set Model (cont.) D a D b cc 2 • Degree of membership cc 3 cc 1 algebraic sum µ = µ ∪ ∪ q , d cc cc cc , d j 1 2 3 j 3 negative algebraic product ∏ for a doc in d = − − µ D c 1 ( 1 ) j the fuzzy answer cc , d D set i j cc 3 q ) ( ) = cc 2 ( )( i 1 cc 1 = − − µ − µ − µ 1 1 1 1 ∩ ∩ ∩ ∩ a b c , d a b c , d ∩ ∩ a b c , d j j j algebraic product = − − µ µ µ 1 ( 1 ) a , d b , d c , d j j j × − µ µ − µ × − µ − µ − µ ( 1 ( 1 )) ( 1 ( 1 )( 1 )) a , d b , d c , d a , d b , d c , d j j j j j j IR 2004 – Berlin Chen 11

  12. Fuzzy Set Model (cont.) • Advantages – The correlations among index terms are considered – Degree of relevance between queries and docs can be achieved • Disadvantages – Fuzzy IR models have been discussed mainly in the literature associated with fuzzy theory – Experiments with standard test collections are not available IR 2004 – Berlin Chen 12

  13. Extended Boolean Model Salton et al., 1983 • Motive – Extend the Boolean model with the functionality of partial matching and term weighting 陳水扁 及 呂秀蓮 • E.g.: in Boolean model, for the qery q = k x ∧ k y , a doc contains either k x or k y is as irrelevant as another doc which contains neither of them • How about the disjunctive query q = k x ∨ k y 陳水扁 或 呂秀蓮 – Combine Boolean query formulations with characteristics of the vector model • Term weighting a ranking can be obtained • Algebraic distances for similarity measures IR 2004 – Berlin Chen 13

  14. Extended Boolean Model (cont.) • Term weighting – The weight for the term k x in a doc d j is idf ranged from 0 to 1 = × w tf x x , j x , j max idf Normalized idf i i normalized frequency w , • is normalized to lay between 0 and 1 x j • Assume two index terms k x and k y were used w , x – Let denote the weight of term k x on doc d j x j w , y – Let denote the weight of term k y on doc d j y j r ( ) ( ) = d j x , y – The doc vector is represented as = d w , , w j x j y , j – Queries and docs can be plotted in a two-dimensional map IR 2004 – Berlin Chen 14

  15. Extended Boolean Model (cont.) • If the query is q = k x ∧ k y (conjunctive query) -The docs near the point (1,1) are preferred -The similarity measure is defined as ( ) ( ) − + − 2 2 1 x 1 y ( ) 2-norm model = − sim q , d 1 and (Euclidean distance) 2 k y (1,1) 1 1 − 1 / 2 AND r ( ) = y w = d j+1 d w , , w y , j j x j y , j d j 0 = x w k x (0,0) 1 − 1 − 1 1 / / 2 2 x , j IR 2004 – Berlin Chen 15

  16. Extended Boolean Model (cont.) • If the query is q = k x ∨ k y (disjunctive query) -The docs far from the point (0,0) are preferred -The similarity measure is defined as + x 2 y 2 ( ) = sim q , d 2-norm model or 2 (Euclidean distance) k y (1,1) Or 1 / 2 1 d j+1 d j y = w y,j k x 1 / 2 (0,0) 0 x = w x,j IR 2004 – Berlin Chen 16

  17. Extended Boolean Model (cont.) ( ) sim q or , d • The similarity measures and ( ) also lay between 0 and 1 sim q and , d IR 2004 – Berlin Chen 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend