Models for Models for Retrieval and Browsing Retrieval and - - PowerPoint PPT Presentation

models for models for retrieval and browsing retrieval
SMART_READER_LITE
LIVE PREVIEW

Models for Models for Retrieval and Browsing Retrieval and - - PowerPoint PPT Presentation

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2004 Reference: 1. Modern Information Retrieval . Chapter 2 Taxonomy of Classic IR Models Set


slide-1
SLIDE 1

Models for Models for Retrieval and Browsing Retrieval and Browsing

  • Fuzzy Set, Extended Boolean,

Generalized Vector Space Models

Berlin Chen 2004

Reference:

  • 1. Modern Information Retrieval. Chapter 2
slide-2
SLIDE 2

IR 2004 – Berlin Chen 2

Taxonomy of Classic IR Models

Non-Overlapping Lists Proximal Nodes Structured Models

Retrieval: Adhoc Filtering Browsing U s e r T a s k

Classic Models Boolean Vector Probabilistic Set Theoretic Fuzzy Extended Boolean Probabilistic Inference Network Belief Network Hidden Markov Model Probabilistic LSI Language Model Algebraic Generalized Vector Latent Semantic Indexing (LSI) Neural Networks Browsing Flat Structure Guided Hypertext probability-based

slide-3
SLIDE 3

IR 2004 – Berlin Chen 3

Outline

  • Alternative Set Theoretic Models

– Fuzzy Set Model (Fuzzy Information Retrieval) – Extended Boolean Model

  • Alternative Algebraic Models

– Generalized Vector Space Model

slide-4
SLIDE 4

IR 2004 – Berlin Chen 4

Fuzzy Set Model

  • Premises

– Docs and queries are represented through sets of keywords, therefore the matching between them is vague

  • Keywords cannot completely describe the user’s

information need and the doc’s main theme – For each query term (keyword)

  • Define a fuzzy set and that each doc has a degree
  • f membership (0~1) in the set

aboutness

wi, wj, wk,…. ws, wp, wq,…. Retrieval Model

陳總統、北二高、、 陳水扁、北部第二高速公路、、

slide-5
SLIDE 5

IR 2004 – Berlin Chen 5

Fuzzy Set Model (cont.)

  • Fuzzy Set Theory

– Framework for representing classes (sets) whose boundaries are not well defined – Key idea is to introduce the notion of a degree of membership associated with the elements of a set – This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership

  • 0 →no membership
  • 1 →full membership

– Thus, membership is now a gradual instead of abrupt

  • Not as conventional Boolean logic

Here we will define a fuzzy set for each query (or index) term, thus each doc has a degree of membership in this set.

slide-6
SLIDE 6

IR 2004 – Berlin Chen 6

Fuzzy Set Model (cont.)

  • Definition

– A fuzzy subset A of a universal of discourse U is characterized by a membership function µA: U → [0,1]

  • Which associates with each element u of U a

number µA(u) in the interval [0,1] – Let A and B be two fuzzy subsets of U. Also, let A be the complement of A. Then,

  • Complement
  • Union
  • Intersection

) ( 1 ) ( u u

A A

µ µ − =

)) ( ), ( max( ) ( u u u

B A B A

µ µ µ =

)) ( ), ( min( ) ( u u u

B A B A

µ µ µ =

∩ U A B

u

slide-7
SLIDE 7

IR 2004 – Berlin Chen 7

Fuzzy Set Model (cont.)

  • Fuzzy information retrieval

– Fuzzy sets are modeled based on a thesaurus – This thesaurus can be constructed by a term-term correlation matrix (or called keyword connection matrix)

  • : a term-term correlation matrix
  • : a normalized correlation factor for terms ki and kl
  • We now have the notion of proximity among index terms

– The relationship is symmetric !

c r

l i

c ,

l i l i l i l i

n n n n c

, , ,

− + =

l i i

n n

,

: no of docs that contain ki : no of docs that contain both ki and kl

Defining term relationship

( ) ( )

i k i l l i l k

k c c k

l i

µ µ = = =

, ,

docs, paragraphs, sentences, .. ranged from 0 to 1

slide-8
SLIDE 8

IR 2004 – Berlin Chen 8

Fuzzy Set Model (cont.)

  • The union and intersection operations are

modified here

– Union: algebraic sum (instead of max) – Intersection: algebraic product (instead of min)

U A1 A2

u

( )

= ∪

= + + =

2 1

1 1 ) ( ) ( ) ( ) ( ) ( ) ( ) (

2 1 2 1 2 1 2 1

j A A A A A A A A A

(u)

  • µ
  • u

u u u u u u

j

µ µ µ µ µ µ µ

( )

1 1 ) ( ) (

1

2 1

= ∪ ∪ ∪

= =

n j A A A A A

(u)

  • µ
  • u

u

j j j n

µ µ

L

) ( ) ( ) (

2 1 2 1

u u u

A A A A

µ µ µ =

= ∩ ∩

=

n j A A A A

(u) µ u

j n

1

) (

2 1

L

µ

a negative algebraic product

( ) ( )

) 1 )( 1 ( 1 ) 1 ( 1 1 1 b a ab b a ab a ab b ab b a b a ab b a b a ab − − − = + − − − = − + − + = − + − + = + +

slide-9
SLIDE 9

IR 2004 – Berlin Chen 9

Fuzzy Set Model (cont.)

– The degree of membership between a doc dj and an index term ki

  • Computes an algebraic sum over all terms in the doc dj

– Implemented as the complement of a negative algebraic product – A doc dj belongs to the fuzzy set associated to the term ki if its own terms are related to ki

  • If there is at least one index term kl of dj which is strongly

related to the index ki ( ) then µki,dj ∼1 – ki is a good fuzzy index for doc dj – And vice versa

( )

( ) ( ) ( )

( )

( )

∏ ∏

∈ ∈ ∪

− − = − − = = =

∈ j l j l l l j d l k j i

d k l i d k i k i k i d j k

c k k k d

,

1 1 1 1 µ µ µ µ

1 ~

,l i

c

a i

c ,

b i

c ,

a i

c , 1 −

b i

c , 1 −

a

k

b

k

i

k

algebraic sum (a doc is a union of index terms)

slide-10
SLIDE 10

IR 2004 – Berlin Chen 10

Fuzzy Set Model (cont.)

  • Example:

– Query q=ka ∧ (kb ∨ ¬kc) qdnf =(ka ∧ kb ∧ kc) ∨ (ka ∧ kb ∧ ¬ kc) ∨(ka ∧ ¬kb ∧ ¬kc) =cc1+cc2+cc3 – Da is the fuzzy set of docs associated to the term ka – Degree of membership ?

cc3 cc2 Da Db Dc

disjunctive normal form

conjunctive component

cc1

slide-11
SLIDE 11

IR 2004 – Berlin Chen 11

Fuzzy Set Model (cont.)

( )( )(

)

)) 1 )( 1 ( 1 ( )) 1 ( 1 ( ) 1 ( 1 1 1 1 1 ) 1 ( 1

, , , , , , , , , , , , 3 1 , , ,

3 2 1 j j j j j j j j j j j j j i j j

d c d b d a d c d b d a d c d b d a d c b a d c b a d c b a i d cc d cc cc cc d q

µ µ µ µ µ µ µ µ µ µ µ µ µ µ µ − − − × − − × − − = − − − − = − − = =

∩ ∩ ∩ ∩ ∩ ∩ = ∪ ∪

algebraic sum negative algebraic product

cc1 cc2 cc3

algebraic product

for a doc in the fuzzy answer set

j

d

q

D

  • Degree of membership

cc3 cc2 Da Db Dc cc1

slide-12
SLIDE 12

IR 2004 – Berlin Chen 12

Fuzzy Set Model (cont.)

  • Advantages

– The correlations among index terms are considered – Degree of relevance between queries and docs can be achieved

  • Disadvantages

– Fuzzy IR models have been discussed mainly in the literature associated with fuzzy theory – Experiments with standard test collections are not available

slide-13
SLIDE 13

IR 2004 – Berlin Chen 13

Extended Boolean Model

  • Motive

– Extend the Boolean model with the functionality of partial matching and term weighting

  • E.g.: in Boolean model, for the qery q=kx ∧ ky , a

doc contains either kx or ky is as irrelevant as another doc which contains neither of them

  • How about the disjunctive query q=kx ∨ ky

– Combine Boolean query formulations with characteristics of the vector model

  • Term weighting
  • Algebraic distances for similarity measures

Salton et al., 1983

a ranking can be obtained

陳水扁 及 呂秀蓮 陳水扁 或 呂秀蓮

slide-14
SLIDE 14

IR 2004 – Berlin Chen 14

Extended Boolean Model (cont.)

  • Term weighting

– The weight for the term kx in a doc dj is

  • is normalized to lay between 0 and 1
  • Assume two index terms kx and ky were used

– Let denote the weight of term kx on doc dj – Let denote the weight of term ky on doc dj – The doc vector is represented as – Queries and docs can be plotted in a two-dimensional map

i i x j x j x

idf idf tf w max

, ,

× =

Normalized idf

j x

w ,

j x

w , x y

j y

w , ( )

j y j x j

w w d

, , ,

= r

( )

y x d j , =

normalized frequency

ranged from 0 to 1

slide-15
SLIDE 15

IR 2004 – Berlin Chen 15

Extended Boolean Model (cont.)

  • If the query is q=kx ∧ ky (conjunctive query)
  • The docs near the point (1,1) are preferred
  • The similarity measure is defined as

( ) ( ) ( )

2 1 1 1 ,

2 2

y x d q sim

and

− + − − =

2-norm model (Euclidean distance)

dj dj+1

(0,0) (1,1)

kx ky AND

( )

j y j x j

w w d

, , ,

= r 1

2 / 1 1− 2 / 1 1− 2 / 1 1−

j x

w x

,

=

j y

w y

,

=

slide-16
SLIDE 16

IR 2004 – Berlin Chen 16

Extended Boolean Model (cont.)

  • If the query is q=kx ∨ ky (disjunctive query)
  • The docs far from the point (0,0) are preferred
  • The similarity measure is defined as

( )

2 ,

2 2

y x d q sim

  • r

+ = dj dj+1 y = wy,j

(0,0) (1,1)

kx ky Or x = wx,j

1

2 / 1 2 / 1

2-norm model (Euclidean distance)

slide-17
SLIDE 17

IR 2004 – Berlin Chen 17

Extended Boolean Model (cont.)

  • The similarity measures and

also lay between 0 and 1

( )

d q sim

  • r ,

( )

d q sim

and ,

slide-18
SLIDE 18

IR 2004 – Berlin Chen 18

Extended Boolean Model (cont.)

  • Generalization

– t index terms are used → t-dimensional space – p-norm model, – Some interesting properties

  • p=1
  • p=

∞ ≤ ≤ p 1

m p p p and

k k k q ∧ ∧ ∧ = ...

2 1

m p p p

  • r

k v k k q ...

2 1

∨ ∨ = ( ) ( ) ( ) ( )

p p m p p and

m x x x d q sim

1 2 1

1 ... 1 1 1 , ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + + − + − − =

( )

p p m p p

  • r

m x x x d q sim

1 2 1

... , ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + =

( ) ( )

m x x x d q sim d q sim

m

  • r

and

+ + + = = ... , ,

2 1

( ) ( )

i and

x d q sim min , ≈

( ) ( )

i

  • r

x d q sim max , ≈

just like the formula of fuzzy logic

slide-19
SLIDE 19

IR 2004 – Berlin Chen 19

Extended Boolean Model (cont.)

  • Example query 1:

– Processed by grouping the operators in a predefined

  • rder
  • Example query 2:

– Combination of different algebraic distances

( )

3 2 1

k k k q

p p

∨ ∧ =

( ) ( ) ( )

p p p p p p

x x x d q sim

1 3 1 2 1

2 2 1 1 1 , ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + − − =

( )

3 2 2 1

k k k q

∧ ∨ =

( )

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + =

3 2 1 2 2 2 1

, 2 min , x x x d q sim

slide-20
SLIDE 20

IR 2004 – Berlin Chen 20

Extended Boolean Model (cont.)

  • Advantages

– A hybrid model including properties of both the set theoretic models and the algebraic models

  • Relax the Boolean algebra by interpreting Boolean
  • perations in terms of algebraic distances
  • Disadvantages

– Distributive operation does not hold for ranking computation

  • E.g.:

– Assumes mutual independence of index terms

( ) ( ) ( )

3 2 2 2 3 2 1 2 3 2 2 2 1 1

, k k k k q k k k q ∨ ∧ ∨ = ∨ ∧ =

( ) ( )

d q sim d q sim , ,

2 1

( ) ( )

2 1 2 3 2 2 1 2 2 2 1

2 2 1 1 1 ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + − − x x x

2 1 2 2 3 2 2 2 2 2 2 1

2 2 1 2 1 1 ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + − + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + − − x x x x

slide-21
SLIDE 21

IR 2004 – Berlin Chen 21

Generalized Vector Model

  • Premise

– Classic models enforce independence of index terms – For the Vector model

  • Set of term vectors {k1, k1, ..., kt} are linearly

independent and form a basis for the subspace of interest

  • Frequently, it means pairwise orthogonality

∀i,j ⇒ ki ․kj = 0 (in a more restrictive sense)

  • Wong et al. proposed an interpretation

– The index term vectors are linearly independent, but not pairwise orthogonal

  • Generalized Vector Model

Wong et al., 1985

slide-22
SLIDE 22

IR 2004 – Berlin Chen 22

Generalized Vector Model (cont.)

  • Key idea

– Index term vectors form the basis of the space are not

  • rthogonal and are represented in terms of smaller

components (minterms)

  • Notations

– {k1, k2, …, kt}: the set of all terms – wi,j: the weight associated with [ki, dj] – Minterms:binary indicators (0 or 1) of all patterns of

  • ccurrence of terms within documents
  • Each represent one kind of co-occurrence of index terms in a

specific document

slide-23
SLIDE 23

IR 2004 – Berlin Chen 23

Generalized Vector Model (cont.)

  • Representations of minterms

m1=(0,0,….,0) m2=(1,0,….,0) m3=(0,1,….,0) m4=(1,1,….,0) m5=(0,0,1,..,0) … m2t=(1,1,1,..,1) m1=(1,0,0,0,0,….,0) m2=(0,1,0,0,0,….,0) m3=(0,0,1,0,0,….,0) m4=(0,0,0,1,0,….,0) m5=(0,0,0,0,1,….,0) … m2t=(0,0,0,0,0,….,1) 2t minterms 2t minterm vectors

Points to the docs where only index terms k1 and k2 co-occur and the other index terms disappear Point to the docs containing all the index terms Pairwise orthogonal vectors mi associated with minterms mi as the basis for the generalized vector space

slide-24
SLIDE 24

IR 2004 – Berlin Chen 24

Generalized Vector Model (cont.)

  • Minterm vectors are pairwise orthogonal. But,

this does not mean that the index terms are independent

– Each minterm specifies a kind of dependence among index terms – That is, the co-occurrence of index terms inside docs in the collection induces dependencies among these index terms

slide-25
SLIDE 25

IR 2004 – Berlin Chen 25

Generalized Vector Model (cont.)

  • The vector associated with the term ki is

represented by summing up all minterms containing it and normalizing

( ) ( )

∑ ∑

= ∀ = ∀

=

1 , 2 , 1 , , r m i g r r i r m i g r r r i i

c m c k r r

( )

( )

∑ =

= all for , , , l m g d g d j i r i

r l j l j

w c

r

All the docs whose term co-occurrence relation (pattern) can be represented as (exactly coincide with that of) minterm mr

  • The weight associated with the pair [ki, mr]

sums up the weights of the term ki in all the docs which have a term occurrence pattern given by mr.

  • Notice that for a collection of size N,
  • nly N minterms affect the ranking (and not 2N)

( )

r i

m g

Indicates the index term ki is in the minterm mr

slide-26
SLIDE 26

IR 2004 – Berlin Chen 26

Generalized Vector Model (cont.)

  • The similarity between the query and doc is

calculated in the space of minterm vectors

( ) ( )

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

⋅ = ⋅ = = ⇒ = = ⇒ =

r r d r r q r r d r q j j i q i i q i i j i q i j j i r r r q i q i j r r r j i i j i j

s s s s d q sim w w w w d q sim m s k w q m s k w d

, , , , , , , , , , , ,

, , r r r r r r r r r r

t-dimensional 2t-dimensional

slide-27
SLIDE 27

IR 2004 – Berlin Chen 27

Generalized Vector Model (cont.)

  • Example (a system with three index terms)

d1 d2 d3 d4 d5 d6 d7 k1 k2 k3 k1 k2 k3 minterm d1 2 1 m6 d2 1 m2 d3 1 3 m7 d4 2 m2 d5 1 2 4 m8 d6 1 2 m4 d7 5 m3 q 1 2 3

minterm k1 k2 k3 m1 m2 1 m3 1 m4 1 1 m5 1 m6 1 1 m7 1 1 m8 1 1 1

2 8 , 2 2 7 , 2 2 4 , 2 2 3 , 2 8 8 , 2 7 7 , 2 4 4 , 2 3 3 , 2 2

c c c c m c m c m c m c k + + + + + + = r r r r r

2 8 , 1 2 6 , 1 2 4 , 1 2 2 , 1 8 8 , 1 6 6 , 1 4 4 , 1 2 2 , 1 1

c c c c m c m c m c m c k + + + + + + = r r r r r

2 8 , 3 2 7 , 3 2 6 , 3 2 5 , 3 8 8 , 3 7 7 , 3 6 6 , 3 5 5 , 3 3

c c c c m c m c m c m c k + + + + + + = r r r r r

1 2 1 3 2 1

5 , 1 8 , 1 1 , 1 6 , 1 6 . 1 4 , 1 4 . 1 2 . 1 2 , 1

= = = = = = = + = + = w c w c w c w w c

2 2 2 2 8 6 4 2 1

1 2 1 3 1 2 1 3 + + + + + + = m m m m k r r r r r

2 1 2 5

5 , 2 8 , 2 3 , 2 7 , 2 6 , 2 4 , 2 7 , 2 3 , 2

= = = = = = = = w c w c w c w c

2 2 2 2 8 7 4 3 2

2 1 2 5 2 1 2 5 + + + + + + = m m m m k r r r r r

4 3 1

5 , 3 8 , 3 3 , 3 7 , 3 1 , 3 6 , 3 5 , 3

= = = = = = = w c w c w c c

2 2 2 2 8 7 6 5 3

4 3 1 4 3 1 + + + + + + = m m m m k r r r r r

slide-28
SLIDE 28

IR 2004 – Berlin Chen 28

Generalized Vector Model (cont.)

  • Example: Ranking

15 1 2 1 3 1 2 1 3 1 2 1 3

8 6 4 2 2 2 2 2 8 6 4 2 1

m m m m m m m m k r r r r r r r r r + + + = + + + + + + = 34 2 1 2 5 2 1 2 5 2 1 2 5

8 7 4 3 2 2 2 2 8 7 4 3 2

m m m m m m m m k r r r r r r r r r + + + = + + + + + + = 26 4 3 1 4 3 1 4 3 1

8 7 6 2 2 2 2 8 7 6 5 3

m m m m m m m k r r r r r r r r + + = + + + + + + =

8 7 6 4 2 3 1 1

26 4 1 15 1 2 26 3 1 26 1 1 15 2 2 15 1 2 15 3 2 1 2 m m m m m k k d r r r r r r r r ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⋅ = + =

8 7 6 4 3 2 3 2 1

26 4 3 34 2 2 15 1 1 26 3 3 34 1 2 26 1 3 15 2 1 34 2 2 15 1 1 34 5 2 15 3 1 3 2 1 m m m m m m k k k q r r r r r r r r r r ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⋅ = + + =

sd1,4 sd1,2 sd1,6 sd1,7 sq,6 sd1,8 sq,2 sq,3 sq,4

8 7 6 4 3 2 3 2 1

26 4 3 34 2 2 15 1 1 26 3 3 34 1 2 26 1 3 15 2 1 34 2 2 15 1 1 34 5 2 15 3 1 3 2 1 m m m m m m k k k q r r r r r r r r r r ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⋅ + ⋅ + ⋅ + ⋅ = + + =

sq,8 sq,7

( ) ( )

2 2 2 2 2 2 2 2 2 2 2 8 , 8 , 7 , 7 , 6 , 6 , 4 , 4 , 2 , 2 , 1 2 2 , ,

8 , 1 7 , 1 6 , 1 4 , 1 2 , 1 8 , 7 , 6 , 4 , 3 , 2 , 1 1 1 1 1 , , , , , , , ,

, ) , ( consine ,

d d d d d q q q q q q r d r q r d r d r q r q r d r q

s s s s s s s s s s s s s s s s s s s s s d q sim s s s s d q d q sim

d q d q d q d q d q s s r s s r s s r r d r q

+ + + + + + + + + + + + + = ⋅ = =

∑ ∑ ∑

≠ ∧ ≠ ≠ ∧ ≠ ≠ ∧ ≠

The similarity between the query and doc is calculated in the space of minterm vectors

slide-29
SLIDE 29

IR 2004 – Berlin Chen 29

Generalized Vector Model (cont.)

  • Term Correlation

– The degree of correlation between the terms ki and kj can now be computed as

  • Do not need to be normalized? (because we have

done it before!)

= ∧ = ∀

× =

  • 1

) ( 1 ) ( | , , r m j g r m i g r r j r i j i

c c k k

slide-30
SLIDE 30

IR 2004 – Berlin Chen 30

Generalized Vector Model (cont.)

  • Advantages

– Model considers correlations among index terms – Model does introduce interesting new ideas

  • Disadvantages

– Not clear in which situations it is superior to the standard vector model – Computation costs are higher