Information Retrieval Tutorial 4: Vector Space Model Professor: - PowerPoint PPT Presentation

Review Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-11-15 Vector Space Model 1 / 21

Review Outline Review 1 Vector Space Model 2 / 21

Review Simple Boolean vs. Ranking of result set Simple Boolean vs. Ranking of result set Simple Boolean retrieval returns matching documents in no particular order. Google (and most well designed Boolean engines) rank the result set – they rank good hits (according to some estimator of relevance) higher than bad hits. Vector Space Model 3 / 21

Review Ranked retrieval Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users Most users are not capable of writing Boolean queries . . . Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users Most users are not capable of writing Boolean queries . . . . . . or they are, but they think it’s too much work. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users Most users are not capable of writing Boolean queries . . . . . . or they are, but they think it’s too much work. Most users don’t want to wade through 1000s of results. Vector Space Model 4 / 21

Review Ranked retrieval Thus far, our queries have been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and of the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users Most users are not capable of writing Boolean queries . . . . . . or they are, but they think it’s too much work. Most users don’t want to wade through 1000s of results. This is particularly true of web search. Vector Space Model 4 / 21

Review Problem with Boolean search: Feast or famine Boolean queries often result in either too few (=0) or too many (1000s) results. Vector Space Model 5 / 21

Review Problem with Boolean search: Feast or famine Boolean queries often result in either too few (=0) or too many (1000s) results. In Boolean retrieval, it takes a lot of skill to come up with a query that produces a manageable number of hits. Vector Space Model 5 / 21

Review Problem with Boolean search: Feast or famine Boolean queries often result in either too few (=0) or too many (1000s) results. In Boolean retrieval, it takes a lot of skill to come up with a query that produces a manageable number of hits. AND gives too few; OR gives too many Vector Space Model 5 / 21

Review Scoring as the basis of ranked retrieval Vector Space Model 6 / 21

Review Scoring as the basis of ranked retrieval We wish to rank documents that are more relevant higher than documents that are less relevant. Vector Space Model 6 / 21

Review Scoring as the basis of ranked retrieval We wish to rank documents that are more relevant higher than documents that are less relevant. How can we accomplish such a ranking of the documents in the collection with respect to a query? Vector Space Model 6 / 21

Review Scoring as the basis of ranked retrieval We wish to rank documents that are more relevant higher than documents that are less relevant. How can we accomplish such a ranking of the documents in the collection with respect to a query? Assign a score to each query-document pair, say in [0 , 1]. Vector Space Model 6 / 21

Review Scoring as the basis of ranked retrieval We wish to rank documents that are more relevant higher than documents that are less relevant. How can we accomplish such a ranking of the documents in the collection with respect to a query? Assign a score to each query-document pair, say in [0 , 1]. This score measures how well document and query “match”. Vector Space Model 6 / 21

Review Take 1: Jaccard coefficient Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Jaccard coefficient: jaccard ( A , B ) = | A ∩ B | | A ∪ B | ( A � = ∅ or B � = ∅ ) Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Jaccard coefficient: jaccard ( A , B ) = | A ∩ B | | A ∪ B | ( A � = ∅ or B � = ∅ ) jaccard ( A , A ) = 1 Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Jaccard coefficient: jaccard ( A , B ) = | A ∩ B | | A ∪ B | ( A � = ∅ or B � = ∅ ) jaccard ( A , A ) = 1 jaccard ( A , B ) = 0 if A ∩ B = 0 Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Jaccard coefficient: jaccard ( A , B ) = | A ∩ B | | A ∪ B | ( A � = ∅ or B � = ∅ ) jaccard ( A , A ) = 1 jaccard ( A , B ) = 0 if A ∩ B = 0 A and B don’t have to be the same size. Vector Space Model 7 / 21

Review Take 1: Jaccard coefficient A commonly used measure of overlap of two sets Let A and B be two sets Jaccard coefficient: jaccard ( A , B ) = | A ∩ B | | A ∪ B | ( A � = ∅ or B � = ∅ ) jaccard ( A , A ) = 1 jaccard ( A , B ) = 0 if A ∩ B = 0 A and B don’t have to be the same size. Always assigns a number between 0 and 1. Vector Space Model 7 / 21

Review Jaccard coefficient: Example Problem1: What is the query-document match score that the Jaccard coefficient computes for: Vector Space Model 8 / 21

Review Jaccard coefficient: Example Problem1: What is the query-document match score that the Jaccard coefficient computes for: Query: “University College Cork” Vector Space Model 8 / 21

Review Jaccard coefficient: Example Problem1: What is the query-document match score that the Jaccard coefficient computes for: Query: “University College Cork” Document “Cork City Tourism guide” Vector Space Model 8 / 21

Review Jaccard coefficient: Example Problem1: What is the query-document match score that the Jaccard coefficient computes for: Query: “University College Cork” Document “Cork City Tourism guide” jaccard ( q , d ) = 1 / 6 Vector Space Model 8 / 21

Review What’s wrong with Jaccard? Vector Space Model 9 / 21

Review What’s wrong with Jaccard? It doesn’t consider term frequency (how many occurrences a term has). (tf) Vector Space Model 9 / 21

Review What’s wrong with Jaccard? It doesn’t consider term frequency (how many occurrences a term has). (tf) Rare terms are more informative than frequent terms. Jaccard does not consider this information. (idf) Vector Space Model 9 / 21

Review What’s wrong with Jaccard? It doesn’t consider term frequency (how many occurrences a term has). (tf) Rare terms are more informative than frequent terms. Jaccard does not consider this information. (idf) We need a more sophisticated way of normalizing for the length of a document. Vector Space Model 9 / 21

Review tf-idf weighting Vector Space Model 10 / 21

Review tf-idf weighting The tf-idf weight of a term is the product of its tf weight and its idf weight. Vector Space Model 10 / 21

Information Retrieval Tutorial 4: Vector Space Model Professor: - PowerPoint PPT Presentation

Review Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-11-15 Vector Space Model 1 / 21 Review Outline Review 1 Vector Space Model 2 / 21 Review Simple

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Scoring, term weighting, the vector space model Giorgio Gambosi Course of Information Retrieval

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Information Retrieval Vector space classification Hamid Beigy Sharif university of technology

Vector Space Models Module Introduction CS6200: Information Retrieval In the first module, we

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal

Retrieval Strategy Retrieval Strategies: Vector Space Model An IR strategy is a technique by

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space

DETECTION OF HOUSING AND AGRICULTURE AREAS ON DRY-RIVERBEDS FOR THE EVALUATION OF RISK BY

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Outline Linkage-based Clustering Motivation Definitions Clustering with Semantic

Discrete Mathematics and Its Applications Lecture 7: Graphs: Proximity MING GAO DaSE@ECNU (for

The project Automation of reverse parking using rules and logic, regardless of the size of the

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work

Results for the mass di ff erence between the long- and short-lived K mesons for physical quark

The Roper resonance from spatially large interpolation fields The QCD Collaboration: Mingyang

Information Retrieval Tutorial 4: Vector Space Model Professor: - PowerPoint PPT Presentation

Review Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-11-15 Vector Space Model 1 / 21 Review Outline Review 1 Vector Space Model 2 / 21 Review Simple

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Scoring, term weighting, the vector space model Giorgio Gambosi Course of Information Retrieval

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Information Retrieval Vector space classification Hamid Beigy Sharif university of technology

Vector Space Models Module Introduction CS6200: Information Retrieval In the first module, we

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal

Retrieval Strategy Retrieval Strategies: Vector Space Model An IR strategy is a technique by

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Introduction to Information Retrieval http://informationretrieval.org IIR 6&amp;7: Vector Space

DETECTION OF HOUSING AND AGRICULTURE AREAS ON DRY-RIVERBEDS FOR THE EVALUATION OF RISK BY

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Outline Linkage-based Clustering Motivation Definitions Clustering with Semantic

Discrete Mathematics and Its Applications Lecture 7: Graphs: Proximity MING GAO DaSE@ECNU (for

The project Automation of reverse parking using rules and logic, regardless of the size of the

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud &amp; Paris Descartes Joint work

Results for the mass di ff erence between the long- and short-lived K mesons for physical quark

The Roper resonance from spatially large interpolation fields The QCD Collaboration: Mingyang

Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work