ad hoc i r i ntroduction
play

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & - PDF document

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & Management Ad-hoc Information Retrieval: CS-490W Search a collection of documents to find relevant documents that Web Information Search & Management satisfy different


  1. Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & Management Ad-hoc Information Retrieval: CS-490W � Search a collection of documents to find relevant documents that Web Information Search & Management satisfy different information needs (i.e. queries) � Example: Web search Basic Concepts of Information Retrieval Luo Si Department of Computer Science Purdue University Basic Concepts of I R: Outline Ad-hoc I R: I ntroduction Basic Concepts of Information Retrieval: Ad-hoc Information Retrieval: � Search a collection of documents to find relevant documents that � Task definition of Ad-hoc IR satisfy different information needs (i.e. queries) � Terminologies and concepts Relatively � Overview of retrieval models Changes Stable � Queries are created and used dynamically; change fast � Text representation � “Ad-hoc”: formed or used for specific or immediate problems or � Indexing needs” – Merriam-Webster’s collegiate Dictionary � Text preprocessing Ad-hoc IR vs. Filtering � Evaluation � Filtering: Queries are stable (e.g., Asian High-Tech) while the � Evaluation methodology collection changes (e.g., news) � Evaluation metrics � More for filtering in later lectures Ad-hoc I R: Terminologies Content Based Filtering Filtering Terminologies: Information Needs are Stable � Query System should make a delivery decision on the fly when a � Representative data of user’s information need: text (default) and document “arrives” other media � Document User Profile: Asian High-Tech � Data candidate to satisfy user’s information need: text (default) and other media � Database|Collection|Corpus Filtering System � A set of documents � Corpora � A set of databases � Valuable corpora from TREC (Text Retrieval Evaluation Conference)

  2. AD-hoc I R: Basic Process Types of Retrieval Models � Exact Match (Document Selection) Information � Example: Boolean Retrieval Method Need � Query defines the exact retrieval criterion Representation Representation � Relevance is a binary variable; a document is either relevant (i.e., match query) or irrelevant (i.e., mismatch) Query Retrieval Model � Result is a set of documents Indexed Objects � Documents are unordered � Often in reverse-chronological order (e.g., Pubmed) Retrieved Objects Return Exact Evaluation/Feedback Match Ignore AD-hoc I R: Overview of Retrieval Model Types of Retrieval Models Retrieval Models � Best Match (Document Ranking) � Boolean � Example: Most probabilistic models � Query describes the desired retrieval criterion � Vector space � Degree of relevance is a continuous/integral variable; � Basic vector space SMART each document matches query to some degree � Extended Boolean � Result in a ranked list ( top ones match better) � Probabilistic models � Often return a partial list (e.g., rank threshold) � Statistical language models Lemur � Two Possion model Okapi Doc1 0.99 + � Bayesian inference networks Inquery Return Doc2 0.90 + Best Doc3 0.85 + � Citation/Link analysis models Match Doc4 0.82 - Rank � Page rank Google Doc5 0.81 + � Hub & authorities Clever Doc6 0.79 - ………………. AD-hoc I R: Overview of Retrieval Model Types of Retrieval Models Retrieval Model Exact Match (Selection) vs. Best Match (Ranking) Determine whether a document is relevant to query � Best Match is usually more accurate/effective � Do not need precise query; representative query generates good � Relevance is difficult to define results � Varies by judgers � Users have control to explore the rank list: view more if need every � Varies by context (i.e., jointly by a set of documents and queries) piece; view less if need one or two most relevant � Different retrieval methods estimate relevance differently � Exact Match � Word occurrence of document and query � Hard to define the precise query; too strict (terms are too specific) or � In probabilistic framework, P(query|document) or too coarse (terms are too general) P(Relevance|query,document) � Users have no control over the returned results � Estimate semantic consistency between query and document � Still prevalent in some markets (e.g., legal retrieval)

  3. AD-hoc I R: Basic Process Text Representation: TREC Format <DOC> Information <DOCNO> AP900101-0001 </DOCNO> Need <FILEID>AP-NR-01-01-90 2345EDT</FILEID> <FIRST>r i PM-Iran-Population Bjt 01-01 0777</FIRST> <SECOND>PM-Iran-Population, Bjt,0800</SECOND> Representation Representation <HEAD>Iran Moves To Curb A Baby Boom That Threatens Its Economic Future</HEAD> <HEAD>An AP Extra</HEAD> Query Retrieval Model <BYLINE>By ED BLANCHE</BYLINE> Indexed Objects <BYLINE>Associated Press Writer</BYLINE> <DATELINE>NICOSIA, Cyprus (AP) </DATELINE> <TEXT> Retrieved Objects Iran's government is intensifying a birth control program _ despite opposition from radicals _ because the country's fast-growing population is imposing strains on a struggling economy. Evaluation/Feedback ………… </TEXT> </DOC> Text Representation: What you see Text Representation: I ndexing Indexing It never leaves my side, April 6, 2002 Associate document/query with a set of keys Reviewer:"dage456" (Carmichael, CA USA) - See all my reviewsIt fits in the palm of your hand and is the size of a deflated wallet (wonder where the money went). � Manual or human Indexing I have had my ipod now for 4 months and cannot imagine how I used to get by with my old rio 600 with is 64 megs of ram and.. usb connection. Because of its � Indexers assign keywords or key concepts (e.g., libraries, Medline, size this little machine goes with my everywhere and its ten hour battery life means I can listen to stuff all day long. Yahoo!); often small vocabulary Pros: size, both physical and capacity. � Significant human efforts, may not be thorough design: It looks beautiful controls: simple and very easy to use � Automatic Indexing connection: FIREWIRE!! Cons: needs the ability to bookmark. I use my ipod mostly for audiobooks. the � Index program assigns words, phrases or other features; often large ipod needs to include a bookmark feature for those like me. vocabulary From Amazon Customer Review of IPod � No human efforts Text Representation: What computer see Text Representation: I ndexing <table><tr><td valign="top"> Controlled Vocabulary vs. Full Text Reviewer:</td> � Controlled Vocabulary Indexing <td><a href="http://www.amazon.com/exec/obidos/tg/cm/member-glance/- /AJF9GJKJ8UGNX/1/ref=cm_cr_auth/002-1193904-0468830?%5Fencoding=UTF8"><span � Assign words from a small vocabulary or a node from an ontology style =" font-weight: bold;">"dage456"</span></a> (Carmichael, CA USA) - <a href="http://www.amazon.com/gp/cdp/member- � Often manually but can be done by learning algorithms reviews/AJF9GJKJ8UGNX/ref=cm_cr_auth/002-1193904-0468830?ie=UTF8“> � Full Indexing: See all my reviews</a></td></tr></table>It fits in the palm of your hand and is the size of a deflated wallet (wonder where the money went). <p>I have had my ipod now for 4 � Often index with an uncontrolled vocabulary of full text months and cannot imagine how I used to get by with my old rio 600 with is 64 megs of ram and.. usb connection. Because of its size this little machine goes with my � Automatically while good algorithm can generate more everywhere and its ten hour battery life means I can listen to stuff all day long.<p>Pros: representative keywords/ key concepts size, both physical and capacity.<br>design: It looks beautiful<br>controls: simple and very easy to use<p>connection: FIREWIRE!!<p>Cons: needs the ability to bookmark. I use my ipod mostly for audiobooks. the ipod needs to include a bookmark feature for those like me.<br /><br /> From Amazon Customer Review of IPod

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend