Web Search Basics Introduction to Information Retrieval INF 141/ CS - PowerPoint PPT Presentation

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org

Overview Overview • Introduction • Classic Information Retrieval • Web IR • Sponsored Search • Web Search Basics • Size of the Web • Web Users • Spam

Classic Information Retrieval Classic IR assumptions • Corpus: Fixed document collection • Goal: Retrieve information content relevant to information need

Classic Information Retrieval Classic IR Goal • Classic “Relevance” • For each query, Q, and stored document, D, in a corpus there exists a relevance score: R(Q,D) • R(Q,D) is averaged over users, U, and contexts, C • Maximize R(Q,D) instead of R(Q,D,U,C) • Context is ignored • Individuals are ignored • Corpus is static

Web Information Retrieval Web IR: Differences from traditional IR • On the web, search and ads are intricately connected • The web is huge • The web is a rapidly changing collection. • There is spam on the web • Adversarial IR • Huge difference from traditional IR • One interface for hugely divergent needs • Queries, Maps, Stocks, Weather, Calculations

Web Information Retrieval History • Early keyword-based engines • (1995-1997) Altavista, Excite, Infoseek, Inktomi • Paid placement ranking • Goto.com -> Overture.com -> Yahoo! • Results based on auction for keyword placement

Web Information Retrieval History • (1998+) Link-based ranking pioneered by Google • Links added the idea of “authoritativeness” to “relevance” • Blew away all early engines save Inktomi • Great user experience looking for a business model • Meanwhile Goto/Overture’s annual revenues were nearing $1 billion

Web Information Retrieval History • Result • Google: • Added paid placement ads on the side • Differentiated from search results • Yahoo! built a similar architecture • Buys Overture for paid placement • Buys Inktomi for search

Sponsored Search Ads Ads Algorithmic Results

Sponsored Search Ads vs. Search Results • Google has maintained that ads (based on vendors bidding for search queries) do not affect vendors ranking in search results

Sponsored Search Ranking of ads • Other search engines (Yahoo!, MSN) have made similar statements on occasion • Any of them can change at any time • Facebook is currently testing the waters in their “Newsfeeds” • We will ignore the possibility of paid placement ads being interspersed in search results.

Sponsored Search Ranking of ads • Goto model: • Rank according to how much advertiser pays • Current model: • Balance auction price and relevance • Irrelevant ads (few click-throughs) • Decrease opportunities for relevant ads • Harm the user experience • Idea: Well-targeted advertising is good for everyone

Sponsored Search Paying for advertisements • CPM • “Cost Per Mil” • Pay for 1000 eyeballs • Important for branding campaigns • CPC • “Cost per Click” • Pay for clicking on ads • Important for sales campaigns

Web Search Basics The Web Corpus • No design/coordination • Distributed content creation, linking • “Democratization of publishing” • Content includes truth, lies, contradictions, etc. • Unstructured Data (text, html) • Semi-Structured (XML, annotated photos) • Structured (Databases) The Web • Scale is much larger than previous text corpora

Web Search Basics The Web Corpus • Growth - slowing from “doubling every few months”, but still expanding The Web

Web Search Basics Dynamic Content • Content can by dynamically generated • There is no static html version • Flight status information, evite responses • Assembled on request (“?” in URL is a clue) The User flickr:crankyT Flight AA715 Browser Application Server Databases

Web Search Basics Dynamic Content • Most (truly) dynamic content is ignored by web spiders • Too much to index • Static information is more important for search • Spider Traps look dynamic • Actually a lot of “static” content is assembled on the fly also • ASP, PHP, JSP, ads, etc....

Web Search Basics Introduction to Information Retrieval INF 141/ CS - PowerPoint PPT Presentation

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schtze http://www.informationretrieval.org Overview Overview Introduction Classic Information Retrieval Web

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Link-based Web Search Web Search PageRank HITS Stability Issues Current

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Informatics 1: Data & Analysis Lecture 14: Example Corpora Applications Ian Stark School of

How to have a research career in industry Rebecca Isaacs, Research Scientist at Google SOSP

About Me About Me The Webs Missing Links: The Webs Missing Links: Dual training Dual

Multiprocessors and Thread-Level Parallelism 1 MO401 Tpicos IC-UNICAMP Centralized

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

Sambuz

Useful Links

Newsletter

Mail Us

Web Search Basics Introduction to Information Retrieval INF 141/ CS - PowerPoint PPT Presentation

Web Search Basics Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schtze http://www.informationretrieval.org Overview Overview Introduction Classic Information Retrieval Web

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Link-based Web Search Web Search PageRank HITS Stability Issues Current

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web and PageRank Lecture 4 CSCI 4974/6971 12 Sep 2016 1 / 16 Todays Biz 1. Review MPI 2.

Informatics 1: Data &amp; Analysis Lecture 14: Example Corpora Applications Ian Stark School of

How to have a research career in industry Rebecca Isaacs, Research Scientist at Google SOSP

About Me About Me The Webs Missing Links: The Webs Missing Links: Dual training Dual

Multiprocessors and Thread-Level Parallelism 1 MO401 Tpicos IC-UNICAMP Centralized

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &amp;

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

Sambuz

Useful Links

Newsletter

Mail Us

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Informatics 1: Data & Analysis Lecture 14: Example Corpora Applications Ian Stark School of

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &