Algorithms for Web Indexing and Searching Gerth Stlting Brodal and - - PowerPoint PPT Presentation

algorithms for web indexing and searching
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Web Indexing and Searching Gerth Stlting Brodal and - - PowerPoint PPT Presentation

Algorithms for Web Indexing and Searching Gerth Stlting Brodal and Rolf Fagerberg Fall 2002 1 Course Motivation How does Google work? 2 Course Motivation How does Google work? How do search engines work? 2 Course Motivation How


slide-1
SLIDE 1

Algorithms for Web Indexing and Searching

Gerth Stølting Brodal and Rolf Fagerberg Fall 2002

1

slide-2
SLIDE 2

Course Motivation

How does Google work?

2

slide-3
SLIDE 3

Course Motivation

How does Google work? ⇓ How do search engines work?

2

slide-4
SLIDE 4

Course Motivation

How does Google work? ⇓ How do search engines work? ⇓ Algorithms for web indexing and searching

2

slide-5
SLIDE 5

Course Outline

  • 1. Introduction to Course
  • 2. General Anatomy of Web Search Engines
  • 3. Building blocks of Search Engines

(a) Web Crawlers

  • Anatomy of crawlers
  • Crawling strategy

(b) Index

  • Inverted files
  • Suffix trees
  • Signature files
  • Compression
  • Issues of efficient construction
  • Duplicate removal

3

slide-6
SLIDE 6

Course Outline

(c) Types of Queries (d) Ranking

  • Textbased methods

– Vector based methods – Latent semantic indexing

  • Link based methods

– PageRank – HITS – SALSA – Others

4

slide-7
SLIDE 7

Course Outline

  • 4. Further topics

(a) Clustering (b) Automatic Categorization/Hierarchy Building (c) Evaluation of search engines (d) Structure of and Models for the Web Graph (e) Data Mining

5

slide-8
SLIDE 8

Formal Course Description

Prerequisites: dADS Literature: Handouts Course language: Danish or English Credits: 2 points/10 ECTS Evaluation: Programming project Course page: http://www.daimi.au.dk/~gerth/webalg02/index.html

6

slide-9
SLIDE 9

Programming Project

Implement a Web Search Engine

7

slide-10
SLIDE 10

Programming Project

Implement a Web Search Engine Distributed project Groups (2–4 persons) doing: Web crawling Index building Ranking Query interface Start: index Aarhus University website Goal: index domain .dk

7