1998: enter Link Analysis uses hyperlink structure to focus the - - PDF document

1998 enter link analysis
SMART_READER_LITE
LIVE PREVIEW

1998: enter Link Analysis uses hyperlink structure to focus the - - PDF document

1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web = traditional IR IR on the Web =


slide-1
SLIDE 1
slide-2
SLIDE 2

1998: enter Link Analysis

  • uses hyperlink structure to focus the relevant set
  • combine traditional IR score with popularity score

1998 Page and Brin Kleinberg

slide-3
SLIDE 3

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR

slide-4
SLIDE 4

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

slide-5
SLIDE 5

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

  • It’s huge.

– over 10 billion pages, average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web - 400 X bigger than Surface Web

slide-6
SLIDE 6

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

  • It’s huge.

– over 10 billion pages, average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web - 400 X bigger than Surface Web

  • It’s dynamic.

– content changes: 40% of pages change in a week, 23% of .com change daily – size changes: billions of pages added each year

slide-7
SLIDE 7

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

  • It’s huge.

– over 10 billion pages, average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web - 400 X bigger than Surface Web

  • It’s dynamic.

– content changes: 40% of pages change in a week, 23% of .com change daily – size changes: billions of pages added each year

  • It’s self-organized.

– no standards, review process, formats – errors, falsehoods, link rot, and spammers!

slide-8
SLIDE 8

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

  • It’s huge.

– over 10 billion pages, average page size of 500KB – 20 times size of Library of Congress print collection – Deep Web - 400 X bigger than Surface Web

  • It’s dynamic.

– content changes: 40% of pages change in a week, 23% of .com change daily – size changes: billions of pages added each year

  • It’s self-organized.

– no standards, review process, formats – errors, falsehoods, link rot, and spammers! A Herculean Task!

slide-9
SLIDE 9

Web Information Retrieval

IR before the Web = traditional IR IR on the Web = web IR How is the Web different from other document collections?

  • It’s huge.

– over 10 billion pages, each about 500KB – 20 times size of Library of Congress print collection – Deep Web - 400 X bigger than Surface Web

  • It’s dynamic.

– content changes: 40% of pages change in a week, 23% of .com change daily – size changes: billions of pages added each year

  • It’s self-organized.

– no standards, review process, formats – errors, falsehoods, link rot, and spammers!

  • Ah, but it’s hyperlinked !

– Vannevar Bush’s 1945 memex

slide-10
SLIDE 10

Elements of a Web Search Engine

WWW Crawler Module User Indexing Module Indexes Query Module Ranking Module Content Index Structure Index Special-purpose indexes Page Repository Queries R e s u l t s query-independent

slide-11
SLIDE 11

The Ranking Module (generates popularity scores)

  • Measure the importance of each page
slide-12
SLIDE 12

The Ranking Module (generates popularity scores)

  • Measure the importance of each page
  • The measure should be Independent of any query

— Primarily determined by the link structure of the Web — Tempered by some content considerations

slide-13
SLIDE 13

The Ranking Module (generates popularity scores)

  • Measure the importance of each page
  • The measure should be Independent of any query

— Primarily determined by the link structure of the Web — Tempered by some content considerations

  • Compute these measures off-line long before any queries are

processed

slide-14
SLIDE 14

The Ranking Module (generates popularity scores)

  • Measure the importance of each page
  • The measure should be Independent of any query

— Primarily determined by the link structure of the Web — Tempered by some content considerations

  • Compute these measures off-line long before any queries are

processed

  • Google’s PageRank c

technology distinguishes it from all com- petitors

slide-15
SLIDE 15

The Ranking Module (generates popularity scores)

  • Measure the importance of each page
  • The measure should be Independent of any query

— Primarily determined by the link structure of the Web — Tempered by some content considerations

  • Compute these measures off-line long before any queries are

processed

  • Google’s PageRank c

technology distinguishes it from all com- petitors Google’s PageRank = Google’s $$$$$

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Google’s PageRank

(Lawrence Page & Sergey Brin 1998)

The Google Goals

  • Create a PageRank r(P) that is not query dependent

⊲ Off-line calculations — No query time computation

  • Let the Web vote with in-links

⊲ But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not

  • Share The Vote

⊲ Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted ⊲ If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y

slide-22
SLIDE 22

Google’s PageRank

(Lawrence Page & Sergey Brin 1998)

The Google Goals

  • Create a PageRank r(P) that is not query dependent

⊲ Off-line calculations — No query time computation

  • Let the Web vote with in-links

⊲ But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not

  • Share The Vote

⊲ Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted ⊲ If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y

slide-23
SLIDE 23

Google’s PageRank

(Lawrence Page & Sergey Brin 1998)

The Google Goals

  • Create a PageRank r(P) that is not query dependent

⊲ Off-line calculations — No query time computation

  • Let the Web vote with in-links

⊲ But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not

  • Share The Vote

⊲ Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted ⊲ If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y

slide-24
SLIDE 24

Google’s PageRank

(Lawrence Page & Sergey Brin 1998)

The Google Goals

  • Create a PageRank r(P) that is not query dependent

⊲ Off-line calculations — No query time computation

  • Let the Web vote with in-links

⊲ But not by simple link counts — One link to P from Yahoo! is important — Many links to P from me is not

  • Share The Vote

⊲ Yahoo! casts many “votes” — value of vote from Y ahoo! is diluted ⊲ If Yahoo! “votes” for n pages — Then P receives only r(Y )/n credit from Y

slide-25
SLIDE 25

PageRank

The Definition r(P) =

  • P∈BP

r(P) |P| BP = {all pages pointing to P} |P| = number of out links from P

slide-26
SLIDE 26

PageRank

The Definition r(P) =

  • P∈BP

r(P) |P| BP = {all pages pointing to P} |P| = number of out links from P Successive Refinement Start with r0(Pi) = 1/n for all pages P1, P2, ..., Pn

slide-27
SLIDE 27

PageRank

The Definition r(P) =

  • P∈BP

r(P) |P| BP = {all pages pointing to P} |P| = number of out links from P Successive Refinement Start with r0(Pi) = 1/n for all pages P1, P2, ..., Pn Iteratively refine rankings for each page r1(Pi) =

  • P∈BPi

r0(P) |P|

slide-28
SLIDE 28

PageRank

The Definition r(P) =

  • P∈BP

r(P) |P| BP = {all pages pointing to P} |P| = number of out links from P Successive Refinement Start with r0(Pi) = 1/n for all pages P1, P2, ..., Pn Iteratively refine rankings for each page r1(Pi) =

  • P∈BPi

r0(P) |P| r2(Pi) =

  • P∈BPi

r1(P) |P|

slide-29
SLIDE 29

PageRank

The Definition r(P) =

  • P∈BP

r(P) |P| BP = {all pages pointing to P} |P| = number of out links from P Successive Refinement Start with r0(Pi) = 1/n for all pages P1, P2, ..., Pn Iteratively refine rankings for each page r1(Pi) =

  • P∈BPi

r0(P) |P| r2(Pi) =

  • P∈BPi

r1(P) |P| ... rj+1(Pi) =

  • P∈BPi

rj(P) |P|

slide-30
SLIDE 30

In Matrix Notation

After Step k — πT

k = [rk(P1), rk(P2), ..., rk(Pn)]

slide-31
SLIDE 31

In Matrix Notation

After Step k — πT

k = [rk(P1), rk(P2), ..., rk(Pn)]

— πT

k+1 = πT k H

where hij = 1/|Pi| if i → j

  • therwise
slide-32
SLIDE 32

In Matrix Notation

After Step k — πT

k = [rk(P1), rk(P2), ..., rk(Pn)]

— πT

k+1 = πT k H

where hij = 1/|Pi| if i → j

  • therwise

— PageRank vector = πT = lim

k→∞ πT k = eigenvector for H

Provided that the limit exists

slide-33
SLIDE 33

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 P2 P3 P4 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-34
SLIDE 34

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 P4 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-35
SLIDE 35

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 P4 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-36
SLIDE 36

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-37
SLIDE 37

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-38
SLIDE 38

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-39
SLIDE 39

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-40
SLIDE 40

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⊲ A random walk on the Web Graph

slide-41
SLIDE 41

Tiny Web

3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⊲ A random walk on the Web Graph ⊲ PageRank = πi = amount of time spent at Pi

slide-42
SLIDE 42

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

Markov chain

slide-43
SLIDE 43

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-44
SLIDE 44

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-45
SLIDE 45

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-46
SLIDE 46

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-47
SLIDE 47

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-48
SLIDE 48

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-49
SLIDE 49

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-50
SLIDE 50

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-51
SLIDE 51

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

slide-52
SLIDE 52

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

page 2 is a dangling node

slide-53
SLIDE 53

Tiny Web

3 6 5 4 1 3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⊲ A random walk on the Web Graph ⊲ PageRank = πi = amount of time spent at Pi ⊲ Dead end page (nothing to click on) — a “dangling node”

slide-54
SLIDE 54

Tiny Web

3 6 5 4 1 3 6 5 4 1 2

H = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⊲ A random walk on the Web Graph ⊲ PageRank = πi = amount of time spent at Pi ⊲ Dead end page (nothing to click on) — a “dangling node” ⊲ πT = (0, 1, 0, 0, 0, 0) = e-vector =⇒ Page P2 is a “rank sink”

slide-55
SLIDE 55

The Fix

Allow Web Surfers To Make Random Jumps

slide-56
SLIDE 56

Ranking with a Random Surfer

  • Rank each page corresponding to a search term by number

and quality of votes cast for that page. Hyperlink as vote

3 6 5 4 2

surfer “teleports”

slide-57
SLIDE 57

The Fix

Allow Web Surfers To Make Random Jumps — Replace zero rows with eT n = 1 n, 1 n, ... , 1 n

  • S =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 1/6 1/6 1/6 1/6 1/6 1/6 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-58
SLIDE 58

The Fix

Allow Web Surfers To Make Random Jumps — Replace zero rows with eT n = 1 n, 1 n, ... , 1 n

  • S =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 1/6 1/6 1/6 1/6 1/6 1/6 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ — S = H + a eT 6 is now row stochastic =⇒ ρ(S) = 1

slide-59
SLIDE 59

The Fix

Allow Web Surfers To Make Random Jumps — Replace zero rows with eT n = 1 n, 1 n, ... , 1 n

  • S =

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 1/6 1/6 1/6 1/6 1/6 1/6 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ — S = H + a eT 6 is now row stochastic =⇒ ρ(S) = 1 — Perron says ∃ πT ≥ 0 s.t. πT = πTS with

i πi = 1

slide-60
SLIDE 60

Nasty Problem

The Web Is Not Strongly Connected

slide-61
SLIDE 61

Nasty Problem

The Web Is Not Strongly Connected

S is reducible S = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 1/6 1/6 1/6 1/6 1/6 1/6 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

slide-62
SLIDE 62

Nasty Problem

The Web Is Not Strongly Connected

S is reducible S = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ P1 P2 P3 P4 P5 P6 P1 1/2 1/2 P2 1/6 1/6 1/6 1/6 1/6 1/6 P3 1/3 1/3 1/3 P4 1/2 1/2 P5 1/2 1/2 P6 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ — Reducible =⇒ PageRank vector is not well defined — Frobenius says S needs to be irreducible to ensure a unique πT > 0 s.t. πT = πTS with

i πi = 1

slide-63
SLIDE 63

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi)

slide-64
SLIDE 64

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge

slide-65
SLIDE 65

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge — πT

k+1 = πT k S fails to convergence

slide-66
SLIDE 66

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge — πT

k+1 = πT k S fails to convergence

Convergence Requirement — Perron–Frobenius requires S to be primitive

slide-67
SLIDE 67

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge — πT

k+1 = πT k S fails to convergence

Convergence Requirement — Perron–Frobenius requires S to be primitive — No eigenvalues other than λ = 1 on unit circle

slide-68
SLIDE 68

Irreducibility Is Not Enough

Could Get Trapped Into A Cycle (Pi → Pj → Pi) — The powers Sk fail to converge — πT

k+1 = πT k S fails to convergence

Convergence Requirement — Perron–Frobenius requires S to be primitive — No eigenvalues other than λ = 1 on unit circle — Frobenius proved S is primitive ⇐ ⇒ Sk > 0 for some k

slide-69
SLIDE 69

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1

slide-70
SLIDE 70

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n

slide-71
SLIDE 71

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n — PageRank vector πT = left-hand Perron vector of G

slide-72
SLIDE 72

Ranking with a Random Surfer

  • If a page is “important,” it gets lots of votes from other impor-

tant pages, which means the random surfer visits it often.

  • Simply count the number of times, or proportion of time, the

surfer spends on each page to create ranking of webpages.

slide-73
SLIDE 73

Ranking with a Random Surfer

  • If a page is “important,” it gets lots of votes from other impor-

tant pages, which means the random surfer visits it often.

  • Simply count the number of times, or proportion of time, the

surfer spends on each page to create ranking of webpages.

3 6 5 4 2

Proportion of Time Page 1 = .04 Page 2 = .05 Page 3 = .04 Page 4 = .38 Page 5 = .20 Page 6 = .29 Ranked List of Pages Page 4 Page 6 Page 5 Page 2 Page 1 Page 3

slide-74
SLIDE 74

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n — PageRank vector πT = left-hand Perron vector of G Some Happy Accidents — xTG = αxTH + βvT

Sparse computations with the original link structure

slide-75
SLIDE 75

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n — PageRank vector πT = left-hand Perron vector of G Some Happy Accidents — xTG = αxTH + βvT

Sparse computations with the original link structure

— λ2(G) = α

Convergence rate controllable by Google engineers

slide-76
SLIDE 76

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n — PageRank vector πT = left-hand Perron vector of G Some Happy Accidents — xTG = αxTH + βvT

Sparse computations with the original link structure

— λ2(G) = α

Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT

slide-77
SLIDE 77

The Google Fix

Allow A Random Jump From Any Page — G = αS + (1 − α)E > 0, E = eeT/n, 0 < α < 1 — G = αH + uvT > 0 u = αa + (1 − α)e, vT = eT/n — PageRank vector πT = left-hand Perron vector of G Some Happy Accidents — xTG = αxTH + βvT

Sparse computations with the original link structure

— λ2(G) = α

Convergence rate controllable by Google engineers

— vT can be any positive probability vector in G = αH + uvT — The choice of vT allows for personalization

slide-78
SLIDE 78
  • Compu'ng
PageRank:
simula'on,
eigensystem,
linear
system;
accuracy

  • power
law
distribu'on:
sensi'vity,
spamming

  • link
strategies

  • overuse


PageRank
Issues