Link Analysis Stony Brook University CSE545, Spring 2019 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Spring 2019

The Web , circa 1998

The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory

The Web , circa 1998 Easy to game with “term spam” Time-consuming; Match keywords, language ( information retrieval ) Not open-ended Explore directory

Enter PageRank ...

PageRank Key Idea: Consider the citations of the website.

PageRank Key Idea: Consider the citations of the website. Who links to it? and what are their citations?

PageRank Key Idea: Consider the citations of the website. Who links to it? and what are their citations? Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank A B C View 1: Flow Model: in-links as votes D E F Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank View 1: Flow Model: in-links as votes Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

PageRank View 1: Flow Model: in-links (citations) as votes but, citations from important pages should count more. => Use recursion to figure out if each page is important. Innovation 1: What pages would a “random Web surfer” end up at? Innovation 2: Not just own terms but what terms are used by citations?

A B PageRank View 1: Flow Model: C D How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|)

A B PageRank r A /1 r B /4 View 1: Flow Model: C D r C /2 r D = r A /1 + r B /4 + r C /2 How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|)

PageRank A B View 1: Flow Model: C D How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|)

PageRank A B View 1: Flow Model: C D A System of Equations: How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|)

PageRank A B View 1: Flow Model: Solve C D How to compute? Each page ( j ) has an importance (i.e. rank, r j ) ( n j is |out-links|)

PageRank A B C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

View 2: Matrix Formulation A B C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To Start, all are equally likely at ¼ A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To Start, all are equally likely at ¼: ends up at D A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To Start, all are equally likely at ¼: ends up at D C and B are then equally likely: ->D->B=¼*½; ->D->C=¼*½ A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To Start, all are equally likely at ¼: ends up at D C and B are then equally likely: ->D->B=¼*½; ->D->C=¼*½ Ends up at C: then A is only option: ->D->C->A = ¼*½*1 A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? ... A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after 1st iteration: M·r = [3/8, 5/24, 5/24, 5/24] A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after 1st iteration: M·r = [3/8, 5/24, 5/24, 5/24] after 2nd iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ] A B View 2: Matrix Formulation C D to \ from A B C D A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 Transition Matrix, M

Innovation: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after 1st iteration: M·r = [3/8, 5/24, 5/24, 5/24] after 2nd iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ] A B Power iteration algorithm C D r [0] = [1/N, … , 1/N], initialize: r [-1]=[0,...,0] to \ from A B C D while (err_norm( r[t] , r[t-1] )>min_err): A 0 1/2 1 0 B 1/3 0 0 1/2 C 1/3 0 0 1/2 D 1/3 1/2 0 0 err_norm( v1, v2 ) = | v1 - v2 | #L1 norm “Transition Matrix”, M

Innovation: What pages would a “random Web surfer” end up at? To start: N=4 nodes, so r = [¼, ¼, ¼, ¼,] after 1st iteration: M·r = [3/8, 5/24, 5/24, 5/24] after 2nd iteration: M(M·r) = M 2 ·r = [15/48, 11/48, … ] A B Power iteration algorithm C D r [0] = [1/N, … , 1/N], initialize: r [-1]=[0,...,0] to \ from A B C D while (err_norm( r[t] , r[t-1] )>min_err): A 0 1/2 1 0 r [t+1] = M·r [t] B 1/3 0 0 1/2 t+=1 solution = r [t] C 1/3 0 0 1/2 D 1/3 1/2 0 0 err_norm( v1, v2 ) = | v1 - v2 | #L1 norm “Transition Matrix”, M

As err_norm gets smaller we are moving toward: r = M·r View 3: Eigenvectors: Power iteration algorithm r [0] = [1/N, … , 1/N], initialize: r [-1]=[0,...,0] while (err_norm( r[t] , r[t-1] )>min_err): r [t+1] = M·r [t] t+=1 solution = r [t] err_norm( v1, v2 ) = | v1 - v2 | #L1 norm

As err_norm gets smaller we are moving toward: r = M·r View 3: Eigenvectors: We are actually just finding the eigenvector of M. . . . e h t s d n i f Power iteration algorithm x is an r [0] = [1/N, … , 1/N], initialize: eigenvector of A if: r [-1]=[0,...,0] A · x = 𝛍 · x while (err_norm( r[t] , r[t-1] )>min_err): r [t+1] = M·r [t] t+=1 solution = r [t] err_norm( v1, v2 ) = | v1 - v2 | #L1 norm (Leskovec at al., 2014; http://www.mmds.org/)

As err_norm gets smaller we are moving toward: r = M·r View 3: Eigenvectors: We are actually just finding the eigenvector of M. . . . e h t s d n i f Power iteration algorithm x is an r [0] = [1/N, … , 1/N], initialize: eigenvector of A if: r [-1]=[0,...,0] A · x = 𝛍 · x while (err_norm( r[t] , r[t-1] )>min_err): r [t+1] = M·r [t] 𝛍 = 1 (eigenvalue for 1st principal eigenvector) t+=1 since columns of M sum to 1. solution = r [t] Thus, if r is x , then Mr=1r err_norm( v1, v2 ) = sum(| v1 - v2 |) #L1 norm

View 4: Markov Process Where is surfer at time t+1? p(t+1) = M · p(t) Suppose: p(t+1) = p(t), then p(t) is a stationary distribution of a random walk . Thus, r is a stationary distribution. Probability of being at given node.

View 4: Markov Process Where is surfer at time t+1? p(t+1) = M · p(t) Suppose: p(t+1) = p(t), then p(t) is a stationary distribution of a random walk . Thus, r is a stationary distribution. Probability of being at given node. aka 1st order Markov Process ● Rich probabilistic theory. One finding: ○ Stationary distributions have a unique distribution if: ■ No “dead-ends” : a node can’t propagate its rank ■ No “spider traps” : set of nodes with no way out. Also known as being stochastic , irreducible , and aperiodic.

Link Analysis Stony Brook University CSE545, Spring 2019 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Spring 2019 The Web , circa 1998 The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory The Web , circa 1998 Easy to game with term spam Time-consuming;

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

Link Analysis & Social Media: A New And Powerful Investigation Tactic Link Analysis &

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Modification 630R UK Link Gemini considerations April 2018 Gemini UK Link Gemini holds

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 17: Link Analysis Jan-Willem van

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

KAGRA data management and analysis N.Kanda, on behalf of KAGRA collaboration The 3rd KAGRA

Asia Pacific Advanced etwork APA

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

11/26/12 F ailures Problems:Links and switches could fail Advertisements could get lost

Recent Developments in GNU Autotools Ralf Wildenhues GHM July 2010 The Hague Netherlands

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2020) Part 4: Analyzing Graphs

Link Analysis Stony Brook University CSE545, Spring 2019 The Web , - PowerPoint PPT Presentation

Link Analysis Stony Brook University CSE545, Spring 2019 The Web , circa 1998 The Web , circa 1998 Match keywords, language ( information retrieval ) Explore directory The Web , circa 1998 Easy to game with term spam Time-consuming;

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

Teacher Teacher-Student Data Link Teacher Teacher Student Data Link Student Data Link Student

ESCom and Scottish Environment LINK Phoebe Cochrane Scottish Environment LINK May 2014

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

RT-Link: A Time-Synchronized Link Protocol Anthony Rowe, Rahul Mangharam, Raj Rajkumar C

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

Chapter 5: The Data Link Layer Chapter 5 Link Layer and LANs Our goals: understand

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

Link Analysis &amp; Social Media: A New And Powerful Investigation Tactic Link Analysis &amp;

Access Link at UHCprovider.com Sign in to Link by clicking on the Link button in the top right

Modification 630R UK Link Gemini considerations April 2018 Gemini UK Link Gemini holds

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 17: Link Analysis Jan-Willem van

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

KAGRA data management and analysis N.Kanda, on behalf of KAGRA collaboration The 3rd KAGRA

Asia Pacific Advanced etwork APA

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

11/26/12 F ailures Problems:Links and switches could fail Advertisements could get lost

Recent Developments in GNU Autotools Ralf Wildenhues GHM July 2010 The Hague Netherlands

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2020) Part 4: Analyzing Graphs

Link Analysis & Social Media: A New And Powerful Investigation Tactic Link Analysis &