Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web - PowerPoint PPT Presentation

Web Dynamics Part 2 – Modeling static and evolving graphs 2.1 The Web graph and its static properties 2.2 Generative models for random graphs 2.3 Measures of node importance Summer Term 2009 Web Dynamics 2 ‐ 1

Notation: Graphs • G=(V(G),E(G)) We will drop G when the graph is clear from the context. – directed graph: E(G) ⊆ V(G)xV(G) – undirected graph: E(G) ⊆ {{v,w} ⊆ V(G)} • Degrees of nodes in directed graphs: – indegree of node n: indeg(n)=|{(v,w) ∈ E(G):w=n}| – outdegree of node n: outdeg(n)=|{(v,w) ∈ E(G):v=n}| • Degree of node n in undirected graph: – deg(n)=|{ e ∈ E(G):n ∈ e}| • Distributions of degree, indegree, outdegree ∈ = | { n V ( G ) : deg(n) k } | = P ( k ) deg,G | V ( G ) | Summer Term 2009 Web Dynamics 2 ‐ 2

Web Graph W • Nodes are URLs on the Web – No dynamic pages, often only HTML ‐ like pages • Edges correspond to links – directed edges, sparse • Highly dynamic, impossible to grab snapshot at any fixed time ⇒ large ‐ scale crawls as approximation/samples Summer Term 2009 Web Dynamics 2 ‐ 3

Degree distributions • Assume the average indegree is 3, what would be the shape of P in,W ? Summer Term 2009 Web Dynamics 2 ‐ 4

Degree distributions fraction of nodes degree Summer Term 2009 Web Dynamics 2 ‐ 5

Power Law Distributions Distribution P(k) follows power law if − β = ⋅ P ( k ) C k for real constant C>0 and real coefficient β >0 (needs normalization to become probability distribution) Moments of order m are finite iff β >m+1: ∞ ∞ ∑ ∑ − β = ⋅ = ⋅ = ⋅ ζ β − m m m E [ X ] k P ( k ) C k C ( m ) = = k 1 k 1 Heavy ‐ tailed distribution: P(k) decays polynomially to 0 Summer Term 2009 Web Dynamics 2 ‐ 6

Power ‐ Law ‐ Distributions in log ‐ log ‐ scale Parameter fitting in loglog-scale (fit linear function) Summer Term 2009 Web Dynamics 2 ‐ 7

Degree distributions of the Web Based on an Altavista crawl in May 1999 A. Broder et al.: Grpah structure in the Web, Computer Networks 33:309—320, 2000 (203 million urls, 1466 million links) β = 2.1 β = 2.72 Summer Term 2009 Web Dynamics 2 ‐ 8

Examples for Power Laws in the Web • Web page sizes • Web page access statistics • Web browsing behavior • Web page connectivity • Web connected components size Summer Term 2009 Web Dynamics 2 ‐ 9

More graphs with Power ‐ Law degrees • Connectivity of Internet routers and hosts • Call graphs in telephone networks • Power grid of western United States • Citation networks • Collaborators of Paul Erdös • Collaboration graph of actors (IMDB) Summer Term 2009 Web Dynamics 2 ‐ 10

Scale ‐ Freeness Scaling k by a constant factor yields a proportional change in P(k) , independent of the absolute value of k : − β − β − β − β = ⋅ = ⋅ ⋅ = ⋅ P ( ak ) C ( ak ) C a k a P ( k ) (similar to 80/20 or 90/10 rules) Additionally: results often independent of graph size (Web or single domain) Summer Term 2009 Web Dynamics 2 ‐ 11

Zipfian vs. Power ‐ Law Zipfian distribution: Power ‐ law distribution of ranks, not numbers • Input: map item → value (e.g., terms and their count) • Sort items by descending value (any tie breaking) • Plot (k, value of item at position k) pairs and consider their distribution Important example : Frequency of words in large texts (but: also occurs in completely random texts) Other related Law: • Benford‘s Law: distribution of first digits in numbers • Heaps‘ Law: number of distinct words in a text Summer Term 2009 Web Dynamics 2 ‐ 12

Example: Term distribution in Wikipedia http://en.wikipedia.org/wiki/File:Wikipedia ‐ n ‐ zipf.png term frequency term rank Most popular words are “the”, “of” and “and” (so ‐ called “stopwords”) Summer Term 2009 Web Dynamics 2 ‐ 13

Diameters How many clicks away are two pages? For two nodes u,v ∈ V : d(u,v) minimal length of a path from u to v Scale ‐ free graphs: d has Normal distribution (Albert, 1999) • Average path length – E[d]=O(log n) , n number of nodes – For the Web: E[d] ~ 0.35 + 2.06*log 10 n (avg 21 hops distance) – Undirected: O( ln ln n) (Cohen&Havlin, 2003) • Maximal path length („diameter“) Summer Term 2009 Web Dynamics 2 ‐ 14

Diameters From Broder et al, 2000: • only 24% of nodes are connected through directed path • average connected directed distance: 16 • average connected undirected distance: 7 ⇒ small world only for connected nodes! Summer Term 2009 Web Dynamics 2 ‐ 15

Connected components Computer Networks 33:309—320, 2000 A. Broder et al.: Grpah structure in the Web, (Their sample of the) Web graph contains • one giant weakly connected component with 91% of nodes • one giant strongly connected component with 28% of nodes (even after removing well ‐ connected nodes) Summer Term 2009 Web Dynamics 2 ‐ 16

A. Broder et al.: Grpah structure in the Web, Computer Networks 33:309—320, 2000 2 ‐ 17 Bow ‐ Tie Structure of the Web Web Dynamics Summer Term 2009

Connectivity of Power ‐ Law Graphs (Undirected) connectivity depends on β : • β <1: connected with high probability • 1< β <2: one giant component of size O(n), all others size O(1) • 2< β < β 0 =3.4785: one giant component of size O(n), all others size O(log n) • β > β 0: no giant component with high probability (Aiello et al, 2001) Summer Term 2009 Web Dynamics 2 ‐ 18

S.D. Kamvar et al.: Exploiting the block structure of the Web for computing Pagerank , WWW conference, 2003 2 ‐ 19 Block structure of Web links Web Dynamics Summer Term 2009

Neighborhood sizes N(h): number of pairs of nodes at distance <=h When average degree=3, how many neighbors can be expected at distance 1,2,3,…? 1 hop: 3 neighbors 2 hops: 3*3=9 neighbors h hops: 3 h neighbors Summer Term 2009 Web Dynamics 2 ‐ 20

Neighborhood sizes N(h): number of pairs of nodes at distance <=h When average degree=3, how many neighbors can be expected at/up to distance 1,2,3,…? 1 hop: 3 neighbors 2 hops: 3*3=9 neighbors h hops: 3 h neighbors Not true in general! (duplicates ⇒ over ‐ estimation) N(h) ∝ h H (hop exponent) [Faloutsos et al, 1999] Summer Term 2009 Web Dynamics 2 ‐ 21

Neighborhood sizes Intuition: H ~ „fractal dimensionality“ of graph … N(h) ∝ h 2 N(h) ∝ h 1 Summer Term 2009 Web Dynamics 2 ‐ 22

Web Dynamics Part 2 – Modeling static and evolving graphs 2.1 The Web graph and its static properties 2.2 Generative models for random graphs 2.3 Measures of node importance Summer Term 2009 Web Dynamics 2 ‐ 23

Requirements for a Web graph model • Online : number of nodes and edges changes with time • Power ‐ Law : degree distribution follows power ‐ law, with exponent β >2 • Small ‐ world : average distance much smaller than O(n) • Possibly more features of the Web graph… Summer Term 2009 Web Dynamics 2 ‐ 24

Random Graphs: Erdös ‐ Rénji G(n,p) for undirected random graphs: • Fix n (number of nodes) • For each pair of nodes, independently add edge with uniform probability p Degree distribution: binomial ⎛ − ⎞ n 1 = ⎜ ⎟ − − − k n 1 k P ( k ) p ( 1 p ) ⎜ ⎟ deg ⎝ k ⎠ Pick k out of Probability to have n ‐ 1 targets exactly k edges ln n threshold for the connectivity of G(n,p) n ⇒ cannot be used to model the Web graph Summer Term 2009 Web Dynamics 2 ‐ 25

Example: p=0.01 http://upload.wikimedia.org/wikipedia/commons/1/13/Erdos_generated_network ‐ p0.01.jpg Summer Term 2009 Web Dynamics 2 ‐ 26

Preferential attachment Idea : Barabasi&Albert, 1999 • mimic creation of links on the Web • Links to „important“ pages are more likely than links to random pages Generation algorithm : • Start with set of M 0 nodes • When new node is added, add m ≤ M 0 random edges deg( v ) probability of adding edge to node v: ∑ deg( w ) Result : Power ‐ law degree distribution with β =2.9 for M 0 =m=5 (from simulation) Summer Term 2009 Web Dynamics 2 ‐ 27

Analysis of Preferential Attachment (Using „mean field“ analysis and assuming continuous time, see Baldi et al.) After t steps: M 0 +t nodes, tm edges Consider node v with k v (t) edges after step t k ( t ) k ( t ) + − = = (considering expectations, allowing multiple edges) v v k ( t 1 ) k ( t ) m v v 2 mt 2 t ∂ k k = v v (assuming continous time, considering differential equation) ∂ t 2 t = with initial condition ( t v : time when v was added) k ( t ) m v v This can be solved as t = k ( t ) m (older nodes grow faster than younger ones) v t v 2 2 m = Further analysis shows that P ( k ) 3 k Summer Term 2009 Web Dynamics 2 ‐ 28

Properties and extensions • Diameter of generated graphs: – O(log n) for m=1 – O(log n/log logn) for m ≥ 2 • Extension to directed edges: – randomly choose direction of each added edge – consider indegree and outdegree for edge choice • Extensions to generate different distributions (where β≠ 3): mixtures of operations – Allow addition of edges between existing nodes – Allow rewiring of edges • Extensions for node and edge deletion required Summer Term 2009 Web Dynamics 2 ‐ 29

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web - PowerPoint PPT Presentation

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web graph and its static properties 2.2 Generative models for random graphs 2.3 Measures of node importance Summer Term 2009 Web Dynamics 2 1 Notation: Graphs G=(V(G),E(G))

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Crawling and Web Dynamics Knut Magne Risvik and Rolf Michelsen, Search engines and Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

Agenda Web MVC-2: Apache Struts Drawbacks with Web Model 1 Web Model 2 (Web MVC) Rimon

7. Dynamics & Age Outline 7.1. Dynamics & Age 7.2. Temporal Information 7.3. Search in

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Mastering Drupal 8 Views Gregg Marshall http://bit.ly/D8Views About me drupal.org since 2006

Nested Resources July 2012 by Anton Nested resources resources :pages do resources :posts

Picnic Post-Quantum Signatures from Zero Knowledge Proofs MELISSA CHASE, MSR THE PICNIC TEAM

CE419 Session 13: Django Web Framework: Views Web Programming Review ModelTemplateView

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky Department of Electrical and

LHC, HL-LHC and future upgrades Lucio Rossi - CERN MAP Collab. Meeting 28 May 2014 (via

Ilab METIS Optimization of Energy Policies Olivier Teytaud + Inria-Tao + Artelys TAO

Pricing in markets with large amounts of variable power. Lund, 19 May, 2011 Lennart Sder

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web - PowerPoint PPT Presentation

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web graph and its static properties 2.2 Generative models for random graphs 2.3 Measures of node importance Summer Term 2009 Web Dynamics 2 1 Notation: Graphs G=(V(G),E(G))

Web Dynamics Part 1 Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application examples

Web Dynamics Part 1 - Introduction 1.1 Dimensions of dynamics in the Web 1.2 Application

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Dynamics Part 3 Searching the Dynamic Web 3.1 Crawling and recrawling policies 3.2

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Crawling and Web Dynamics Knut Magne Risvik and Rolf Michelsen, Search engines and Web

Web Application Security Attacks on the Web Attacker Web User Application Web Database Web

Web Mining Web Mining to automatically discover and extract information from Web

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

Agenda Web MVC-2: Apache Struts Drawbacks with Web Model 1 Web Model 2 (Web MVC) Rimon

7. Dynamics &amp; Age Outline 7.1. Dynamics &amp; Age 7.2. Temporal Information 7.3. Search in

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

CS 410/510: Web Basics Basics Web Clients HTTP Web Servers PC running Firefox Web

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Mastering Drupal 8 Views Gregg Marshall http://bit.ly/D8Views About me drupal.org since 2006

Nested Resources July 2012 by Anton Nested resources resources :pages do resources :posts

Picnic Post-Quantum Signatures from Zero Knowledge Proofs MELISSA CHASE, MSR THE PICNIC TEAM

CE419 Session 13: Django Web Framework: Views Web Programming Review ModelTemplateView

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky Department of Electrical and

LHC, HL-LHC and future upgrades Lucio Rossi - CERN MAP Collab. Meeting 28 May 2014 (via

Ilab METIS Optimization of Energy Policies Olivier Teytaud + Inria-Tao + Artelys TAO

Pricing in markets with large amounts of variable power. Lund, 19 May, 2011 Lennart Sder

7. Dynamics & Age Outline 7.1. Dynamics & Age 7.2. Temporal Information 7.3. Search in