Weighted Graphs and Disconnected Components Patterns and a - - PowerPoint PPT Presentation
Weighted Graphs and Disconnected Components Patterns and a - - PowerPoint PPT Presentation
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science 2 McGlohon, Akoglu, Faloutsos KDD08 Disconnected components
2
McGlohon, Akoglu, Faloutsos KDD08
- In graphs a largest connected component
emerges.
- What about the smaller-size components?
- How do they emerge, and join with the large
- ne?
3
McGlohon, Akoglu, Faloutsos KDD08
“Disconnected” components
4
McGlohon, Akoglu, Faloutsos KDD08
Weighted edges
- Graphs have heavy-tailed degree distribution.
- What can we also say about these edges?
- How are they repeated, or otherwise weighted?
5
McGlohon, Akoglu, Faloutsos KDD08
Our goals
- Observe “Next-largest connected components”
- Q1. How does the GCC emerge?
- Q2. How do NLCC’s emerge and join with the GCC?
- Find properties that govern edge weights
Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph?
- Q6: Can we produce an emergent, generative
model
6 6
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Model
- Summary
1 2 3 4 5
7
McGlohon, Akoglu, Faloutsos KDD08
Properties of networks
- Small diameter (“small world” phenomenon)
– [Milgram 67] [Leskovec, Horovitz 07]
- Heavy-tailed degree distribution
– [Barabasi, Albert 99] [Faloutsos, Faloutsos,
Faloutsos 99]
- Densification
– [Leskovec, Kleinberg, Faloutsos 05]
- “Middle region” components as well as GCC
and singletons
– [Kumar, Novak, Tomkins 06]
8
McGlohon, Akoglu, Faloutsos KDD08
Generative Models
- Erdos-Renyi model [Erdos, Renyi 60]
- Preferential Attachment [Barabasi, Albert 99]
- Forest Fire model [Leskovec, Kleinberg,
Faloutsos 05]
- Kronecker multiplication [Leskovec,
Chakrabarti, Kleinberg, Faloutsos 07]
- Edge Copying model [Kumar, Raghavan,
Rajagopalan, Sivakumar, Tomkins, Upfal 00]
- “Winners don’t take all” [Pennock, Flake,
Lawrence, Glover, Giles 02]
9 9
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Model
- Summary
1 2 3 4 5 6
10
McGlohon, Akoglu, Faloutsos KDD08
Diameter
- Diameter of a graph is the “longest shortest
path”.
n1 n2 n3 n4 n5 n6 n7
11
McGlohon, Akoglu, Faloutsos KDD08
Diameter
- Diameter of a graph is the “longest shortest
path”.
diameter=3
n1 n2 n3 n4 n5 n6 n7
12
McGlohon, Akoglu, Faloutsos KDD08
Diameter
- Diameter of a graph is the “longest shortest
path”.
- Effective diameter is the distance at which 90%
- f nodes can be reached.
diameter=3
n1 n2 n3 n4 n5 n6 n7
13 13
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Model
- Summary
1 2 3 4 5 6
14
McGlohon, Akoglu, Faloutsos KDD08
Unipartite Networks
- Postnet: Posts in blogs, hyperlinks
between
- Blognet: Aggregated Postnet,
repeated edges
- Patent: Patent citations
- NIPS: Academic citations
- Arxiv: Academic citations
- NetTraffic: Packets, repeated edges
- Autonomous Systems (AS): Packets,
repeated edges
n1 n2 n3 n4 n5 n6 n7
15
McGlohon, Akoglu, Faloutsos KDD08
Unipartite Networks
- Postnet: Posts in blogs, hyperlinks
between
- Blognet: Aggregated Postnet,
repeated edges
- Patent: Patent citations
- NIPS: Academic citations
- Arxiv: Academic citations
- NetTraffic: Packets, repeated edges
- Autonomous Systems (AS): Packets,
repeated edges
n1 n2 n3 n4 n5 n6 n7
(3)
16
McGlohon, Akoglu, Faloutsos KDD08
Unipartite Networks
- Postnet: Posts in blogs, hyperlinks
between
- Blognet: Aggregated Postnet,
repeated edges
- Patent: Patent citations
- NIPS: Academic citations
- Arxiv: Academic citations
- NetTraffic: Packets, repeated edges
- Autonomous Systems (AS): Packets,
repeated edges
n1 n2 n3 n4 n5 n6 n7
10 1.2 8.3 2 6 1
17
McGlohon, Akoglu, Faloutsos KDD08
Unipartite Networks
- (Nodes, Edges, Timestamps)
- Postnet: 250K, 218K, 80 days
- Blognet: 60K,125K, 80 days
- Patent: 4M, 8M, 17 yrs
- NIPS: 2K, 3K, 13 yrs
- Arxiv: 30K, 60K, 13 yrs
- NetTraffic: 21K, 3M, 52 mo
- AS: 12K, 38K, 6 mo
n1 n2 n3 n4 n5 n6 n7
18
McGlohon, Akoglu, Faloutsos KDD08
Bipartite Networks
- IMDB: Actor-movie network
- Netflix: User-movie ratings
- DBLP: conference- repeated edges
– Author-Keyword – Keyword-Conference – Author-Conference
- US Election Donations: $ weights,
repeated edges
– Orgs-Candidates – Individuals-Orgs
n1 n2 n3 n4 m
1
m
2
m
3
19
McGlohon, Akoglu, Faloutsos KDD08
Bipartite Networks
- IMDB: Actor-movie network
- Netflix: User-movie ratings
- DBLP: repeated edges
– Author-Keyword – Keyword-Conference – Author-Conference
- US Election Donations: $ weights,
repeated edges
– Orgs-Candidates – Individuals-Orgs
n1 n2 n3 n4 m
1
m
2
m
3
20
McGlohon, Akoglu, Faloutsos KDD08
Bipartite Networks
- IMDB: Actor-movie network
- Netflix: User-movie ratings
- DBLP: repeated edges
– Author-Keyword – Keyword-Conference – Author-Conference
- US Election Donations: $ weights,
repeated edges
– Orgs-Candidates – Individuals-Orgs
n1 n2 n3 n4 m
1
m
2
m
3
10 1.2 2 1 5 6
21
McGlohon, Akoglu, Faloutsos KDD08
Bipartite Networks
- IMDB: 757K, 2M, 114 yr
- Netflix: 125K, 14M, 72 mo
- DBLP: 25 yr
– Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – Author-Conference: 17K, 22K
- US Election Donations: 22 yr
– Orgs-Candidates: 23K, 877K – Individuals-Orgs: 6M, 10M
n1 n2 n3 n4 m
1
m
2
m
3
22 22
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Model
- Summary
1 2 3 4 5 6
23
McGlohon, Akoglu, Faloutsos KDD08
Observation 1: Gelling Point
Q1: How does the GCC emerge?
24
McGlohon, Akoglu, Faloutsos KDD08
Observation 1: Gelling Point
- Most real graphs display a gelling point, or
burning off period
- After gelling point, they exhibit typical behavior.
This is marked by a spike in diameter.
Time Diameter IMDB
t=1914
Observation 2: NLCC behavior
Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize?
25
McGlohon, Akoglu, Faloutsos KDD08
26
McGlohon, Akoglu, Faloutsos KDD08
Observation 2: NLCC behavior
- After the gelling point, the GCC takes off, but
NLCC’s remain constant or oscillate.
Time IMDB CC size
27 27
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Model
- Summary
1 2 3 4 5 6
Observation 3
Q3: How does the total weight
- f the graph relate to the
number of edges?
28
McGlohon, Akoglu, Faloutsos KDD08
29
McGlohon, Akoglu, Faloutsos KDD08
Observation 3: Fortification Effect
- $ = # checks ?
|Checks| Orgs-Candidates |$|
1980 2004
30
McGlohon, Akoglu, Faloutsos KDD08
Observation 3: Fortification Effect
- Weight additions follow a power law with
respect to the number of edges:
– W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $)
|Checks| Orgs-Candidates |$|
1980 2004
Observation 4 and 5
Q4: How do the weights
- f nodes relate to degree?
Q5: Does this relation change over time?
31
McGlohon, Akoglu, Faloutsos KDD08
32
McGlohon, Akoglu, Faloutsos KDD08
Observation 4: Snapshot Power Law
- At any time, total incoming weight of a node is
proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear
- More donors, even more $
Edges (# donors) In-weights ($) Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors
33
McGlohon, Akoglu, Faloutsos KDD08
Observation 5: Snapshot Power Law
- For a given graph, this exponent is constant
- ver time.
Time exponent Orgs-Candidates
34 34
McGlohon, Akoglu, Faloutsos KDD08
Outline
- Motivation
- Related work
- Preliminaries
- Data
- Observations
- Q6: Is there a generative, “emergent”
model?
- Summary
Goals of model
35
McGlohon, Akoglu, Faloutsos KDD08
- a) Emergent, intuitive behavior
- b) Shrinking diameter
- c) Constant NLCC’s
- d) Densification power law
- e) Power-law degree distribution
Goals of model
36
McGlohon, Akoglu, Faloutsos KDD08
- a) Emergent, intuitive behavior
- b) Shrinking diameter
- c) Constant NLCC’s
- d) Densification power law
- e) Power-law degree distribution
= “Butterfly” Model
37
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameter.
n1 n2 n3 n4 n5 n6 n7 n8
pstep “Curiosity”
38
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameter.
- With (global) phost, chooses a random host
n1 n2 n3 n4 n5 n6 n7 n8
phost “Cross-disciplinarity”
39
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameters.
- With (global) phost, chooses a random host
– With (global) plink, creates link
n1 n2 n3 n4 n5 n6 n7 n8
plink “Friendliness”
40
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameters.
- With (global) phost, chooses a random host
– With (global) plink, creates link – With pstep travels to random neighbor
n1 n2 n3 n4 n5 n6 n7 n8
pstep
41
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameters.
- With (global) phost, chooses a random host
– With (global) plink, creates link – With pstep travels to random neighbor. Repeat.
n1 n2 n3 n4 n5 n6 n7 n8
plink
42
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- A node joins a network, with own parameters.
- With (global) phost, chooses a random host
– With (global) plink, creates link – With pstep travels to random neighbor. Repeat.
n1 n2 n3 n4 n5 n6 n7 n8
pstep
43
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- Once there are no more “steps”, repeat “host”
procedure:
– With phost, choose new host, possibly link, etc.
n1 n2 n3 n4 n5 n6 n7 n8
phost
44
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- Once there are no more “steps”, repeat “host”
procedure:
– With phost, choose new host, possibly link, etc.
n1 n2 n3 n4 n5 n6 n7 n8
phost
45
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- Once there are no more “steps”, repeat “host”
procedure:
– With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts.
n1 n2 n3 n4 n5 n6 n7 n8
plink
46
McGlohon, Akoglu, Faloutsos KDD08
Butterfly model in action
- Once there are no more “steps”, repeat “host”
procedure:
– With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts.
n1 n2 n3 n4 n5 n6 n7 n8
pstep
47
McGlohon, Akoglu, Faloutsos KDD08
a) Emergent, intuitive behavior
Novelties of model:
- Nodes link with probability
– May choose host, but not link (start new component)
- Incoming nodes are “social butterflies”
– May have several hosts (merges components)
- Some nodes are friendlier than others
– pstep different for each node – This creates power-law degree distribution (theorem)
Validation of Butterfly
- Chose following parameters:
– phost= 0.3 – plink = 0.5 – pstep ~ U(0,1)
- Ran 10 simulations
- 100,000 nodes per simulation
48
McGlohon, Akoglu, Faloutsos KDD08
b) Shrinking diameter
- Shrinking diameter
– In model, gelling usually occurred around N=20,000
49
McGlohon, Akoglu, Faloutsos KDD08
Nodes Diam- eter
N=20,000
- Constant / oscillating NLCC’s
Nodes NLCC size
c) Oscillating NLCC’s
50
McGlohon, Akoglu, Faloutsos KDD08
N=20,000
d) Densification power law
- Densification:
– Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2)
51
McGlohon, Akoglu, Faloutsos KDD08
Nodes Edges
N=20,000
e) Power-law degree distribution
- Power-law degree distribution
– Exponents approx -2
52
McGlohon, Akoglu, Faloutsos KDD08
Degree Count
53
McGlohon, Akoglu, Faloutsos KDD08
Summary
- Studied several diverse public graphs
– Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges
54
McGlohon, Akoglu, Faloutsos KDD08
Summary
- Observations on unweighted graphs:
A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size
- Observations on weighted graphs:
A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time
- A6: Intuitive, emergent generative “butterfly”
model, that matches properties
55
McGlohon, Akoglu, Faloutsos KDD08
References
[Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. [Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math.
- Inst. Hungary. Acad. Sci. 5, 17-61.
[Faloutsos*99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. [Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli
- Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp.
57-65 [Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617. [Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. [Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. [Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67. [Pennock+02] Winners don’t take all: Characterizing the competition for links on the web PNAS 2002 [Wang+2002] Wang, M.; Madhyastha, T.; Chang, N. H.; Papadimitriou, S. & Faloutsos, C. (2002), 'Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling
56
McGlohon, Akoglu, Faloutsos KDD08
Contact us
Leman Akoglu www.andrew.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu Mary McGlohon www.cs.cmu.edu/~mmcgloho mmcgloho@cs.cmu.edu
57
McGlohon, Akoglu, Faloutsos KDD08
58
McGlohon, Akoglu, Faloutsos KDD08
- From time series data, begin with resolution r=
T/2.
- Record entropy HR
59
McGlohon, Akoglu, Faloutsos KDD08
Entropy plots [Wang+2002]
Time Δ Weights Resolution Entropy
- From time series data, begin with resolution r=
T/2.
- Record entropy HR`
60
McGlohon, Akoglu, Faloutsos KDD08
Entropy plots
Time Δ Weights Resolution Entropy
- From time series data, begin with resolution r=
T/2.
- Record entropy HR
- Recursively take finer resolutions.
61
McGlohon, Akoglu, Faloutsos KDD08
Entropy plots
Time Δ Weights Resolution Entropy
- From time series data, begin with resolution r=
T/2.
- Record entropy HR
- Recursively take finer resolutions.
62
McGlohon, Akoglu, Faloutsos KDD08
Entropy plots
Time Δ Weights Resolution Entropy
63
McGlohon, Akoglu, Faloutsos KDD08
Entropy Plots
- Self-similarity Linear plot
Resolution Entropy
s= 0.59
- Self-similarity Linear plot
64
McGlohon, Akoglu, Faloutsos KDD08
Entropy Plots
- Self-similarity Linear plot
Resolution Entropy
s= 0.59
- Self-similarity Linear plot
- Uniform: slope of plot s=1.
time
65
McGlohon, Akoglu, Faloutsos KDD08
Entropy Plots
- Self-similarity Linear plot
Resolution Entropy
s= 0.59
- Self-similarity Linear plot
- Uniform: slope of plot s=1.
Point mass: s=0
time time
66
McGlohon, Akoglu, Faloutsos KDD08
Entropy Plots
- Self-similarity Linear plot
Resolution Entropy
s= 0.59
- Self-similarity Linear plot
- Uniform: slope of plot s=1.
Point mass: s=0
time time