Weighted Graphs and Disconnected Components Patterns and a - - PowerPoint PPT Presentation

weighted graphs and disconnected components
SMART_READER_LITE
LIVE PREVIEW

Weighted Graphs and Disconnected Components Patterns and a - - PowerPoint PPT Presentation

Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science 2 McGlohon, Akoglu, Faloutsos KDD08 Disconnected components


slide-1
SLIDE 1

Weighted Graphs and Disconnected Components

Patterns and a Generator

Mary McGlohon, Leman Akoglu, Christos Faloutsos

Carnegie Mellon University School of Computer Science

slide-2
SLIDE 2

2

McGlohon, Akoglu, Faloutsos KDD08

slide-3
SLIDE 3
  • In graphs a largest connected component

emerges.

  • What about the smaller-size components?
  • How do they emerge, and join with the large
  • ne?

3

McGlohon, Akoglu, Faloutsos KDD08

“Disconnected” components

slide-4
SLIDE 4

4

McGlohon, Akoglu, Faloutsos KDD08

Weighted edges

  • Graphs have heavy-tailed degree distribution.
  • What can we also say about these edges?
  • How are they repeated, or otherwise weighted?
slide-5
SLIDE 5

5

McGlohon, Akoglu, Faloutsos KDD08

Our goals

  • Observe “Next-largest connected components”
  • Q1. How does the GCC emerge?
  • Q2. How do NLCC’s emerge and join with the GCC?
  • Find properties that govern edge weights

Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph?

  • Q6: Can we produce an emergent, generative

model

slide-6
SLIDE 6

6 6

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Model
  • Summary

1 2 3 4 5

slide-7
SLIDE 7

7

McGlohon, Akoglu, Faloutsos KDD08

Properties of networks

  • Small diameter (“small world” phenomenon)

– [Milgram 67] [Leskovec, Horovitz 07]

  • Heavy-tailed degree distribution

– [Barabasi, Albert 99] [Faloutsos, Faloutsos,

Faloutsos 99]

  • Densification

– [Leskovec, Kleinberg, Faloutsos 05]

  • “Middle region” components as well as GCC

and singletons

– [Kumar, Novak, Tomkins 06]

slide-8
SLIDE 8

8

McGlohon, Akoglu, Faloutsos KDD08

Generative Models

  • Erdos-Renyi model [Erdos, Renyi 60]
  • Preferential Attachment [Barabasi, Albert 99]
  • Forest Fire model [Leskovec, Kleinberg,

Faloutsos 05]

  • Kronecker multiplication [Leskovec,

Chakrabarti, Kleinberg, Faloutsos 07]

  • Edge Copying model [Kumar, Raghavan,

Rajagopalan, Sivakumar, Tomkins, Upfal 00]

  • “Winners don’t take all” [Pennock, Flake,

Lawrence, Glover, Giles 02]

slide-9
SLIDE 9

9 9

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Model
  • Summary

1 2 3 4 5 6

slide-10
SLIDE 10

10

McGlohon, Akoglu, Faloutsos KDD08

Diameter

  • Diameter of a graph is the “longest shortest

path”.

n1 n2 n3 n4 n5 n6 n7

slide-11
SLIDE 11

11

McGlohon, Akoglu, Faloutsos KDD08

Diameter

  • Diameter of a graph is the “longest shortest

path”.

diameter=3

n1 n2 n3 n4 n5 n6 n7

slide-12
SLIDE 12

12

McGlohon, Akoglu, Faloutsos KDD08

Diameter

  • Diameter of a graph is the “longest shortest

path”.

  • Effective diameter is the distance at which 90%
  • f nodes can be reached.

diameter=3

n1 n2 n3 n4 n5 n6 n7

slide-13
SLIDE 13

13 13

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Model
  • Summary

1 2 3 4 5 6

slide-14
SLIDE 14

14

McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks

  • Postnet: Posts in blogs, hyperlinks

between

  • Blognet: Aggregated Postnet,

repeated edges

  • Patent: Patent citations
  • NIPS: Academic citations
  • Arxiv: Academic citations
  • NetTraffic: Packets, repeated edges
  • Autonomous Systems (AS): Packets,

repeated edges

n1 n2 n3 n4 n5 n6 n7

slide-15
SLIDE 15

15

McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks

  • Postnet: Posts in blogs, hyperlinks

between

  • Blognet: Aggregated Postnet,

repeated edges

  • Patent: Patent citations
  • NIPS: Academic citations
  • Arxiv: Academic citations
  • NetTraffic: Packets, repeated edges
  • Autonomous Systems (AS): Packets,

repeated edges

n1 n2 n3 n4 n5 n6 n7

(3)

slide-16
SLIDE 16

16

McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks

  • Postnet: Posts in blogs, hyperlinks

between

  • Blognet: Aggregated Postnet,

repeated edges

  • Patent: Patent citations
  • NIPS: Academic citations
  • Arxiv: Academic citations
  • NetTraffic: Packets, repeated edges
  • Autonomous Systems (AS): Packets,

repeated edges

n1 n2 n3 n4 n5 n6 n7

10 1.2 8.3 2 6 1

slide-17
SLIDE 17

17

McGlohon, Akoglu, Faloutsos KDD08

Unipartite Networks

  • (Nodes, Edges, Timestamps)
  • Postnet: 250K, 218K, 80 days
  • Blognet: 60K,125K, 80 days
  • Patent: 4M, 8M, 17 yrs
  • NIPS: 2K, 3K, 13 yrs
  • Arxiv: 30K, 60K, 13 yrs
  • NetTraffic: 21K, 3M, 52 mo
  • AS: 12K, 38K, 6 mo

n1 n2 n3 n4 n5 n6 n7

slide-18
SLIDE 18

18

McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

  • IMDB: Actor-movie network
  • Netflix: User-movie ratings
  • DBLP: conference- repeated edges

– Author-Keyword – Keyword-Conference – Author-Conference

  • US Election Donations: $ weights,

repeated edges

– Orgs-Candidates – Individuals-Orgs

n1 n2 n3 n4 m

1

m

2

m

3

slide-19
SLIDE 19

19

McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

  • IMDB: Actor-movie network
  • Netflix: User-movie ratings
  • DBLP: repeated edges

– Author-Keyword – Keyword-Conference – Author-Conference

  • US Election Donations: $ weights,

repeated edges

– Orgs-Candidates – Individuals-Orgs

n1 n2 n3 n4 m

1

m

2

m

3

slide-20
SLIDE 20

20

McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

  • IMDB: Actor-movie network
  • Netflix: User-movie ratings
  • DBLP: repeated edges

– Author-Keyword – Keyword-Conference – Author-Conference

  • US Election Donations: $ weights,

repeated edges

– Orgs-Candidates – Individuals-Orgs

n1 n2 n3 n4 m

1

m

2

m

3

10 1.2 2 1 5 6

slide-21
SLIDE 21

21

McGlohon, Akoglu, Faloutsos KDD08

Bipartite Networks

  • IMDB: 757K, 2M, 114 yr
  • Netflix: 125K, 14M, 72 mo
  • DBLP: 25 yr

– Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – Author-Conference: 17K, 22K

  • US Election Donations: 22 yr

– Orgs-Candidates: 23K, 877K – Individuals-Orgs: 6M, 10M

n1 n2 n3 n4 m

1

m

2

m

3

slide-22
SLIDE 22

22 22

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Model
  • Summary

1 2 3 4 5 6

slide-23
SLIDE 23

23

McGlohon, Akoglu, Faloutsos KDD08

Observation 1: Gelling Point

Q1: How does the GCC emerge?

slide-24
SLIDE 24

24

McGlohon, Akoglu, Faloutsos KDD08

Observation 1: Gelling Point

  • Most real graphs display a gelling point, or

burning off period

  • After gelling point, they exhibit typical behavior.

This is marked by a spike in diameter.

Time Diameter IMDB

t=1914

slide-25
SLIDE 25

Observation 2: NLCC behavior

Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize?

25

McGlohon, Akoglu, Faloutsos KDD08

slide-26
SLIDE 26

26

McGlohon, Akoglu, Faloutsos KDD08

Observation 2: NLCC behavior

  • After the gelling point, the GCC takes off, but

NLCC’s remain constant or oscillate.

Time IMDB CC size

slide-27
SLIDE 27

27 27

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Model
  • Summary

1 2 3 4 5 6

slide-28
SLIDE 28

Observation 3

Q3: How does the total weight

  • f the graph relate to the

number of edges?

28

McGlohon, Akoglu, Faloutsos KDD08

slide-29
SLIDE 29

29

McGlohon, Akoglu, Faloutsos KDD08

Observation 3: Fortification Effect

  • $ = # checks ?

|Checks| Orgs-Candidates |$|

1980 2004

slide-30
SLIDE 30

30

McGlohon, Akoglu, Faloutsos KDD08

Observation 3: Fortification Effect

  • Weight additions follow a power law with

respect to the number of edges:

– W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $)

|Checks| Orgs-Candidates |$|

1980 2004

slide-31
SLIDE 31

Observation 4 and 5

Q4: How do the weights

  • f nodes relate to degree?

Q5: Does this relation change over time?

31

McGlohon, Akoglu, Faloutsos KDD08

slide-32
SLIDE 32

32

McGlohon, Akoglu, Faloutsos KDD08

Observation 4: Snapshot Power Law

  • At any time, total incoming weight of a node is

proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear

  • More donors, even more $

Edges (# donors) In-weights ($) Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors

slide-33
SLIDE 33

33

McGlohon, Akoglu, Faloutsos KDD08

Observation 5: Snapshot Power Law

  • For a given graph, this exponent is constant
  • ver time.

Time exponent Orgs-Candidates

slide-34
SLIDE 34

34 34

McGlohon, Akoglu, Faloutsos KDD08

Outline

  • Motivation
  • Related work
  • Preliminaries
  • Data
  • Observations
  • Q6: Is there a generative, “emergent”

model?

  • Summary
slide-35
SLIDE 35

Goals of model

35

McGlohon, Akoglu, Faloutsos KDD08

  • a) Emergent, intuitive behavior
  • b) Shrinking diameter
  • c) Constant NLCC’s
  • d) Densification power law
  • e) Power-law degree distribution
slide-36
SLIDE 36

Goals of model

36

McGlohon, Akoglu, Faloutsos KDD08

  • a) Emergent, intuitive behavior
  • b) Shrinking diameter
  • c) Constant NLCC’s
  • d) Densification power law
  • e) Power-law degree distribution

= “Butterfly” Model

slide-37
SLIDE 37

37

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameter.

n1 n2 n3 n4 n5 n6 n7 n8

pstep “Curiosity”

slide-38
SLIDE 38

38

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameter.
  • With (global) phost, chooses a random host

n1 n2 n3 n4 n5 n6 n7 n8

phost “Cross-disciplinarity”

slide-39
SLIDE 39

39

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameters.
  • With (global) phost, chooses a random host

– With (global) plink, creates link

n1 n2 n3 n4 n5 n6 n7 n8

plink “Friendliness”

slide-40
SLIDE 40

40

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameters.
  • With (global) phost, chooses a random host

– With (global) plink, creates link – With pstep travels to random neighbor

n1 n2 n3 n4 n5 n6 n7 n8

pstep

slide-41
SLIDE 41

41

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameters.
  • With (global) phost, chooses a random host

– With (global) plink, creates link – With pstep travels to random neighbor. Repeat.

n1 n2 n3 n4 n5 n6 n7 n8

plink

slide-42
SLIDE 42

42

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • A node joins a network, with own parameters.
  • With (global) phost, chooses a random host

– With (global) plink, creates link – With pstep travels to random neighbor. Repeat.

n1 n2 n3 n4 n5 n6 n7 n8

pstep

slide-43
SLIDE 43

43

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • Once there are no more “steps”, repeat “host”

procedure:

– With phost, choose new host, possibly link, etc.

n1 n2 n3 n4 n5 n6 n7 n8

phost

slide-44
SLIDE 44

44

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • Once there are no more “steps”, repeat “host”

procedure:

– With phost, choose new host, possibly link, etc.

n1 n2 n3 n4 n5 n6 n7 n8

phost

slide-45
SLIDE 45

45

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • Once there are no more “steps”, repeat “host”

procedure:

– With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts.

n1 n2 n3 n4 n5 n6 n7 n8

plink

slide-46
SLIDE 46

46

McGlohon, Akoglu, Faloutsos KDD08

Butterfly model in action

  • Once there are no more “steps”, repeat “host”

procedure:

– With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts.

n1 n2 n3 n4 n5 n6 n7 n8

pstep

slide-47
SLIDE 47

47

McGlohon, Akoglu, Faloutsos KDD08

a) Emergent, intuitive behavior

Novelties of model:

  • Nodes link with probability

– May choose host, but not link (start new component)

  • Incoming nodes are “social butterflies”

– May have several hosts (merges components)

  • Some nodes are friendlier than others

– pstep different for each node – This creates power-law degree distribution (theorem)

slide-48
SLIDE 48

Validation of Butterfly

  • Chose following parameters:

– phost= 0.3 – plink = 0.5 – pstep ~ U(0,1)

  • Ran 10 simulations
  • 100,000 nodes per simulation

48

McGlohon, Akoglu, Faloutsos KDD08

slide-49
SLIDE 49

b) Shrinking diameter

  • Shrinking diameter

– In model, gelling usually occurred around N=20,000

49

McGlohon, Akoglu, Faloutsos KDD08

Nodes Diam- eter

N=20,000

slide-50
SLIDE 50
  • Constant / oscillating NLCC’s

Nodes NLCC size

c) Oscillating NLCC’s

50

McGlohon, Akoglu, Faloutsos KDD08

N=20,000

slide-51
SLIDE 51

d) Densification power law

  • Densification:

– Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2)

51

McGlohon, Akoglu, Faloutsos KDD08

Nodes Edges

N=20,000

slide-52
SLIDE 52

e) Power-law degree distribution

  • Power-law degree distribution

– Exponents approx -2

52

McGlohon, Akoglu, Faloutsos KDD08

Degree Count

slide-53
SLIDE 53

53

McGlohon, Akoglu, Faloutsos KDD08

Summary

  • Studied several diverse public graphs

– Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges

slide-54
SLIDE 54

54

McGlohon, Akoglu, Faloutsos KDD08

Summary

  • Observations on unweighted graphs:

A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size

  • Observations on weighted graphs:

A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time

  • A6: Intuitive, emergent generative “butterfly”

model, that matches properties

slide-55
SLIDE 55

55

McGlohon, Akoglu, Faloutsos KDD08

References

[Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. [Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math.

  • Inst. Hungary. Acad. Sci. 5, 17-61.

[Faloutsos*99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. [Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli

  • Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp.

57-65 [Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617. [Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. [Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. [Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67. [Pennock+02] Winners don’t take all: Characterizing the competition for links on the web PNAS 2002 [Wang+2002] Wang, M.; Madhyastha, T.; Chang, N. H.; Papadimitriou, S. & Faloutsos, C. (2002), 'Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling

slide-56
SLIDE 56

56

McGlohon, Akoglu, Faloutsos KDD08

Contact us

Leman Akoglu www.andrew.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu Mary McGlohon www.cs.cmu.edu/~mmcgloho mmcgloho@cs.cmu.edu

slide-57
SLIDE 57

57

McGlohon, Akoglu, Faloutsos KDD08

slide-58
SLIDE 58

58

McGlohon, Akoglu, Faloutsos KDD08

slide-59
SLIDE 59
  • From time series data, begin with resolution r=

T/2.

  • Record entropy HR

59

McGlohon, Akoglu, Faloutsos KDD08

Entropy plots [Wang+2002]

Time Δ Weights Resolution Entropy

slide-60
SLIDE 60
  • From time series data, begin with resolution r=

T/2.

  • Record entropy HR`

60

McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time Δ Weights Resolution Entropy

slide-61
SLIDE 61
  • From time series data, begin with resolution r=

T/2.

  • Record entropy HR
  • Recursively take finer resolutions.

61

McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time Δ Weights Resolution Entropy

slide-62
SLIDE 62
  • From time series data, begin with resolution r=

T/2.

  • Record entropy HR
  • Recursively take finer resolutions.

62

McGlohon, Akoglu, Faloutsos KDD08

Entropy plots

Time Δ Weights Resolution Entropy

slide-63
SLIDE 63

63

McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

  • Self-similarity  Linear plot

Resolution Entropy

s= 0.59

  • Self-similarity  Linear plot
slide-64
SLIDE 64

64

McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

  • Self-similarity  Linear plot

Resolution Entropy

s= 0.59

  • Self-similarity  Linear plot
  • Uniform: slope of plot s=1.

time

slide-65
SLIDE 65

65

McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

  • Self-similarity  Linear plot

Resolution Entropy

s= 0.59

  • Self-similarity  Linear plot
  • Uniform: slope of plot s=1.

Point mass: s=0

time time

slide-66
SLIDE 66

66

McGlohon, Akoglu, Faloutsos KDD08

Entropy Plots

  • Self-similarity  Linear plot

Resolution Entropy

s= 0.59

  • Self-similarity  Linear plot
  • Uniform: slope of plot s=1.

Point mass: s=0

time time

Bursty: 0.2 < s < 0.9