Spreading Rumours without the Network Alessandro Epasto P . - - PowerPoint PPT Presentation

spreading rumours without the network
SMART_READER_LITE
LIVE PREVIEW

Spreading Rumours without the Network Alessandro Epasto P . - - PowerPoint PPT Presentation

Spreading Rumours without the Network Alessandro Epasto P . Brach*, A. Panconesi, P . Sankowski*. *U. of Warsaw Sapienza U. Rome Rumour Spreading Diffusive processes on graphs are an important paradigm in several fields : Systems:


slide-1
SLIDE 1

Spreading Rumours without the Network

P . Brach*, A. Panconesi°, P . Sankowski*. *U. of Warsaw °Sapienza U. Rome

Alessandro Epasto

slide-2
SLIDE 2

Diffusive processes on graphs are an important paradigm in several fields:

  • Systems: How to spread information on network?
  • Social Networks: Why posts become viral?
  • Sociology: What makes innovations/products accepted?
  • Epidemiology: How diseases spread?

We consider various models of information diffusion: Push, Pull and SIR.

Rumour Spreading

slide-3
SLIDE 3

Most results known are asymptotic bounds on the competition time:

  • At most O(n log(n)) (Feige et. al, 90)
  • Fast in Erdos Reyni and Preferential

Attachement (Elsasser et al. 2006, Chierichetti et al. 2009).

  • Fast in high conductance graphs. (Chierichetti et
  • al. 2010, Giakkoupis et al. 2011)

Background

slide-4
SLIDE 4

Goal #1: Beyond asymptotics

We are interested in the expected number of informed nodes for each time step of the process

Notice: this is known only for very simple graphs (e.g. Clique, Pittel ’87)

10000 20000 30000 40000 33 67 100 133 167 200 Informed nodes

Our Goal

slide-5
SLIDE 5

Goal #2: Prediction with limited information

Motivation: real networks are often unavailable

10000 20000 30000 40000 33 67 100 133 167 200

Our Goal

Caveat: this is clearly an ill-posed question… … But surprisingly, it is possible for real social network

slide-6
SLIDE 6

A simpler problem: model the unknown graph by a known random graph generation process.

Random graph model

10000 20000 30000 40000 33 67 100 133 167 200

How Can we Achieve this?

slide-7
SLIDE 7

A simpler problem: model the unknown graph by a known random graph generation process.

Random graph model

10000 20000 30000 40000 33 67 100 133 167 200

How Can we Achieve this?

Real Graph

Prediction

slide-8
SLIDE 8

Which Graph Model?

We use the configuration model as random graph model.

SIR on configuration model matches real post diffusions in Twitter (Goel et al., 2013):

  • Distribution of popularity of posts.
  • Virality of the diffusion.
slide-9
SLIDE 9

A predictor algorithm for the configuration model for the Push, Pull and SIR Processes:

  • Space efficient: very large graphs can fit in

memory.

  • Provably exact on random graphs.

The algorithm predicts accurately the both the popularity and the virality on real social networks.

Our Contribution

slide-10
SLIDE 10
  • The diffusion processes;
  • Our algorithm(s);
  • Experimental evaluation;
  • Conclusions.

Outline of the Talk

slide-11
SLIDE 11

The Push-Pull Process

slide-12
SLIDE 12

Push-Pull Protocol

PUSH

slide-13
SLIDE 13

PUSH

Push-Pull Protocol

slide-14
SLIDE 14

Push-Pull Protocol

PUSH

slide-15
SLIDE 15

Push-Pull Protocol

PUSH

slide-16
SLIDE 16

Push-Pull Protocol

PULL

slide-17
SLIDE 17

Push-Pull Protocol

PULL

slide-18
SLIDE 18

Push-Pull Protocol

PULL

slide-19
SLIDE 19

SIR Process

SIR

slide-20
SLIDE 20

SIR Process

SIR

slide-21
SLIDE 21

SIR Process

SIR

slide-22
SLIDE 22

SIR Process

SIR

slide-23
SLIDE 23

SIR Process

SIR

slide-24
SLIDE 24

SIR Process

SIR

slide-25
SLIDE 25

SIR Process

SIR

slide-26
SLIDE 26

Our Algorithm

slide-27
SLIDE 27

Simulate two random processes: the network generation and the rumour spreading. Naive algorithm:

  • Generate a random network G.
  • Simulate rumour spreading on G.
  • Run several times in parallel and average.

Space bottleneck: Real networks are too large to fit in main memory!

Naive Solution

slide-28
SLIDE 28

We can reduce the space to O(n) vs O(n+m) in directed graphs and even o(n) in undirected ones. This is a significant reduction not only in asymptotic! Deferred decision principle: the topology is discovered as nodes are involved in the rumor spreading process and immediately forget.

Our Approach

slide-29
SLIDE 29

Only the local neighbourhood determines the evolution of the process. We do not store the edges of the graph.

v

Intuition

  • Num. Informed

in-neighbours

  • Num. Informed
  • ut-neighbours
slide-30
SLIDE 30

K

High degree nodes stored individually Low degree nodes stored in a K x K matrix

Undirected Graphs

We use an efficient matrix representation. K

slide-31
SLIDE 31

Graph Nodes Matrix SIze Saving in space Livejournal 5M 176 98% Facebook (estimates) 720M <5000 >97%

Undirected Graphs

n

2 1+α

For power law graphs of exponent the cost is

α

In practice the entire Facebook graph could fit in few gigabytes.

slide-32
SLIDE 32

Results on Random Graphs

slide-33
SLIDE 33

This can be proved formally.

The model prediction is perfect

Results on Random Graphs

10000 20000 30000 40000 50000 60000 70000 80000 200 400 600 800 1000 Number of privy nodes Time Actual process Prediction

slide-34
SLIDE 34

Results on Real Graphs

slide-35
SLIDE 35

The model is qualitatively accurate for the social network we tested

Slashdot

Social Networks - Push

10000 20000 30000 40000 50000 60000 70000 100 200 300 400 500 Number of privy nodes Time Actual process Prediction

slide-36
SLIDE 36

Livejournal

More Social Networks - Push

500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 100 200 300 400 500 Number of privy nodes Time Actual process Prediction

slide-37
SLIDE 37

DBLP

More Social Networks - Push

100000 200000 300000 400000 500000 600000 700000 800000 50 100 150 200 Number of privy nodes Time Actual process Prediction

slide-38
SLIDE 38

Non-Social Networks - Push

Web Stanford

For non-social networks the prediction is not accurate.

slide-39
SLIDE 39

Prediction performances strongly depends on the network class:

  • Very good for social networks: friendship graphs,

trust networks, collaboration networks.

  • Poor for non-social networks: web graphs, road

networks, etc. This dichotomy has been observed in other contexts: degree correlations, graph compressibility, etc. What is the reason for this phenomenon?

Results

slide-40
SLIDE 40

The neighbourhood function F(t) of graph measures how many pairs of nodes are at distance <= t This measure has been shown to tell apart social and non- social graphs.

Neighbourhood Function

slide-41
SLIDE 41

Neighbourhood F. vs Prediction Quality

Slashdot Neighbourhood F .

Slashdot Prediction - SIR

Social graphs have a neighbourhood function close to the configuration model.

slide-42
SLIDE 42

Neighbourhood F. vs Prediction Quality

Web Graph Neighbourhood F .

Web Graph Prediction - SIR

Non-Social graphs have a neighbourhood function far from the configuration model.

20000 40000 60000 80000 100000 120000 140000 160000 5 10 15 20 25 30 Number of nodes Distance Actual graph Configuration Model 20 40 60 80 100 5 10 15 20 25 30 Number of infected nodes Time Actual process Prediction

slide-43
SLIDE 43

Neighbourhood F. vs Prediction Quality

The correlation is strong and statistically significant.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 MAPE Neighborhood F. L2/n norm Correlation Neighborhood F. vs Prediction Error SIR SIR (linear fit) PUSH PUSH (linear fit)

slide-44
SLIDE 44
  • Rumour spreading processes can be predicted

accurately in social graphs based on very limited information on the graph.

  • Our predictor is provably correct and space efficient.
  • We characterise the class of graph that can be

predicted based on the Neighbourhood Function.

  • We would like to extend our model to more nuanced

diffusion processes.

Conclusion

slide-45
SLIDE 45

Thank you for your attention!