Spreading Rumours without the Network
P . Brach*, A. Panconesi°, P . Sankowski*. *U. of Warsaw °Sapienza U. Rome
Spreading Rumours without the Network Alessandro Epasto P . - - PowerPoint PPT Presentation
Spreading Rumours without the Network Alessandro Epasto P . Brach*, A. Panconesi, P . Sankowski*. *U. of Warsaw Sapienza U. Rome Rumour Spreading Diffusive processes on graphs are an important paradigm in several fields : Systems:
P . Brach*, A. Panconesi°, P . Sankowski*. *U. of Warsaw °Sapienza U. Rome
Diffusive processes on graphs are an important paradigm in several fields:
We consider various models of information diffusion: Push, Pull and SIR.
Most results known are asymptotic bounds on the competition time:
Attachement (Elsasser et al. 2006, Chierichetti et al. 2009).
We are interested in the expected number of informed nodes for each time step of the process
Notice: this is known only for very simple graphs (e.g. Clique, Pittel ’87)
10000 20000 30000 40000 33 67 100 133 167 200 Informed nodes
Motivation: real networks are often unavailable
10000 20000 30000 40000 33 67 100 133 167 200
Caveat: this is clearly an ill-posed question… … But surprisingly, it is possible for real social network
A simpler problem: model the unknown graph by a known random graph generation process.
Random graph model
10000 20000 30000 40000 33 67 100 133 167 200
A simpler problem: model the unknown graph by a known random graph generation process.
Random graph model
10000 20000 30000 40000 33 67 100 133 167 200
Real Graph
Prediction
We use the configuration model as random graph model.
SIR on configuration model matches real post diffusions in Twitter (Goel et al., 2013):
A predictor algorithm for the configuration model for the Push, Pull and SIR Processes:
memory.
The algorithm predicts accurately the both the popularity and the virality on real social networks.
Simulate two random processes: the network generation and the rumour spreading. Naive algorithm:
Space bottleneck: Real networks are too large to fit in main memory!
We can reduce the space to O(n) vs O(n+m) in directed graphs and even o(n) in undirected ones. This is a significant reduction not only in asymptotic! Deferred decision principle: the topology is discovered as nodes are involved in the rumor spreading process and immediately forget.
Only the local neighbourhood determines the evolution of the process. We do not store the edges of the graph.
v
in-neighbours
K
High degree nodes stored individually Low degree nodes stored in a K x K matrix
We use an efficient matrix representation. K
Graph Nodes Matrix SIze Saving in space Livejournal 5M 176 98% Facebook (estimates) 720M <5000 >97%
2 1+α
For power law graphs of exponent the cost is
The model prediction is perfect
10000 20000 30000 40000 50000 60000 70000 80000 200 400 600 800 1000 Number of privy nodes Time Actual process Prediction
The model is qualitatively accurate for the social network we tested
10000 20000 30000 40000 50000 60000 70000 100 200 300 400 500 Number of privy nodes Time Actual process Prediction
500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06 3.5e+06 4e+06 100 200 300 400 500 Number of privy nodes Time Actual process Prediction
100000 200000 300000 400000 500000 600000 700000 800000 50 100 150 200 Number of privy nodes Time Actual process Prediction
Web Stanford
For non-social networks the prediction is not accurate.
Prediction performances strongly depends on the network class:
trust networks, collaboration networks.
networks, etc. This dichotomy has been observed in other contexts: degree correlations, graph compressibility, etc. What is the reason for this phenomenon?
The neighbourhood function F(t) of graph measures how many pairs of nodes are at distance <= t This measure has been shown to tell apart social and non- social graphs.
Slashdot Neighbourhood F .
Slashdot Prediction - SIR
Social graphs have a neighbourhood function close to the configuration model.
Web Graph Neighbourhood F .
Web Graph Prediction - SIR
Non-Social graphs have a neighbourhood function far from the configuration model.
20000 40000 60000 80000 100000 120000 140000 160000 5 10 15 20 25 30 Number of nodes Distance Actual graph Configuration Model 20 40 60 80 100 5 10 15 20 25 30 Number of infected nodes Time Actual process Prediction
The correlation is strong and statistically significant.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 MAPE Neighborhood F. L2/n norm Correlation Neighborhood F. vs Prediction Error SIR SIR (linear fit) PUSH PUSH (linear fit)
accurately in social graphs based on very limited information on the graph.
predicted based on the Neighbourhood Function.
diffusion processes.