Rumor Source Detection in the SIR Model: A Sample Path Approach
Presented by Bao Yuanyuan Kai Zhu, Lei Ying Arizona State University
1
Rumor Source Detection in the SIR Model: A Sample Path Approach Kai - - PowerPoint PPT Presentation
Rumor Source Detection in the SIR Model: A Sample Path Approach Kai Zhu, Lei Ying Arizona State University Presented by Bao Yuanyuan 1 Kai Zhu, Lei Ying. Information Source Detection in the SIR Model: A Sample Path Based Approach.
Rumor Source Detection in the SIR Model: A Sample Path Approach
Presented by Bao Yuanyuan Kai Zhu, Lei Ying Arizona State University
1
Zhu, Lei Ying. Information Source Detection in the SIR Model: A Sample Path Based Approach. Information Theory and Application Workshop (ITA 2013).
Estimator with Sparse Observations. IEEE INFOCOM 2014.
2
Back ackgr groun
– Top 100 hottest events on Sina Weibo of 2012.1- 2013.1: 1/3 are rumors.
3
When Hurricane Sandy came, rumors about “confirmed flooding” of the New York Stock Exchange, failure of the Old Bridge Township water system and bodies of victims been found in Seaside Heights circulated
Twitter and resulted in social panics.
Back ackgr groun
4
It said that the president of Syria is dead, which hit twitter greatly and was circulated fast among population, leading to a sharp, quick increase in the price of oil.
Back ackgr groun
5
Rumor about explosions at the White House injuring President Obama tweeted by a news agency, made the Dow plunge more than 140 points and the temporary loss of market cap in the S&P 500 alone totaled $136.5 billion.
Back ackgr groun
6
Here the problem comes!
sequence
Given a snapshot of the diffusion process at time t, tell which node is the source of the diffusion.
7
Rumor Source Detection Problem
1 2 3 6 5 7 9 4 8 2 6 5
Given a snapshot of the diffusion process at time t, which node is the source of the diffusion? (Topology is also known.)
8
Related Work
Susceptible Infected Recovered
Susceptible Infected
SI Model SIR Model
9
Related Work
Transactions on Information Theory, Vol. 57, No. 8, August 2011.
10
Limitations
for viral epidemics.
consideration.
– A contraband material uploader may delete the file; – Anti-virus software removes the virus; – A user deletes the rumor from his/her microblog.
11
Challenge
Only can identify infected nodes and healthy nodes (susceptible nodes and recovered nodes). Susceptible nodes and recovered nodes are indistinguishable.
12
PROBLEM FORMATION
13
THE SIR MODEL FOR INFORMATION PROPAGATION
is the set of edges.
infected (I), and recovered (R).
and the state of node v in time slot is denoted by Xv(t).
state I and is the information source.
X(t)={Xv(t), vЄV} Markov chain
14
INFORMATION SOURCE DETECTION
to identify v* given the graph G and Y.
15
An Example of Information Propagation
(infection time, recovery time)
If we observe the network at the end of the time slot 3, then the snapshot of the network is Y={0,1,0,1,0,1,1}.
16
MAXIMUM LIKELIHOOD DETECTION
infection process from 0 to t.
formulated as a maximum likelihood detection problems:
sample path X[0,t] given the source is node v.
If source=v1, exist X(1), X(2),…, X(t); 𝑸𝒔(𝒀[𝟏, 𝒖]) If source=v2, exist X(1), X(2),…, X(t); 𝑸𝒔(𝒀[𝟏, 𝒖]) … If source=vn, exist X(1), X(2),…, X(t); 𝑸𝒔(𝒀[𝟏, 𝒖])
Max 𝑸𝒔(𝒀[𝟏, 𝒖])
17
CURSE OF DIMENSIONALITY
If Y
v=1, need to decide the infection time. O(t) possible choices.
If Y
v=0, need to decide the infection time and recovery time. O(t2)
possible choices. Even for a fixed t, the number of possible sample paths is at lease tN.
18
SAMPLE PATH BASED DETECTION
MLE: To identify the sample path X*[0,t*] that most likely leads to Y: Where . The source node associated with X*[0,t*] is then viewed as the information source.
19
are still difficult to obtain.
properties of the optimal sample paths.
SAMPLE PATH BASED DETECTION ON TREE NETWORKS
20
Infection Eccentricity
– maximum distance between v and
– the nodes having the minimum eccentricity.
– Maximum distance between v and any infected nodes
– Nodes with the minimum infection eccentricity.
Jordan center Jordan infection center
21
source associated with the
sample path=Node with the minimum infection eccentricity.
I. Time duration of the optimal sample path equals to the infection eccentricity of node vr. II. The optimal sample path starting from a node with a smaller infection eccentricity is more likely to occur. (the
infection eccentricity occurs with a higher probability.)
infection center.
SAMPLE PATH BASED DETECTION ON TREE NETWORKS
22
I. Time duration of the optimal sample path equals to the infection eccentricity of node vr. Assuming the information source is vr, analyze time duration of the optimal sample path such that 𝑢𝑤𝑠
∗ is the time duration of the optimal sample path in which vr
is the source. Time duration of the optimal sample path equals to the infection eccentricity of node vr.
23
I. Time duration of the optimal sample path equals to the infection eccentricity of node vr.
24
two sample path is one.
– Divide all possible infection topologies Y into countable subsets {yk} where yk is the set of infection topologies where the largest distance from vr to an infected node is k. – Use induction over k to prove (2).
for k=n+1.
∗ is the
minimum amount of time required to produce the observed infection topology. The minimum time required is equal to maximum distance from vr to an infected node.
25
𝒖𝒘𝒔
∗ is the minimum amount of time required to produce
the observed infection topology. Infection Eccentricity Maximum distance from vr to an infected node
II. The optimal sample path starting from a node with smaller infection eccentricity is more likely to occur.
*=tv *+1;
I=1;
*=[0, tu *], construct
Xv=[0, tv
*], which occurs with a higher probability.
26
Jordan infection center.
a larger minimum infection eccentricity, then there exists a path from u to v along which the infection eccentricity monotonically decrease.
v, can conclude that the optimal sample path rooted at node v is more likely to occur than the optimal sample path rooted at node u.
infection center.
27
containing its identity(ID) to its neighbors.
nodes, it claims itself as the information source the algorithm terminates.
maximum infection closeness(inverse of the sum of distances from a node to all infected nodes)
28
Reverse Infection Algorithm
Performance Analysis
path based approach, within a constant distance of from the actual source with a high probability, independent of the number of infected nodes and the time at which the snapshot Y was taken.
29
Tree network
– No more than 100 – Detection rate is almost the same as that of MLE. – Higher than that of the closeness centrality 20% when degree is small.
networks
– Higher than 60% when g>6. – Higher than that of closeness centrality, average difference is 8.86%.
30
Tree network
– The number of children of each node follows a binomial distribution X~B(g’, β). g’=10, β from 0.1 to 0.9 – RI outperforms the closeness centrality algorithm by 10.16% on average.
31
Real World Network
– 10670 nodes and 22002 edges – More than 80% are no more than two hops away from the actual sources.
– 7066 nodes and 100736 links – More than 90% are no more than two hops away from the actual sources.
– 4941 nodes and 6594 links – The peak
the reverse infection algorithm appears at the third hop versus the seventeenth hop under random guessing.
32
Conclusion
a node with minimum infection eccentricity
and demonstrate the effectiveness.
33
34