SLIDE 1 Using Partial Probes to Infer Network States
Pavan Rangudu◦, Bijaya Adhikari∗, B. Aditya Prakash∗, Anil Vullikanti∗ ◦
∗Department of Computer Science, Virginia Tech
- NDSSL, Biocomplexity Institute, Virginia Tech
Contact: badityap@cs.vt.edu
SLIDE 2 Motivation
- Network nodes and links fail dynamically
- Networks not known fully because of privacy constraints
- Our focus: if some failed nodes are known, can we infer the
states of the remaining nodes?
Node failures in internet Traffic jam in road network
Prior works fail to the address the problem directly.
SLIDE 3 Our model
- Graph G(V , E) with set I ⊆ V which have failed
- Goegraphically correlated failure model [Agarwal et al., 2013]
- Single seed of the failure, with probability ps(v) of node v
being the seed
- Correlated failure model: F(u|v) denotes the probability that
node u fails given that v has failed
- Assume independence, i.e., F(u1, u2|v) = F(u1|v) · F(u2|v)
- Motivation: attacks or natural disasters in infrastructure
networks
- Probes: subset Q ⊆ I of failed nodes is known
- Objective: find the set I − Q
Figure: A toy road network with node failures
SLIDE 4 Our approach: Minimum Description Length
- Model cost L(|Q|, |I|, I) has three components
L(|Q|, |I|, I) = L(|Q|) + L
- |I|
- |Q|
- + L
- I
- |Q|, |I|
- .
- L(|Q|) = − log
- Pr(|Q|)
- by using the Shannon-Fano code
- L
- |I|
- |Q|
- = − log
Pr
Pr(|Q|)
- L
- I
- |Q|, |I|
- = − log
- Pr
- I
- |Q|, |I|
- = − log
- Pr
- I
- |I|
- Data cost: description of Q+ = I \ Q (assuming no
- bservation errors)
- L(Q+|I) = − log
- γ|Q|(1 − γ)|Q+|
= −|Q| log(γ) − (|I| − |Q|) log(1 − γ)
SLIDE 5 Problem Description
Model Cost
L(|Q|, |I|, I) =L(|Q|) + L
- |I|
- |Q|
- + L
- I
- |Q|, |I|
- = − log
- |I|
|Q|
- − |Q| log(γ) − (|I| − |Q|) log(1 − γ)
− log
s∈V
ps(s)
F(v | s)
∈I
- 1 − F(v ′ | s)
- *after algebra
Problem Formulation Given G, ps, F(·), Q, find I that minimizes the total MDL cost:
L
- |Q|, |I|, I, Q
- = − log
- |I|
|Q|
s∈V
ps(s)
F(v | s)
∈I
- 1 − F(v ′ | s)
- −2|Q| log(γ) − 2(|I| − |Q|) log(1 − γ)
SLIDE 6
Algorithm Greedy
Input: Instance (V , Q, p, P, γ) Output: Solution ˆ I that minimizes L(|Q|, |ˆ I|, ˆ I, Q)
1: for each s ∈ V do 2:
for each k ∈ [|Q|, |V |] do
3:
Is(k) ← Top k − |Q| nodes in V \ Q with highest weight f (s, v)
4:
Is(k) ← Is(k) ∪ Q
5:
end for
6: end for 7: S ← {Is(k) : ∀s ∈ V &k ∈ [|Q|, |V |]} 8: ˆ
I ← arg min
I∈S
L(|Q|, |I|, I, Q)
9: Return ˆ
I
SLIDE 7
Analysis of Greedy
Theorem: (Additive Approximation)
Let I ∗ be the set minimizing the MDL cost, and let I denote the solution computed by Algorithm Greedy. Then, L(|Q|, |I|, I, Q) ≤ L(|Q|, |I ∗|, I ∗, Q) + log(n), where n is the number of seed nodes.
Running time
Algorithm Greedy runs in O(|V |3) time
SLIDE 8 Experiments
- Baseline: local improvement algorithm LocalSearch
- Datasets
- Synthetic grid
- 60 × 60 grid
- Uniform seed probability ps(·)
- Conditional failure probability distribution using model of
[Agarwal et al., 2013]: F(v | s) = 1 − d(s, v), where d(·) is (normalized) distance
- Real datasets: Seed and conditional failure probability
distributions computed from data
- JAM data from WAZE for Boston: road network with 2650
nodes.
- WEATHER data from WAZE for Boston: road network with
1520 nodes.
- POWER-GRID: network of 24 nodes from Electric disturbance
events
SLIDE 9
WAZE dataset
Visualization of Waze dataset. Partitions in the 119 × 78 grid represent nodes in our network.
SLIDE 10 Takeaways
Results for JAM dataset
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Gamma
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Precision/ Recall/ F1 Score/ MDL Cost Ratio Precision Recall F1 Score MDL Cost Ratio
Algorithm LocalSearch
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Gamma
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Precision/ Recall/ F1 Score/ MDL Cost Ratio Precision Recall F1 Score MDL Cost Ratio
Algorithm Greedy
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Gamma
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
MDL Cost Ratio (L(I, Q)/L(I*, Q)) Quick Local Greedy
Compa
- f the MDL costs
- Our MDL based approach helps identify missing failures
- Promising approach for other problems with missing
information