Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco - - PowerPoint PPT Presentation

fast reliability search in uncertain graphs
SMART_READER_LITE
LIVE PREVIEW

Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco - - PowerPoint PPT Presentation

Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo S ystems Group, ETH Zurich Y ahoo Labs, Spain Aalto University, Finland Uncertain Graphs 0.1 0.5 U 0.2 S ocial Net work T


slide-1
SLIDE 1

Fast Reliability Search in Uncertain Graphs

Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo

S ystems Group, ETH Zurich Y ahoo Labs, Spain Aalto University, Finland

slide-2
SLIDE 2

Uncertain Graphs

Uncertain Graph

1

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

S

  • cial Net work

Traffic Net work Ad-hoc Mobile Net work Prot ein-int eraction Net work

slide-3
SLIDE 3

Motivation

M obile Ad-hoc Network: find the set of sink nodes where a source node can deliver a packet with high probability

2

Packet Delivery Probability in Mobile Ad-hoc Network Traffic Network: find a set of

target locations reachable from a source location with high probability

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

Social Network: find a set of users who could be influenced with high probability by a target user

slide-4
SLIDE 4

Motivation

M obile Ad-hoc Network: find the set of sink nodes where a source node can deliver a packet with high probability

2

Packet Delivery Probability in Mobile Ad-hoc Network Traffic Network: find a set of

target locations reachable from a source location with high probability

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

Social Network: find a set of users who could be influenced with high probability by a target user

slide-5
SLIDE 5

Reliability in Uncertain Graphs

3

Uncertain Graph

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

Certain Graph (Possible World)

T S W U V

Sample Edges

slide-6
SLIDE 6

Reliability in Uncertain Graphs

3

Uncertain Graph

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

Certain Graph (Possible World)

T S W U V

Sample Edges

Identity Function

slide-7
SLIDE 7

Reliability Search in Uncertain Graphs

4

#P - complete Given an uncertain graph G, a probability threshold ɳ ϵ (0, 1), and a source node S in G, find all nodes in G that are reachable from S with probability greater than or equal to threshold ɳ

slide-8
SLIDE 8

Related Work

Two-terminal reliability All-terminal reliability K-terminal reliability M onte-Carlo (M C) sampling

5

Distance-constraint reliability – RHT sampling (Jin et. al., VLDB 2011)

slide-9
SLIDE 9

Baseline – MC Simulation + BFS

6

Uncertain Graph

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V T S U W V

MC Sampling + BFS Certain Graph (Possible World)

Number of Samples

slide-10
SLIDE 10

Can We Be More Efficient?

Given a source node S and a probability threshold ɳ ϵ (0, 1), can we quickly determine the nodes that are certainly not reachable from S with probability greater than or equal to ɳ 7

Uncertain Graph

T 0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S W U V

ɳ = 0.5

Indexing (offline) Filtering + Verification (Online)

slide-11
SLIDE 11

RQ-Tree Index

7

S, U, W, V, T U V T W S

RQ-Tree Index Uout(S, *)=0.8 Uout(S, *)=0.496 Uout(S, *)=0 Uout(S, *)=0.8

ɳ = 0.5

Uncertain Graph

0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S

ɳ = 0.5

U W V T

V,T S, U, W S, W

slide-12
SLIDE 12

RQ-Tree: Filtering

8

S, U, W, V, T S, U, W S, W S

RQ-Tree Index Uout(S, *) =0.8 Uout(S, *) =0.496 Uout(S, *) =0 Uout(S, *) =0.8

ɳ = 0.5

W U V T

M ax-Flow M in-Cut Based Upper Bound: Edge Capacity:

c(a) = – log (1 – p(a))

Compute M ax-Flow f from S to Outside Cluster C

Uout(S, C) = 1 – exp(-f)

V,T

slide-13
SLIDE 13

RQ-Tree: Filtering

8

S, U, W, V, T S, U, W S, W S

RQ-Tree Index Uout(S, *) =0.8 Uout(S, *) =0.496 Uout(S, *) =0 Uout(S, *) =0.8

ɳ = 0.5

W U V T

M ax-Flow M in-Cut Based Upper Bound: Edge Capacity:

c(a) = – log (1 – p(a))

Compute M ax-Flow f from S to Outside Cluster C

Uout(S, C) = 1 – exp(-f)

Benefits:

No false negative (recall = 1) Computation limited only inside cluster C Incremental Max-Flow computation

V,T

slide-14
SLIDE 14

RQ-Tree: Verification

9 Sampling-based Verification: M C-Sample + BFSover the sub-graph formed by the candidate set Pros: high precision, high recall Cons: verification could still be relatively expensive Lower-Bound-based Verification: M ost-Likely-Path Pros: precision = 1, high efficiency Cons: lower recall

0.5 0.7 0.2 0.3 S W U V

Pr(S-U-V) = 0.5 * 0.2 = 0.10 Pr(S-W-V) = 0.7 * 0.3 = 0.21 Most-Likely-Path: (S-W-V)

slide-15
SLIDE 15

RQ-Tree: Online Complexity

10

MC S ampling Recursive S ampling

[VLDB ‘ 11]

RQ-Tree + MC-S ampling-based Verificat ion

[Our Method]

RQ-t ree + Lower-Bound-based Verificat ion

[Our Method]

)) ( ( n m K O 

) (

2 d

n O

) ( n m O   )) ( ( n m K n m O      

K = No of Samples m = No of edges n = No of nodes = No of nodes in the candidate set = No of edges induced by the candidate nodes d = Diameter of the graph

m  n 

slide-16
SLIDE 16

RQ-Tree Index Construction

11

S, U, W, V, T U V T W S

RQ-Tree Index Uncertain Graph

0.5 0.7 0.6 0.5 0.1 0.2 0.3 0.6 S U W V T

V,T S, U, W S, W

Hierarchical Clustering: M inimum-cut balanced bi-partition using M ETIS Edge weight:

w(a) = – log (1 – p(a))

slide-17
SLIDE 17

Experimental Results

12 # Nodes # Edges #Arc Prob: Mean, S D, Quart iles DBLP 684 911 4 569 982 0.14 ± 0.11, {0.09, 0.09, 0.18} Flickr 78 322 20 343 018 0.09 ± 0.06, {0.06, 0.07, 0.09} BioMine 1 008 201 13 445 048 0.27 ± 0.21, {0.12, 0.22, 0.36}

Dataset Characteristics

slide-18
SLIDE 18

Accuracy Results

13 RQ-Tree-MC RQ-Tree-LB

ɳ=0.4 ɳ=0.6 ɳ=0.8 ɳ=0.4 ɳ=0.6 ɳ=0.8

DBLP 0.96 0.99 0.99 1 1 1 Flickr 0.97 0.98 0.98 1 1 1 BioMine 0.95 0.96 0.97 1 1 1 RQ-Tree-MC RQ-Tree-LB

ɳ=0.4 ɳ=0.6 ɳ=0.8 ɳ=0.4 ɳ=0.6 ɳ=0.8

DBLP 0.99 0.99 1.00 0.75 0.87 0.91 Flickr 0.98 0.99 0.99 0.76 0.79 0.83 BioMine 0.97 0.98 0.98 0.77 0.81 0.85

Precision Recall

slide-19
SLIDE 19

Efficiency Results

14 RQ-Tree-MC RQ-Tree-LB MC

ɳ=0.4 ɳ=0.6 ɳ=0.8 ɳ=0.4 ɳ=0.6 ɳ=0.8

All ɳ DBLP 43 40 36 1.50 0.60 0.60 588 Flickr 60 59 55 0.21 0.20 0.17 114 BioMine 6062 5417 4974 1.00 0.50 0.50 25 608

Online query-processing time (sec)

slide-20
SLIDE 20

Pruning Capacity of Filtering Phase

15

Precision of Filtering Phase

slide-21
SLIDE 21

RQ-Tree in Influence Maximization

16 RQ-Tree index in multi-source reliability query and in influence maximization

Expected Spread (Last.FM) Top-k Seed Finding Time (Last.FM)

slide-22
SLIDE 22

Conclusion

In future, we shall study reliability search queries when the arc probabilities are not independent. Indexing method for answering online reliability queries efficiently and effectively. RQ-tree works very well with lower arc probabilities and with higher probability threshold. 17

slide-23
SLIDE 23

Questions?