SCALABLE DISTRIBUTED SUBGRAPH ENUMERATION
AUTHORS: LONGBIN LAI LU QIN XUEMIN LIN YING ZHANG LIJUN CHANG
SCALABLE DISTRIBUTED SUBGRAPH ENUMERATION AUTHORS: LONGBIN LAI - - PowerPoint PPT Presentation
SCALABLE DISTRIBUTED SUBGRAPH ENUMERATION AUTHORS: LONGBIN LAI LU QIN XUEMIN LIN YING ZHANG LIJUN CHANG OUTLINE PROBLEM DEFINITION ALGORITHM FRAMEWORK TWINTWIG JOIN - VLDB15 SEED EXPERIMENTS CONCLUSION PROBLEM PROBLEM DEFINTION
SCALABLE DISTRIBUTED SUBGRAPH ENUMERATION
AUTHORS: LONGBIN LAI LU QIN XUEMIN LIN YING ZHANG LIJUN CHANG
PROBLEM DEFINITION
ALGORITHM FRAMEWORK SEED CONCLUSION TWINTWIG JOIN - VLDB15’ EXPERIMENTS
SUBGRAPH ENUMERATION
PROBLEM DEFINTION
enumeration aims to find all subgraphs (matches), that are isomorphic to .
P
g ⊆ G
P
1
v
2
v
3
v
4
v
1
u
2
u
3
u
4
u
5
u
6
u
G
P
✓ v1 v2 v3 v4 u1 u2 u5 u3 ◆
enumeration aims to find all subgraphs (matches), that are isomorphic to .
PROBLEM DEFINTION
G
P
g ⊆ G
P
1
v
2
v
3
v
4
v
1
u
2
u
3
u
4
u
5
u
6
u
G
P
✓ v1 v2 v3 v4 u4 u2 u3 u5 ◆
enumeration aims to find all subgraphs (matches), that are isomorphic to .
PROBLEM DEFINTION
G
P
g ⊆ G
P
1
v
2
v
3
v
4
v
1
u
2
u
3
u
4
u
5
u
6
u
G
P
✓ v1 v2 v3 v4 u6 u3 u2 u5 ◆
PATTERN DECOMPOSITION
1
v
2
v
3
v
4
v
1
u
2
u
3
u
4
u
5
u
6
u
v1 v2 v4 v2 v4 v3 v4 v3 p0 p1 p2
Join Units
P = p0 ∪ p1 ∪ p2
WHAT CAN BE JOIN UNITS
Φ(G) = {Gu|u ∈ V (G)} (u; Gu) u Gu u ∈ V (Gu)
[
u∈V (G)
E(Gu) = E(G)
WHAT CAN BE JOIN UNITS
RG(p) = [
u∈V (G)
RGu(p)
RG(p) p G
JOIN PLAN (TREE)
P = p0 ∪ p1 ∪ p2 ∪ p3 R(P) = R(p0) o n R(p1) o n R(p2) o n R(p3)
R(P)
R(P 0
1)
R(P 0
2)
R(p0)
R(p1)
R(p2)
R(p3)
JOIN PLAN (TREE)
P = p0 ∪ p1 ∪ p2 ∪ p3 R(P) = R(p0) o n R(p1) o n R(p2) o n R(p3)
R(P)
R(P 0
1)
R(P 0
2)
R(p0)
R(p1)
R(p2)
R(p3)
The matches of each join unit can be online computed independently in each local graph
JOIN PLAN (TREE)
R(P) R(P)
R(P ′
1)
R(P ′
1)
R(P ′
2)
R(P ′
2)
R(p0)
R(p0) R(p1) R(p1)
R(p2) R(p2)
R(p3) R(p3)
⋊ ⋉ ⋊ ⋉ ⋊ ⋉ ⋊ ⋉ ⋊ ⋉ ⋊ ⋉
P = p0 ∪ p1 ∪ p2 ∪ p3
Left-deep tree Bushy tree
R(P) = R(p0) o n R(p1) o n R(p2) o n R(p3)
DESCRIBE THE ALGORITHMS
decomposition
14
SIMPLE GRAPH STORAGE
TWINTWIG JOIN - VLDB2015
V (Gu) = {u} ∪ N(u)
Gu
E(Gu) = {(u, u0)|u0 ∈ N(u)}
16
u1
u2
u3
u4
u5 u6
u1 u2 u3 Gu1 u2 u1 u3 u4 u5 Gu2
SIMPLE GRAPH STORAGE
TWINTWIG JOIN - VLDB2015
V (Gu) = {u} ∪ N(u)
E(Gu) = {(u, u0)|u0 ∈ N(u)}
17
… Star as the join unit
SIMPLE GRAPH STORAGE
TWINTWIG JOIN - VLDB2015
V (Gu) = {u} ∪ N(u)
E(Gu) = {(u, u0)|u0 ∈ N(u)}
18
… Star as the join unit A node with degree 1,000,000 will generate 3-stars
1018
SIMPLE GRAPH STORAGE
TWINTWIG JOIN
can solve it using twintwigs with at most the same (often much less) cost
LEFT-DEEP JOIN PLAN
TWINTWIG JOIN
v1 v2 v4 v2 v4 v3
v1 v2 v3 v4
v4 v3 p0 p1 p2 v1 v2 v3 v4
DRAWBACKS
TWINTWIG JOIN
as join units, too many intermediate results
edge twintwigs
TwinTwigJoin
21
1012
DRAWBACKS
TWINTWIG JOIN
v1 v2
v3
v4 v5
v6
v1 v2
v3
v1
v3
v4 v1 v2 v4
v3
v1 v4
v5
v1 v2
v3
v4 v5 v1 v5
v6
v1 v2 v3 v1
v3
v4 v1 v2 v4
v3
v1 v4
v5 v1 v5 v6
v1 v4
v5 v6
R(p0) R(p0) R(p1) R(p1) R(p2) R(p2) R(p3) R(p3)
Optimal solution is a bushy join
MOTIVATIONS
SEED - VLDB17’
star and clique as the join units
23
SCP GRAPH STORAGE
SEED
25
G+
u
V (Gu) = {u} ∪ N(u)
V (G+
u ) =
E(G+
u ) =E(Gu) ∪
{(u0, u00)|(u0, u00) ∈ E(G) ∧ u0, u00 ∈ N(u)}
SCP GRAPH STORAGE
SEED
26
G+
u
V (Gu) = {u} ∪ N(u)
V (G+
u ) =
NEIGHBOUR EDGES
E(G+
u ) =E(Gu) ∪
{(u0, u00)|(u0, u00) ∈ E(G) ∧ u0, u00 ∈ N(u)}
SCP GRAPH STORAGE
SEED
27
G+
u
V (Gu) = {u} ∪ N(u)
V (G+
u ) =
E(G+
u ) =E(Gu) ∪
{(u0, u00)|(u0, u00) ∈ E(G) ∧ u0, u00 ∈ N(u)}
TRIANGLE EDGES
SCP GRAPH STORAGE
SEED
28
u1
u2
u3
u4
u5 u6
u1 u2 u3 u2 u1 u3 u4 u5 G+
u
V (Gu) = {u} ∪ N(u)
V (G+
u ) =
E(G+
u ) =E(Gu) ∪
{(u0, u00)|(u0, u00) ∈ E(G) ∧ u0, u00 ∈ N(u)}
NEIGHBOUR EDGES TRIANGLE EDGES
G+
u1
G+
u2
SCP GRAPH STORAGE
SEED
both star and clique as the join units
for each local graph
29
OPTIMAL BUSHY JOIN PLAN
SEED
30
EP P C(EP ) C(P) P C(EP ) is minimised
OPTIMAL BUSHY JOIN PLAN
SEED
31
P 0
l
P 0
r
EP 0
EP 0
l
EP 0
r
EP 0
l
R(P 0) = R(P 0
l ) o
n R(P 0
r)
EP 0
r
EP 0
OPTIMAL BUSHY JOIN PLAN
SEED
32
P 0
l
P 0
r
EP 0
EP 0
l
EP 0
r
EP 0
l
R(P 0) = R(P 0
l ) o
n R(P 0
r)
EP 0
r
EP 0
C(EP 0) = min
P 0
l ⇢P 0^P 0 r=P 0\P 0 l
{C(EP 0
l ) + C(P 0
l ) + C(EP 0
r) + C(P 0
r)}
SETUP
EXPERIMENTS
v1 v2
v3 v4 v1
v2
v3 v4 v1
v2
v3 v4
v1 v2
v3 v4 v5
v1 v2
v3 v4 v5
v1 v2 v5
v3 v4
v1 v2
v3 v4 v5 v6
v1 < v2, v1 < v3 v1 < v4, v2 < v4
v1 < v3 v2 < v4 v1 < v2 < v3 < v4 v2 < v5 v2 < v5 v3 < v4
v1 < v2 < v3 v3 < v4 < v5
v3 < v5
q1 q2 q3 q4 q5 q6
q7
SETUP
EXPERIMENTS
Node Instance vCPU Memory Disk master m3.xlarge 4 15GB 2 x 40GBSSD slave c3.4xlarge 16 30GB 2 x 160GB SSD
RESULTS
EXPERIMENTS
101 102 103 104 INF
yt lj
134 134 220 220
Running Time (s)
SEED+O TT PSgL 101 102 103 104 INF
yt lj Running Time (s)
29 612 107 5206
SEED+O TT PSgL
RESULTS
EXPERIMENTS
101 102 103 104 INF
yt lj Running Time (s)
28 63 279 60 1281 5071
SEED+O TT PSgL
v1 v2
v3
v4
v1 < v2 < v3 < v4
q3
101 102 103 104 INF
yt lj Running Time (s)
780 3282 1686
SEED+O TT PSgL
v1 v2
v3
v4 v5
v2 < v5
q4
RESULTS
EXPERIMENTS
101 102 103 104 INF
yt lj Running Time (s)
306 5814
SEED+O TT PSgL
v1 v2
v3
v4 v5
v6
v3 < v5
q5
101 102 103 104 INF
yt lj Running Time (s)
66 229 850 1013 6968
SEED+O TT PSgL
v1 v2
v3
v4 v5
v2 < v5 v3 < v4
q6
RESULTS
EXPERIMENTS
101 102 103 104 INF
yt lj Running Time (s)
29 129 493 1206
SEED+O TT PSgL
CONCLUSION
solve subgraph enumeration
as the join units) + Optimal left-deep join
join units) + Optimal bushy join
40