GRAIL: Scalable Reachability Index for Large Graphs Hilmi Yldrm 1 - - PowerPoint PPT Presentation

grail scalable reachability index for large graphs
SMART_READER_LITE
LIVE PREVIEW

GRAIL: Scalable Reachability Index for Large Graphs Hilmi Yldrm 1 - - PowerPoint PPT Presentation

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work GRAIL: Scalable Reachability Index for Large Graphs Hilmi Yldrm 1 Vineet Chaoji 2 Mohammed J.Zaki 1 1 Rensselaer Polytechnic


slide-1
SLIDE 1

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL: Scalable Reachability Index for Large Graphs

Hilmi Yıldırım1 Vineet Chaoji2 Mohammed J.Zaki1

1Rensselaer Polytechnic Institute

Troy, NY

2Yahoo! Labs

Bangalore, India

14 September VLDB 2010

slide-2
SLIDE 2

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Outline

Problem Definition & Motivation Background Related Work Interval Labeling Our Approach : GRAIL Index Construction Querying Experiments Experimental Setup & Datasets Results and Comparison with Other Methods Sensitivity to Different Graph Types and Parameters Conclusion & Future Work

slide-3
SLIDE 3

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Problem Definition

Reachability Query : Given two vertices u and v in a directed acyclic graph G, is there a path between u and v?

  • Simple in undirected graphs
  • Any directed graph can be transformed into a dag

B E G H I J A C D F

  • Query(B,I)

Reachable

  • Query(D,B)

Not Reachable

slide-4
SLIDE 4

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Motivation

Traditional Applications

  • Class Hierarchies, GIS,

dependency graphs Trending Applications

  • Semantic Web
  • Biological networks
  • Citation graphs

Motivation

  • Existing methods do not

scale for large and dense graphs

slide-5
SLIDE 5

Motivation

slide-6
SLIDE 6

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Related Work

Construction Time Query Time Index Size

  • Opt. Tree Cover (Agrawal et al. 89)

O(nm) O(n) O(n2) GRIPP (Trissl et al. 07) O(m + n) O(m − n) O(m + n) Dual Labeling (Wang et al. 06) O(n + m + t3) O(1) O(n + t2) PathTree (Jin et al. 08) O(mk) O(mk)/O(mn) O(nk) 2HOP (Cohen et al. 03) O(n4) O(√m) O(n√m) HOPI (Schenkel et al. 05) O(n3) O(√m) O(n√m) GRAIL (this paper) O(d(n + m)) O(d)/O(n + m) O(dn)

Full Transitive Closure DFS/BFS O(nm) O(1) Construction Time O(1) O(n + m) Query Time O(n2) O(1) Index Size

slide-7
SLIDE 7

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 8 9

slide-8
SLIDE 8

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 8 1] 9

slide-9
SLIDE 9

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 8 [1,1] 9

slide-10
SLIDE 10

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 8 [1,1] 9 2]

slide-11
SLIDE 11

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 8 [1,1] 9 [2,2]

slide-12
SLIDE 12

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 3] 8 [1,1] 9 [2,2]

slide-13
SLIDE 13

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 4 5 6 7 [1,3] 8 [1,1] 9 [2,2]

slide-14
SLIDE 14

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 [1,4] 4 5 6 7 [1,3] 8 [1,1] 9 [2,2]

slide-15
SLIDE 15

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 2 3 [1,4] 4 [5,5] 5 6 7 [1,3] 8 [1,1] 9 [2,2]

slide-16
SLIDE 16

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

1 [1,6] 2 3 [1,4] 4 [5,5] 5 6 7 [1,3] 8 [1,1] 9 [2,2]

slide-17
SLIDE 17

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a Tree

Post-Order Labeling

  • Interval of u is [s(u), e(u)]
  • e(u) is the post-order value
  • f node u
  • s(u) is the min of e(v)

where u ⇒ v

[1,10] 1 [1,6] 2 [7,9] 3 [1,4] 4 [5,5] 5 [7,8] 6 [7,7] 7 [1,3] 8 [1,1] 9 [2,2]

slide-18
SLIDE 18

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

1 2 3 4 5 6 7 8 9

slide-19
SLIDE 19

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

1 2 3 4 5 6 7 8 9

slide-20
SLIDE 20

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

[1,10] 1 [1,6] 2 [1,9] 3 [1,4] 4 [1,5] 5 [1,8] 6 [1,7] 7 [1,3] 8 [1,1] 9 [2,2]

slide-21
SLIDE 21

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree

[1,10] 1 [1,6] 2 [1,9] 3 [1,4] 4 [1,5] 5 [1,8] 6 [1,7] 7 [1,3] 8 [1,1] 9 [2,2]

slide-22
SLIDE 22

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree

[1,10] 1 [1,6] 2 [7,9] 3 [1,4] 4 [5,5] 5 [7,8] 6 [7,7] 7 [1,3] 8 [1,1] 9 [2,2]

slide-23
SLIDE 23

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree

[1,10] 1 [1,6] 2 [7,9] 3 [1,4] 4 [5,5] 5 [7,8] 6 [7,7] 7 [1,3] 8 [1,1] 9 [2,2] [1,1]

slide-24
SLIDE 24

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree

[1,10] 1 [1,6] 2 [7,9] 3 [1,4] 4 [5,5] 5 [7,8] 6 [7,7] 7 [1,3] 8 [1,1] 9 [2,2] [1,1] [1,1]

slide-25
SLIDE 25

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

  • False positives on DAGs

such as 6− > 9

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree

[1,10] 1 [1,6] 2 [7,9] 3 [1,4] 4 [5,5] 5 [7,8] 6 [7,7] 7 [1,3] 8 [1,1] 9 [2,2] [1,1] [1,1] [1,4]

slide-26
SLIDE 26

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL : Graph Reachability Indexing via RAndomized Interval Labeling

Key Observations

  • No false negatives.
  • Interval labeling is repeatable with different traversals.
slide-27
SLIDE 27

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL : Graph Reachability Indexing via RAndomized Interval Labeling

Key Observations

  • No false negatives.
  • Interval labeling is repeatable with different traversals.

GRAIL Index Construction

  • For each dimension of the index
  • Generate a randomized post-order labeling
  • Each label corresponds to a dimension of the hyperrectangle

that node represents.

  • Each new dimension reduces the number of exceptions.
slide-28
SLIDE 28

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL in action

  • Many exceptions after the

first traversal.

node exceptions (E) direct (E d) indirect (E i) 2 {1, 4} ∅ {1, 4} 4 {3, 7, 9} {3, 7, 9} ∅ 5 {1, 3, 4, 7, 9} ∅ {1, 3, 4, 7, 9} 6 {1, 3, 4, 7, 9} {1, 3, 4, 7, 9} ∅

[1,10] 1 [1,6] 2 [1,9] 3 [1,4] 4 [1,5] 5 [1,8] 6 [1,7] 7 [1,3] 8 [1,1] 9 [2,2]

slide-29
SLIDE 29

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL in action

  • Many exceptions after the

first traversal.

node exceptions (E) direct (E d) indirect (E i) 2 {1, 4} ∅ {1, 4} 4 {3, 7, 9} {3, 7, 9} ∅ 5 {1, 3, 4, 7, 9} ∅ {1, 3, 4, 7, 9} 6 {1, 3, 4, 7, 9} {1, 3, 4, 7, 9} ∅

  • Most of them are eliminated.

node exceptions (E) direct (E d) indirect (E i) 4 {3, 7, 9} {3, 7, 9} ∅

[1,10],[1,10] 1 [1,6],[1,9] 2 [1,9],[1,7] 3 [1,4],[1,6] 4 [1,5],[1,8] 5 [1,8],[1,3] 6 [1,7],[1,2] 7 [1,3],[1,5] 8 [1,1],[1,1] 9 [2,2],[4,4]

slide-30
SLIDE 30

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

GRAIL in action

  • Many exceptions after the

first traversal.

node exceptions (E) direct (E d) indirect (E i) 2 {1, 4} ∅ {1, 4} 4 {3, 7, 9} {3, 7, 9} ∅ 5 {1, 3, 4, 7, 9} ∅ {1, 3, 4, 7, 9} 6 {1, 3, 4, 7, 9} {1, 3, 4, 7, 9} ∅

  • Most of them are eliminated.

node exceptions (E) direct (E d) indirect (E i) 4 {3, 7, 9} {3, 7, 9} ∅

  • Traversal Strategy
  • Randomized
  • Fixed Reverse Pairs

[1,10],[1,10] 1 [1,6],[1,9] 2 [1,9],[1,7] 3 [1,4],[1,6] 4 [1,5],[1,8] 5 [1,8],[1,3] 6 [1,7],[1,2] 7 [1,3],[1,5] 8 [1,1],[1,1] 9 [2,2],[4,4]

slide-31
SLIDE 31

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying Strategies

Exception Lists

  • Extract the list of false positives for each node and keep in a

hash table.

  • On the query (u, v)
  • If v is in the exception list of u, return false.
  • Otherwise, return whether u contains v.
slide-32
SLIDE 32

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying Strategies

Exception Lists

  • Extract the list of false positives for each node and keep in a

hash table.

  • On the query (u, v)
  • If v is in the exception list of u, return false.
  • Otherwise, return whether u contains v.

Pruning Search Space

  • No extra data such as exception lists [linear construction time]
  • Perform a DFS during a query by pruning the children nodes
  • that do not contain the target node in their hyperrectangle
  • Very fast in answering the queries of non-reachable pairs
slide-33
SLIDE 33

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

1 [1,6],[1,9] 2 3 4 5 [1,8],[1,3] 6 7 8 9

slide-34
SLIDE 34

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

Not Reachable

  • Query (1-7)

1 [1,6],[1,9] 2 3 4 5 6 7 [1,3],[1,5] 8 9

slide-35
SLIDE 35

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

Not Reachable

  • Query (1-7)

1 [1,6],[1,9] 2 3 4 [1,5],[1,8] 5 6 7 [1,3],[1,5] 8 9

slide-36
SLIDE 36

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

Not Reachable

  • Query (1-7)

1 [1,6],[1,9] 2 3 4 [1,5],[1,8] 5 6 7 [1,3],[1,5] 8 [1,1],[1,1] 9

slide-37
SLIDE 37

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

Not Reachable

  • Query (1-7)

1 [1,6],[1,9] 2 3 [1,4],[1,6] 4 5 6 7 [1,3],[1,5] 8 9

slide-38
SLIDE 38

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Querying with GRAIL

  • Query (1-5)

Not Reachable

  • Query (1-7)

Reachable

[1,10],[1,10] 1 [1,6],[1,9] 2 [1,9],[1,7] 3 [1,4],[1,6] 4 5 6 [1,7],[1,2] 7 [1,3],[1,5] 8 9 [2,2],[4,4]

slide-39
SLIDE 39

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Datasets

Small Sparse

  • graphs from metabolic

networks, pathway and genome databases

  • 10K nodes with average degree

≤ 1.2

  • also used in most of the other

reachability papers

slide-40
SLIDE 40

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Datasets

Small Sparse

  • graphs from metabolic

networks, pathway and genome databases

  • 10K nodes with average degree

≤ 1.2

  • also used in most of the other

reachability papers

Small Dense

  • graphs from citation databases,

gene ontology and semantic databases

  • 10K nodes with average degree

varying from 2 to 11

  • used in 3HOP paper
slide-41
SLIDE 41

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Datasets

Small Sparse

  • graphs from metabolic

networks, pathway and genome databases

  • 10K nodes with average degree

≤ 1.2

  • also used in most of the other

reachability papers

Large Real

  • full citation graphs from

publication and patant databases, uniprot and go

  • 700K to 25M nodes with

degrees 0.5 to 5

Small Dense

  • graphs from citation databases,

gene ontology and semantic databases

  • 10K nodes with average degree

varying from 2 to 11

  • used in 3HOP paper
slide-42
SLIDE 42

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Datasets

Small Sparse

  • graphs from metabolic

networks, pathway and genome databases

  • 10K nodes with average degree

≤ 1.2

  • also used in most of the other

reachability papers

Large Real

  • full citation graphs from

publication and patant databases, uniprot and go

  • 700K to 25M nodes with

degrees 0.5 to 5

Small Dense

  • graphs from citation databases,

gene ontology and semantic databases

  • 10K nodes with average degree

varying from 2 to 11

  • used in 3HOP paper

Large Synthetic

  • random directed acyclic graphs
  • 10M nodes with degree up to

10 and 100M nodes with degree up to 5

slide-43
SLIDE 43

Small Sparse Datasets

1 10 100 1000 10000 100000 1e+06 1e+07 agrocyc mtbrv amaze

construction time(ms) Dataset

GRAIL HLSS INT Dual PathTree 3HOP 1 10 100 1000 10000 agrocyc mtbrv amaze

query time(ms) Dataset

GRAIL DFS HLSS INT Dual PathTree 3HOP 10000 100000 1e+06 agrocyc mtbrv amaze

index size (num. entries) Dataset

GRAIL HLSS INT Dual PathTree 3HOP

  • Construction Time: GRAIL is

several orders faster than all

  • Query Time : PathTree is the

fastest and even DFS is not much worse than others

  • Index Size : INT has the lowest

index size while others have comparable sizes

slide-44
SLIDE 44

Small Dense Datasets

10 100 1000 10000 100000 1e+06 yago pubmed arxiv

construction time(ms) Dataset

GRAIL HLSS INT Dual PathTree 3HOP 10 100 1000 10000 yago pubmed arxiv

query time(ms) Dataset

GRAIL DFS HLSS INT Dual PathTree 3HOP 10000 100000 1e+06 1e+07 yago pubmed arxiv

index size (num. entries) Dataset

GRAIL HLSS INT Dual PathTree 3HOP

  • GRAIL has the lowest index

size and construction time

  • PathTree has still the best

query time

  • GRAIL is 10-20 times faster

than DFS

slide-45
SLIDE 45

Large Real Graphs

Dataset Construction (sec) Query Time (sec) Index Size GRAIL (d=5) GRAIL (d=5) DFS GRAIL (d=5) cit-patents 61.9 1.5 43.5 37747680 citeseer 1.7 0.09 0.05 2775788 citeseerx 19.8 12.4 198.4 26272704 go-uniprot 32.6 0.2 0.4 27871824 uniprot22m 5.1 0.13 0.04 6381776 uniprot100m 58.8 0.18 0.08 64349180 uniprot150m 96.6 0.18 0.09 100150400

  • Only GRAIL and DFS scale on these datasets
  • PathTree exceeds memory limit on cit-patents and citeseerx

and time limit on the others

  • GRAIL outperforms pure DFS by 2-27 times on dense data
  • On very sparse data DFS can be up to 3 times faster
slide-46
SLIDE 46

Large Synthetic Graphs

Construction (sec) Query Time (sec) Index Size Size Deg. GRAIL(d=5) GRAIL(d=5) DFS GRAIL(d=5) rand10m 2 128 0.1 0.6 100M 5 226 5.8 90 100M 10 407 1415 –(t) 100M rand100m 2 1169 0.25 0.76 800M 5 1084 20 131 400M

  • PathTree indexed rand10m2x in 537 secs with query time of

0.2 secs using ≈ 70M entries

  • aborted in the remaining datasets
  • GRAIL is 3-15 faster than DFS in query time
  • On rand10m10x DFS exceeds the time limit of 6 hours
slide-47
SLIDE 47

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Number of Traversals

16 24 32 40 48 2 3 4 5 45 50 55 60 65 Construction Time (ms) Query Time (ms) Number of Traversals

  • Constr. Time

Query Time 15 30 45 60 75 2 3 4 5 1.5 2 2.5 3 3.5 Construction Time (sec) Query Time (sec) Number of Traversals

  • Constr. Time

Query Time 100 200 300 400 500 2 3 4 5 1000 1500 2000 2500 3000 Construction Time (sec) Query Time (sec) Number of Traversals

  • Constr. Time

Query Time

  • increases construction time
  • decreases query time up to some

point (see ecoo)

  • d ≅ min(5, avg.deg) in practice
  • determining best d is open
slide-48
SLIDE 48

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Effect of Reachability

QTpositive QTrandom

human 13.17 arxiv 0.8 cit-patents 2.06 citeseerx 30.2 rand10m5x 3.31

5 10 15 20 25 30 human arxiv cit-patents citeseerx rand10m5x

Speedup against DFS Dataset

RandomQueries PositiveQueries

  • GRAIL performs worse when all queries are positive (2 to 30

times)

  • However still better than DFS, speedup drops about 10 times
slide-49
SLIDE 49

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Effect of Density

100 150 200 250 300 350 400 450 2 3 4 5 6 7 8 9 10 300 600 900 1200 1500 Construction Time (sec) Query Time (sec) Average Degree

  • Constr. Time

Query Time 1.75 1.8 1.85 1.9 1.95 2 2.05 2 3 4 5 6 7 8 9 4000 8000 12000 16000 20000 Construction Time (sec) Query Time (sec) Average Degree

  • Constr. Time

Query Time

  • Experimented on synthetic DAGS with 10m nodes and varying

average degrees from 2 to 10

  • Both the construction time and query time increase with

increasing density

  • Other methods suffer when density increase even in small

graphs

  • Number of traversals in GRAIL can be increased to handle

denser graphs

slide-50
SLIDE 50

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Conclusion

  • GRAIL : a lightweight indexing scheme
  • easy to implement and scalable
  • based on interval labeling
  • has linear construction time
  • able to index very large and dense graphs on which existing

methods fail

  • Future Work
  • Exploring other labeling strategies instead of random traversals
  • Obtaining bounds on the number of traversals and exceptions
  • Experiments on disk-resident graphs
  • Handling dynamic graphs
slide-51
SLIDE 51

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Questions?

slide-52
SLIDE 52

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Datasets

Dataset Nodes Edges Deg agrocyc 12684 13657 1.07 amaze 3710 3947 1.06 anthra 12499 13327 1.07 ecoo 12620 13575 1.08 human 38811 39816 1.01 kegg 3617 4395 1.22 mtbrv 9602 10438 1.09 nasa 5605 6538 1.17 vchocyc 9491 10345 1.09 xmark 6080 7051 1.16

Table: Small Sparse Real

Dataset Nodes Edges Deg arxiv 6000 66707 11.12 citeseer 10720 44258 4.13 go 6793 13361 1.97 pubmed 9000 40028 4.45 yago 6642 42392 6.38

Table: Small Dense Real

Dataset Nodes Edges Deg citeseer 693947 312282 0.45 citeseerx 6540399 15011259 2.30 cit-patents 3774768 16518947 4.38 go-uniprot 6967956 34770235 4.99 uniprot22m 1595444 1595442 1.00 uniprot100m 16087295 16087293 1.00 uniprot150m 25037600 25037598 1.00

Table: Large Real

Dataset Nodes Edges Deg rand10m2x 10M 20M 2 rand10m5x 10M 50M 5 rand10m10x 10M 100M 10 rand100m2x 100M 200M 2 rand100m5x 100M 500M 5

Table: Large Synthetic

slide-53
SLIDE 53

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

1 2 3 4 5 6 7 8 9 10 [1,11] [1,8] [9,10] [1,1] [7,7] [2,6] [2,5] [9,9] [2,3] [4,4] [2,2]

  • Direct application gives false

positives such as 3− > 7

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree
slide-54
SLIDE 54

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Interval Labeling on a DAG

1 2 3 4 5 6 7 8 9 10 [1,11] [1,8] [9,10] [2,2] [1,1] [2,2] [7,7] [2,5] [2,6] [2,5] [9,9] [2,2] [2,3] [4,4] [2,2] [2,2]

  • Direct application gives false

positives such as 3− > 7

  • Variants of Interval Labeling

– Tree Cover

  • Optimal Tree Cover
  • GRIPP
  • PathTree
slide-55
SLIDE 55

Problem Definition & Motivation Background Our Approach : GRAIL Experiments Conclusion & Future Work

Exception Lists

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x c1 c2 c3 c4 e1 e2 e3 e4

Direct Exceptions : If Lu contains v, but none of the children of u contains Lv, then call the exception between u and v a direct exception. Indirect Exceptions : If at least one child of u contains v as an exception, then we call the exception between u and v as an indirect exception