PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability - - PowerPoint PPT Presentation

paper discussion adding regular expressions to graph
SMART_READER_LITE
LIVE PREVIEW

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability - - PowerPoint PPT Presentation

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries Xilun Wu Purdue University Contribution Describe reachability queries and graph pattern queries in a subset of Regular Expression . (Tractable) Define


slide-1
SLIDE 1

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries

Xilun Wu Purdue University

slide-2
SLIDE 2

Contribution

Describe reachability queries and graph pattern queries in a subset

  • f Regular Expression. (Tractable)

Define queries using graph simulation instead of using subgraph isomorphism. Low polynomial time algorithms for containment, equivalence and minimization problems for RQs and PQs. algorithm for RQ answering and for PQ answering.

O(N2) O(N3)

slide-3
SLIDE 3

Problem Definition

Reachability queries: whether there exists a path from one node to another. Graph pattern queries: find all subgraphs of a graph that are isomorphic to a pattern graph.

multiple edge types (fa, fn, sa, sn) indicating various relationships

slide-4
SLIDE 4

Problem Definition

Reachability queries: whether there exists a path from one node to another. Graph pattern queries: find all subgraphs of a graph that are isomorphic to a pattern graph.

Q1: find all biologists (nodes C) who support “cloning”, along with those doctors (nodes B) who are friends-nemeses (via fn) of some users supported by C within 2 hops (via fa≤2). Q2: 1. “Alice”s friends-nemeses (via fn) who are doctors, and are against “cloning”. 2. biologists who support “cloning research”, and are connected within 2 hops to someone via fa relationships who is within 2 hops to person D via sa 3. a scientist group with friends all sharing the same view towards cloning 4. these biologists are against those doctor friends of Alice, and vice versa, via paths of certain patterns

1 2 3 4

Identify connectivity via a path: (a) with edges of particular types and patterns, and (b) with a bound on its length (hops).

slide-5
SLIDE 5

Notation

G = (V, E, fA, fC)

slide-6
SLIDE 6

Notation (Pattern Query- Unfinished)

G = (V, E, fA, fC)

slide-7
SLIDE 7

Subgraph Isomorphism

These hinder the usability in emerging applications, e.g., social networks Keep exact structure topology between Q and Gs May return exponential many matched subgraphs Decision problem is NP-complete In certain scenarios, too restrictive to find matches

  • Pattern graph Q, subgraph Gs of data graph G
  • Q matches Gs if there exists a bijective function f: VQ→ VGs such that

– for each node u in Q, u and f(u) have the same label – An edge (u, u‘) in Q if and only if (f(u), f(u')) is an edge in Gs

  • Goodness:
  • Badness:
slide-8
SLIDE 8

Graph Simulation

  • Given pattern graph Q(Vq, Eq) and data graph G(V, E), a

binary relation R ⊆ Vq × V is said to be a match if

– (1) for each (u, v) ∈ R, u and v have the same label; and – (2) for each edge (u, u′) ∈ Eq, there exists an edge (v, v′) in E such that (u′, v′) ∈ R.

  • Graph G matches pattern Q via graph simulation, if there

exists a total match relation M

– for each u ∈ Vq, there exists v ∈ V such that (u, v) ∈ M.

  • Goodness:
  • Badness:

Subgraph isomorphism (NP-complete) vs. graph simulation (O(n2))! Quadratic time solvable

Return a single unique matched subgraph

Lose structure topology (how much? open question)

slide-9
SLIDE 9

Graph Simulation

Subgraph Isomorphism is too strict for emerging applications!

Set up a team to develop a new software product Graph simulation returns F3, F4 and F5; Subgraph isomorphism returns empty!

slide-10
SLIDE 10

Graph Simulation Loses Structures

Connected pattern graphs match disconnected subgraphs Cyclic pattern graphs match tree subgraphs

  • S(HR) = {HR}
  • S(SE) = {SE}
  • S(Bio) = {Bio1, Bio2}

Q Gs

  • S(HR) = {HR}
  • S(SE) = {SE}
  • S(Bio) = {Bio1, Bio2}

Q Gs

These motivate us to propose a new matching model!

slide-11
SLIDE 11

Complexity (Bounded Regex)

  • 1. The subgraph isomorphism problem is NP-complete.
  • 2. But bounded graph simulation time is polynomial.
  • 3. Bounded = allow bounds on the number of hops.
slide-12
SLIDE 12

To Be Continued

  • 1. Two efficient algorithms for RQ
  • 2. Two efficient algorithms for PQ
  • 3. Containment, Equivalence, and Minimization problems for RQ

and PQ, their complexity bounds, and algorithms.

  • 4. Evaluations
  • 5. Perhaps more on Graph Simulation and Subgraph Isomorphism.
slide-13
SLIDE 13

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries (Cont’d)

Xilun Wu Purdue University

slide-14
SLIDE 14

Content

  • 1. Two efficient algorithms for RQs
  • 2. Two efficient algorithms for PQs
slide-15
SLIDE 15

Reachability Queries (RQs)

1. 2.

  • 3. if nodes and the edge between satisfy the predicates,

G = (V, E, fA, fC) Qr = (u1, u2, fu1, fu2, fe) (v1, v2) is a match of (u1, u2) (v1 ∼ u1, v2 ∼ u2, (v1, v2) ≈ f2)

slide-16
SLIDE 16

Graph Pattern Queries (PQs)

1. 2.

  • 3. if nodes and edges satisfy the predicates,
  • 4. Can be answered using RQs.

G = (V, E, fA, fC) Qp = (Vp, Ep, fv, fe) (V1, E1) is a match of (Vp, Ep)

slide-17
SLIDE 17

Answer RQs

  • 1. Two methods:
  • 1. Shortest distance matrix
  • 2. Bi-directional BFS with an auxiliary LRU cache
  • 2. Starts with single-color RQs. A multiple-color RQ

can be decomposed into k single-color RQs.

F = F1F2…Fk

slide-18
SLIDE 18

Single-Color RQs

  • 1. Shortest distance matrix
  • 1. Matrix contains pair-wise shortest distance of nodes with the bound k on number of
  • edges. The third dimension is the color. Is the distance along color c path of no

more than k edges.

  • 2. Assumption: this matrix is pre-computed in
  • 3. Answer time:

1. 2. 3.

Mk Mk[v1][v2][c] O((m + 1)|V|2 + |V|(|V| + |E|)) (_, _) ∼ O(|V|2) (v1, _) or (_, v2) ∼ O(|V|) (v1, v2) ∼ O(1)

slide-19
SLIDE 19

Single-Color RQs

  • 2. Bi-directional BFS with an auxiliary LRU cache
  • 1. Iterate BFS from src and dest until those two sets intersect or one

becomes empty.

  • 2. Answer time:

1. 2. 3.

(_, _) ∼ O(|V|2(|V| + |E|)) (v1, _) or (_, v2) ∼ O(|V|(|V| + |E|)) (v1, v2) ∼ O(|V| + |E|)

slide-20
SLIDE 20

Multi-Color RQs

  • 1. Shortest distance matrix
  • 1. can be decomposed into k single-color RQs.

2.

  • 3. Answer time:

1. 2. 3.

(_, _) ∼ O(|V|2) (v1, _) or (_, v2) ∼ O(|V|) (v1, v2) ∼ O(1)

F = F1F2…Fk (_, v2); (_, _); …; (v1, _) O(k|V|2)

slide-21
SLIDE 21

Multi-Color RQs

  • 2. Bi-directional BFS with an auxiliary LRU cache
  • 1. Extend the set from src and dest.

2.

  • 3. Terminate when those two sets intersect or one becomes empty.

1. 2. 3.

(v1, _), (_, v2); (_, _); …; (_, _) O(k|V|2(|V| + |E|))

(_, _) ∼ O(|V|2(|V| + |E|)) (v1, _) or (_, v2) ∼ O(|V|(|V| + |E|)) (v1, v2) ∼ O(|V| + |E|)

slide-22
SLIDE 22

Graph Pattern Queries (PQs)

  • 1. JoinMatch
  • 2. SplitMatch
slide-23
SLIDE 23

JoinMatch

  • 1. Create a candidate match set for node.
  • 2. Use RQs to remove ineligible nodes from the set
  • 3. The input graph has to be DAG. Otherwise, compute SCC

instead.

slide-24
SLIDE 24

JoinMatch

slide-25
SLIDE 25

SplitMatch

  • 1. Treat query nodes and graph nodes uniformly.
  • 2. Group nodes into blocks. Each block contains a set of nodes

from both sources.

  • 3. Compute partition-relation pair. This pair is recursively refined

by splitting the blocks based on constraints (same rmv-set concept in JoinMatch).

slide-26
SLIDE 26

SplitMatch

slide-27
SLIDE 27

SplitMatch

slide-28
SLIDE 28

SplitMatch