Grammar-Based Graph Compression Fabian Peternek October 25, 2016 - - PowerPoint PPT Presentation

grammar based graph compression
SMART_READER_LITE
LIVE PREVIEW

Grammar-Based Graph Compression Fabian Peternek October 25, 2016 - - PowerPoint PPT Presentation

Grammar-Based Graph Compression Fabian Peternek October 25, 2016 Use of Grammar-Based Compression: Speed-up Algorithms Some queries can be answered in polynomial time on a grammar, e.g., Strings ( O ( n 2 log n ) , see also [Lohrey, 2012])


slide-1
SLIDE 1

Grammar-Based Graph Compression

Fabian Peternek October 25, 2016

slide-2
SLIDE 2

Use of Grammar-Based Compression: Speed-up Algorithms

Some queries can be answered in polynomial time on a grammar, e.g., Strings

◮ Equality

(O(n2 log∗ n), see also [Lohrey, 2012])

◮ NFA membership

(O(n|A|3), [Plandowski and Rytter, 1999])

◮ Pattern matching

(O(n2 log n), [Jez, 2012]) Trees

◮ Equality

(PTIME, [Busatto, Maneth, Lohrey, 2008])

◮ TA membership

(PTIME, [Lohrey, Maneth, Schmidt-Schauß, 2012])

◮ Isomorphism

(PTIME, [Lohrey, Maneth, P, 2015]) Speed-up proportional to compression ratio

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 2 / 28

slide-3
SLIDE 3

Use of Grammar-Based Compression: Speed-up Algorithms

Some queries can be answered in polynomial time on a grammar, e.g., Strings

◮ Equality

(O(n2 log∗ n), see also [Lohrey, 2012])

◮ NFA membership

(O(n|A|3), [Plandowski and Rytter, 1999])

◮ Pattern matching

(O(n2 log n), [Jez, 2012]) Trees

◮ Equality

(PTIME, [Busatto, Maneth, Lohrey, 2008])

◮ TA membership

(PTIME, [Lohrey, Maneth, Schmidt-Schauß, 2012])

◮ Isomorphism

(PTIME, [Lohrey, Maneth, P, 2015]) Speed-up proportional to compression ratio For graph grammars?

◮ Reachability

(O(|G|), [Maneth and P, 2016])

◮ Regular Path Queries

(O(|G||A|), [Maneth and P, 2016])

◮ Others?

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 2 / 28

slide-4
SLIDE 4

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-5
SLIDE 5

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

abcabcabc

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-6
SLIDE 6

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-7
SLIDE 7

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-8
SLIDE 8

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-9
SLIDE 9

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

1 times

b c

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-10
SLIDE 10

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

1 times

b c

1 times

c a

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-11
SLIDE 11

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

2 times

b c

1 times

c a

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-12
SLIDE 12

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

2 times

b c

2 times

c a

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-13
SLIDE 13

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

2 times

b c

2 times

c a

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-14
SLIDE 14

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

3 times

b c

2 times

c a

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-15
SLIDE 15

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

3 times

b c

3 times

c a

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-16
SLIDE 16

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

a b c a b c a b c

Digrams:

a b

3 times

b c

3 times

c a

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-17
SLIDE 17

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c a b c a b c

Digrams:

a b

2 times

b c

2 times

c a

2 times

A

A →

a b A c

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-18
SLIDE 18

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c c a b c

Digrams:

a b

1 times

b c

1 times

c a

1 times

A A

A →

a b A c

2 times

c A

1 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-19
SLIDE 19

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c c c A A A

A →

a b A c

3 times

c A

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-20
SLIDE 20

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c c c A A A

A →

1 2 a b A c

3 times

c A

2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-21
SLIDE 21

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c c A A B

A →

1 2 a b A c

2 times

c A

1 times B →

1 2 a b

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-22
SLIDE 22

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

c A B B

A →

1 2 a b A c

1 times B →

1 2 a b

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-23
SLIDE 23

Computing Small Grammars

Finding a smallest grammar is NP-complete [Charikar et al., 2005].

⇒ Approximations

One Approximation: RePair compression for Strings

B B B

A →

1 2 a b

B →

1 2 a b

Size: Original string: 9 edges, 10 nodes Grammar: 7 edges, 10 nodes

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 3 / 28

slide-24
SLIDE 24

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

a a a b b b c c c

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-25
SLIDE 25

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

a a a b b b c c c 1 2

: 3 times

a b

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-26
SLIDE 26

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

a a a b b b c c c 1 2 1 2

: 3 times : 3 times

a b b c

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-27
SLIDE 27

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

a a a b b b c c c 1 2 1 2 1 2

: 3 times : 3 times : 2 times Not 4 – overlapping!

a b b c c a

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-28
SLIDE 28

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

c c c A A A

A →

1 2 a b

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-29
SLIDE 29

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

c c c A A A

A →

1 2 a b 1 3 1 3 A c c A

: 3 times : 2 times

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-30
SLIDE 30

RePair for Graphs

Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]

B B B

A → B →

1 2 1 2 a b A c

Size of original graph: 17 Size of grammar: 15

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 4 / 28

slide-31
SLIDE 31

What if all nodes have a degree > 2?

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-32
SLIDE 32

What if all nodes have a degree > 2?

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-33
SLIDE 33

What if all nodes have a degree > 2?

1 2 3

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-34
SLIDE 34

What if all nodes have a degree > 2?

1 2 3

A

1 2 3

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-35
SLIDE 35

What if all nodes have a degree > 2?

1 2 3

A

1 2 3

size= 3

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-36
SLIDE 36

What if all nodes have a degree > 2?

B B B B B B

1 2 3 4 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 3 4 1 2

B →

1 2 3 4

size= 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-37
SLIDE 37

What if all nodes have a degree > 2?

B →

1 2 3 4

C C C

4 3 2 1 2 1 4 3 2 1 4 3

C →

1 2 3 4

B B

1 2 3 4 1 2 3 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-38
SLIDE 38

What if all nodes have a degree > 2?

B →

1 2 3 4

C C C

4 3 2 1 2 1 4 3 2 1 4 3

C →

1 2 3 4

B B

1 2 3 4 1 2 3 4

For ring with 2n+1 nodes: Size of finished grammar 19 + 14n ∈ O(n).

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 5 / 28

slide-39
SLIDE 39

Hypergraphs

Definition: Hypergraph

Quintuple g = (V, E, att, lab, ext) with

◮ nodes V = {1, . . . , n},

1 2 3

◮ V = {1, 2, 3}

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 6 / 28

slide-40
SLIDE 40

Hypergraphs

Definition: Hypergraph

Quintuple g = (V, E, att, lab, ext) with

◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗,

1 2 3 A

e3 e1 e2 2 1 3

◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,

att(e2) = 23, att(e3) = 213

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 6 / 28

slide-41
SLIDE 41

Hypergraphs

Definition: Hypergraph

Quintuple g = (V, E, att, lab, ext) with

◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and

1 2 3 A

e3

a

e1

b

e2 2 1 3

◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,

att(e2) = 23, att(e3) = 213

◮ lab(e1) = a,

lab(e2) = b, lab(e3) = A

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 6 / 28

slide-42
SLIDE 42

Hypergraphs

Definition: Hypergraph

Quintuple g = (V, E, att, lab, ext) with

◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and ◮ a string of external nodes ext ∈ V∗.

1 2 3

1 2

A

e3

a

e1

b

e2 2 1 3

◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,

att(e2) = 23, att(e3) = 213

◮ lab(e1) = a,

lab(e2) = b, lab(e3) = A

◮ ext = 31

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 6 / 28

slide-43
SLIDE 43

Hypergraphs

Definition: Hypergraph

Quintuple g = (V, E, att, lab, ext) with

◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and ◮ a string of external nodes ext ∈ V∗.

Two additional conditions: (C1) ∀e ∈ E : att(e) is pairwise distinct (C2) ext is pairwise distinct. 1 2 3

1 2

A

e3

a

e1

b

e2 2 1 3

◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,

att(e2) = 23, att(e3) = 213

◮ lab(e1) = a,

lab(e2) = b, lab(e3) = A

◮ ext = 31

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 6 / 28

slide-44
SLIDE 44

Hypergraphs – Further Properties

Let g = (V, E, att, lab, ext) be a hypergraph, and e ∈ E an edge.

Notation

Rank rank(e) = |att(e)|, rank(g) = |ext| HGR(Σ) Set of all hypergraphs over Σ.

Size Definition

◮ |e| = 1 if |rank(e)| ≤ 2, and |e| = |rank(e)| else. ◮ |g|V = |V| ◮ |g|E =

e∈E |e|

◮ |g| = |g|V + |g|E

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 7 / 28

slide-45
SLIDE 45

Hyperedge Replacement Grammars

Definition: Hyperedge Replacement Grammar (HR grammar)

Triple G = (N, P, S) with

◮ a ranked alphabet N ⊆ N, ◮ rules P = {(A, g) | A ∈ N, g ∈ HGR(Σ), where rank(A) = rank(g)} ◮ an initial hypergraph S ∈ HGR(Σ ∪ N).

Size of an HR-grammar:

◮ |G| = |S| + (A,h)∈P |h|

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 8 / 28

slide-46
SLIDE 46

Hyperedge Replacement Grammars

Definition: Hyperedge Replacement Grammar (HR grammar)

Triple G = (N, P, S) with

◮ a ranked alphabet N ⊆ N, ◮ rules P = {(A, g) | A ∈ N, g ∈ HGR(Σ), where rank(A) = rank(g)} ◮ an initial hypergraph S ∈ HGR(Σ ∪ N).

Size of an HR-grammar:

◮ |G| = |S| + (A,h)∈P |h|

Straight-line HR grammar (SL-HR grammar)

We call an HR grammar straight-line, if it is acyclic and deterministic. Note: An SL-HR grammar represents exactly one graph, denoted by val(G).

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 8 / 28

slide-47
SLIDE 47

Problem: Occurrence Counting

Problem

Finding maximal cardinality set of non-overlapping occurrences.

◮ Reducible to maximum matching

◮ cubic complexity (e.g., Blossom algorithm)

Approach: Greedy counting following fixed node order

◮ Gives exact result for strings and trees

◮ Left-to-right on strings ◮ Post-order for trees

◮ Only an approximation for graphs.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 9 / 28

slide-48
SLIDE 48

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph:

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-49
SLIDE 49

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-50
SLIDE 50

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-51
SLIDE 51

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-52
SLIDE 52

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-53
SLIDE 53

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1 2

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-54
SLIDE 54

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 3 1 2

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-55
SLIDE 55

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 4 3 1 2

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-56
SLIDE 56

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 5 4 3 1 2

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-57
SLIDE 57

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-58
SLIDE 58

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 2 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-59
SLIDE 59

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 3 2 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-60
SLIDE 60

Node Order influences Result

Greedy Occurrence Finding

Find occurrences of in this graph: 3 2 1 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 10 / 28

slide-61
SLIDE 61

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-62
SLIDE 62

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-63
SLIDE 63

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

1 1 3 2 1

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-64
SLIDE 64

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

1 1 3 2 1

(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-65
SLIDE 65

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

1 1 3 2 1

(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)

Order the labels lexicographically:

(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-66
SLIDE 66

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)

Order the labels lexicographically:

(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-67
SLIDE 67

Our Order

Order that worked well in experiments:

◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached

FP-Order

(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)

2 2 4 3 1

Order the labels lexicographically:

(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 11 / 28

slide-68
SLIDE 68

Compressible?

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 12 / 28

slide-69
SLIDE 69

Limits of HR-Grammar Based Compression

Fact: HR grammars cannot represent families of unbounded tree-width.

Conjecture

Let Cn be a complete graph with 2n nodes.

  • 1. There is no SL-HR grammar G with val(G) = Cn and |G| ∈ O(n).

Note: Some compression is still possible.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 13 / 28

slide-70
SLIDE 70

Another Family of Unbounded Tree-Width

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 14 / 28

slide-71
SLIDE 71

Bounded Tree-Width: Hopeless?

... . . . . . . . . . . . . . . . n 2n

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 15 / 28

slide-72
SLIDE 72

Bounded Tree-Width: Hopeless?

... . . . . . . . . . . . . . . . n 2n

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 15 / 28

slide-73
SLIDE 73

Bounded Tree-Width: Hopeless?

... . . . . . . . . . . . . . . . n 2n A A → . . .

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 15 / 28

slide-74
SLIDE 74

Bounded Tree-Width: Hopeless?

... . . . . . . . . . . . . . . . n 2n A A A A → . . . Represents n × 2n-grid, but can be further compressed to size O(n).

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 15 / 28

slide-75
SLIDE 75

Encoding SL-HR grammars

Typical result

◮ Relatively large remaining graph S ◮ Relatively small rules

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 16 / 28

slide-76
SLIDE 76

Encoding SL-HR grammars

Typical result

◮ Relatively large remaining graph S ◮ Relatively small rules

To encode the resulting grammars into a file:

◮ k2-tree-method for the startgraph ◮ One tree per edge-label ◮ Simple edge lists for the production rules

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 16 / 28

slide-77
SLIDE 77

Encoding SL-HR grammars

Typical result

◮ Relatively large remaining graph S ◮ Relatively small rules

To encode the resulting grammars into a file:

◮ k2-tree-method for the startgraph ◮ One tree per edge-label ◮ Simple edge lists for the production rules

Note:

  • 1. This is currently not an in-memory data structure, but only file

compression.

  • 2. We use our own implementation of the k2-tree-method, which is not as
  • ptimized.
  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 16 / 28

slide-78
SLIDE 78

Simple edges

Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 17 / 28

slide-79
SLIDE 79

Simple edges

Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 17 / 28

slide-80
SLIDE 80

Simple edges

Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 17 / 28

slide-81
SLIDE 81

Simple edges

Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

    

1 1

    

A :

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 17 / 28

slide-82
SLIDE 82

Simple edges

Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

    

1 1

    

A :

    

1

    

B :

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 17 / 28

slide-83
SLIDE 83

Hyperedges (rank ≥ 3)

For hyperedges, we use incidence matrices

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

  • 1

1 1 1 1 1

  • C :
  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 18 / 28

slide-84
SLIDE 84

Hyperedges (rank ≥ 3)

For hyperedges, we use incidence matrices

Example

1 C 2 3 C 4 A B A

2 3 1 2 1 3

  • 1

1 1 1 1 1

  • C :

Problem: Order lost.

Additional file contains dictionary of att-permutations.

e1 → (3, 1, 2) e2 → (2, 1, 3)

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 18 / 28

slide-85
SLIDE 85

Rules

Stored as a simple edge list, using δ-coded encodings of

  • 1. Number of edges in the rule
  • 2. For every edge

◮ number of nodes it is attached to, ◮ whether it is nonterminal, and ◮ the node-ids it is attached to, and whether the node is external

The next rule is concatenated, the rules are ordered by their nonterminal.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 19 / 28

slide-86
SLIDE 86

Experimental Results – Graphs

Network Graphs

|V| |E|

CA-AstroPh 18 772 396 160 CA-CondMat 23 133 186 936 CA-GrQc 5 242 28 980 DBLP60-70 24 246 23 677 Email-Enron 36 692 367 662 Email-EuAll 265 214 420 045 NotreDame 325 729 1 497 134 Wiki-Talk 2 394 385 5 021 410 Wiki-Vote 7 115 103 689 RDF Graphs

|V| |E| |Σ|

Specific properties en 609 014 819 764 71 Types ru 642 340 642 364 1 Types es 818 657 819 780 1 Types de with en 618 708 1 810 909 1 Identica 16 355 29 683 12 Jamendo 438 975 1 047 898 25

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 20 / 28

slide-87
SLIDE 87

Experimental Results – Setting

Orders evaluated

◮ Natural ◮ BFS ◮ Degree ◮ FP

Parameters used

◮ Maximal rank of NT: 4

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 21 / 28

slide-88
SLIDE 88

Experimental Results – Setting

Orders evaluated

◮ Natural ◮ BFS ◮ Degree ◮ FP

Parameters used

◮ Maximal rank of NT: 4

Methods compared to

◮ k2-tree [Brisaboa et al., 2014] ◮ Dense Substructure Removal/k2-tree [Hernández and Navarro, 2014] ◮ List-Merge [Grabowski and Bieniecki, 2014]

Note: These are all in-memory data structures

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 21 / 28

slide-89
SLIDE 89

Experimental Results – Impact of Order

CA-AstroPh Email-Enron Email-EuAll NotreDame Jamendo Specific Properties en 2 4 6 8 10 12 14 16 18 bpe FP Degree BFS Nat

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 22 / 28

slide-90
SLIDE 90

Experimental Results – Networks

Email-EuAll NotreDame Wiki-Talk Wiki-Vote CA-AstroPh CA-CondMat CA-GrQc Email-Enron 2 4 6 8 10 12 14 16 18 20 22 bpe gRePair gRePair +DSR k2-tree HN LM

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 23 / 28

slide-91
SLIDE 91

Experimental Results – RDF

Graph gRePair k2-tree Specific properties en 12.70 27.29 Types ru 0.01 7.53 Types es 0.03 9.38 Types de with en 1.21 5.06 Identica 8.35 14.31 Jamendo 6.79 7.72

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 24 / 28

slide-92
SLIDE 92

Speed-Up Algorithms

  • 1. Reachability

Theorem

Given an SL-HR grammar G, and nodes s, t from val(G), it can be decided in time O(|G|), whether there exists a path from s to t in val(G).

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 25 / 28

slide-93
SLIDE 93

Speed-Up Algorithms

  • 1. Reachability

Theorem

Given an SL-HR grammar G, and nodes s, t from val(G), it can be decided in time O(|G|), whether there exists a path from s to t in val(G). Method:

◮ Compute compressed paths within G to the nodes representing s, t. ◮ Bottom-up compute pairs of external nodes that are reachable ◮ Find the (“external”) nodes that are reachable in S from s and the nodes

that can reach t from S.

◮ Solve reachability for these two sets of nodes classically.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 25 / 28

slide-94
SLIDE 94

Speed-Up Algorithms

  • 2. Regular Path Queries

Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 26 / 28

slide-95
SLIDE 95

Speed-Up Algorithms

  • 2. Regular Path Queries

Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α. Uncompressed algorithm:

  • 1. Compute NFA A for α
  • 2. Treat graph g as if it was an NFA Ag with initial state s and final state t
  • 3. Compute product automaton for L(Ag) ∩ L(A) and test for emptiness
  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 26 / 28

slide-96
SLIDE 96

Speed-Up Algorithms

  • 2. Regular Path Queries

Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α. Uncompressed algorithm:

  • 1. Compute NFA A for α
  • 2. Treat graph g as if it was an NFA Ag with initial state s and final state t
  • 3. Compute product automaton for L(Ag) ∩ L(A) and test for emptiness

Theorem

Given an SL-HR grammar G, nodes s, t from val(G), and a regular Path Query α it can be decided in time O(|G||A|), whether there exists a path satisfying α from s to t in val(G). Main idea:

◮ Compute SL-HR grammar G′ representing the product automaton of

val(G) and A.

◮ Then solve reachability.

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 26 / 28

slide-97
SLIDE 97

Outlook

More expressive formalism: nonterminal nodes instead of hyperedges.

NR grammar representing all complete graphs

S : A → A A a

|

a

{(A, a), (a, a)}

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 27 / 28

slide-98
SLIDE 98

Outlook

More expressive formalism: nonterminal nodes instead of hyperedges.

NR grammar representing all complete graphs

S : A → A A a

|

a

{(A, a), (a, a)}

Why choose HR instead? Digram definition. Unclear with NR: neighboring nodes and edge, but what about embedding?

Updated Conjecture

Let Cn be a complete graph with 2n nodes.

  • 1. There is no SL-HR grammar G with val(G) = Cn and |G| ∈ O(n).
  • 2. There is an SL-NR grammar G with val(G) = Cn and |G| ∈ O(n).
  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 27 / 28

slide-99
SLIDE 99

Open Questions/Future Work

Querying

◮ Neighborhood queries are possible, but linear in height of grammar ◮ Would need efficient in-memory data structures for the grammar

Hypergraphs

◮ Better way of representing hypergraphs?

  • F. Peternek

Grammar-Based Graph Compression October 25, 2016 28 / 28