Grammar-Based Graph Compression Fabian Peternek October 25, 2016 - - PowerPoint PPT Presentation
Grammar-Based Graph Compression Fabian Peternek October 25, 2016 - - PowerPoint PPT Presentation
Grammar-Based Graph Compression Fabian Peternek October 25, 2016 Use of Grammar-Based Compression: Speed-up Algorithms Some queries can be answered in polynomial time on a grammar, e.g., Strings ( O ( n 2 log n ) , see also [Lohrey, 2012])
Use of Grammar-Based Compression: Speed-up Algorithms
Some queries can be answered in polynomial time on a grammar, e.g., Strings
◮ Equality
(O(n2 log∗ n), see also [Lohrey, 2012])
◮ NFA membership
(O(n|A|3), [Plandowski and Rytter, 1999])
◮ Pattern matching
(O(n2 log n), [Jez, 2012]) Trees
◮ Equality
(PTIME, [Busatto, Maneth, Lohrey, 2008])
◮ TA membership
(PTIME, [Lohrey, Maneth, Schmidt-Schauß, 2012])
◮ Isomorphism
(PTIME, [Lohrey, Maneth, P, 2015]) Speed-up proportional to compression ratio
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 2 / 28
Use of Grammar-Based Compression: Speed-up Algorithms
Some queries can be answered in polynomial time on a grammar, e.g., Strings
◮ Equality
(O(n2 log∗ n), see also [Lohrey, 2012])
◮ NFA membership
(O(n|A|3), [Plandowski and Rytter, 1999])
◮ Pattern matching
(O(n2 log n), [Jez, 2012]) Trees
◮ Equality
(PTIME, [Busatto, Maneth, Lohrey, 2008])
◮ TA membership
(PTIME, [Lohrey, Maneth, Schmidt-Schauß, 2012])
◮ Isomorphism
(PTIME, [Lohrey, Maneth, P, 2015]) Speed-up proportional to compression ratio For graph grammars?
◮ Reachability
(O(|G|), [Maneth and P, 2016])
◮ Regular Path Queries
(O(|G||A|), [Maneth and P, 2016])
◮ Others?
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 2 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
abcabcabc
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
1 times
b c
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
1 times
b c
1 times
c a
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
2 times
b c
1 times
c a
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
2 times
b c
2 times
c a
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
2 times
b c
2 times
c a
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
3 times
b c
2 times
c a
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
3 times
b c
3 times
c a
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
a b c a b c a b c
Digrams:
a b
3 times
b c
3 times
c a
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c a b c a b c
Digrams:
a b
2 times
b c
2 times
c a
2 times
A
A →
a b A c
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c c a b c
Digrams:
a b
1 times
b c
1 times
c a
1 times
A A
A →
a b A c
2 times
c A
1 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c c c A A A
A →
a b A c
3 times
c A
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c c c A A A
A →
1 2 a b A c
3 times
c A
2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c c A A B
A →
1 2 a b A c
2 times
c A
1 times B →
1 2 a b
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
c A B B
A →
1 2 a b A c
1 times B →
1 2 a b
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
Computing Small Grammars
Finding a smallest grammar is NP-complete [Charikar et al., 2005].
⇒ Approximations
One Approximation: RePair compression for Strings
B B B
A →
1 2 a b
B →
1 2 a b
Size: Original string: 9 edges, 10 nodes Grammar: 7 edges, 10 nodes
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 3 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
a a a b b b c c c
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
a a a b b b c c c 1 2
: 3 times
a b
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
a a a b b b c c c 1 2 1 2
: 3 times : 3 times
a b b c
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
a a a b b b c c c 1 2 1 2 1 2
: 3 times : 3 times : 2 times Not 4 – overlapping!
a b b c c a
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
c c c A A A
A →
1 2 a b
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
c c c A A A
A →
1 2 a b 1 3 1 3 A c c A
: 3 times : 2 times
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
RePair for Graphs
Goal: Extend RePair to graphs with hyperedge replacement graph grammars [Habel, 1992]
B B B
A → B →
1 2 1 2 a b A c
Size of original graph: 17 Size of grammar: 15
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 4 / 28
What if all nodes have a degree > 2?
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
1 2 3
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
1 2 3
A
1 2 3
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
1 2 3
A
1 2 3
size= 3
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
B B B B B B
1 2 3 4 1 2 3 4 3 4 1 2 1 2 3 4 3 4 1 2 3 4 1 2
B →
1 2 3 4
size= 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
B →
1 2 3 4
C C C
4 3 2 1 2 1 4 3 2 1 4 3
C →
1 2 3 4
B B
1 2 3 4 1 2 3 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
What if all nodes have a degree > 2?
B →
1 2 3 4
C C C
4 3 2 1 2 1 4 3 2 1 4 3
C →
1 2 3 4
B B
1 2 3 4 1 2 3 4
For ring with 2n+1 nodes: Size of finished grammar 19 + 14n ∈ O(n).
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 5 / 28
Hypergraphs
Definition: Hypergraph
Quintuple g = (V, E, att, lab, ext) with
◮ nodes V = {1, . . . , n},
1 2 3
◮ V = {1, 2, 3}
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 6 / 28
Hypergraphs
Definition: Hypergraph
Quintuple g = (V, E, att, lab, ext) with
◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗,
1 2 3 A
e3 e1 e2 2 1 3
◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,
att(e2) = 23, att(e3) = 213
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 6 / 28
Hypergraphs
Definition: Hypergraph
Quintuple g = (V, E, att, lab, ext) with
◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and
1 2 3 A
e3
a
e1
b
e2 2 1 3
◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,
att(e2) = 23, att(e3) = 213
◮ lab(e1) = a,
lab(e2) = b, lab(e3) = A
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 6 / 28
Hypergraphs
Definition: Hypergraph
Quintuple g = (V, E, att, lab, ext) with
◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and ◮ a string of external nodes ext ∈ V∗.
1 2 3
1 2
A
e3
a
e1
b
e2 2 1 3
◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,
att(e2) = 23, att(e3) = 213
◮ lab(e1) = a,
lab(e2) = b, lab(e3) = A
◮ ext = 31
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 6 / 28
Hypergraphs
Definition: Hypergraph
Quintuple g = (V, E, att, lab, ext) with
◮ nodes V = {1, . . . , n}, ◮ edges E = {e1, . . . , em}, ◮ an attachment mapping att : E → V∗, ◮ an edge labeling lab : E → Σ, and ◮ a string of external nodes ext ∈ V∗.
Two additional conditions: (C1) ∀e ∈ E : att(e) is pairwise distinct (C2) ext is pairwise distinct. 1 2 3
1 2
A
e3
a
e1
b
e2 2 1 3
◮ V = {1, 2, 3} ◮ E = {e1, e2, e3} ◮ att(e1) = 12,
att(e2) = 23, att(e3) = 213
◮ lab(e1) = a,
lab(e2) = b, lab(e3) = A
◮ ext = 31
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 6 / 28
Hypergraphs – Further Properties
Let g = (V, E, att, lab, ext) be a hypergraph, and e ∈ E an edge.
Notation
Rank rank(e) = |att(e)|, rank(g) = |ext| HGR(Σ) Set of all hypergraphs over Σ.
Size Definition
◮ |e| = 1 if |rank(e)| ≤ 2, and |e| = |rank(e)| else. ◮ |g|V = |V| ◮ |g|E =
e∈E |e|
◮ |g| = |g|V + |g|E
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 7 / 28
Hyperedge Replacement Grammars
Definition: Hyperedge Replacement Grammar (HR grammar)
Triple G = (N, P, S) with
◮ a ranked alphabet N ⊆ N, ◮ rules P = {(A, g) | A ∈ N, g ∈ HGR(Σ), where rank(A) = rank(g)} ◮ an initial hypergraph S ∈ HGR(Σ ∪ N).
Size of an HR-grammar:
◮ |G| = |S| + (A,h)∈P |h|
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 8 / 28
Hyperedge Replacement Grammars
Definition: Hyperedge Replacement Grammar (HR grammar)
Triple G = (N, P, S) with
◮ a ranked alphabet N ⊆ N, ◮ rules P = {(A, g) | A ∈ N, g ∈ HGR(Σ), where rank(A) = rank(g)} ◮ an initial hypergraph S ∈ HGR(Σ ∪ N).
Size of an HR-grammar:
◮ |G| = |S| + (A,h)∈P |h|
Straight-line HR grammar (SL-HR grammar)
We call an HR grammar straight-line, if it is acyclic and deterministic. Note: An SL-HR grammar represents exactly one graph, denoted by val(G).
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 8 / 28
Problem: Occurrence Counting
Problem
Finding maximal cardinality set of non-overlapping occurrences.
◮ Reducible to maximum matching
◮ cubic complexity (e.g., Blossom algorithm)
Approach: Greedy counting following fixed node order
◮ Gives exact result for strings and trees
◮ Left-to-right on strings ◮ Post-order for trees
◮ Only an approximation for graphs.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 9 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph:
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1 2
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 3 1 2
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 4 3 1 2
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 5 4 3 1 2
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 2 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 3 2 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Node Order influences Result
Greedy Occurrence Finding
Find occurrences of in this graph: 3 2 1 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 10 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
1 1 3 2 1
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
1 1 3 2 1
(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)
⇒
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
1 1 3 2 1
(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)
⇒
Order the labels lexicographically:
(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)
Order the labels lexicographically:
(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Our Order
Order that worked well in experiments:
◮ start with node degrees ◮ recursively refine using the neighborhoods until fixpoint reached
FP-Order
(1, 3) (1, 3) (3, 1, 1, 2) (2, 1, 3) (1, 2)
2 2 4 3 1
⇒
Order the labels lexicographically:
(1, 2) ⇒ 1 (1, 3) ⇒ 2 (2, 1, 3) ⇒ 3 (3, 1, 1, 2) ⇒ 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 11 / 28
Compressible?
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 12 / 28
Limits of HR-Grammar Based Compression
Fact: HR grammars cannot represent families of unbounded tree-width.
Conjecture
Let Cn be a complete graph with 2n nodes.
- 1. There is no SL-HR grammar G with val(G) = Cn and |G| ∈ O(n).
Note: Some compression is still possible.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 13 / 28
Another Family of Unbounded Tree-Width
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 14 / 28
Bounded Tree-Width: Hopeless?
... . . . . . . . . . . . . . . . n 2n
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 15 / 28
Bounded Tree-Width: Hopeless?
... . . . . . . . . . . . . . . . n 2n
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 15 / 28
Bounded Tree-Width: Hopeless?
... . . . . . . . . . . . . . . . n 2n A A → . . .
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 15 / 28
Bounded Tree-Width: Hopeless?
... . . . . . . . . . . . . . . . n 2n A A A A → . . . Represents n × 2n-grid, but can be further compressed to size O(n).
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 15 / 28
Encoding SL-HR grammars
Typical result
◮ Relatively large remaining graph S ◮ Relatively small rules
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 16 / 28
Encoding SL-HR grammars
Typical result
◮ Relatively large remaining graph S ◮ Relatively small rules
To encode the resulting grammars into a file:
◮ k2-tree-method for the startgraph ◮ One tree per edge-label ◮ Simple edge lists for the production rules
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 16 / 28
Encoding SL-HR grammars
Typical result
◮ Relatively large remaining graph S ◮ Relatively small rules
To encode the resulting grammars into a file:
◮ k2-tree-method for the startgraph ◮ One tree per edge-label ◮ Simple edge lists for the production rules
Note:
- 1. This is currently not an in-memory data structure, but only file
compression.
- 2. We use our own implementation of the k2-tree-method, which is not as
- ptimized.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 16 / 28
Simple edges
Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 17 / 28
Simple edges
Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 17 / 28
Simple edges
Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 17 / 28
Simple edges
Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
1 1
A :
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 17 / 28
Simple edges
Simple edges, i.e., those of rank 1 or 2, are encoded via adjacency matrices.
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
1 1
A :
1
B :
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 17 / 28
Hyperedges (rank ≥ 3)
For hyperedges, we use incidence matrices
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
- 1
1 1 1 1 1
- C :
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 18 / 28
Hyperedges (rank ≥ 3)
For hyperedges, we use incidence matrices
Example
1 C 2 3 C 4 A B A
2 3 1 2 1 3
- 1
1 1 1 1 1
- C :
Problem: Order lost.
Additional file contains dictionary of att-permutations.
e1 → (3, 1, 2) e2 → (2, 1, 3)
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 18 / 28
Rules
Stored as a simple edge list, using δ-coded encodings of
- 1. Number of edges in the rule
- 2. For every edge
◮ number of nodes it is attached to, ◮ whether it is nonterminal, and ◮ the node-ids it is attached to, and whether the node is external
The next rule is concatenated, the rules are ordered by their nonterminal.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 19 / 28
Experimental Results – Graphs
Network Graphs
|V| |E|
CA-AstroPh 18 772 396 160 CA-CondMat 23 133 186 936 CA-GrQc 5 242 28 980 DBLP60-70 24 246 23 677 Email-Enron 36 692 367 662 Email-EuAll 265 214 420 045 NotreDame 325 729 1 497 134 Wiki-Talk 2 394 385 5 021 410 Wiki-Vote 7 115 103 689 RDF Graphs
|V| |E| |Σ|
Specific properties en 609 014 819 764 71 Types ru 642 340 642 364 1 Types es 818 657 819 780 1 Types de with en 618 708 1 810 909 1 Identica 16 355 29 683 12 Jamendo 438 975 1 047 898 25
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 20 / 28
Experimental Results – Setting
Orders evaluated
◮ Natural ◮ BFS ◮ Degree ◮ FP
Parameters used
◮ Maximal rank of NT: 4
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 21 / 28
Experimental Results – Setting
Orders evaluated
◮ Natural ◮ BFS ◮ Degree ◮ FP
Parameters used
◮ Maximal rank of NT: 4
Methods compared to
◮ k2-tree [Brisaboa et al., 2014] ◮ Dense Substructure Removal/k2-tree [Hernández and Navarro, 2014] ◮ List-Merge [Grabowski and Bieniecki, 2014]
Note: These are all in-memory data structures
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 21 / 28
Experimental Results – Impact of Order
CA-AstroPh Email-Enron Email-EuAll NotreDame Jamendo Specific Properties en 2 4 6 8 10 12 14 16 18 bpe FP Degree BFS Nat
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 22 / 28
Experimental Results – Networks
Email-EuAll NotreDame Wiki-Talk Wiki-Vote CA-AstroPh CA-CondMat CA-GrQc Email-Enron 2 4 6 8 10 12 14 16 18 20 22 bpe gRePair gRePair +DSR k2-tree HN LM
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 23 / 28
Experimental Results – RDF
Graph gRePair k2-tree Specific properties en 12.70 27.29 Types ru 0.01 7.53 Types es 0.03 9.38 Types de with en 1.21 5.06 Identica 8.35 14.31 Jamendo 6.79 7.72
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 24 / 28
Speed-Up Algorithms
- 1. Reachability
Theorem
Given an SL-HR grammar G, and nodes s, t from val(G), it can be decided in time O(|G|), whether there exists a path from s to t in val(G).
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 25 / 28
Speed-Up Algorithms
- 1. Reachability
Theorem
Given an SL-HR grammar G, and nodes s, t from val(G), it can be decided in time O(|G|), whether there exists a path from s to t in val(G). Method:
◮ Compute compressed paths within G to the nodes representing s, t. ◮ Bottom-up compute pairs of external nodes that are reachable ◮ Find the (“external”) nodes that are reachable in S from s and the nodes
that can reach t from S.
◮ Solve reachability for these two sets of nodes classically.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 25 / 28
Speed-Up Algorithms
- 2. Regular Path Queries
Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 26 / 28
Speed-Up Algorithms
- 2. Regular Path Queries
Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α. Uncompressed algorithm:
- 1. Compute NFA A for α
- 2. Treat graph g as if it was an NFA Ag with initial state s and final state t
- 3. Compute product automaton for L(Ag) ∩ L(A) and test for emptiness
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 26 / 28
Speed-Up Algorithms
- 2. Regular Path Queries
Regular path query: Regular expression α over edge alphabet. Satisfied by s, t: if there is a path from s to t that matches α. Uncompressed algorithm:
- 1. Compute NFA A for α
- 2. Treat graph g as if it was an NFA Ag with initial state s and final state t
- 3. Compute product automaton for L(Ag) ∩ L(A) and test for emptiness
Theorem
Given an SL-HR grammar G, nodes s, t from val(G), and a regular Path Query α it can be decided in time O(|G||A|), whether there exists a path satisfying α from s to t in val(G). Main idea:
◮ Compute SL-HR grammar G′ representing the product automaton of
val(G) and A.
◮ Then solve reachability.
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 26 / 28
Outlook
More expressive formalism: nonterminal nodes instead of hyperedges.
NR grammar representing all complete graphs
S : A → A A a
|
a
{(A, a), (a, a)}
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 27 / 28
Outlook
More expressive formalism: nonterminal nodes instead of hyperedges.
NR grammar representing all complete graphs
S : A → A A a
|
a
{(A, a), (a, a)}
Why choose HR instead? Digram definition. Unclear with NR: neighboring nodes and edge, but what about embedding?
Updated Conjecture
Let Cn be a complete graph with 2n nodes.
- 1. There is no SL-HR grammar G with val(G) = Cn and |G| ∈ O(n).
- 2. There is an SL-NR grammar G with val(G) = Cn and |G| ∈ O(n).
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 27 / 28
Open Questions/Future Work
Querying
◮ Neighborhood queries are possible, but linear in height of grammar ◮ Would need efficient in-memory data structures for the grammar
Hypergraphs
◮ Better way of representing hypergraphs?
- F. Peternek
Grammar-Based Graph Compression October 25, 2016 28 / 28