with Top ZDDs K. Matsuda (The University of Tokyo) S. Denzumi (The - - PowerPoint PPT Presentation

with top zdds
SMART_READER_LITE
LIVE PREVIEW

with Top ZDDs K. Matsuda (The University of Tokyo) S. Denzumi (The - - PowerPoint PPT Presentation

Storing Set Families More Compactly with Top ZDDs K. Matsuda (The University of Tokyo) S. Denzumi (The University of Tokyo) K. Sadakane (The University of Tokyo) 18 th Symposium on Experimental Algorithms June 16 18, 2020 Abstract Purpose


slide-1
SLIDE 1

Storing Set Families More Compactly with Top ZDDs

18th Symposium on Experimental Algorithms June 16—18, 2020

  • K. Matsuda (The University of Tokyo)
  • S. Denzumi (The University of Tokyo)
  • K. Sadakane (The University of Tokyo)
slide-2
SLIDE 2

Abstract

  • Purpose

– Compress

zero-suppressed binary decision diagram (ZDD) ≒ labeled binary directed acyclic graph (DAG)

  • Method

– Expand a tree compression algorithm to DAGs

  • Result

– Theoretic: Exponentially smaller than input – Experimental: Smaller than a related research

in almost all cases

2020

  • 06-16

18th Symposium on Experimental Algorithms

2

slide-3
SLIDE 3

Contents

  • Preliminary

– ZDD – Tree compression algorithms

  • Proposed data structure

– Construction algorithm – Complexity analysis

  • Experiment
  • Conclusion

2020

  • 06-16

18th Symposium on Experimental Algorithms

3

slide-4
SLIDE 4

Preliminary

ZDD DAG compression Top tree compression

slide-5
SLIDE 5

ZDD

  • Zero-suppressed binary decision diagram

[Minato 93]

– Labeled binary directed acyclic graph – Represents a family of sets – Share equivalent subgraphs

  • Terminology

– Branching nodes

⁎ Label ⁎ 0-edges and 1-edges

– Sink nodes

⁎ Top or bottom

2020

  • 06-16

18th Symposium on Experimental Algorithms

5

⊥ ⊤ 1

0 1 0 2 1

3

0 1

{ {1, 2}, {1, 3}, {2, 3} }

2

1

slide-6
SLIDE 6

Tree compression methods

  • Tree grammar

– Based on grammar compression for strings

[Charikar et al. 05]

– Traversing on compressed representations

require linear time to the size of grammar [Busatto et al. 04,], [Lohrey et al. 13]

  • Succinct data structures

– Labeled tree: LOUDS [Jacobson 89]

BP [Munro, Raman 01]

– Unlabeled tree: [Ferragina et al. 09]

2020

  • 06-16

18th Symposium on Experimental Algorithms

6

slide-7
SLIDE 7

Tree compression

  • Transform-based compression

– Shares equivalent sub structures – DAG compression [Downey et al. 80]

⁎ Shares all equivalent subtrees

– Top DAG compression [Bille et al. 13]

⁎ Shares equivalent subcomponents

2020

  • 06-16

18th Symposium on Experimental Algorithms

7

slide-8
SLIDE 8

DAG compression

  • Compress labeled DAGs

– [Downey et al. 80] – Share all equivalent subtrees

2020

  • 06-16

18th Symposium on Experimental Algorithms

8

DAG compression

slide-9
SLIDE 9

Problem of DAG comp.

  • Cannot compress substructures

that repeats vertically

  • Example:

Simple, but not compressed

2020

  • 06-16

18th Symposium on Experimental Algorithms

9

DAG compression

𝑜

slide-10
SLIDE 10

Top DAG compression

  • Compress labeled DAGs [Bille et al. 13]

– Transform an input tree to top tree, and

compress the top tree by DAG compression

2020

  • 06-16

18th Symposium on Experimental Algorithms

10

Input tree

slide-11
SLIDE 11

Top DAG compression

  • In comparison to DAG compression:

– [Best case] O(n / logσn) times smaller – [Worst case] O(logσn) times larger

  • Greedy construction [Bille et al. 13]

– #node = O(𝑜 log log𝜏 𝑜 /log𝜏 𝑜) – Proof is in [Hủbchle-Schneider and Raman 15]

  • Optimal construction [Lohrey et al. 17],

[Dudek, Gawrychowski 18]

– #node = O(𝑜/log𝜏 𝑜)

(information theoretic lowerbound)

(n: #node of the input tree, σ: #label)

2020

  • 06-16

18th Symposium on Experimental Algorithms

11

slide-12
SLIDE 12

Top tree

  • A binary tree 𝒰 that represents

the way to decompose the input tree T

– Each node of the top tree

corresponds to a cluster of T

– The root of the top tree

corresponds to whole T

– A cluster is an induced subgraph

  • f a set of connected edges

– Every cluster has at most 2 boundary nodes – A cluster is made by

horizontal or vertical merge of 2 clusters that have the same node as a boundary node

2020

  • 06-16

18th Symposium on Experimental Algorithms

12

slide-13
SLIDE 13

Top tree

  • A binary tree 𝒰 that represents

the way to decompose the input tree T

– A cluster is an induced subgraph

  • f a set of connected edges

– Every cluster has at most 2 boundary nodes

2020

  • 06-16

18th Symposium on Experimental Algorithms

13

A cluster

slide-14
SLIDE 14

Top tree

  • Example:

2020

  • 06-16

18th Symposium on Experimental Algorithms

14

1 2 3 4 5 7 6

Input tree 𝑈

H H H V V

top tree 𝒰

(c) (c) (e) (b) (b)

slide-15
SLIDE 15

Top tree

  • Example:

2020

  • 06-16

18th Symposium on Experimental Algorithms

15

1 2 3 4 5 7 6 H H H V V

top DAG 𝒰𝐸 Input tree 𝑈

slide-16
SLIDE 16

Advantage of top DAG

  • Top DAG compression

allows sharing the same substructure that appear at different height

2020

  • 06-16

18th Symposium on Experimental Algorithms

16

DAG compression

𝑜

top DAG compression

log 𝑜 𝑜

slide-17
SLIDE 17

Two types of merging

  • Vertical merge: (a), (b)
  • Horizontal merge: (c), (d), (e)

2020

  • 06-16

18th Symposium on Experimental Algorithms

17

[Bille et al. 13]

slide-18
SLIDE 18

Horizontal merge

  • Merge two clusters

that have the same node as their top boundary nodes

2020

  • 06-16

18th Symposium on Experimental Algorithms

18

H

A B

top tree Corresponding clusters

left right (c)

A B

Example

slide-19
SLIDE 19

Vertical merge

  • Merge two clusters that have the same

node as their top and bottom boundary

2020

  • 06-16

18th Symposium on Experimental Algorithms

19

V

A B

top tree

left right (a)

Corresponding clusters

A B

Example

slide-20
SLIDE 20

Top tree construction

  • Top tree is not uniquely determined

from the input tree

  • Greedy construction

– Repeat 1—3 until the tree T become 1 edge – 1. Choose pairs of clusters that

can be horizontally merged as much as possible

– 2. Choose pairs of clusters that

can be vertically merged from remaining nodes as much as possible

– 3. Merge the all pairs chosen at 1 and 2

2020

  • 06-16

18th Symposium on Experimental Algorithms

20

slide-21
SLIDE 21

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

21

1 2 3 4 5 7 6

top tree 𝒰 Tree 𝑈

slide-22
SLIDE 22

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

22

1 2 3 4 5 7 6

Tree 𝑈

H H H

top tree 𝒰

slide-23
SLIDE 23

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

23

1 2 3 4

Tree 𝑈

H H H

top tree 𝒰

slide-24
SLIDE 24

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

24

1 2 3 4

Tree 𝑈

H H H V

top tree 𝒰

slide-25
SLIDE 25

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

25

1 2 4

Tree 𝑈

H H H V

top tree 𝒰

slide-26
SLIDE 26

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

26

1 2 4

Tree 𝑈

H H H V V

top tree 𝒰

slide-27
SLIDE 27

Greedy construction

  • Example

2020

  • 06-16

18th Symposium on Experimental Algorithms

27

1 4

Tree 𝑈

H H H V V

top tree 𝒰

slide-28
SLIDE 28

Complexity: Greedy method

  • n: #node of an input tree, σ: #label

2020

  • 06-16

18th Symposium on Experimental Algorithms

28 Theorem [Bille et al. 13] The height of the top tree made by greedy construction is O(log n) Theorem [Hủbchle-Schneider, Raman 15] The number of nodes of top DAG

  • btained after DAG compression

to the top tree made by greedy construction is O( (n log logσ n) / logσ n)

slide-29
SLIDE 29

Operations on top DAG

  • Following operations are in O(log n) time

– (x: x-th node in DFS, T(x): a subtree rooted by x) – access(x): label of x – parent(x): preorder of the parent of x – depth(x): depth of x – height(x): height of x – size(x): number of nodes in T(x) – firstchild(x): preorder of the first child of x – nextsibling(x): preorder of the next sibling of x – la(x, i): preorder of i-th ancestor of x – nca(x, y): preorder of nearest common ancestor

  • f x and y

2020

  • 06-16

18th Symposium on Experimental Algorithms

29

slide-30
SLIDE 30

Proposed method

Construction algorithm Experiment

Top ZDD

slide-31
SLIDE 31

Construction of top ZDD

  • 1. Find a spanning tree from input ZDD

– The edges not included in the spanning tree

is called non tree edges

  • 2. Transform the spanning tree

to a top tree by greedy construction

  • 3. For each non tree edge,

store the edge at the nearest common ancestor of the source node and the destination node (Edges point sink nodes are exception)

  • 4. Share equivalent subtrees

2020

  • 06-16

18th Symposium on Experimental Algorithms

31

slide-32
SLIDE 32

Example of construction

  • Step 0. Input

2020

  • 06-16

18th Symposium on Experimental Algorithms

32 Original ZDD

1 2 3 4

slide-33
SLIDE 33

Example of construction

  • Step 1. Find a spanning tree

2020

  • 06-16

18th Symposium on Experimental Algorithms

33

1 2 3 4 5 7 6 ⊥

1 2 3 4

Original ZDD

slide-34
SLIDE 34

Example of construction

  • Step 2. Construct a top tree

2020

  • 06-16

18th Symposium on Experimental Algorithms

34

1 2 3 4 5 7 6 ⊥

H H H V V 1 2 3 4

Original ZDD

slide-35
SLIDE 35

Example of construction

  • Step 3. Store non tree edges

2020

  • 06-16

18th Symposium on Experimental Algorithms

35

1 2 3 4 5 7 6 ⊥

H H H V V 1 2 3 4

Original ZDD

slide-36
SLIDE 36

Example of construction

  • Step 4. Share equivalent subtrees

2020

  • 06-16

18th Symposium on Experimental Algorithms

36

1 2 3 4 5 7 6 ⊥

H H V V 1 2 3 4

Original ZDD

slide-37
SLIDE 37

Theoretical results

  • Examples

2020

  • 06-16

18th Symposium on Experimental Algorithms

37 Theorem Edge traversal is O(log2 n) time Theorem Memory usage of top ZDD is O(log n) in the best case

slide-38
SLIDE 38

Experiment

  • Compared data structures

– Top ZDD: memory usage (byte) – DenseZDD: memory usage (byte)

⁎ Static ZDD using succinct data structure[Denzumi et al. 14]

– ZDD: (2n log n + n log σ) (n: #node, σ: #label) (byte)

  • Data sets

– {S ⊆ U | |S| ≤ B}, where |U| = A – {S ⊆ U | ∀e∈S, ∃f∈S s.t. |e – f| ≤ B}∪U, where U = {1, ..., A} – 2U, where |U| = A – Solutions of knapsack problems – Sets of matching edges of graphs – Frequent item sets

2020

  • 06-16

18th Symposium on Experimental Algorithms

38

slide-39
SLIDE 39

Experimental results 1/6

  • Data: For an underlying set U of size A,

{S ⊆ U | |S| ≤ B}

  • Top ZDDs are 2—20 times smaller

than DenseZDDs

2020

  • 06-16

18th Symposium on Experimental Algorithms

39

top ZDD

DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

𝐵 = 100, 𝐶 = 50

3,823 9,544 9,882

𝐵 = 400, 𝐶 = 200

13,614 146,550 206,025

𝐵 = 1000, 𝐶 = 500

43,151 966,519 1,440,375 (bytes)

slide-40
SLIDE 40

Experimental results 2/6

  • Data: For an underlying set U of size A,

{S ⊆ U | ∀e∈S, ∃f∈S s.t. |e – f| ≤ B}∪U

  • Top ZDDs are 100—125 times

smaller than DenseZDDs

2020

  • 06-16

18th Symposium on Experimental Algorithms

40

top ZDD

DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

𝐵 = 500, 𝐶 = 250

2,431 227,798 321,594

𝐵 = 1000, 𝐶 = 500

2,511 321,594 1,440,375 (bytes)

slide-41
SLIDE 41

Experimental results 3/6

  • Data: For an underlying set U of size A,

2U

  • Top ZDDs are highly effective

because the inputs have simple structure

2020

  • 06-16

18th Symposium on Experimental Algorithms

41

top ZDD

DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

𝐵 =1000

2,254 4,185 3,750

𝐵 = 50000

2,464 178,764 300,000 (bytes)

slide-42
SLIDE 42

Experimental results 4/6

  • Data: Solutions of knapsack problems

– wi ∈ [1, W]: random weight (wi ≧ wi+1),C: capacity

  • Top ZDDs are better than DenseZDDs

2020

  • 06-16

18th Symposium on Experimental Algorithms

42

top ZDD DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

𝐵 = 100, 𝑋 = 1000, 𝐷 = 10000

1,658,494 1,730,401

2,444,405

𝐵 = 200, 𝑋 = 100, 𝐷 = 5000

1,032,596 1,516,840

2,181,688

𝐵 = 1000, 𝑋 = 100, 𝐷 = 1000

2,080,925 2,929,191

4,491,025

𝐵 = 5000, 𝑋 = 100, 𝐷 = 200

1,135,613 1,740,841

2,884,279

𝐵 = 1000, 𝑋 = 10, 𝐷 = 1000

1,382,933 2,618,970

3,990,350

𝐵 = 1000, 𝑋 = 100, 𝐷 = 1000

565,500 656,728

1,056,907

slide-43
SLIDE 43

Experimental results 5/6

  • Data: Sets of matching edges of graphs
  • Top ZDDs lose in one case,

but not 1.5 times bigger than DenseZDD

2020

  • 06-16

18th Symposium on Experimental Algorithms

43

top ZDD

DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

8 × 8 grid graph

12,196 16,150 18,014

Perfect graph 𝐿12

23,038 16,304 25,340

“Interroute”

30,780 39,831 50,144 (“Interroute” : a graph of real network

http://www.topology-zoo.org/dataset.html)

(bytes)

slide-44
SLIDE 44

Experimental results 6/6

  • Data: Frequent item sets (p: threshold)
  • Not so big differences between

top ZDDs and DenseZDDs

2020

  • 06-16

18th Symposium on Experimental Algorithms

44

top ZDD

DenseZDD (2𝑜 log 𝑜 + 𝑜 log 𝑑)/8

“mushroom” (p = 0.001)

104,580 91,757 123,576

“retail” (p = 0.00025)

59,854 65,219 62,766

T40I10D100K” (p = 0.005)

224,378 188,400 248,656 (bytes) (Data are from http://fimi.uantwerpen.be/data/)

slide-45
SLIDE 45

Conclusion

  • Proposed compression method for ZDD

– Expand top tree compression – Choose a spanning tree from an input ZDD – Unlike DenseZDD, top ZDD does not separate

the spanning tree and non-tree edges

– Experiments showed efficiency of top ZDDs

  • Future work

– Dynamic programming on top ZDDs – Faster operations on top ZDDs – Finding better spanning trees for compression – Complexity of finding optimal spanning tree – Applying proposed method for general DAGs

2020

  • 06-16

18th Symposium on Experimental Algorithms

45

slide-46
SLIDE 46

Thank you for listening!