Graph Compression Using Quadtrees Dr. Amlan Chatterjee Computer - - PowerPoint PPT Presentation

graph compression using
SMART_READER_LITE
LIVE PREVIEW

Graph Compression Using Quadtrees Dr. Amlan Chatterjee Computer - - PowerPoint PPT Presentation

Graph Compression Using Quadtrees Dr. Amlan Chatterjee Computer Science Department California State University, Dominguez Hills April 27, 2016 Outline Storing graphs 1 Quadtree representation of graphs 2 Special graphs 3 Modifying


slide-1
SLIDE 1

Graph Compression Using Quadtrees

  • Dr. Amlan Chatterjee

Computer Science Department California State University, Dominguez Hills April 27, 2016

slide-2
SLIDE 2

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

2

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

1 2 3 4 5 6

slide-3
SLIDE 3

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

3

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

1 6 2 3 4 5

slide-4
SLIDE 4

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

4

  • Consider the following graph G = (V,E)

The adjacency matrix representation is given by: 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

Storing Graphs

slide-5
SLIDE 5

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

5

  • Consider the following graph G = (V,E)

The adjacency list representation is given by:

Storing Graphs

slide-6
SLIDE 6

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

6

  • Consider the following graph G = (V,E)

The adjacency array representation is given by:

Storing Graphs

slide-7
SLIDE 7

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

7

Storing Graphs (contd.)

  • The values in the adjacency matrix can be stored

using boolean data type. For the sample graph G = (V, E), where |V| = 8, it would require 8x8 = 64 bytes.

  • Since the value is either 0 or 1, using bits instead of

boolean the size of the adjacency matrix can be reduced

  • Size required to store adjacency matrix using bit array

for the sample graph G: (nxn)/8 bytes = 8x8/8 bytes = 8 bytes

slide-8
SLIDE 8

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

8

Storing Graphs (contd.)

  • For the sample graph G = (V, E), where |V| = n and

|E| = m, using boolean data type and assuming 64-bit pointer (i.e., 8 byte pointer), the space required for the adjacency list is: 2m*64 + 2mlog(n) + nlogn

(size of pointers) (node numbers) (size of each list)

  • Similarly, the space required for the adjacency array

is: n*64 + 2mlog(n) + nlogn

(size of pointers) (node numbers) (size of each list)

slide-9
SLIDE 9

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

9

Storing Graphs (contd.)

Considering the previous example graph G = (V,E) For the above graph, n = 8, m = 16. The adjacency matrix size is: 64 bits The adjacency list size is: 2168 bits The adjacency array size is: 632 bits

slide-10
SLIDE 10

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

10

Other techniques to store graphs

  • Other than using adjacency matrix, adjacency

list and adjacency array, the following are some

  • ther common techniques to store graphs
  • Unordered edge sequences:

– The data is represented as pair values, each indicating the pair of vertices where an edge exists

  • Incidence matrix

– Two edges are said to be incident if they share a vertex; incidence matrix contains data with respect to edges

slide-11
SLIDE 11

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

11

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

6 3 4 5 2 1

slide-12
SLIDE 12

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

12

Quadtree

  • Quadtree is a data structure which is used to

normally represent images using partitioning of the two dimensional space by recursively subdividing into four quadrants or regions; each internal node of the quadtree has exactly four children

  • Quadtrees can also be used to store graphs

efficiently

slide-13
SLIDE 13

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

13

Sample Data Points in a 2-D space & Quadtree

slide-14
SLIDE 14

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

14

The adjacency matrix: size 64bits Quadtree representation: 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 Size: 10 bits (5 elements) Graph G = (V,E)

Quadtree representation of a graph

slide-15
SLIDE 15

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

15

Quadtree representation of a graph (contd.)

  • Given a quadtree, the entire graph information can be stored in

the form of an array using bits

  • The quadrants are converted and stored according to the row

major order of the adjacency matrix

  • The contents of the bit array are stored as follows

0: all 0's in quadrant 1: all 1's in quadrant 2: 0’s in diagonal, and rest 1’s 3: the quadrant needs to be expanded further

  • Since there are only 4 types of values, using 2 bits for each is

enough

slide-16
SLIDE 16

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

16

Byte representation of the Quadtree: The adjacency matrix: size 64bits Quadtree representation: 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 Graph G = (V,E) Q = {3, 0, 1, 1, 0}

Quadtree representation of a graph (contd.)

slide-17
SLIDE 17

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

17

  • Consider the following graph

The adjacency matrix: Quadtree: 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0

Size: 21 elements: 42 bits

Graph G = (V,E)

Representing Graphs

slide-18
SLIDE 18

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

18

  • The previous example graph using different

numbering

The adjacency matrix: Quadtree: 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 Size: 5 elements: 10 bits Graph G = (V,E)

Representing Graphs (contd.)

slide-19
SLIDE 19

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

19

Numbering matters

  • The numbering of the nodes of the graph G = (V, E)

matters when represented using quadtrees

  • The adjacency matrix representation varies

according to the node numbering

  • In quadtrees, quadrants with uniform values don’t

expand further, while others do increasing the

  • verall space required
slide-20
SLIDE 20

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

20

Problem statement

  • The size required for representing the graphs is

directly proportional to the number of quadrants that are non-uniform

  • Since the adjacency matrix varies with the

numbering of the nodes, some combinations might be better than others

  • Therefore, the problem at hand can be stated

as: Given a graph G = (V,E), does there exist a numbering Y: v -> v', such that the number of quadrants that have to be expanded is the smallest

slide-21
SLIDE 21

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

21

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

6 4 5 1 3 2

slide-22
SLIDE 22

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

22

Special graphs

Following are some of the special graphs that we consider for representing using quadtrees:

  • a. Complete bipartite graph
  • b. Complete k-partite graph
  • c. Block graphs
  • d. Chordal graphs
slide-23
SLIDE 23

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

23

  • a. Complete bipartite graph

1 2 3 4 5 6 7 8

0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 The adjacency matrix: Quadtree: Graph G = (V,E)

Size: Adjacency matrix: 64 bits Quadtree: 82 bits

Special graphs (contd.)

slide-24
SLIDE 24

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

24

  • b. Complete k-partite graph

3 1 2 5 4 8 6 7

0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 The adjacency matrix: Quadtree: Graph G = (V,E)

Size: Adjacency matrix: 64 bits Quadtree: 106 bits

Special graphs (contd.)

slide-25
SLIDE 25

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

25

  • c. Block graphs

1 2 3 4 5 6 7 8

0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 1 0 The adjacency matrix: Quadtree: Graph G = (V,E)

Size: Adjacency matrix: 64 bits Quadtree: 58 bits

Special graphs (contd.)

slide-26
SLIDE 26

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

26

  • d. Chordal graphs
  • Definition: An undirected graph G = (V, E) is chordal

(triangulated, rigid circuit) if every cycle of length greater than three has a chord: namely an edge connecting two non-consecutive vertices on the cycle.

3 1 2 5 4 8 6 7

Given graph G Quadtree representation Size: Adjacency matrix: 64 bits Quadtree: 61 elements; 122 bits 0 1 1 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 The adjacency matrix:

Special graphs (contd.)

slide-27
SLIDE 27

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

27

Special graphs (contd.)

Chordal graphs (contd.) In a graph G = (V, E), a vertex v is called simplicial if and only if the subgraph of G induced by the vertex set {v} ∪ N(v) is a complete graph, where N(v) is the set of neighboring vertices of v. A graph G on n vertices is said to have a perfect elimination ordering if and only if there is an ordering {v1, . . . vn} of G’s vertices, such that each vi is simplicial in the subgraph induced by the vertices {v1, . . . vi}. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings.

slide-28
SLIDE 28

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

28

Chordal graphs (contd.) Perfect Elimination Ordering (PEO): Using the sample graph G=(V,E), a PEO is shown below

3 1 2 5 4 8 6 7

Given graph G PEO: 1 ,3 , 8 , 7 , 6 , 2 , 4 , 5

Special graphs (contd.)

slide-29
SLIDE 29

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

29

Chordal graphs (contd.) Renumbering the nodes according to the PEO; the old numbers are shown in parentheses

Given graph G

0 1 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1 1 1 1 0

The adjacency matrix:

Size: 45 elements: 90 bits So, the size decreases by 32 bits by renumbering according to PEO

Quadtree representation

Special graphs (contd.)

slide-30
SLIDE 30

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

30

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

6 5 1 2 4 3

slide-31
SLIDE 31

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

31

Consider the following graph

0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1 1 1 1 1 0

The adjacency matrix: Quadtree representation

Size: 37 elements: 74 bits

Modifying graphs

slide-32
SLIDE 32

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

32

Consider the following graph

0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1 1 1 1 1 0

The adjacency matrix: Quadtree representation

Size: 37 elements: 74 bits This edge has been added to the PEO numbered chordal graph Additional edge info: 6 bits Total space required to store the PEO numbered chordal graph: 80 bits Space reduced by: 10 bits

Modifying graphs

slide-33
SLIDE 33

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

33

Consider the following graph

0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0

The adjacency matrix: Quadtree representation

Size: 29 elements: 58 bits

Modifying graphs (Contd.)

slide-34
SLIDE 34

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

34

Consider the following graph

0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0

The adjacency matrix: Quadtree representation

Size: 29 elements: 58 bits These edges have been added to the PEO numbered chordal graph Edge (4,8) has been removed Edge information for 3 edges need to be stored (2 removed, 1 added) Extra space required: 18 bits (6 bits per edge) Total space: 76 bits Space Reduced by : 14 bits

Modifying graphs (Contd.)

slide-35
SLIDE 35

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

35

Modifying graphs (Contd.)

  • Further modifications can be made to the chordal graph

to reduce space required

  • In addition to adding (1,8), (4,7) and removing (4,8), the

following needs to change – Add (2,5) – Remove (2,6)

  • These changes reduce the quadtree size to 42 bits;

additional 30 bits are required to store the modified edge information

  • The total size for this case is 72 bits, which is

significantly reduced from the original size of 122 bits for the chordal graph

slide-36
SLIDE 36

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

36

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

6 1 2 3 5 4

slide-37
SLIDE 37

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

37

  • For many cases, the quadtree representation for

graphs of size 8 nodes require more than 64 bits, which is inefficient compared to the adjacency matrix representation

  • However, for larger graphs, the quadtree approach is

efficient compared to other data structures

  • Even for larger graphs, when the quadrant reduces to

8x8 bits, the quadtree would require more space for further reductions

  • Therefore, a hybrid approach, where the recursive

division of the quadrants stop whenever the quadrant size reaches 8x8 is a better technique

Hybrid approach

slide-38
SLIDE 38

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

38

  • In the byte representation of the quadtree, an

additional bit for each node is required to indicate whether the quadrant is further expanded or represented using adjacency matrix

  • Although this would need additional bits, overall the

space required decreases.

Hybrid approach (Contd.)

slide-39
SLIDE 39

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

39

Outline

Storing graphs Quadtree representation of graphs Special graphs Modifying graphs for efficient storage Hybrid approach Other representations

1 2 3 4 6 5

slide-40
SLIDE 40

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

40

1-bit coding

  • Real-world graphs are usually sparse, and most
  • f the quadrants consist of just 0's in them.
  • Hence, instead of using 2-bits to represent each

element, the method can be modified to use 1- bit for each element; in this case a quadrant with all 0's is represented by a 0, else it is broken into smaller quadrants which is denoted by a 1.

slide-41
SLIDE 41

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

41

Comparison of sizes: 1024 node graph

Technique Size (bits) % Compared to Adjacency matrix Adjacency Matrix 1048576 100 2-bit 261370 24.93 1-bit 130873 12.48

slide-42
SLIDE 42

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

42

What about using 3 bit coding?

slide-43
SLIDE 43

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

43

Comparison of sizes: 1024 node graph

Technique Size (bits) % Compared to Adjacency matrix Adjacency Matrix 1048576 100 2-bit 261370 24.93 1-bit 130873 12.48 3-bit 300227 28.63 3-bit hybrid 125538 11.97 3-bit hybrid + folding 62889 6

slide-44
SLIDE 44

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

44

Topological Information helps?

  • The patterns used for the 3-bit compression

are fixed for all graphs

  • Using the topology for specific graphs or

domains, relevant patterns can be chosen

  • This provides an additional 30%

compression

slide-45
SLIDE 45

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

45

Conclusion

  • Comparing with the adjacency matrix

representation, all the techniques achieve more than 70% compression.

  • The techniques with 1-bit and 3-bits outperform

the 2-bit one

  • Since real-world graphs are sparse, the 3-bit

technique does not reach it’s potential with many of the patterns reporting low counts of

  • ccurrences.
  • In other domains, where the graphs are denser,

the 3-bit compression schemes should be able to take advantage of the common patterns and perform better.

slide-46
SLIDE 46

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

46

  • Questions..??
slide-47
SLIDE 47

Graph Compression Using Quadtrees 4/27/2016 Amlan Chatterjee

47 [1] L.S. Chandran, L. Ibarra, F. Ruskey, J. Sawada, Generating and characterizing the perfect elimination orderings of a chordal graph, Theoretical Computer Science, 307 (2003), pp. 303–317 [2] H. Samet, Using quadtree to represent spatial data, NATO ASI Series, Vol. F18, pp.229 – 247, 1985

[3] M. Nelson, S. Radhakrishnan, A. Chatterjee, and C. N. Sekharan. On compressing massive streaming graphs with Quadtrees, 2015 IEEE International Conference on Big Data, pages 2409–2417, Oct 2015.

References