Why re-compression of a compressed graph? large graphs long time to - PowerPoint PPT Presentation

Towards Graph (Re-)Compression Design decisions and first results Stefan Böttcher University of Paderborn Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 1

Why re-compression of a compressed graph? large graphs è “long time“ to find a “good“ compression idea: instead: do any compression “fast“ and in parallel on small sub-graphs è get compressed sub-graphs “fast“ re-compress compressed sub-graphs è re-compression time depends on size of compressed sub-graph Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 2

Overwiew of steps towards re-compressed graphs string compression string re-compression ordered tree compression ordered tree re-compression unordered tree compression unordered tree re-compression graph compression graph re-compression compression re-compression strings ordered trees ordered trees unordered trees unordered trees graphs Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 3

Why digram-based compression? S à b c d e c d e c d S à b N e N e N N à c d S à b N M M M à e N replacing digram occurrences uses a “look for smallest repeated pattern first“ – approach substitute larger frequently occurring patterns in multiple steps Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 4

(Re-)Compression by replacing a most frequent digram S à b c d b c d S à b N b N N à c d S à M M M à b N N à c d S à M M M à b c d (Re-)Compression Algorithm for strings / trees / graphs : while at least one digram occurs more than once choose a most frequent digram D ( e.g. c d ) (if re-compression: isolate all occurrences of D by smart inlining) replace each occurrence of digram D by a new nonterminal N, which is thereafter treated as a terminal, i.e. not cut-off again introduce a grammar rule ( e.g. N à c d ) inline rules called only once ( e.g. N à c d ) Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 5

Digrams for strings and for trees A digram is a pair of typed items (c,d) in a given relationship r String: b c d e c d e c d digram (c,d) with r is “d follows c“ Tree: c c N à c N N b d e d y 1 d b e digram (c,d) with r is “d is the second child of c“ Unordered Tree: c c edge order does not matter - b d d e like in graphs digram (c,d) with r is “d is a child of c“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 6

Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 7

Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c digram (f,b) with r is “nodes f and b are connected by a hyperedge from f to b“ digram (d,e) with r is “there is a node shared by an incoming hyperedge d and an outgoing hyperedge e“ digram (b,e) with r is “node b has an outgoing hyperedge e“ digram (d,b) with r is “node b has an incoming hyperedge d“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 8

Re-compression of a compressed string / tree / graph A string / tree / graph S à d c d c d c that has been compressed to S à d N N c N à c d can be recompressed to S à M M M M à d c to get a better compression Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 12

Re-compress a compressed string: 1. Count digrams S à d N N c N à c d digram generator generated digram d N d c N c d (occurs twice) N N d c N c d c è (d,c) with r = “d follows c“ is the most frequent digram in decompressed graph Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 13

2. Isolate a most frequent digram by smart inlining Task: isolate most frequent digram (d,c) with r = “d follows c“ S à d c N c N c N à c e f g d needed: partial decompression of N to isolate d from N new rules that isolate d from the end of N: N à N -d d N -d à c e f g S à d c N -d d c N -d d c trick: inline rewritten rule N à N -d d instead of N à c e f g d finally, substitute digrams (d,c) with new nonterminal M: S à M N -d M N -d M M à d c Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 14

Re-compress a compressed ordered tree: 1. Count digrams How to count all digrams generated by tree grammars? A à C (A, C, D may be called several times) b D parent node (C) does not determine a digram, but child (D) does: C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j each non-root non-parameter node (e.g. D) in the RHS of each rule of an SLT grammar represents (a child of) a digram è count calls of rule A for the digram represented by child node D è O ( size(G) ) [ICDE2016] Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 15

2. Smarter inlining needed for ordered tree grammars to isolate a digram: A à C - isolate root terminal of tree generated by D - isolate parent of 2 nd parameter of tree generated by C b D C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 16

2. Smarter inlining needed for ordered tree grammars to isolate a digram: A à C - isolate root terminal of tree generated by D - isolate parent of 2 nd parameter of tree generated by C b D C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j needs smarter inlining: C à C -r A à C -r C -e à f C -r à r e e y 1 g y 1 s C -e y 2 C -e h y 1 b i j Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 17

Tree grammar re-compression: compression ratio EXI − EXI − NCBI Medline XMark Treebank Weblog Telecomp #edges 39 39 71 13096 34649 52266 compression ratio 0 % 0.05 % 0.06 % 4.71 % 7.94 % 20.67 % compression ratio with max blow-up 0 % 0.09 % 0.11 % 4.89 % 11.38 % 21.26 % 200% max | intermediate grammar | | final grammar | smarter inlining yields 100% intermediate blow-ups of factor 2 at most 0% document generated from seed by 5000 updates - re-compression after every 100 updates: blow-ups of a factor of 5 at most - without re-compression blow-up up to a factor of 400 Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 18

Why re-compression of a compressed graph? large graphs long time to - PowerPoint PPT Presentation

Towards Graph (Re-)Compression Design decisions and first results Stefan Bttcher University of Paderborn Towards Graph (Re-)Compression - Stefan Bttcher - University of Paderborn 1 Why re-compression of a compressed graph? large

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Exploration of Lossy Compression for Application- level Checkpoint/Restart Naoto Sasaki 1 ,

Fragile watermarks for LZ- -77 77 Fragile watermarks for LZ Stefano Lonardi Stefano Lonardi

Patenting Software-related Inventions according to the European Patent Convention Yannis

Genome 559: Introduction to Statistical and Computational Genomics Professors Jim Thomas and

Compression: Information Theory Greg Plaxton Theory in Programming Practice, Spring 2004

Fast Burrows Wheeler Compression ! Using All-Cores " Aditya'Deshpande*''and'''P'J'Narayanan'

MA/CSSE 473 Day 31 (35 in 201720) Student questions Data Compression Minimal Spanning Tree

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Sambuz

Useful Links

Newsletter

Mail Us

Why re-compression of a compressed graph? large graphs long time to - PowerPoint PPT Presentation

Towards Graph (Re-)Compression Design decisions and first results Stefan Bttcher University of Paderborn Towards Graph (Re-)Compression - Stefan Bttcher - University of Paderborn 1 Why re-compression of a compressed graph? large

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Exploration of Lossy Compression for Application- level Checkpoint/Restart Naoto Sasaki 1 ,

Fragile watermarks for LZ- -77 77 Fragile watermarks for LZ Stefano Lonardi Stefano Lonardi

Patenting Software-related Inventions according to the European Patent Convention Yannis

Genome 559: Introduction to Statistical and Computational Genomics Professors Jim Thomas and

Compression: Information Theory Greg Plaxton Theory in Programming Practice, Spring 2004

Fast Burrows Wheeler Compression ! Using All-Cores &quot; Aditya'Deshpande*''and'''P'J'Narayanan'

MA/CSSE 473 Day 31 (35 in 201720) Student questions Data Compression Minimal Spanning Tree

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Sambuz

Useful Links

Newsletter

Mail Us

Fast Burrows Wheeler Compression ! Using All-Cores " Aditya'Deshpande*''and'''P'J'Narayanan'