why re compression of a compressed graph
play

Why re-compression of a compressed graph? large graphs long time to - PowerPoint PPT Presentation

Towards Graph (Re-)Compression Design decisions and first results Stefan Bttcher University of Paderborn Towards Graph (Re-)Compression - Stefan Bttcher - University of Paderborn 1 Why re-compression of a compressed graph? large


  1. Towards Graph (Re-)Compression Design decisions and first results Stefan Böttcher University of Paderborn Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 1

  2. Why re-compression of a compressed graph? large graphs è “long time“ to find a “good“ compression idea: instead: do any compression “fast“ and in parallel on small sub-graphs è get compressed sub-graphs “fast“ re-compress compressed sub-graphs è re-compression time depends on size of compressed sub-graph Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 2

  3. Overwiew of steps towards re-compressed graphs string compression string re-compression ordered tree compression ordered tree re-compression unordered tree compression unordered tree re-compression graph compression graph re-compression compression re-compression strings ordered trees ordered trees unordered trees unordered trees graphs Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 3

  4. Why digram-based compression? S à b c d e c d e c d S à b N e N e N N à c d S à b N M M M à e N replacing digram occurrences uses a “look for smallest repeated pattern first“ – approach substitute larger frequently occurring patterns in multiple steps Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 4

  5. (Re-)Compression by replacing a most frequent digram S à b c d b c d S à b N b N N à c d S à M M M à b N N à c d S à M M M à b c d (Re-)Compression Algorithm for strings / trees / graphs : while at least one digram occurs more than once choose a most frequent digram D ( e.g. c d ) (if re-compression: isolate all occurrences of D by smart inlining) replace each occurrence of digram D by a new nonterminal N, which is thereafter treated as a terminal, i.e. not cut-off again introduce a grammar rule ( e.g. N à c d ) inline rules called only once ( e.g. N à c d ) Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 5

  6. Digrams for strings and for trees A digram is a pair of typed items (c,d) in a given relationship r String: b c d e c d e c d digram (c,d) with r is “d follows c“ Tree: c c N à c N N b d e d y 1 d b e digram (c,d) with r is “d is the second child of c“ Unordered Tree: c c edge order does not matter - b d d e like in graphs digram (c,d) with r is “d is a child of c“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 6

  7. Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 7

  8. Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c digram (f,b) with r is “nodes f and b are connected by a hyperedge from f to b“ digram (d,e) with r is “there is a node shared by an incoming hyperedge d and an outgoing hyperedge e“ digram (b,e) with r is “node b has an outgoing hyperedge e“ digram (d,b) with r is “node b has an incoming hyperedge d“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 8

  9. Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c digram (f,b) with r is “nodes f and b are connected by a hyperedge from f to b“ digram (d,e) with r is “there is a node shared by an incoming hyperedge d and an outgoing hyperedge e“ digram (b,e) with r is “node b has an outgoing hyperedge e“ digram (d,b) with r is “node b has an incoming hyperedge d“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 9

  10. Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c digram (f,b) with r is “nodes f and b are connected by a hyperedge from f to b“ digram (d,e) with r is “there is a node shared by an incoming hyperedge d and an outgoing hyperedge e“ digram (b,e) with r is “node b has an outgoing hyperedge e“ digram (d,b) with r is “node b has an incoming hyperedge d“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 10

  11. Digrams for a graph with labeled nodes and labeled edges A digram is a pair of typed items (c,d) in a given relationship r d e Graph: f b c digram (f,b) with r is “nodes f and b are connected by a hyperedge from f to b“ digram (d,e) with r is “there is a node shared by an incoming hyperedge d and an outgoing hyperedge e“ digram (b,e) with r is “node b has an outgoing hyperedge e“ digram (d,b) with r is “node b has an incoming hyperedge d“ Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 11

  12. Re-compression of a compressed string / tree / graph A string / tree / graph S à d c d c d c that has been compressed to S à d N N c N à c d can be recompressed to S à M M M M à d c to get a better compression Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 12

  13. Re-compress a compressed string: 1. Count digrams S à d N N c N à c d digram generator generated digram d N d c N c d (occurs twice) N N d c N c d c è (d,c) with r = “d follows c“ is the most frequent digram in decompressed graph Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 13

  14. 2. Isolate a most frequent digram by smart inlining Task: isolate most frequent digram (d,c) with r = “d follows c“ S à d c N c N c N à c e f g d needed: partial decompression of N to isolate d from N new rules that isolate d from the end of N: N à N -d d N -d à c e f g S à d c N -d d c N -d d c trick: inline rewritten rule N à N -d d instead of N à c e f g d finally, substitute digrams (d,c) with new nonterminal M: S à M N -d M N -d M M à d c Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 14

  15. Re-compress a compressed ordered tree: 1. Count digrams How to count all digrams generated by tree grammars? A à C (A, C, D may be called several times) b D parent node (C) does not determine a digram, but child (D) does: C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j each non-root non-parameter node (e.g. D) in the RHS of each rule of an SLT grammar represents (a child of) a digram è count calls of rule A for the digram represented by child node D è O ( size(G) ) [ICDE2016] Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 15

  16. 2. Smarter inlining needed for ordered tree grammars to isolate a digram: A à C - isolate root terminal of tree generated by D - isolate parent of 2 nd parameter of tree generated by C b D C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 16

  17. 2. Smarter inlining needed for ordered tree grammars to isolate a digram: A à C - isolate root terminal of tree generated by D - isolate parent of 2 nd parameter of tree generated by C b D C à r D à h A à r e s i j e s f h f y 2 y 1 g b g i j needs smarter inlining: C à C -r A à C -r C -e à f C -r à r e e y 1 g y 1 s C -e y 2 C -e h y 1 b i j Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 17

  18. Tree grammar re-compression: compression ratio EXI − EXI − NCBI Medline XMark Treebank Weblog Telecomp #edges 39 39 71 13096 34649 52266 compression ratio 0 % 0.05 % 0.06 % 4.71 % 7.94 % 20.67 % compression ratio with max blow-up 0 % 0.09 % 0.11 % 4.89 % 11.38 % 21.26 % 200% max | intermediate grammar | | final grammar | smarter inlining yields 100% intermediate blow-ups of factor 2 at most 0% document generated from seed by 5000 updates - re-compression after every 100 updates: blow-ups of a factor of 5 at most - without re-compression blow-up up to a factor of 400 Towards Graph (Re-)Compression - Stefan Böttcher - University of Paderborn 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend