Most of the slides are borrowed from the authors original - PowerPoint PPT Presentation

SEG5010 presentation G RAPH C OMPRESSION AND S UMMARIZATION Wei Zhang Dept. of Information Engineering The Chinese University of Hong Kong

� Most of the slides are borrowed from the authors’ original presentation. original presentation. � http://www.cs.umd.edu/~saket/pubs/sigmod2008.ppt � http://videolectures net/kdd09 kumar ocsn/ � http://videolectures.net/kdd09_kumar_ocsn/

G RAPH S UMMARIZATION WITH B OUNDED G RAPH S UMMARIZATION WITH B OUNDED E RROR � Saket Navlakha (UMCP) � Rajeev Rastogi (Yahoo! Labs India) � Rajeev Rastogi (Yahoo! Labs, India) � Nisheeth Shrivastava (Bell Labs India)

E F L ARGE G RAPHS yahoo.com cnn.com 20.20.2.2 D D � Many interactions can be represented as B A C graphs 10.1.1.1 G Webgraphs: search engine etc Webgraphs: search engine, etc. � � Netflow jokes.com Netflow graphs (which IPs talk to each other): � traffic patterns, security, worm attacks Social (friendship) networks: ( p) � mine user communities, viral marketing Email exchanges: security. virus spread, � spam detection Market basket data: customer profiles targeted Market basket data: customer profiles, targeted � � advertizing Social Networks email � Need to compress understand � Need to compress, understand � Webgraph ~ 50 billion edges; social networks ~ few million, growing quickly quickly � Compression reduces size to one-tenth (webgraphs)

O UR A PPROACH � Graph Compression (reference encoding) Not applicable to all graphs: use urls node labels for compression Not applicable to all graphs: use urls, node labels for compression � � Resulting structure is hard to visualize/interpret � � Graph Clustering Nice summary works for generic graphs Nice summary, works for generic graphs � � No compression: needs the same memory to store the graph itself � � Our MDL based representation R = (S C) � Our MDL-based representation R = (S,C) S is a high-level summary graph: compact, highlights dominant trends, easy � to visualize C is a set of edge corrections: help in reconstructing the graph C is a set of edge corrections: help in reconstructing the graph � � Compression based on MDL principle: minimize cost of S+C � information-theoretic approach; parameter less; applicable to any graph Novel Approximate Representation: reconstructs graph with bounded error pp p g p � ( є ); results in better compression

d e f g H OW DO WE COMPRESS ? a b c � Compression possible (S) � Many nodes with similar � Many nodes with similar neighborhoods Summary X = {d,e,f,g} � Communities in social networks; link- copying in webpages i i b � Collapse such nodes into Y = {a,b,c} supernodes (clusters) supernodes (clusters) and the edges into superedges � Bipartite subgraph to two supernodes and a superedge d d � Clique to supernode with a “self-edge”

Cost = 14 edges d e f g H OW DO WE COMPRESS ? i h j j a b c � Compression possible (S) Many nodes with similar neighborhoods � � Communities in social networks; link-copying in C webpages Summary Collapse such nodes into supernodes (clusters) and the X = {d,e,f,g} � edges into superedges g p g i � Bipartite subgraph to two supernodes and a h superedge i Y = {a,b,c} � Clique to supernode with a “self-edge” Need to correct mistakes (C) � Most superedges are not complete � � Nodes don’t have exact same neighbors: friends N d d ’t h t i hb f i d Correction in social networks s Cost = 5 +(a,h) Remember edge-corrections � (1 superedge + (1 superedge � Edges not present in superedges ( ve corrections) � Edges not present in superedges (-ve corrections) +(c,i) ( i) 4 corrections) � Extra edges not counted in superedges (+ve +(c,j) corrections) -(a,d) ( , ) Minimize overall storage cost = S+C �

R EPRESENTATION S TRUCTURE R=(S C) R EPRESENTATION S TRUCTURE R=(S,C) X = {d,e,f,g} h i � Summary S(V S , E S ) Y = {a,b,c} j Each supernode v represents a set of nodes A v � Each superedge (u,v) represents E h d ( ) t � all pair of edges π uv = A u x A v C = {+(a,h), +(c,i), +(c,j), -(a,d)} � Corrections C: {(a,b); a and b are nodes of G} � Supernodes are key, superedges/corrections easy easy A uv actual edges of G between A u and A v � Cost with (u,v) = 1 + | π uv – E uv | � d e f f g g C Cost without (u,v) = |E uv | t ith t ( ) |E | � h Choose the minimum, decides whether edge (u,v) i � is in S j j a a b b c c

R EPRESENTATION S TRUCTURE R=(S C) R EPRESENTATION S TRUCTURE R=(S,C) X = {d,e,f,g} h i � Summary S(V S , E S ) Each supernode v represents a set of nodes A v � Y = {a,b,c} j Each superedge (u,v) represents p g ( , ) p � all pair of edges π uv = A u x A v � Corrections C: {(a,b); a and b are nodes of G} C = {+(a,h), +(c,i), +(c,j), -(a,d)} � Supernodes are key superedges/corrections � Supernodes are key, superedges/corrections easy A uv actual edges of G between A u and A v � Cost with (u,v) = 1 + | π Cost with (u,v) 1 + | π uv – E | E uv | � Cost without (u,v) = |E uv | � Choose the minimum, decides whether edge (u,v) is in � S d e f f g g h i � Reconstructing the graph from R For all superedges (u,v) in S, insert all pair of edges j j � a a b b c c π uv For all +ve corrections +(a,b), insert edge (a,b) � For all -ve corrections -(a,b), delete edge (a,b) �

R EPRESENTATION S TRUCTURE R=(S C) R EPRESENTATION S TRUCTURE R=(S,C) X = {d,e,f,g} h i � Summary S(V S , E S ) Each supernode v represents a set of nodes A v Y = {a,b,c} j � Each superedge (u v) represents Each superedge (u,v) represents � � all pair of edges π uv = A u x A v C = {+(a,h), +(c,i), +(c,j), -(a,d)} � Corrections C: {(a,b); a and b are nodes of G} � Supernodes are key superedges/corrections � Supernodes are key, superedges/corrections easy A uv actual edges of G between A u and A v � Cost with (u v) = 1 + | π Cost with (u,v) = 1 + | π uv – E uv | E | � Cost without (u,v) = |E uv | � Choose the minimum, decides whether edge (u,v) is � d e f f g g in S in S h i � Reconstructing the graph from R j j a a b b c c For all superedges (u,v) in S, insert all pair of edges For all superedges (u v) in S insert all pair of edges � � π uv For all +ve corrections +(a,b), insert edge (a,b) � For all -ve corrections -(a,b), delete edge (a,b) �

R EPRESENTATION S TRUCTURE R=(S C) R EPRESENTATION S TRUCTURE R=(S,C) X = {d,e,f,g} h i � Summary S(V S , E S ) Each supernode v represents a set of nodes A v � Y = {a,b,c} j Each superedge (u,v) represents p g ( , ) p � all pair of edges π uv = A u x A v � Corrections C: {(a,b); a and b are nodes of G} C = {+(a,h), +(c,i), +(c,j), -(a,d)} � Supernodes are key superedges/corrections � Supernodes are key, superedges/corrections easy A uv actual edges of G between A u and A v � Cost with (u,v) = 1 + | π Cost with (u,v) 1 + | π uv – E | E uv | � Cost without (u,v) = |E uv | � Choose the minimum, decides whether edge (u,v) is in � S d e f f g g h i � Reconstructing the graph from R For all superedges (u,v) in S, insert all pair of edges j j � a a b b c c π uv For all +ve corrections +(a,b), insert edge (a,b) � For all -ve corrections -(a,b), delete edge (a,b) �

X = {d,e,f,g} A PPROXIMATE R EPRESENTATION R Є Y = {a,b} { b} � Approximate representation Recreating the input graph exactly is not always � necessary necessary C = {-(a,d), -(a,f)} { ( , ), ( ,f)} Reasonable approximation enough: to compute � communities, anomalous traffic patterns, etc. Use approximation leeway to get further cost reduction d d e e f f g g � � Generic Neighbor Query G Given node v, find its neighbors N v in G � Apx-nbr set N’ v estimates N v with є -accuracy p y � a a b b v v Bounded error: error(v) = |N’ v - N v | + |N v - N’ v | < є � |N v | For є =.5, we can remove Number of neighbors added or deleted is at most є - � one correction of a one correction of a fraction of the true neighbors fraction of the true neighbors � Intuition for computing R є If correction (a,d) is deleted, it adds error for both a � d d e e f f g g and d and d From exact representation R for G, remove (maximum) � corrections s.t. є -error guarantees still hold a a b b

C OMPARISON WITH EXISTING TECHNIQUES d e f g � Webgraph compression [Adler-DCC-01] Use nodes sorted by urls: not applicable to other graphs Use nodes sorted by urls: not applicable to other graphs � � More focus on bitwise compression: represent sequence of a b c � neighbors (ids) using smallest bits � Clique stripping [Feder-pods-99] Cli t i i Collapses edges of complete bi-partite subgraph into single � cluster d d e e f f g g Only compresses very large, complete bi-cliques � � Representing webgraphs [Raghavan-icde-03] Represent webgraphs as SNodes, Sedges Represent webgraphs as SNodes, Sedges � a b c Use urls of nodes for compression (not applicable for other � graphs) No concept of approximate representation No concept of approximate representation �

O UTLINE � Compressed graph � MDL representation R=(S C); є -representation � MDL representation R (S,C); є -representation � Computing R � GREEDY RANDOMIZED � GREEDY, RANDOMIZED � Computing R є � APX-MDL, APX-GREEDY APX MDL APX GREEDY � Experimental results � Conclusions and future work

Most of the slides are borrowed from the authors original - PowerPoint PPT Presentation

SEG5010 presentation G RAPH C OMPRESSION AND S UMMARIZATION Wei Zhang Dept. of Information Engineering The Chinese University of Hong Kong Most of the slides are borrowed from the authors original presentation. original presentation.

Photoshop Workshop By Nate Kong Original Cropped Original Filters Original B&W Original

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

Segmentation & Custering Disclaimer: Many slides have been borrowed from Devi Parikh and

The Original Peter Rabbit Presentation Box 1 23 R I The Original Peter Rabbit Presentation Box 1

Part 8 Planning Report Mayfair Building July 2016 Original Building Original Building

Garrett County Government The Need For More Revenue - Looking Back FY 2014 FY 2015 FY 2016 FY

Interpretability of Machine Learning for Computer Vision Xinshuo Weng* *Most slides borrowed

Fault-Tolerant State Machine Replication Chinasa T. Okolo 1 Slides borrowed from Hakim

Slides on thr Slides on threads eads borr borrowed by Chase owed by Chase Landon Cox Landon

SECTION 1: CODE REASONING + VERSION CONTROL CSE 331 Summer 2018 slides borrowed and adapted

Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model

Prizes: the authors of the most interesting works and the most active ambassadors will receive

Tanmay Bhagwat (Slides borrowed from Kevin Cleary and modified as required) UB Cyber and Network

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and

with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software

Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos,

Animation Sequence Compression Yang Liu Department of Computer Science March 2009 . . . . .

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

A Little Confusing Without [a block digest], one must query the offset digest with all

Most of the slides are borrowed from the authors original - PowerPoint PPT Presentation

SEG5010 presentation G RAPH C OMPRESSION AND S UMMARIZATION Wei Zhang Dept. of Information Engineering The Chinese University of Hong Kong Most of the slides are borrowed from the authors original presentation. original presentation.

Photoshop Workshop By Nate Kong Original Cropped Original Filters Original B&amp;W Original

MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN SLIDES [EN] MARKDOWN

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides SBF AGM 2017 CEO Slides

Segmentation &amp; Custering Disclaimer: Many slides have been borrowed from Devi Parikh and

The Original Peter Rabbit Presentation Box 1 23 R I The Original Peter Rabbit Presentation Box 1

Part 8 Planning Report Mayfair Building July 2016 Original Building Original Building

Garrett County Government The Need For More Revenue - Looking Back FY 2014 FY 2015 FY 2016 FY

Interpretability of Machine Learning for Computer Vision Xinshuo Weng* *Most slides borrowed

Fault-Tolerant State Machine Replication Chinasa T. Okolo 1 Slides borrowed from Hakim

Slides on thr Slides on threads eads borr borrowed by Chase owed by Chase Landon Cox Landon

SECTION 1: CODE REASONING + VERSION CONTROL CSE 331 Summer 2018 slides borrowed and adapted

Outline Scale-Free Networks Networks Scale-Free Networks Original model Original model

Prizes: the authors of the most interesting works and the most active ambassadors will receive

Tanmay Bhagwat (Slides borrowed from Kevin Cleary and modified as required) UB Cyber and Network

Knape &amp;Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides

Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and

with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software

Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos,

Animation Sequence Compression Yang Liu Department of Computer Science March 2009 . . . . .

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

A Little Confusing Without [a block digest], one must query the offset digest with all

Photoshop Workshop By Nate Kong Original Cropped Original Filters Original B&W Original

Segmentation & Custering Disclaimer: Many slides have been borrowed from Devi Parikh and

Knape &Vogt Slides Last Updated: 07/02/10 M averick Hardware KV Slides Medium Duty Slides