mining large dynamic graphs and tensors
play

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. - PowerPoint PPT Presentation

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu) Thesis Committee Prof. Christos Faloutsos (Chair) Prof. Tom M. Mitchell Prof. Leman Akoglu Prof. Philip S. Yu Mining Large Dynamic Graphs


  1. Fully Dynamic Graph Stream β€’ Our model for a large and fully-dynamic graph β€’ Discrete time 𝑒 , starting from 1 and ever increasing β€’ At each time 𝑒 , a change in the input graph arrives β—¦ change : either an insertion or deletion of an edge Time 𝑒 1 2 3 4 5 … Change +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) βˆ’(𝑏, 𝑐) +(𝑐, 𝑒) … (given) 𝑑 𝑑 𝑑 𝑑 𝑒 Graph … Not Materialized (unmate 𝑏 𝑏 𝑏 𝑏 𝑏 𝑐 𝑐 𝑐 𝑐 𝑐 -rialized) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 31/127

  2. Problem Definition β€’ Given : β—¦ a fully-dynamic graph stream (possibly infinite) β—¦ memory space (finite) β€’ Estimate: the count of triangles β€’ To Minimize: estimation error Time 𝑒 1 2 3 4 5 … Given +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) βˆ’(𝑏, 𝑐) +(𝑐, 𝑒) … (input) Changes Estimate … # Triangles (output) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 32/127

  3. Roadmap β€’ T1. Structure Analysis (Part 1) β—¦ T1.1 Triangle Counting β–ͺ Handling Deletions (Β§6) β—¦ Problem Definition β—¦ Proposed Method: ThinkD << β—¦ Experiments β–ͺ … β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β€’ Future Directions β€’ Conclusions Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 33/127

  4. Overview of ThinkD β€’ Maintains and updates ΰ·  βˆ† β—¦ Number of (non-deleted) triangles that it has observed β€’ How it processes an insertion : test count: arrive store (keep?) increase ΰ·  βˆ† Yes No - arrive : an insertion of an edge arrives - count: count new triangles and increase ΰ·  βˆ† - test : toss a coin - store : store the edge in memory Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 34/127

  5. Overview of ThinkD (cont.) β€’ Maintains and updates ΰ·  βˆ† β—¦ Number of (non-deleted) triangles that it has observed β€’ How it processes an deletion : test count: arrive delete (stored?) decrease ΰ·  βˆ† Yes No - arrive : a deletion of an edge arrives - count: count deleted triangles and decrease ΰ·  βˆ† - test : test whether the edge is stored in memory - delete : delete the edge in memory Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 35/127

  6. Why is ThinkD Accurate? β€’ ThinkD (Think before You D iscard ): β—¦ every arrived change is used to update ΰ·‘ βˆ† store / count: arrive test update ΰ·  βˆ† delete Yes No (discard) β€’ Triest-FD [DERU17]: β—¦ some changes are discarded without being used to update ΰ·‘ βˆ† store / count: arrive test update ΰ·  Yes βˆ† delete No (discard) β†’ information loss! Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 36/125

  7. Two Versions of ThinkD Q1: How to test in the test step Q2: How to estimate the count of all triangles from ΰ·  βˆ† β€’ ThinkD-FAST : simple and fast β—¦ independent Bernoulli trials with probability π‘ž β€’ ThinkD-ACC : accurate and parameter-free β—¦ random pairing [GLH08] Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 37/127

  8. Unbiasedness of ThinkD-FAST ΰ·  βˆ† β€’ 𝒒 πŸ‘ : estimated count of all triangles β€’ βˆ† : true count of all triangles [ Theorem 1 ] At any time 𝒖 , ΰ·  βˆ† 𝔽 𝒒 πŸ‘ = 𝜠 Unbiased estimate of 𝜠 β€’ Proof and a variance of ΰ·  βˆ†/p 2 : see the thesis Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 38/127

  9. ThinkD-ACC: More Accurate β€’ Disadvantage of ThinkD-FAST: β—¦ setting the parameter π‘ž is not trivial β–ͺ small π‘ž β†’ underutilize memory β†’ inaccurate estimation β–ͺ large π‘ž β†’ out-of-memory error β€’ ThinkD-ACC uses Random Pairing [RLH08] β—¦ always utilizes memory as fully as possible β—¦ gives more accurate estimation Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 39/127

  10. Scalability of ThinkD β€’ Let 𝑙 be the size of memory β€’ For processing 𝑒 changes in the input stream, [ Theorem 2 ] The time complexity of ThinkD-ACC is 𝑃(𝑙 β‹… 𝑒) linear in data size 𝑙 [ Theorem 3 ] If π‘ž = 𝑃 𝑒 , the time complexity ThinkD-FAST is 𝑃(𝑙 β‹… 𝑒) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 40/127

  11. Advantages of ThinkD Fast & Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 41/127

  12. Roadmap β€’ T1. Structure Analysis (Part 1) β—¦ T1.1 Triangle Counting β–ͺ Handling Deletions (Β§6) β—¦ Problem Definition β—¦ Proposed Method: ThinkD β—¦ Experiments << β–ͺ … β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β€’ … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 42/127

  13. Experimental Settings β€’ Competitors : Triest-FD [DERU17] & ESD [HS17] β—¦ triangle counting in fully-dynamic graph streams β€’ Implementations : β€’ Datasets: β—¦ insertions (edges in graphs) + deletions (random 20%) Synthetic Social Networks Citation Web Trust ( 100B edges) ( 1.8B+ edges, …) (16M+) (6M+) (0.7M+) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 43/127

  14. EXP1. Variance Analysis ThinkD is accurate with small variance Triest-FD ThinkD-FAST ThinkD-ACC True Count Number of Processed Changes - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 44/127

  15. EXP2. Scalability [THM 2 & 3] ThinkD is scalable ThinkD-ACC 100 billion ThinkD-FAST changes Number of Changes - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 45/127

  16. EXP3. Space & Accuracy ThinkD outperforms its best competitors Estimation Error (ratio) Triest-FD ThinkD-FAST ESD ThinkD -ACC Memory budget (ratio) - dataset: 46 /127 Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7

  17. EXP4. Speed & Accuracy ThinkD outperforms its best competitors Estimation Error (ratio) ThinkD-ACC ESD Triest-FD ThinkD -FAST Running time (Sec) - dataset: 47 /127 Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7

  18. Advantages of ThinkD Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates 48 /127 Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7

  19. Summary of Β§6 β€’ We propose ThinkD ( Think Before you D iscard) β—¦ for accurate triangle counting β—¦ in large and fully-dynamic graphs Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Download ThinkD Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 49/127

  20. Organization of the Thesis (Recall) Part1. Part2. Part3. Structure Anomaly Behavior Analysis Detection Modeling Triangle Count Graphs Anomalous Purchase (§§ 3-6) Subgraph Behavior Summarization (§ 9) (§ 14) (§ 7) Tensors Summarization Dense Subtensors Progression (§ 8) (§§ 10-13) (§ 15) Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 50/127

  21. T1.2 Summarization β€œGiven a web -scale graph or tensor, how can we succinctly represent it?” Input graph Summary graph 𝑑 𝑔 𝑒 𝑏 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑕 𝑐 𝑓 T1-2 β€’ Β§7: Summarizing Graphs β€’ Β§8: Summarizing Tensors (via Tucker Decomposition) β—¦ External-memory algorithm with 1,000 Γ— improved scalability Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 51/127

  22. Roadmap β€’ T1. Structure Analysis (Part 1) β—¦ … β—¦ T1-2. Summarization (§§ 7-8) β–ͺ Summarizing Graphs (Β§ 7) β—¦ Problem Definition << β—¦ Proposed Method: SWeG β—¦ Experiments β–ͺ … β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β€’ … K. Shin , A. Ghoting , M. Kim, and H. Raghavan, β€œ SWeG: Lossless and Lossy Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 52/127 Summarization of Web- Scale Graphs”, WWW 2019

  23. Graph Summarization: Example 𝑑 𝑑 𝑔 𝑒 𝑔 𝑏 𝑒 βˆ’ (𝑏, 𝑒) 𝑕 𝑕 𝑓 𝑐 𝑏, 𝑐 𝑓 Input Graph (w/ 9 edges) 𝑔 βˆ’ (𝑏, 𝑒) βˆ’ (𝑏, 𝑒) βˆ’ (𝑑, 𝑓) βˆ’ (𝑑, 𝑓) 𝑕 + 𝑒, 𝑕 𝑑, 𝑒, 𝑓 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑏, 𝑐 + 𝑒, 𝑕 Output (w/ 6 edges) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 53/127

  24. Graph Summarization [NRS08] β€’ Given: an input graph β€’ Find: β—¦ a summary graph β—¦ positive and negative residual graphs β€’ To Minimize : the edge count ( β‰ˆ description length) Residual Graph 𝑑 (Positive) + 𝑒, 𝑕 𝑔 𝑒 𝑏 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑕 𝑓 𝑐 βˆ’ (𝑏, 𝑒) βˆ’ (𝑑, 𝑓) Residual Graph Input Graph Summary Graph (Negative) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 54/127

  25. Restoration: Example 𝑔 βˆ’ (𝑏, 𝑒) βˆ’ (𝑏, 𝑒) βˆ’ (𝑑, 𝑓) βˆ’ (𝑑, 𝑓) 𝑕 + 𝑒, 𝑕 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑑, 𝑒, 𝑓 𝑏, 𝑐 𝑏, 𝑐 + 𝑒, 𝑕 Summarized Graph (w/ 6 edges) 𝑑 𝑑 𝑔 𝑔 𝑒 𝑒 𝑏 βˆ’ (𝑏, 𝑒) 𝑕 𝑕 𝑓 𝑓 𝑏, 𝑐 𝑐 Restored Graph (w/ 9 edges) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 55/127

  26. Why Graph Summarization? β€’ Summarization: β—¦ the summary graph is easy to visualize and interpret β€’ Compression: β—¦ support efficient neighbor queries discussed β—¦ applicable to lossy compression in the thesis β—¦ combinable with other graph compression techniques β–ͺ the outputs are also graphs + 𝑒, 𝑕 Residual Graph (Positive) 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 βˆ’ (𝑏, 𝑒) Residual Graph (Negative) βˆ’ (𝑑, 𝑓) Summary Graph Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 56/127

  27. Challenge: Scalability! Compression VoG [KKVF14] Performance Greedy [NSR08] 10,000 Γ— SWeG Good Randomized [NSR08] SAGS [KNL15] Bad millions 10 millions billions Maximum Size of Graphs Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 57/127

  28. Our Contribution: SWeG β€’ We develop SWeG ( S ummarizing We b-scale G raphs): Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 58/127

  29. Roadmap β€’ T1. Structure Analysis (Part 1) β—¦ … β—¦ T1-2. Summarization (§§ 7-8) β–ͺ Summarizing Graphs (Β§ 7) β—¦ Problem Definition β—¦ Proposed Method: SWeG << β—¦ Experiments β–ͺ … β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β€’ … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 59/127

  30. Terminologies Summary Graph 𝑻 Residual Graph 𝑺 Positive Residual super node Graph 𝑺 + + 𝑒, 𝑕 Negative {𝑏, 𝑐} 𝑑, 𝑒, 𝑓 𝑔, 𝑕 Residual βˆ’ (𝑏, 𝑒) = 𝐡 = 𝐢 = 𝐷 Graph 𝑺 βˆ’ βˆ’ (𝑑, 𝑓) Encoding cost when 𝐡 and 𝐢 𝐷𝑝𝑑𝑒(𝐡 βˆͺ 𝐢) are merged π‘»π’ƒπ’˜π’‹π’π’‰ 𝑩, π‘ͺ : = 1 βˆ’ 𝐷𝑝𝑑𝑒 𝐡 + 𝐷𝑝𝑑𝑒(𝐢) Encoding cost of 𝐢 Encoding cost of 𝐡 Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 60/127

  31. Overview of SWeG β€’ Inputs: - input graph 𝑯 - number of iterations 𝑼 β€’ Outputs: - summary graph 𝑻 - residual graph 𝑺 (or 𝑺 + and 𝑺 βˆ’ ) β€’ Procedure: β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 61/127

  32. Overview: Initializing Step Summary Graph 𝑻 = 𝑯 Residual Graph 𝑺 = βˆ… 𝐷 = {𝑑} 𝐺 = {𝑔} 𝐸 = {𝑒} 𝐡 = {𝑏} 𝐻 = {𝑕} 𝐢 = {𝑐} 𝐹 = {𝑓} β€’ S0: Initializing Step << repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 62/127

  33. Overview: Dividing Step β€’ Divides super nodes into groups β—¦ MinHashing (used) , EigenSpoke, Min-Cut, etc. 𝐷 = {𝑑} 𝐸 = {𝑒} 𝐺 = {𝑔} 𝐡 = {𝑏} 𝐹 = {𝑓} 𝐻 = {𝑕} 𝐢 = {𝑐} β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step << β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 63/127

  34. Overview: Merging Step β€’ Merge some supernodes within each group if π‘‡π‘π‘€π‘—π‘œπ‘• > πœ„ (𝑒) 𝐷 = {𝑑} 𝐺 = {𝑔, 𝑕} 𝐸 = {𝑒, 𝑓} 𝐡 = {𝑏, 𝑐} β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step << β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 64/127

  35. Overview: Merging Step (cont.) Summary Graph 𝑻 Residual Graph 𝑺 𝐷 = {𝑑} + 𝑒, 𝑕 𝐡 = {𝑏, 𝑐} 𝐺 = {𝑔, 𝑕} βˆ’ (𝑏, 𝑒) 𝐸 = {𝑒, 𝑓} β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 65/127

  36. Overview: Dividing Step β€’ Divides super nodes into groups 𝐺 = {𝑔, 𝑕} 𝐷 = {𝑑} 𝐸 = {𝑒, 𝑓} 𝐡 = {𝑏, 𝑐} β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step << β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 66/127

  37. Overview: Merging Step β€’ Merge some supernodes within each group if π‘‡π‘π‘€π‘—π‘œπ‘• > πœ„ (𝑒) 𝐺 = {𝑔, 𝑕} 𝐷 = {𝑑, 𝑒, 𝑓} 𝐡 = {𝑏, 𝑐} β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step << β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 67/127

  38. Overview: Merging Step (cont.) Summary Graph 𝑻 Residual Graph 𝑺 𝐡 = {𝑏, 𝑐} + 𝑒, 𝑕 𝐺 = 𝑔, 𝑕 𝐢 = βˆ’ (𝑏, 𝑒) 𝑑, 𝑒, 𝑓 βˆ’ (𝑑, 𝑓) β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 68/127

  39. Overview: Merging Step (cont.) β€’ Merge some supernodes within each group if π‘‡π‘π‘€π‘—π‘œπ‘• > πœ„ (𝑒) β€’ Decreasing πœ„ (𝑒) = 1 + 𝑒 βˆ’1 β—¦ exploration of other groups β—¦ exploitation within each group β—¦ ~ 30 % better compression than πœ„ (𝑒) = 0 β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step << β€’ S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 69/127

  40. Overview: Compressing Step β€’ Compress each output graph ( 𝑻 , 𝑺 + and 𝑺 βˆ’ ) β€’ Use any off-the-shelf graph-compression algorithm β—¦ Boldi-Vigna [BV04] β—¦ VNMiner [BC08] β—¦ Graph Bisection [DKKO+16] β€’ S0: Initializing Step repeat π‘ˆ times β€’ β€’ S1-1: Dividing Step β€’ S1-2: Merging Step β€’ S2: Compressing Step (optional) << Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 70/127

  41. Parallel & Distributed Processing β€’ Map stage: compute min hashes in parallel β€’ Shuffle stage: divide super nodes using min hashes β€’ Reduce stage: process groups independently in parallel No need to load the entire graph in memory! MinHash = πŸ’ MinHash = 𝟐 MinHash = πŸ‘ 𝐡 = {𝑏} 𝐸 = {𝑒} 𝐺 = {𝑔} 𝐢 = {𝑐} 𝐹 = {𝑓} 𝐻 = {𝑕} 𝐷 = {𝑑} Merge! Merge! Merge! Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 71/127

  42. Roadmap β€’ T1. Structure Analysis (Part 1) β—¦ ... β—¦ T1-2. Summarization (§§ 7-8) β–ͺ Summarizing Graphs (Β§ 7) β—¦ Problem Definition β—¦ Proposed Method: SWeG β—¦ Experiments << β–ͺ … β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β€’ … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 72/127

  43. Experimental Settings β€’ 13 real-world graphs (10K - 20B edges) … Social Collaboration Citation Web … β€’ Graph summarization algorithms: β—¦ Greedy [NRS08], Randomized [NSR08], SAGS [KNL15] β€’ Implementations: & Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 73/127

  44. EXP1. Speed and Compression SWeG outperforms its competitors SWeG - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 74/127

  45. Advantages of SWeG (Recall) Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 75/127

  46. EXP2. Memory Efficiency SWeG loads ≀ 0.1 βˆ’ 4% of edges in main memory at once Input Graph 294X 1209X Memory Usage Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 76/127

  47. Advantages of SWeG (Recall) Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 77/127

  48. EXP3. Effect of Iterations About 20 iterations are enough Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 78/127

  49. EXP4. Data Scalability SWeG is linear in the number of edges β‰₯ πŸ‘πŸ billion edges SWeG SWeG (Single machine) (Hadoop) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 79/127

  50. EXP5. Machine Scalability SWeG scales up SWeG SWeG (Hadoop) (Single machine) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 80/127

  51. Advantages of SWeG (Recall) Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 81/127

  52. Summary of Β§7 β€’ We propose SWeG ( S ummarizing W eb G raphs) β—¦ for summarizing large-scale graphs Fast with Concise Outputs Memory Efficient Scalable SWeG SWeG (Hadoop) Part 1 / Part 2 / Part 3 Graph / Tensor Β§4 / Β§5 / Β§6 / Β§7 82/127

  53. Contributions and Impact (Part 1) Triangle counting algorithms [ICDM17, PKDD18, PAKDD18] Summarization algorithms [WSDM17, WWW19] Patent on SWeG : filed by LinkedIn Inc. Open-source software : downloaded 82 times github.com /kijungs SWeG ThinkD Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 83/127

  54. Organization of the Thesis (Recall) Part1. Part2. Part3. Structure Anomaly Behavior Analysis Detection Modeling Triangle Count Graphs Anomalous Purchase (§§ 3-6) Subgraph Behavior Summarization (§ 9) (§ 14) (§ 7) Tensors Summarization Dense Subtensors Progression (§ 8) (§§ 10-13) (§ 15) Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 84/127

  55. T2. Anomaly Detection (Part 2) β€œHow can we detect anomalies or fraudsters in large dynamic graphs (or tensors)?” Hint : fraudsters ?? tend to form dense subgraphs benign dense subgraphs Part 1 / Part 2 / Part 3 Graph / Tensor 85/127

  56. T2-1. Utilizing Patterns β€’ T2-1. Patterns and Anomalies in Dense Subgraphs (Β§ 9) β€œWhat are patterns in dense subgraphs?” β€œWhat are anomalies deviating from the patterns?” K. Shin , T. Eliassi-Rad, C. Faloutsos , β€œPatterns and Anomalies in k -Cores of Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/127 Real- world Graphs with Applications”, KAIS 2018 (formerly, ICDM 2016 )

  57. T2-2. Utilizing Side Information Accounts Items Part 1 / Part 2 / Part 3 Graph / Tensor Β§11 / Β§12 / Β§13 / Β§14 87/127

  58. T2-2. Utilizing Side Information β€œHow can we detect dense subtensors in large dynamic data?” β€’ T2-2. Detecting Dense Subtensors (§§ 11-13) β—¦ In-memory Algorithm (Β§ 11) β—¦ Distribute Algorithm for Web-scale Tensors (Β§ 12) β—¦ Incremental Algorithms for Dynamic Tensors (Β§ 13) Part 1 / Part 2 / Part 3 Graph / Tensor Β§11 / Β§12 / Β§13 / Β§14 88/127

  59. Contributions and Impact (Part 2) Patterns in dense subgraphs [ICDM16] β—¦ Award : best paper candidate at ICDM 2016 β—¦ Class: Algorithms for dense subtensors [PKDD16, WSDM17, KDD17] β—¦ Real-world usage: Open-source software : downloaded 257 times github.com /kijungs Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 89/127

  60. Organization of the Thesis (Recall) Part1. Part2. Part3. Structure Anomaly Behavior Analysis Detection Modeling Triangle Count Graphs Anomalous Purchase (§§ 3-6) Subgraph Behavior Summarization (§ 9) (§ 14) (§ 7) Tensors Summarization Dense Subtensors Progression (§ 8) (§§ 10-13) (§ 15) Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 90/127

  61. T3. Behavior Modeling (Part 3) β€œHow can we model the behavior of individuals in graph and tensor data?” Social Network Behavior Log on Social Media β€’ T3-1. Modeling Purchase Behavior in a Social Network (Β§14) β€’ T3-2. Modeling Progression of Users of Social Media (Β§15) β€œHow do users evolve over time on social media?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 91/127

  62. Roadmap β€’ T1. Structure Analysis (Part 1) β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β—¦ T3-1. Modeling Purchases (Β§14) << β—¦ … β€’ Future Directions β€’ Conclusions K Shin , E Lee, D Eswaran, AD Procaccia , β€œWhy You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/127

  63. Sharable Goods: Question Portable crib IKEA toolkit DVDs β€œWhat do they have in common?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 93/127

  64. Sharable Goods: Properties β€’ Used occasionally β€’ Share with friends β€’ Do not share with strangers Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 94/127

  65. Motivation: Social Inefficiency Popular Lonely Efficiency High Low of Purchase (share with many) (share with few) can be Low can be High Likelihood (likely to borrow) (likely to buy) of Purchase Q1 β€œHow large can social inefficiency be?” Q2 β€œHow to nudge people towards efficiency?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 95/127

  66. Roadmap β€’ T1. Structure Analysis (Part 1) β€’ T2. Anomaly Detection (Part 2) β€’ T3. Behavior Modeling (Part 3) β—¦ T3-1. Modeling Purchases (Β§14) β–ͺ Toy Example << β–ͺ Game-theoretic Model β–ͺ Best Rental-fee Search β—¦ … β€’ Future Directions β€’ Conclusions Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 96/127

  67. Social Network β€’ Consider a social network , which is a graph β—¦ Nodes : people β—¦ Edges : friendship Carol Alice Bob β€œHow many people should buy an IKEA toolkit for everyone to use it?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 97/127

  68. Socially Optimal Decision β€’ The answer is at least πŸ‘ β€’ Socially optimal : β—¦ everyone uses a toolkit β—¦ with minimum purchases (or with minimum cost) Alice Bob β€œDoes everyone want to stick to their current decisions?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 98/127

  69. Individually Optimal Decision β€’ The answer is No β€’ Individually optimal : β—¦ everyone best responses to others’ decisions Alice Bob β€’ Socially inefficient (suboptimal): β—¦ 4 purchases happen when 2 are optimal Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 99/127

  70. Social Inefficiency β€’ Individually optimal outcome with 6 purchases Carol Dan β€œHow can we prevent this social inefficiency?” Part 1 / Part 2 / Part 3 Graph Tensor Β§14 / Β§15 100/127

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend