Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. - PowerPoint PPT Presentation

Fully Dynamic Graph Stream • Our model for a large and fully-dynamic graph • Discrete time 𝑢 , starting from 1 and ever increasing • At each time 𝑢 , a change in the input graph arrives ◦ change : either an insertion or deletion of an edge Time 𝑢 1 2 3 4 5 … Change +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) +(𝑐, 𝑒) … (given) 𝑑 𝑑 𝑑 𝑑 𝑒 Graph … Not Materialized (unmate 𝑏 𝑏 𝑏 𝑏 𝑏 𝑐 𝑐 𝑐 𝑐 𝑐 -rialized) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 31/127

Problem Definition • Given : ◦ a fully-dynamic graph stream (possibly infinite) ◦ memory space (finite) • Estimate: the count of triangles • To Minimize: estimation error Time 𝑢 1 2 3 4 5 … Given +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) +(𝑐, 𝑒) … (input) Changes Estimate … # Triangles (output) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 32/127

Roadmap • T1. Structure Analysis (Part 1) ◦ T1.1 Triangle Counting ▪ Handling Deletions (§6) ◦ Problem Definition ◦ Proposed Method: ThinkD << ◦ Experiments ▪ … • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) • Future Directions • Conclusions Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 33/127

Overview of ThinkD • Maintains and updates ෠ ∆ ◦ Number of (non-deleted) triangles that it has observed • How it processes an insertion : test count: arrive store (keep?) increase ෠ ∆ Yes No - arrive : an insertion of an edge arrives - count: count new triangles and increase ෠ ∆ - test : toss a coin - store : store the edge in memory Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 34/127

Overview of ThinkD (cont.) • Maintains and updates ෠ ∆ ◦ Number of (non-deleted) triangles that it has observed • How it processes an deletion : test count: arrive delete (stored?) decrease ෠ ∆ Yes No - arrive : a deletion of an edge arrives - count: count deleted triangles and decrease ෠ ∆ - test : test whether the edge is stored in memory - delete : delete the edge in memory Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 35/127

Why is ThinkD Accurate? • ThinkD (Think before You D iscard ): ◦ every arrived change is used to update ෡ ∆ store / count: arrive test update ෠ ∆ delete Yes No (discard) • Triest-FD [DERU17]: ◦ some changes are discarded without being used to update ෡ ∆ store / count: arrive test update ෠ Yes ∆ delete No (discard) → information loss! Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 36/125

Two Versions of ThinkD Q1: How to test in the test step Q2: How to estimate the count of all triangles from ෠ ∆ • ThinkD-FAST : simple and fast ◦ independent Bernoulli trials with probability 𝑞 • ThinkD-ACC : accurate and parameter-free ◦ random pairing [GLH08] Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 37/127

Unbiasedness of ThinkD-FAST ෠ ∆ • 𝒒 𝟑 : estimated count of all triangles • ∆ : true count of all triangles [ Theorem 1 ] At any time 𝒖 , ෠ ∆ 𝔽 𝒒 𝟑 = 𝜠 Unbiased estimate of 𝜠 • Proof and a variance of ෠ ∆/p 2 : see the thesis Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 38/127

ThinkD-ACC: More Accurate • Disadvantage of ThinkD-FAST: ◦ setting the parameter 𝑞 is not trivial ▪ small 𝑞 → underutilize memory → inaccurate estimation ▪ large 𝑞 → out-of-memory error • ThinkD-ACC uses Random Pairing [RLH08] ◦ always utilizes memory as fully as possible ◦ gives more accurate estimation Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 39/127

Scalability of ThinkD • Let 𝑙 be the size of memory • For processing 𝑢 changes in the input stream, [ Theorem 2 ] The time complexity of ThinkD-ACC is 𝑃(𝑙 ⋅ 𝑢) linear in data size 𝑙 [ Theorem 3 ] If 𝑞 = 𝑃 𝑢 , the time complexity ThinkD-FAST is 𝑃(𝑙 ⋅ 𝑢) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 40/127

Advantages of ThinkD Fast & Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 41/127

Roadmap • T1. Structure Analysis (Part 1) ◦ T1.1 Triangle Counting ▪ Handling Deletions (§6) ◦ Problem Definition ◦ Proposed Method: ThinkD ◦ Experiments << ▪ … • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) • … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 42/127

Experimental Settings • Competitors : Triest-FD [DERU17] & ESD [HS17] ◦ triangle counting in fully-dynamic graph streams • Implementations : • Datasets: ◦ insertions (edges in graphs) + deletions (random 20%) Synthetic Social Networks Citation Web Trust ( 100B edges) ( 1.8B+ edges, …) (16M+) (6M+) (0.7M+) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 43/127

EXP1. Variance Analysis ThinkD is accurate with small variance Triest-FD ThinkD-FAST ThinkD-ACC True Count Number of Processed Changes - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 44/127

EXP2. Scalability [THM 2 & 3] ThinkD is scalable ThinkD-ACC 100 billion ThinkD-FAST changes Number of Changes - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 45/127

EXP3. Space & Accuracy ThinkD outperforms its best competitors Estimation Error (ratio) Triest-FD ThinkD-FAST ESD ThinkD -ACC Memory budget (ratio) - dataset: 46 /127 Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7

EXP4. Speed & Accuracy ThinkD outperforms its best competitors Estimation Error (ratio) ThinkD-ACC ESD Triest-FD ThinkD -FAST Running time (Sec) - dataset: 47 /127 Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7

Advantages of ThinkD Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates 48 /127 Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7

Summary of §6 • We propose ThinkD ( Think Before you D iscard) ◦ for accurate triangle counting ◦ in large and fully-dynamic graphs Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates Download ThinkD Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 49/127

Organization of the Thesis (Recall) Part1. Part2. Part3. Structure Anomaly Behavior Analysis Detection Modeling Triangle Count Graphs Anomalous Purchase (§§ 3-6) Subgraph Behavior Summarization (§ 9) (§ 14) (§ 7) Tensors Summarization Dense Subtensors Progression (§ 8) (§§ 10-13) (§ 15) Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 50/127

T1.2 Summarization “Given a web -scale graph or tensor, how can we succinctly represent it?” Input graph Summary graph 𝑑 𝑔 𝑒 𝑏 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑕 𝑐 𝑓 T1-2 • §7: Summarizing Graphs • §8: Summarizing Tensors (via Tucker Decomposition) ◦ External-memory algorithm with 1,000 × improved scalability Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 51/127

Roadmap • T1. Structure Analysis (Part 1) ◦ … ◦ T1-2. Summarization (§§ 7-8) ▪ Summarizing Graphs (§ 7) ◦ Problem Definition << ◦ Proposed Method: SWeG ◦ Experiments ▪ … • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) • … K. Shin , A. Ghoting , M. Kim, and H. Raghavan, “ SWeG: Lossless and Lossy Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 52/127 Summarization of Web- Scale Graphs”, WWW 2019

Graph Summarization: Example 𝑑 𝑑 𝑔 𝑒 𝑔 𝑏 𝑒 − (𝑏, 𝑒) 𝑕 𝑕 𝑓 𝑐 𝑏, 𝑐 𝑓 Input Graph (w/ 9 edges) 𝑔 − (𝑏, 𝑒) − (𝑏, 𝑒) − (𝑑, 𝑓) − (𝑑, 𝑓) 𝑕 + 𝑒, 𝑕 𝑑, 𝑒, 𝑓 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑏, 𝑐 + 𝑒, 𝑕 Output (w/ 6 edges) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 53/127

Graph Summarization [NRS08] • Given: an input graph • Find: ◦ a summary graph ◦ positive and negative residual graphs • To Minimize : the edge count ( ≈ description length) Residual Graph 𝑑 (Positive) + 𝑒, 𝑕 𝑔 𝑒 𝑏 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 𝑕 𝑓 𝑐 − (𝑏, 𝑒) − (𝑑, 𝑓) Residual Graph Input Graph Summary Graph (Negative) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 54/127

Restoration: Example 𝑔 − (𝑏, 𝑒) − (𝑏, 𝑒) − (𝑑, 𝑓) − (𝑑, 𝑓) 𝑕 + 𝑒, 𝑕 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑑, 𝑒, 𝑓 𝑏, 𝑐 𝑏, 𝑐 + 𝑒, 𝑕 Summarized Graph (w/ 6 edges) 𝑑 𝑑 𝑔 𝑔 𝑒 𝑒 𝑏 − (𝑏, 𝑒) 𝑕 𝑕 𝑓 𝑓 𝑏, 𝑐 𝑐 Restored Graph (w/ 9 edges) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 55/127

Why Graph Summarization? • Summarization: ◦ the summary graph is easy to visualize and interpret • Compression: ◦ support efficient neighbor queries discussed ◦ applicable to lossy compression in the thesis ◦ combinable with other graph compression techniques ▪ the outputs are also graphs + 𝑒, 𝑕 Residual Graph (Positive) 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏, 𝑐 − (𝑏, 𝑒) Residual Graph (Negative) − (𝑑, 𝑓) Summary Graph Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 56/127

Challenge: Scalability! Compression VoG [KKVF14] Performance Greedy [NSR08] 10,000 × SWeG Good Randomized [NSR08] SAGS [KNL15] Bad millions 10 millions billions Maximum Size of Graphs Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 57/127

Our Contribution: SWeG • We develop SWeG ( S ummarizing We b-scale G raphs): Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 58/127

Roadmap • T1. Structure Analysis (Part 1) ◦ … ◦ T1-2. Summarization (§§ 7-8) ▪ Summarizing Graphs (§ 7) ◦ Problem Definition ◦ Proposed Method: SWeG << ◦ Experiments ▪ … • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) • … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 59/127

Terminologies Summary Graph 𝑻 Residual Graph 𝑺 Positive Residual super node Graph 𝑺 + + 𝑒, 𝑕 Negative {𝑏, 𝑐} 𝑑, 𝑒, 𝑓 𝑔, 𝑕 Residual − (𝑏, 𝑒) = 𝐵 = 𝐶 = 𝐷 Graph 𝑺 − − (𝑑, 𝑓) Encoding cost when 𝐵 and 𝐶 𝐷𝑝𝑡𝑢(𝐵 ∪ 𝐶) are merged 𝑻𝒃𝒘𝒋𝒐𝒉 𝑩, 𝑪 : = 1 − 𝐷𝑝𝑡𝑢 𝐵 + 𝐷𝑝𝑡𝑢(𝐶) Encoding cost of 𝐶 Encoding cost of 𝐵 Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 60/127

Overview of SWeG • Inputs: - input graph 𝑯 - number of iterations 𝑼 • Outputs: - summary graph 𝑻 - residual graph 𝑺 (or 𝑺 + and 𝑺 − ) • Procedure: • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 61/127

Overview: Initializing Step Summary Graph 𝑻 = 𝑯 Residual Graph 𝑺 = ∅ 𝐷 = {𝑑} 𝐺 = {𝑔} 𝐸 = {𝑒} 𝐵 = {𝑏} 𝐻 = {𝑕} 𝐶 = {𝑐} 𝐹 = {𝑓} • S0: Initializing Step << repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 62/127

Overview: Dividing Step • Divides super nodes into groups ◦ MinHashing (used) , EigenSpoke, Min-Cut, etc. 𝐷 = {𝑑} 𝐸 = {𝑒} 𝐺 = {𝑔} 𝐵 = {𝑏} 𝐹 = {𝑓} 𝐻 = {𝑕} 𝐶 = {𝑐} • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step << • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 63/127

Overview: Merging Step • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄 (𝑢) 𝐷 = {𝑑} 𝐺 = {𝑔, 𝑕} 𝐸 = {𝑒, 𝑓} 𝐵 = {𝑏, 𝑐} • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 64/127

Overview: Merging Step (cont.) Summary Graph 𝑻 Residual Graph 𝑺 𝐷 = {𝑑} + 𝑒, 𝑕 𝐵 = {𝑏, 𝑐} 𝐺 = {𝑔, 𝑕} − (𝑏, 𝑒) 𝐸 = {𝑒, 𝑓} • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 65/127

Overview: Dividing Step • Divides super nodes into groups 𝐺 = {𝑔, 𝑕} 𝐷 = {𝑑} 𝐸 = {𝑒, 𝑓} 𝐵 = {𝑏, 𝑐} • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step << • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 66/127

Overview: Merging Step • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄 (𝑢) 𝐺 = {𝑔, 𝑕} 𝐷 = {𝑑, 𝑒, 𝑓} 𝐵 = {𝑏, 𝑐} • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 67/127

Overview: Merging Step (cont.) Summary Graph 𝑻 Residual Graph 𝑺 𝐵 = {𝑏, 𝑐} + 𝑒, 𝑕 𝐺 = 𝑔, 𝑕 𝐶 = − (𝑏, 𝑒) 𝑑, 𝑒, 𝑓 − (𝑑, 𝑓) • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 68/127

Overview: Merging Step (cont.) • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄 (𝑢) • Decreasing 𝜄 (𝑢) = 1 + 𝑢 −1 ◦ exploration of other groups ◦ exploitation within each group ◦ ~ 30 % better compression than 𝜄 (𝑢) = 0 • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step << • S2: Compressing Step (optional) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 69/127

Overview: Compressing Step • Compress each output graph ( 𝑻 , 𝑺 + and 𝑺 − ) • Use any off-the-shelf graph-compression algorithm ◦ Boldi-Vigna [BV04] ◦ VNMiner [BC08] ◦ Graph Bisection [DKKO+16] • S0: Initializing Step repeat 𝑈 times • • S1-1: Dividing Step • S1-2: Merging Step • S2: Compressing Step (optional) << Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 70/127

Parallel & Distributed Processing • Map stage: compute min hashes in parallel • Shuffle stage: divide super nodes using min hashes • Reduce stage: process groups independently in parallel No need to load the entire graph in memory! MinHash = 𝟒 MinHash = 𝟐 MinHash = 𝟑 𝐵 = {𝑏} 𝐸 = {𝑒} 𝐺 = {𝑔} 𝐶 = {𝑐} 𝐹 = {𝑓} 𝐻 = {𝑕} 𝐷 = {𝑑} Merge! Merge! Merge! Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 71/127

Roadmap • T1. Structure Analysis (Part 1) ◦ ... ◦ T1-2. Summarization (§§ 7-8) ▪ Summarizing Graphs (§ 7) ◦ Problem Definition ◦ Proposed Method: SWeG ◦ Experiments << ▪ … • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) • … Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 72/127

Experimental Settings • 13 real-world graphs (10K - 20B edges) … Social Collaboration Citation Web … • Graph summarization algorithms: ◦ Greedy [NRS08], Randomized [NSR08], SAGS [KNL15] • Implementations: & Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 73/127

EXP1. Speed and Compression SWeG outperforms its competitors SWeG - dataset: Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 74/127

Advantages of SWeG (Recall) Fast with Concise Outputs Memory Efficient Scalable Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 75/127

EXP2. Memory Efficiency SWeG loads ≤ 0.1 − 4% of edges in main memory at once Input Graph 294X 1209X Memory Usage Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 76/127

EXP3. Effect of Iterations About 20 iterations are enough Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 78/127

EXP4. Data Scalability SWeG is linear in the number of edges ≥ 𝟑𝟏 billion edges SWeG SWeG (Single machine) (Hadoop) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 79/127

EXP5. Machine Scalability SWeG scales up SWeG SWeG (Hadoop) (Single machine) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 80/127

Summary of §7 • We propose SWeG ( S ummarizing W eb G raphs) ◦ for summarizing large-scale graphs Fast with Concise Outputs Memory Efficient Scalable SWeG SWeG (Hadoop) Part 1 / Part 2 / Part 3 Graph / Tensor §4 / §5 / §6 / §7 82/127

Contributions and Impact (Part 1) Triangle counting algorithms [ICDM17, PKDD18, PAKDD18] Summarization algorithms [WSDM17, WWW19] Patent on SWeG : filed by LinkedIn Inc. Open-source software : downloaded 82 times github.com /kijungs SWeG ThinkD Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 83/127

T2. Anomaly Detection (Part 2) “How can we detect anomalies or fraudsters in large dynamic graphs (or tensors)?” Hint : fraudsters ?? tend to form dense subgraphs benign dense subgraphs Part 1 / Part 2 / Part 3 Graph / Tensor 85/127

T2-1. Utilizing Patterns • T2-1. Patterns and Anomalies in Dense Subgraphs (§ 9) “What are patterns in dense subgraphs?” “What are anomalies deviating from the patterns?” K. Shin , T. Eliassi-Rad, C. Faloutsos , “Patterns and Anomalies in k -Cores of Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/127 Real- world Graphs with Applications”, KAIS 2018 (formerly, ICDM 2016 )

T2-2. Utilizing Side Information Accounts Items Part 1 / Part 2 / Part 3 Graph / Tensor §11 / §12 / §13 / §14 87/127

T2-2. Utilizing Side Information “How can we detect dense subtensors in large dynamic data?” • T2-2. Detecting Dense Subtensors (§§ 11-13) ◦ In-memory Algorithm (§ 11) ◦ Distribute Algorithm for Web-scale Tensors (§ 12) ◦ Incremental Algorithms for Dynamic Tensors (§ 13) Part 1 / Part 2 / Part 3 Graph / Tensor §11 / §12 / §13 / §14 88/127

Contributions and Impact (Part 2) Patterns in dense subgraphs [ICDM16] ◦ Award : best paper candidate at ICDM 2016 ◦ Class: Algorithms for dense subtensors [PKDD16, WSDM17, KDD17] ◦ Real-world usage: Open-source software : downloaded 257 times github.com /kijungs Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 89/127

T3. Behavior Modeling (Part 3) “How can we model the behavior of individuals in graph and tensor data?” Social Network Behavior Log on Social Media • T3-1. Modeling Purchase Behavior in a Social Network (§14) • T3-2. Modeling Progression of Users of Social Media (§15) “How do users evolve over time on social media?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 91/127

Roadmap • T1. Structure Analysis (Part 1) • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) ◦ T3-1. Modeling Purchases (§14) << ◦ … • Future Directions • Conclusions K Shin , E Lee, D Eswaran, AD Procaccia , “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017 Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/127

Sharable Goods: Question Portable crib IKEA toolkit DVDs “What do they have in common?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 93/127

Sharable Goods: Properties • Used occasionally • Share with friends • Do not share with strangers Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 94/127

Motivation: Social Inefficiency Popular Lonely Efficiency High Low of Purchase (share with many) (share with few) can be Low can be High Likelihood (likely to borrow) (likely to buy) of Purchase Q1 “How large can social inefficiency be?” Q2 “How to nudge people towards efficiency?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 95/127

Roadmap • T1. Structure Analysis (Part 1) • T2. Anomaly Detection (Part 2) • T3. Behavior Modeling (Part 3) ◦ T3-1. Modeling Purchases (§14) ▪ Toy Example << ▪ Game-theoretic Model ▪ Best Rental-fee Search ◦ … • Future Directions • Conclusions Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 96/127

Social Network • Consider a social network , which is a graph ◦ Nodes : people ◦ Edges : friendship Carol Alice Bob “How many people should buy an IKEA toolkit for everyone to use it?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 97/127

Socially Optimal Decision • The answer is at least 𝟑 • Socially optimal : ◦ everyone uses a toolkit ◦ with minimum purchases (or with minimum cost) Alice Bob “Does everyone want to stick to their current decisions?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 98/127

Individually Optimal Decision • The answer is No • Individually optimal : ◦ everyone best responses to others’ decisions Alice Bob • Socially inefficient (suboptimal): ◦ 4 purchases happen when 2 are optimal Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 99/127

Social Inefficiency • Individually optimal outcome with 6 purchases Carol Dan “How can we prevent this social inefficiency?” Part 1 / Part 2 / Part 3 Graph Tensor §14 / §15 100/127

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. - PowerPoint PPT Presentation

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu) Thesis Committee Prof. Christos Faloutsos (Chair) Prof. Tom M. Mitchell Prof. Leman Akoglu Prof. Philip S. Yu Mining Large Dynamic Graphs

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Student (kijungs@cs.cmu.edu) Thesis

Outline Outline 4 Basic Rules 4 Basic Rules 4 Vectors and Tensors 4 Vectors and Tensors 4

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Mining Dynamic and Augmented Graphs A Constraint-Based Pattern Mining View Marc Plantevit MEET

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Think of it as a RE treat. Lower costs Contribute to society Something to do and think

Disclosure Medical Problems in the I have no financial disclosures or conflicts of interest in

Implementing NYS Healthcare Reform Initiatives: DSRIP Update and Key IT Initiatives Greg Allen,

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George

Ignition of unipolar arcing on nanostructured tungsten Shin Kajita, Nagoya university

Crash-Neutral Currency Carry Trades Jakub W. Jurek Princeton University Bendheim Center for

and What to Do About I t Anat Admati Stanford University June, 2015

Towards an Efficient Switch Architecture for High-Radix Switches G. Mora 1 J. Flich 1 J. Duato 1 .