Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. - - PowerPoint PPT Presentation

mining large dynamic graphs and tensors
SMART_READER_LITE
LIVE PREVIEW

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. - - PowerPoint PPT Presentation

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu) Thesis Committee Prof. Christos Faloutsos (Chair) Prof. Tom M. Mitchell Prof. Leman Akoglu Prof. Philip S. Yu Mining Large Dynamic Graphs


slide-1
SLIDE 1

Mining Large Dynamic Graphs and Tensors

Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu)

slide-2
SLIDE 2

Thesis Committee

  • Prof. Christos Faloutsos (Chair)
  • Prof. Tom M. Mitchell
  • Prof. Leman Akoglu
  • Prof. Philip S. Yu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

2/106

slide-3
SLIDE 3

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

3/127

What Do Real Graphs Look Like?

  • Part 1 (Chapters 3 - 8)
slide-4
SLIDE 4

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

4/127

How to Spot Anomalies?

  • Part 2 (Chapters 9 - 13)
slide-5
SLIDE 5

How to Model Behavior?

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

5/127

  • Part 3 (Chapters 14 - 15)
slide-6
SLIDE 6

Graphs are Everywhere!

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

6/127

slide-7
SLIDE 7

Graphs are Large and Dynamic

  • Large: many nodes, more edges

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

7/127

2B+ active users 500M+ products

  • Dynamic: additions/deletions of nodes and edges

40B+ web pages 5M+ articles

slide-8
SLIDE 8

.. and with Rich Side Information

  • Rich: timestamps, scores, text, etc.

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

8/127

Poor Quality  Satisfied ☺

slide-9
SLIDE 9

Simple Graphs are Matrices

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

9/127

1 1 1 1 1

Graph Adjacency Matrix

slide-10
SLIDE 10

Rich Graphs are Tensors

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

10/127

1

  • Tensors: multi-dimensional array

3-order tensor (3-dimensional array) + Stars (4-order tensor) + Text (5-order tensor)

Satisfied ☺

slide-11
SLIDE 11

Thesis Goal and Focus

  • Goal:
  • Our Focus: To Develop Scalable Algorithms for

▪T1. Structure Analysis (Part 1) ▪T2. Anomaly Detection (Part 2) ▪T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

11/127

To Fully Understand and Utilize Large Dynamic Graphs and Tensors

  • n User Behavior
slide-12
SLIDE 12

Tasks and Their Relation

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

12/127

“How do they look like?”

  • Given large dynamic graphs or tensors,

“How to spot anomalies?” “How to model behavior?”

T1. Structure T2. Anomaly T3. Behavior

contrast

slide-13
SLIDE 13

Our Tools for Scalability

  • We design (sub) linear algorithms
  • Running on big data platforms
  • Exploiting empirical patterns in data
  • locality, power-laws, etc.

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

13/127

  • Approx. Sampling Streaming Out-of-core Parallel
slide-14
SLIDE 14

Organization of the Thesis

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

14/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15) Chapter Chapters

slide-15
SLIDE 15

Focuses of This Presentation

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

15/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15)

slide-16
SLIDE 16

Roadmap

  • T1. Structure Analysis (Part 1) <<
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • Future Directions
  • Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

16/127

slide-17
SLIDE 17
  • T1. Structure Analysis (Part 1)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

17/127

“Given a large graph (or tensor), how can we analyze its structure?

𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕

Structure measures

  • density
  • clustering coefficients
  • transitivity ratio
  • triangle connectivity

× 4 × 9 × 7

Basic statistics Input graph T1-1. Triangle Counting T1-2. Summarization

𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕

Summary graph

slide-18
SLIDE 18

T1-1. Triangle Counting (§§ 4-6)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

18/127

“Given a large dynamic graph, how can we track the count of triangles accurately with sub-linear memory?”

× 4 × 6 …

, + , , − , , +

Sources Destination

slide-19
SLIDE 19

T1-1. Triangle Counting (§§ 4-6)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

19/127 …

, + , , − , , +

Sources Destination

Created at 9:21 AM Created at 9:08 AM Created at 9:02 AM

How can we exploit temporal patterns? (§4) How can we make good use

  • f multiple machines? (§5)

How can we handle removed edges? (§6)

slide-20
SLIDE 20

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1.1 Triangle Counting

▪Handling Deletions (§6) <<

▪ …

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • Future Directions
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

20/127

  • K. Shin, J. Kim B. Hooi, C. Faloutsos, “Think before You Discard: Accurate

Triangle Counting in Graph Streams with Deletions”, ECML/PKDD 2018

slide-21
SLIDE 21

Triangles in a Graph

  • A triangle is 3 nodes connected to each other
  • The count of triangles is an important primitive
  • Applications:

▪community detection, spam detection, link prediction

  • Structure measures:

▪transitivity ratio, clustering coefficients, trussness

21/106 1 2 3 4 Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-22
SLIDE 22

Remaining Challenge

  • Counting triangles in real-world graphs
  • Large: not fitting in main memory
  • Fully dynamic: both growing and shrinking

22/127

Online social networks Citation networks Call networks Web

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-23
SLIDE 23

Previous Work

23/127

Large Graph Fully dynamic Graph Accurate Growing Shrinking

MASCOT [LJK18] Triest-IMPR [DERU17] WRS [Shi17] ESD [HS17] Triest-FD [DERU17]

  • Given: a large and fully-dynamic graph
  • To estimate: the count of triangles accurately

ThinkD (Proposed)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-24
SLIDE 24

Our Contribution: ThinkD

24/127

  • We develop ThinkD (Think before You Discard):

Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-25
SLIDE 25

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1.1 Triangle Counting

▪Handling Deletions (§6)

  • Problem Definition <<
  • Proposed Method: ThinkD
  • Experiments

▪ …

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)

25/127

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

slide-26
SLIDE 26

26/127

Fully Dynamic Graph Stream

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑏 𝑐

Time 𝑢 1 Change (given) +(𝑏, 𝑐) Graph (unmate

  • rialized)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-27
SLIDE 27

Fully Dynamic Graph Stream

27/127

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑑 𝑏 𝑐 𝑏 𝑐

Time 𝑢 1 2 Change (given) +(𝑏, 𝑐) +(𝑏, 𝑑) Graph (unmate

  • rialized)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-28
SLIDE 28

Fully Dynamic Graph Stream

28/127

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑑 𝑏 𝑐 𝑏 𝑐

Time 𝑢 1 2 3 Change (given) +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) Graph (unmate

  • rialized)

𝑑 𝑏 𝑐

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-29
SLIDE 29

Fully Dynamic Graph Stream

29/127

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑑 𝑏 𝑐 𝑏 𝑐

Time 𝑢 1 2 3 4 Change (given) +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) Graph (unmate

  • rialized)

𝑑 𝑏 𝑐 𝑑 𝑏 𝑐

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-30
SLIDE 30

Fully Dynamic Graph Stream

30/127

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑑 𝑏 𝑐 𝑏 𝑐

Time 𝑢 1 2 3 4 5 … Change (given) +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) +(𝑐, 𝑒)

Graph (unmate

  • rialized)

𝑑 𝑏 𝑐 𝑑 𝑏 𝑐 𝑑 𝑒 𝑏 𝑐

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-31
SLIDE 31

Fully Dynamic Graph Stream

31/127

  • Our model for a large and fully-dynamic graph
  • Discrete time 𝑢, starting from 1 and ever increasing
  • At each time 𝑢, a change in the input graph arrives
  • change: either an insertion or deletion of an edge

𝑑 𝑏 𝑐 𝑏 𝑐

Time 𝑢 1 2 3 4 5 … Change (given) +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) +(𝑐, 𝑒)

Graph (unmate

  • rialized)

𝑑 𝑏 𝑐 𝑑 𝑏 𝑐 𝑑 𝑒 𝑏 𝑐

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Not Materialized

slide-32
SLIDE 32

Problem Definition

32/127

  • Given:
  • a fully-dynamic graph stream (possibly infinite)
  • memory space (finite)
  • Estimate: the count of triangles
  • To Minimize: estimation error

Time 𝑢 1 2 3 4 5 … Changes +(𝑏, 𝑐) +(𝑏, 𝑑) +(𝑐, 𝑑) −(𝑏, 𝑐) +(𝑐, 𝑒) … # Triangles … Given (input) Estimate (output)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-33
SLIDE 33

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1.1 Triangle Counting

▪Handling Deletions (§6)

  • Problem Definition
  • Proposed Method: ThinkD <<
  • Experiments

▪ …

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • Future Directions
  • Conclusions

33/127 Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-34
SLIDE 34

Overview of ThinkD

  • Maintains and updates ෠

  • Number of (non-deleted) triangles that it has observed
  • How it processes an insertion:

34/127

  • arrive: an insertion of an edge arrives
  • count: count new triangles and increase ෠

  • test: toss a coin
  • store: store the edge in memory

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

store arrive No (keep?) Yes test count: increase ෠ ∆

slide-35
SLIDE 35

Overview of ThinkD (cont.)

35/127

  • Maintains and updates ෠

  • Number of (non-deleted) triangles that it has observed
  • How it processes an deletion:

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

delete arrive Yes

  • arrive: a deletion of an edge arrives
  • count: count deleted triangles and decrease ෠

  • test: test whether the edge is stored in memory
  • delete: delete the edge in memory

No test (stored?) count: decrease ෠ ∆

slide-36
SLIDE 36

Why is ThinkD Accurate?

36/125

  • ThinkD (Think before You Discard):
  • every arrived change is used to update ෡

  • Triest-FD [DERU17]:
  • some changes are discarded without being used to update ෡

store / delete test arrive count:

update ෠ ∆

Yes No (discard) store / delete arrive Yes No (discard) → information loss! count:

update ෠ ∆

test

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-37
SLIDE 37

Two Versions of ThinkD

37/127

  • ThinkD-FAST: simple and fast
  • independent Bernoulli trials with probability 𝑞
  • ThinkD-ACC: accurate and parameter-free
  • random pairing [GLH08]

Q1: How to test in the test step Q2: How to estimate the count of all triangles from ෠ ∆

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-38
SLIDE 38

Unbiasedness of ThinkD-FAST

∆ 𝒒𝟑: estimated count of all triangles

  • ∆: true count of all triangles

[ Theorem 1 ] At any time 𝒖,

  • Proof and a variance of ෠

∆/p2: see the thesis

38/127

𝔽

෠ ∆ 𝒒𝟑 = 𝜠

Unbiased estimate of 𝜠

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-39
SLIDE 39

ThinkD-ACC: More Accurate

  • Disadvantage of ThinkD-FAST:
  • setting the parameter 𝑞 is not trivial

▪small 𝑞 → underutilize memory → inaccurate estimation ▪large 𝑞 → out-of-memory error

  • ThinkD-ACC uses Random Pairing [RLH08]
  • always utilizes memory as fully as possible
  • gives more accurate estimation

39/127 Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-40
SLIDE 40

40/127

Scalability of ThinkD

  • Let 𝑙 be the size of memory
  • For processing 𝑢 changes in the input stream,

[ Theorem 2 ] The time complexity of ThinkD-ACC is [ Theorem 3 ] If 𝑞 = 𝑃

𝑙 𝑢 ,

the time complexity ThinkD-FAST is

𝑃(𝑙 ⋅ 𝑢) linear in data size 𝑃(𝑙 ⋅ 𝑢)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-41
SLIDE 41

41/127

Advantages of ThinkD

Fast & Accurate: outperforming competitors Scalable: linear data scalability (Theorems 2 & 3) Theoretically Sound: unbiased estimates (Theorem 1)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-42
SLIDE 42

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1.1 Triangle Counting

▪Handling Deletions (§6)

  • Problem Definition
  • Proposed Method: ThinkD
  • Experiments <<

▪ …

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

42/127

slide-43
SLIDE 43

Experimental Settings

  • Competitors: Triest-FD [DERU17] & ESD [HS17]
  • triangle counting in fully-dynamic graph streams
  • Implementations:
  • Datasets:
  • insertions (edges in graphs) + deletions (random 20%)

43/127

Web (6M+) Citation (16M+) Social Networks (1.8B+ edges, …) Synthetic (100B edges) Trust (0.7M+)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-44
SLIDE 44
  • EXP1. Variance Analysis

44/127

ThinkD is accurate with small variance

Triest-FD ThinkD-ACC ThinkD-FAST True Count

Number of Processed Changes

  • dataset:

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-45
SLIDE 45
  • EXP2. Scalability [THM 2 & 3]

45/127

ThinkD-ACC ThinkD-FAST

Number of Changes

  • dataset:

100 billion changes

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

ThinkD is scalable

slide-46
SLIDE 46
  • EXP3. Space & Accuracy

46/127

ThinkD outperforms its best competitors

ThinkD-FAST ThinkD

  • ACC

Memory budget (ratio) Estimation Error (ratio)

Triest-FD ESD

  • dataset:

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-47
SLIDE 47
  • EXP4. Speed & Accuracy

47/127

Running time (Sec) Estimation Error (ratio)

ThinkD

  • FAST

ThinkD-ACC ESD Triest-FD

  • dataset:

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

ThinkD outperforms its best competitors

slide-48
SLIDE 48

Advantages of ThinkD

48/127

Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-49
SLIDE 49

Summary of §6

  • We propose ThinkD (Think Before you Discard)
  • for accurate triangle counting
  • in large and fully-dynamic graphs

49/127

ThinkD

Download Fast & Accurate: outperforming competitors Scalable: linear data scalability Theoretically Sound: unbiased estimates

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-50
SLIDE 50

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

50/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15)

Organization of the Thesis (Recall)

slide-51
SLIDE 51

T1.2 Summarization

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

51/127

T1-2

𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕

“Given a web-scale graph or tensor, how can we succinctly represent it?” Input graph Summary graph

  • §7: Summarizing Graphs
  • §8: Summarizing Tensors (via Tucker Decomposition)
  • External-memory algorithm with 1,000× improved scalability
slide-52
SLIDE 52

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1-2. Summarization (§§ 7-8)

▪Summarizing Graphs (§ 7)

  • Problem Definition <<
  • Proposed Method: SWeG
  • Experiments

▪…

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

52/127

  • K. Shin, A. Ghoting, M. Kim, and H. Raghavan, “SWeG: Lossless and Lossy

Summarization of Web-Scale Graphs”, WWW 2019

slide-53
SLIDE 53

Graph Summarization: Example

53/127 − (𝑏, 𝑒)

𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕 𝑏, 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕 𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

Input Graph (w/ 9 edges) Output (w/ 6 edges)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-54
SLIDE 54

Graph Summarization [NRS08]

  • Given: an input graph
  • Find:
  • a summary graph
  • positive and negative residual graphs
  • To Minimize: the edge count (≈ description length)

54/127

Summary Graph Residual Graph (Positive) Residual Graph (Negative) 𝑏 𝑐 𝑒 𝑓 𝑔 𝑕 𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

Input Graph 𝑑

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-55
SLIDE 55

Restoration: Example

55/127

𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

𝑏, 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕

− (𝑏, 𝑒)

𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 𝑕 Restored Graph (w/ 9 edges) Summarized Graph (w/ 6 edges)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-56
SLIDE 56

Why Graph Summarization?

  • Summarization:
  • the summary graph is easy to visualize and interpret
  • Compression:
  • support efficient neighbor queries
  • applicable to lossy compression
  • combinable with other graph compression techniques

▪the outputs are also graphs

56/127

Summary Graph Residual Graph (Positive) Residual Graph (Negative) 𝑏, 𝑐 𝑑, 𝑒, 𝑓 𝑔, 𝑕

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

discussed in the thesis

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-57
SLIDE 57

Challenge: Scalability!

57/127

Maximum Size of Graphs Compression Performance Good Bad VoG [KKVF14] Greedy [NSR08]

millions 10 millions

Randomized [NSR08] SAGS [KNL15]

billions

10,000 ×

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

SWeG

slide-58
SLIDE 58

Our Contribution: SWeG

58/127

Fast with Concise Outputs Memory Efficient Scalable

  • We develop SWeG (Summarizing Web-scale Graphs):

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-59
SLIDE 59

Roadmap

  • T1. Structure Analysis (Part 1)
  • T1-2. Summarization (§§ 7-8)

▪Summarizing Graphs (§ 7)

  • Problem Definition
  • Proposed Method: SWeG <<
  • Experiments

▪…

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

59/127

slide-60
SLIDE 60

Terminologies

60/127

Summary Graph 𝑻 {𝑏, 𝑐} = 𝐵 𝑑, 𝑒, 𝑓 = 𝐶 𝑔, 𝑕 = 𝐷

− (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

super node Residual Graph 𝑺 Positive Residual Graph 𝑺+ Negative Residual Graph 𝑺− 𝑻𝒃𝒘𝒋𝒐𝒉 𝑩, 𝑪 : = 1 − 𝐷𝑝𝑡𝑢(𝐵 ∪ 𝐶) 𝐷𝑝𝑡𝑢 𝐵 + 𝐷𝑝𝑡𝑢(𝐶) Encoding cost when 𝐵 and 𝐶 are merged Encoding cost of 𝐵 Encoding cost of 𝐶

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-61
SLIDE 61

Overview of SWeG

  • Inputs: - input graph 𝑯
  • number of iterations 𝑼
  • Outputs: - summary graph 𝑻
  • residual graph 𝑺 (or 𝑺+ and 𝑺−)
  • Procedure:

61/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-62
SLIDE 62

Overview: Initializing Step

62/127

𝐵 = {𝑏} 𝐶 = {𝑐} 𝐸 = {𝑒} 𝐹 = {𝑓} 𝐺 = {𝑔} 𝐻 = {𝑕} 𝐷 = {𝑑}

Summary Graph 𝑻 = 𝑯 Residual Graph 𝑺 = ∅

  • S0: Initializing Step <<
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-63
SLIDE 63

Overview: Dividing Step

63/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step <<
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

𝐵 = {𝑏} 𝐶 = {𝑐} 𝐸 = {𝑒} 𝐷 = {𝑑} 𝐹 = {𝑓} 𝐺 = {𝑔} 𝐻 = {𝑕}

  • Divides super nodes into groups
  • MinHashing (used), EigenSpoke, Min-Cut, etc.

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-64
SLIDE 64

Overview: Merging Step

  • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄(𝑢)

64/127

𝐵 = {𝑏, 𝑐} 𝐸 = {𝑒, 𝑓} 𝐷 = {𝑑} 𝐺 = {𝑔, 𝑕}

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step <<
  • S2: Compressing Step (optional)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-65
SLIDE 65

65/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

𝐵 = {𝑏, 𝑐} 𝐸 = {𝑒, 𝑓} 𝐺 = {𝑔, 𝑕} 𝐷 = {𝑑}

Summary Graph 𝑻 Residual Graph 𝑺

− (𝑏, 𝑒) + 𝑒, 𝑕

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

slide-66
SLIDE 66

Overview: Dividing Step

66/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step <<
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

𝐵 = {𝑏, 𝑐} 𝐸 = {𝑒, 𝑓} 𝐷 = {𝑑} 𝐺 = {𝑔, 𝑕}

  • Divides super nodes into groups

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-67
SLIDE 67

67/127

𝐵 = {𝑏, 𝑐} 𝐷 = {𝑑, 𝑒, 𝑓} 𝐺 = {𝑔, 𝑕}

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step <<
  • S2: Compressing Step (optional)
  • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄(𝑢)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step

slide-68
SLIDE 68

68/127

𝐵 = {𝑏, 𝑐} 𝐶 = 𝑑, 𝑒, 𝑓 𝐺 = 𝑔, 𝑕 − (𝑏, 𝑒) − (𝑑, 𝑓) + 𝑒, 𝑕

Summary Graph 𝑻 Residual Graph 𝑺

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step
  • S2: Compressing Step (optional)

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

slide-69
SLIDE 69

69/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step <<
  • S2: Compressing Step (optional)
  • Merge some supernodes within each group if 𝑇𝑏𝑤𝑗𝑜𝑕 > 𝜄(𝑢)
  • Decreasing 𝜄(𝑢) = 1 + 𝑢 −1
  • exploration of other groups
  • exploitation within each group
  • ~ 30% better compression than 𝜄(𝑢) = 0

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

slide-70
SLIDE 70

Overview: Compressing Step

  • Compress each output graph (𝑻, 𝑺+ and 𝑺−)
  • Use any off-the-shelf graph-compression algorithm
  • Boldi-Vigna [BV04]
  • VNMiner [BC08]
  • Graph Bisection [DKKO+16]

70/127

  • S0: Initializing Step
  • repeat 𝑈 times
  • S1-1: Dividing Step
  • S1-2: Merging Step
  • S2: Compressing Step (optional) <<

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-71
SLIDE 71

No need to load the entire graph in memory!

  • Map stage: compute min hashes in parallel
  • Shuffle stage: divide super nodes using min hashes
  • Reduce stage: process groups independently in parallel

71/127

Parallel & Distributed Processing

𝐵 = {𝑏} 𝐶 = {𝑐} 𝐷 = {𝑑} MinHash = 𝟐

Merge!

𝐸 = {𝑒} 𝐹 = {𝑓} MinHash = 𝟑 𝐺 = {𝑔} 𝐻 = {𝑕} MinHash = 𝟒

Merge! Merge!

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-72
SLIDE 72

Roadmap

  • T1. Structure Analysis (Part 1)
  • ...
  • T1-2. Summarization (§§ 7-8)

▪Summarizing Graphs (§ 7)

  • Problem Definition
  • Proposed Method: SWeG
  • Experiments <<

▪…

  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

72/127

slide-73
SLIDE 73

Experimental Settings

73/127

  • 13 real-world graphs (10K - 20B edges)
  • Graph summarization algorithms:
  • Greedy [NRS08], Randomized [NSR08], SAGS [KNL15]
  • Implementations: &

Social Collaboration Citation Web … …

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-74
SLIDE 74
  • EXP1. Speed and Compression

74/127

SWeG outperforms its competitors

SWeG

  • dataset:

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-75
SLIDE 75

Advantages of SWeG (Recall)

75/127

Fast with Concise Outputs Memory Efficient Scalable

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-76
SLIDE 76

Memory Usage Input Graph

76/127

  • EXP2. Memory Efficiency

SWeG loads ≤0.1−4% of edges in main memory at once

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

294X

1209X

slide-77
SLIDE 77

Advantages of SWeG (Recall)

77/127

Fast with Concise Outputs Memory Efficient Scalable

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-78
SLIDE 78

78/127

About 20 iterations are enough

  • EXP3. Effect of Iterations

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-79
SLIDE 79
  • EXP4. Data Scalability

79/127

SWeG is linear in the number of edges

SWeG (Hadoop) SWeG (Single machine) ≥ 𝟑𝟏 billion edges

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-80
SLIDE 80
  • EXP5. Machine Scalability

80/127

SWeG (Hadoop) SWeG (Single machine)

SWeG scales up

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

slide-81
SLIDE 81

Advantages of SWeG (Recall)

81/127 Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Fast with Concise Outputs Memory Efficient Scalable

slide-82
SLIDE 82

Summary of §7

  • We propose SWeG (Summarizing Web Graphs)
  • for summarizing large-scale graphs

82/127

Fast with Concise Outputs Memory Efficient Scalable

Graph / Tensor Part 1 / Part 2 / Part 3 §4 / §5 / §6 / §7 SWeG (Hadoop) SWeG

slide-83
SLIDE 83

Triangle counting algorithms [ICDM17, PKDD18, PAKDD18] Summarization algorithms [WSDM17, WWW19] Patent on SWeG: filed by LinkedIn Inc. Open-source software: downloaded 82 times

83/127

Contributions and Impact (Part 1)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

github.com /kijungs

SWeG ThinkD

slide-84
SLIDE 84

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

84/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15)

Organization of the Thesis (Recall)

slide-85
SLIDE 85

85/127

  • T2. Anomaly Detection (Part 2)

“How can we detect anomalies or fraudsters in large dynamic graphs (or tensors)?” Hint: fraudsters tend to form dense subgraphs

??

benign dense subgraphs

Graph / Tensor Part 1 / Part 2 / Part 3

slide-86
SLIDE 86

T2-1. Utilizing Patterns

  • T2-1. Patterns and Anomalies in Dense Subgraphs (§ 9)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

86/127

“What are patterns in dense subgraphs?” “What are anomalies deviating from the patterns?”

  • K. Shin, T. Eliassi-Rad, C. Faloutsos, “Patterns and Anomalies in k-Cores of

Real-world Graphs with Applications”, KAIS 2018 (formerly, ICDM 2016)

slide-87
SLIDE 87

T2-2. Utilizing Side Information

87/127

Items Accounts

Graph / Tensor Part 1 / Part 2 / Part 3 §11 / §12 / §13 / §14

slide-88
SLIDE 88

88/127

“How can we detect dense subtensors in large dynamic data?”

  • T2-2. Detecting Dense Subtensors (§§ 11-13)
  • In-memory Algorithm (§ 11)
  • Distribute Algorithm for Web-scale Tensors (§ 12)
  • Incremental Algorithms for Dynamic Tensors (§ 13)

T2-2. Utilizing Side Information

Graph / Tensor Part 1 / Part 2 / Part 3 §11 / §12 / §13 / §14

slide-89
SLIDE 89

Contributions and Impact (Part 2)

Patterns in dense subgraphs [ICDM16]

  • Award: best paper candidate at ICDM 2016
  • Class:

Algorithms for dense subtensors [PKDD16, WSDM17, KDD17]

  • Real-world usage:

Open-source software: downloaded 257 times

89/127

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

github.com /kijungs

slide-90
SLIDE 90

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

90/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15)

Organization of the Thesis (Recall)

slide-91
SLIDE 91

“How can we model the behavior of individuals in graph and tensor data?”

91/127

  • T3. Behavior Modeling (Part 3)

Social Network Behavior Log on Social Media

  • T3-1. Modeling Purchase Behavior in a Social Network (§14)
  • T3-2. Modeling Progression of Users of Social Media (§15)

“How do users evolve over time on social media?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-92
SLIDE 92

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • T3-1. Modeling Purchases (§14) <<
  • Future Directions
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

92/127

K Shin, E Lee, D Eswaran, AD Procaccia, “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017

slide-93
SLIDE 93

Sharable Goods: Question

93/127

“What do they have in common?”

Portable crib IKEA toolkit DVDs

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-94
SLIDE 94

Sharable Goods: Properties

  • Used occasionally
  • Share with friends
  • Do not share with strangers

94/127 Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-95
SLIDE 95

Motivation: Social Inefficiency

95/127

Efficiency

  • f Purchase

Likelihood

  • f Purchase

High (share with many) Low (share with few) can be Low (likely to borrow) can be High (likely to buy) Popular Lonely

Q1 “How large can social inefficiency be?” Q2 “How to nudge people towards efficiency?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-96
SLIDE 96

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • T3-1. Modeling Purchases (§14)

▪Toy Example << ▪Game-theoretic Model ▪Best Rental-fee Search

  • Future Directions
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

96/127

slide-97
SLIDE 97

Social Network

  • Consider a social network, which is a graph
  • Nodes: people
  • Edges: friendship

97/127

Alice Carol Bob

“How many people should buy an IKEA toolkit for everyone to use it?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-98
SLIDE 98

Socially Optimal Decision

  • The answer is at least 𝟑
  • Socially optimal:
  • everyone uses a toolkit
  • with minimum purchases (or with minimum cost)

98/127

Bob

“Does everyone want to stick to their current decisions?”

Alice

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-99
SLIDE 99

Individually Optimal Decision

  • The answer is No
  • Individually optimal:
  • everyone best responses to others’ decisions
  • Socially inefficient (suboptimal):
  • 4 purchases happen when 2 are optimal

99/127

Alice Bob

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-100
SLIDE 100

Social Inefficiency

  • Individually optimal outcome with 6 purchases

100/127

Dan Carol

“How can we prevent this social inefficiency?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-101
SLIDE 101

Moving toward Social Optimum

101/127

“How can we make people stick with this socially optimal outcome?”

  • Recall the socially optimal outcome

Bob Alice

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-102
SLIDE 102

Imposing Rental Fee

  • Renters pay rental fee for getting permanent access
  • Rental fee is half the price of a toolkit

102/127

Bob Alice

“Does everyone want to stick to their current decisions?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-103
SLIDE 103

Socially & Individually Optimal

  • The answer is Yes
  • Alice & Bob: are paid more than the price
  • The others: renting is cheaper than buying
  • Individually optimal
  • Socially optimal with minimum (2) purchases

103/127

Bob Alice

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-104
SLIDE 104

Socially & Individually Optimal

  • The answer is Yes
  • Alice & Bob: are paid more than the price
  • The others: renting is cheaper than buying

104/127

Bob Alice

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

Q1 “How do rental fees affect social inefficiency?” Q2 “What are the ‘socially optimal’ rental fees?”

slide-105
SLIDE 105

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • T3-1. Modeling Purchases (§14)

▪Toy Example ▪Game-theoretic Model << ▪Best Rental-fee Algorithm

  • Future Directions
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

105/127

slide-106
SLIDE 106

Formal Game-theoretic Model

  • Players: nodes in a social network
  • Strategies:
  • buy a good / rent a good from a friend with a good
  • Nash Equilibrium (NE): individually optimal outcome
  • Social Optimum: socially optimal outcome
  • Inefficiency of an NE:

106/127

# purchases in a social optimum # purchases in the NE

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-107
SLIDE 107

Inefficiency without Rental Fee

  • [THM 1] Existence of NEs
  • In every social network, there exists an NE.
  • [THM 2] Inefficiency without Rental Fee
  • There exists a social network with 𝒐 nodes
  • where all NEs have 𝜤(𝒐) inefficiency.

107/127

… …

𝑜 − 1 2

Input graph

… …

Social optimum (𝟑 owners) Best NE (𝒐/𝟑 owners)

… …

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-108
SLIDE 108

Inefficiency with Rental Fee

  • [THM 3] Inefficiency with Rental Fee
  • In every social network,
  • if

𝑞𝑠𝑗𝑑𝑓 3

< 𝑠𝑓𝑜𝑢𝑏𝑚 𝑔𝑓𝑓 < 𝑞𝑠𝑗𝑑𝑓,

  • then, there exists a socially optimal NE,
  • otherwise …

108/127

… …

𝑜 − 1 2

Input graph

… …

Social optimum (𝟑 owners) → NE with proper rental fee

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-109
SLIDE 109

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • T3-1. Modeling Purchases (§14) <<

▪Toy Example ▪Game-theoretic Model ▪Best Rental-fee Search <<

  • Future Directions
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

109/127

slide-110
SLIDE 110

Finding Best Rental Fee

  • Given:
  • a social network
  • a sharable good with price 𝑞
  • Find: a range of rental fees
  • To Minimize: inefficiencies of NEs

110/127

,

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-111
SLIDE 111

Searching NEs (SGG-Nash)

  • Linear-time algorithm for searching NEs
  • Gives different NEs depending on initial strategies

111/127

randomly initialize strategies repeat until an NE is reached

  • for each node

▪optimize its strategy while fixing the others’

[THM 4] Convergence In every social network, an NE is reached within 𝟒 iterations.

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-112
SLIDE 112

Scalability of SGG-Nash

  • SGG-Nash is linear in the number of edges

112/127 Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

  • Dataset:
slide-113
SLIDE 113

Best Rental Fee in Real Graphs

  • Datasets:
  • Inefficiency is minimized consistently when

113/127

1.5 2.0 2.5 3.0 0.0 0.5 1.0

Average Inefficiency

Rental Fee (Relative to Price)

2 3 4 0.0 0.5 1.0

Average Inefficiency

Rental Fee (Relative to Price)

𝑞𝑠𝑗𝑑𝑓/3 < 𝑠𝑓𝑜𝑢𝑏𝑚 𝑔𝑓𝑓 < 𝑞𝑠𝑗𝑑𝑓/2

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

Theoretically best [THM3] Theoretically best [THM3]

slide-114
SLIDE 114

1.5 2.0 2.5 3.0 0.0 0.5 1.0

Average Inefficiency

Summary of §14

  • Game-theoretic Model: Sharable good game
  • Theoretical Analysis:
  • Existence of NEs
  • Inefficiency of NEs
  • Algorithm: for linear-time NE search
  • Suggestion: “socially optimal” rental fees

114/127

Rental Fee (Relative to Price)

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-115
SLIDE 115

Tools for purchase modeling [IJCAI17]

  • Suggest ‘socially optimal’ rental fees
  • Media:

Tools for progression modeling [WWW18]

  • Scale to datasets with a trillion records
  • Real-world usage:

115/127

Contributions and Impact (Part 3)

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

slide-116
SLIDE 116

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

116/127

Part1. Structure Analysis Part2. Anomaly Detection Part3. Behavior Modeling Graphs Triangle Count

(§§ 3-6)

Anomalous Subgraph

(§ 9)

Purchase Behavior

(§ 14)

Summarization

(§ 7)

Tensors Summarization

(§ 8)

Dense Subtensors

(§§ 10-13)

Progression

(§ 15)

Organization of the Thesis (Recall)

slide-117
SLIDE 117

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • Future Directions <<
  • Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

117/127

slide-118
SLIDE 118

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

118/127

Model & Theory Scalable Algorithms for Big Data Mining

D3

Vision: Algorithms for “Big Data”

slide-119
SLIDE 119

D1: Distributed Graph Stream Processing

“How to analyze large dynamic graphs

  • n a cluster of machines?”

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

119/127

Current

Sources Destination

Short Term More Graph Mining Tasks Mid Term Programming Model (Generalization) Long Term System / Platform Triangle Counting (§5)

slide-120
SLIDE 120

D2: Detecting Adversarial Anomalies

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

120/127

Anomalies / Fraudsters Detection System improve avoid

Profits of Anomalies Cost to Avoid Algorithms Algorithms Costly to Avoid Algorithms for Static Anomalies Current Short Term Mid Term Long Term

slide-121
SLIDE 121

D3: Co-Evolution of Beliefs and Graphs

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

121/127

“How to model the co-evolution of nodes’ beliefs and edges?”

Change of Beliefs

OR

Change of Edges

Game Theory / Nash Equilibrium Prediction Algorithms Reducing Polarization Regression Model [PKDD18] Current Short Term Mid Term Long Term

slide-122
SLIDE 122

Roadmap

  • T1. Structure Analysis (Part 1)
  • T2. Anomaly Detection (Part 2)
  • T3. Behavior Modeling (Part 3)
  • T3-1. Modeling Purchases (§14) <<
  • T3-2. Modeling Progression (§ 15) (Skip)
  • Future Directions
  • Conclusions <<

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

122/127

slide-123
SLIDE 123

Conclusion

  • Goal: “To fully understand and utilize

…………. large dynamic graphs and tensors”

  • Contributions: developing scalable algorithms for
  • Impact:

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

123/127

  • T1. Structure

Analysis (Part 1)

  • T2. Anomaly

Detection (Part 2)

  • T3. Behavior

Modeling (Part 3)

1 2 3 4 github.com /kijungs

slide-124
SLIDE 124

[1] K Shin, B Hooi, and C Faloutsos, “M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees”, ECML/PKDD 2016 (§11) [2] K Shin, T Eliassi-Rad, and C Faloutsos, “CoreScope: Graph Mining Using k- Core Analysis - Patterns, Anomalies and Algorithms”, ICDM 2016 (§9) [3] K Shin, “Mining Large Dynamic Graphs and Tensors for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017 (§4) [4] K Shin, B Hooi, J Kim, and C Faloutsos, “D-Cube: Dense-Block Detection in Terabyte-Scale Tensors”, WSDM 2017 (§12) [5] K Shin, E Lee, D Eswaran, and AD. Procaccia, “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017 (§14) [6] K Shin, B Hooi, J Kim, and C Faloutsos, “DenseAlert: Incremental Dense- Subtensor Detection in Tensor Streams”, KDD 2017 (§13) [7] J Oh, K Shin, EE Papalexakis, C Faloutsos, and Hwanjo Yu, “S-HOT: Scalable High-Order Tucker Decomposition”, WSDM 2017 (§8)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

124/127

References

slide-125
SLIDE 125

References (cont.)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

125/127

[8] K Shin, B Hooi, and C Faloutsos, “Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining”, TKDD 2018 (§11) [9] K Shin, M Shafiei, M Kim, A Jain, and H Raghavan, “Discovering Progression Stages in Trillion-Scale Behavior Logs” WWW 2018 (§14) [10] K Shin, T Eliassi-Rad, and C Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with Applications”, KAIS 2018 (§9) [11] K Shin, M Hammoud, E Lee, J Oh, and C Faloutsos. “Tri-fly: Distributed estimation of global and local triangle counts in graph streams” PAKDD 2018 (§5) [12] K Shin, J Kim, B Hooi, and C Faloutsos, “Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions”, ECML/PKDD 2018 (§6) [13] K Shin, A Ghoting, M Kim and H Raghavan, “SWeG: Lossless and Lossy Summarization of Web-Scale Graphs”, WWW 2019 (§7)

slide-126
SLIDE 126

Thank You!

  • Sponsors:
  • Admins:
  • Collaborators:

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

126/127

slide-127
SLIDE 127

Thank You!

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

127/127

  • Homepage (Software & Datasets):

http://www.cs.cmu.edu/~kijungs/defense/

ThinkD

  • ACC