A foray into graph mining Neil Shah April 15 th , 2019 (Graph) data - - PowerPoint PPT Presentation

a foray into graph mining
SMART_READER_LITE
LIVE PREVIEW

A foray into graph mining Neil Shah April 15 th , 2019 (Graph) data - - PowerPoint PPT Presentation

A foray into graph mining Neil Shah April 15 th , 2019 (Graph) data is prevalent 2.5 exabytes of data produced every day 90% generated in the last 2 years Data is produced as the product of a highly interconnected world 244 million


slide-1
SLIDE 1

A foray into graph mining

Neil Shah April 15th, 2019

slide-2
SLIDE 2

(Graph) data is prevalent

  • 2.5 exabytes of data produced every day
  • 90% generated in the last 2 years
  • Data is produced as the product of a highly interconnected world

1.3 billion users 1 billion daily mobile views 244 million users 480 million products 187 million daily actives 3.5 billion daily snaps

slide-3
SLIDE 3

(Graph) data shapes perspectives

M

  • v

i e r e c

  • m

m e n d a t i

  • n

S e a r c h e n g i n e r a n k i n g P r

  • d

u c t p u r c h a s i n g S

  • c

i a l p l a t f

  • r

m i n t e r a c t i

  • n
slide-4
SLIDE 4

What’s in a graph?

  • Graphs consist of nodes, edges and attributes
  • ex: Facebook social network where
  • nodes = individuals
  • edges = friendship
  • attributes = gender (node), # of messages exchanged (edge)
  • Graphs can easily model relationships between entities
  • Who-follows-whom on a social network
  • Who-buys-what on an e-commerce platform
  • Who-calls-whom using a certain cellular provider
slide-5
SLIDE 5

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and ranking
  • Clustering
  • Anomaly detection
  • Takeaways
slide-6
SLIDE 6

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and ranking
  • Clustering
  • Anomaly detection
  • Takeaways
slide-7
SLIDE 7

Graph preliminaries – directionality

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

slide-8
SLIDE 8

Graph preliminaries – degree

  • Degree: # of adjacent edges
  • Degree(u7) = 2

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

slide-9
SLIDE 9

Graph preliminaries – out- and in-degree

  • Degree: # of adjacent edges
  • Out-degree: # outgoing edges
  • In-degree: # incoming edges
  • Out-degree(u4) = 1
  • In-degree(u6) = 2

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

slide-10
SLIDE 10

Graph preliminaries – weighted degree

  • Weighted degree: total sum of

adjacent edge weights

  • i.e. “how many times did two users

communicate”

  • Weighted-degree(u6) = 7

3 4 1 2 9 1 u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

6

slide-11
SLIDE 11

Graph preliminaries – ego(net)

  • Ego: single, central node
  • Ego network (egonet): nodes and

edges within one “hop” from ego

  • Egonet(u7) =
  • Nodes {u7, u3, u5}
  • Edges {u7-u3, u7-u5}

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

slide-12
SLIDE 12

Graph preliminaries – connectivity

  • Two nodes are connected if there is a

path between them.

  • A graph is fully connected if all node

pairs are connected.

  • u1 and u8 are connected
  • u3 and u5 are connected
  • u1 and u9 are not connected
  • This graph is not fully connected

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

slide-13
SLIDE 13

Graph preliminaries – node and edge types

  • A heterogeneous graph has multiple

node and/or edge types.

  • Users and products
  • Who-buys-what and who-rates-what

u1

u2 u3 u4 u5 u6

p1 p2 p3 p4 p5

Users Products Users-by-products

slide-14
SLIDE 14

Graph preliminaries – matrix representation

  • Graph connectivity can be summarized in an adjacency matrix.
  • Ai,j = # (or weight) of edges from node i to j
  • A usually very sparse (makes compact representations possible!)

u1 u2 u3 u4 u5 u6

Users-by-users

u7 u8 u9 u10 u11

1 1 1 1 1 1 1 1 1

Users Users

slide-15
SLIDE 15

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and ranking
  • Clustering
  • Anomaly detection
  • Takeaways
slide-16
SLIDE 16

Key question: What does a graph “look like”?

  • At first look… large, unwieldy and

seemingly random.

  • Spoiler: In actuality, most real-

world graphs are far from random.

Lyon ’03 Trace-route paths

  • n the internet
slide-17
SLIDE 17

A quick detour: “Random” graphs

  • Erdos-Renyi random graph model: graph G(n,p)
  • n = number of nodes
  • p = probability of an edge between two nodes (independent edges)
  • Expected # of edges:
  • Degree distribution:

(binom.)

Babaoglu’ 18

slide-18
SLIDE 18

What about real graphs?

  • X-axis: degree, Y-axis: frequency/probability
  • Degree distributions of real graphs are not “random”
  • What exactly are they, then?

Log(# posts) vs. log(# users) log(# visitors) vs. log(# sites) log(# peers) vs. log(# routers)

Faloutsos ‘99 Viswanath ‘09 Adamic ‘02

slide-19
SLIDE 19

The “scale-free” property

  • Real-world graphs are often scale-free, meaning that their degree

distribution obeys a power-law:

  • Scaling the input by a multiple simply results in

proportional scaling of the whole function

  • Power laws are linear in log-log scales
  • Typical 2 ≤ # ≤ 3

log(# visitors) vs. log(# sites)

slide-20
SLIDE 20

Scale-freeness is evident in many domains

Newman ‘05

slide-21
SLIDE 21

Why are many real graphs scale-free?

  • Hypothesis: preferential attachment, or a “rich-get-richer” effect
  • Generative process to construct a network:
  • Start with !" nodes, each with at least 1 edge
  • At each timestep, add a new node with ! edges

connecting it to ! already existing nodes

  • Probability of new node to connect to node # depends on

the degree $% as

  • Many real-world variants of this effect:

academic citations, recommendation, virality

log(# visitors) vs. log(# sites)

slide-22
SLIDE 22

Real graphs have “small-world” effects

  • How “far apart” are nodes in real graphs?
  • Interestingly, not very far! The typical number is 6. You may have heard of

the “six degrees of separation”

  • Milgram ‘69: avg. # of hops for a letter to travel from Nebraska to

Boston was 6.2 (sample size 64)

  • Leskovec ‘08: avg. distance between node pairs on MSN messenger

has mode 6 (sample size 180M nodes and 1.3B edges)

slide-23
SLIDE 23

What causes the small-world effect?

  • Hypothesis: The abundance of hubs, or high-degree nodes
  • Even though most nodes aren’t connected to most other nodes, they are

connected to hubs, which facilitate paths

log(# visitors) vs. log(# sites)

slide-24
SLIDE 24

How do real graphs “grow” over time?

  • Consider a time-evolving graph !
  • If it has "($) nodes and &($) edges at time t…
  • Suppose that "($ + 1) = 2"($)
  • What is &($ + 1)?
  • Not only is it > 2& $ ; the growth is actually superlinear and follows

& $ ∝ " $ . (power law!) with 1 ≤ 0 ≤ 2, generally

slide-25
SLIDE 25

Real graphs exhibit densification

  • Avg. out-degree increases over time

Power-law in # edges vs. # nodes (over time)

slide-26
SLIDE 26

Moreover, the graph diameter shrinks

  • Graph diameter = max(distance

between node pairs)

  • Leskovec ‘05 shows that diameter

actually shrinks over time, instead

  • f growing. In other words, nodes

tend to get closer

  • Hypothesis: Once again due to

prevalence and growth of hubs

slide-27
SLIDE 27

Much more work done on graph behaviors

  • Generative graph models (Leskovec ‘05)
  • Patterns in sizes of connected components (Kang ‘10)
  • Node in-degree (popularity) over time (McGlohon ‘07)
  • Duration of calls in phone-call networks (Vaz de Melo ‘10)
  • Temporal structure evolution (Shah ‘15)

… the list goes on

slide-28
SLIDE 28

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and ranking
  • Clustering
  • Anomaly detection
  • Takeaways
slide-29
SLIDE 29

Key question: how can we leverage graphs for recommendation/ranking tasks?

  • Measuring webpage importance
  • Link prediction and recommendation
  • Local methods
  • Global methods
slide-30
SLIDE 30

PageRank for large-scale search engines

  • Key problem: how to prioritize/curate a large (ever-growing)

hyperlinked body of pages by importance and relevance?

  • Key idea: leverage the hyperlink citation graph (page-links-page) to

rank page importance according to connectivity patterns

  • 150 million web pages à 1.7 billion links

Backlinks and Forward links:

ØA and B are C’s backlinks ØC is A and B’s forward link

Content adapted from Li ‘09

slide-31
SLIDE 31

Simplified PageRank

  • !: a web page
  • "!: the set of u’s backlinks
  • #$: the number of forward links of page v
  • %: the normalization factor to make & a

probability distribution

  • Simplified PageRank is the stationary probability dist. of a random-

walk on the graph; a surfer keeps clicking successive pages at random.

Idea: each page equally distributes its own PageRank to its forward-links recursively. “An important page has many important pages pointing to it”

slide-32
SLIDE 32

Simplified PageRank

PageRank Calculation: first iteration

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores Read as “Amazon gives ½ of its own PageRank to Yahoo and Microsoft each”

slide-33
SLIDE 33

Simplified PageRank

PageRank Calculation: second iteration

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores Read as “Amazon gives ½ of its own PageRank to Yahoo and Microsoft each”

slide-34
SLIDE 34

Simplified PageRank

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores

Convergence after some iterations

Read as “Amazon gives ½ of its own PageRank to Yahoo and Microsoft each”

slide-35
SLIDE 35

Problem with Simplified PageRank

A loop:

During each iteration, the loop accumulates rank but never distributes rank to other pages!

slide-36
SLIDE 36

The problem in practice

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores Read as “Microsoft gives all of its PageRank to Microsoft”

slide-37
SLIDE 37

The problem in practice

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores Read as “Microsoft gives all of its PageRank to Microsoft”

slide-38
SLIDE 38

The problem in practice

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores Read as “Microsoft gives all of its PageRank to Microsoft” All roads lead to Microsoft

slide-39
SLIDE 39

A modified solution: (true) PageRank

  • This subtle modification solves the problem of “sinks”
  • PageRanks converge to the dominant eigenvector of the appropriately

configured/normalized adjacency matrix, due to Markov chain theory! Cool!

  • Modified PageRank is the same as the simple model, with the exception of

the surfer having a random jump probability.

!(#): a distribution of ranks of web pages that the surfer can jump to when he/she “gets bored” after clicking on successive links.

slide-40
SLIDE 40

A modified solution: PageRank

Adjacency matrix transposed and column-normalized (accounts for equal neighbor distribution) Yahoo Amzn MS Initial PageRank scores 20% random jump probability

slide-41
SLIDE 41

PageRank converges quickly and produces empirically good results

  • PR (322 Million Links): 52 iterations
  • PR (161 Million Links): 45 iterations
  • Scaling factor is roughly linear in logn
slide-42
SLIDE 42

Key question: how can we leverage graphs for recommendation/ranking tasks?

  • Measuring webpage importance
  • Link prediction and recommendation
  • Local methods
  • Global methods
slide-43
SLIDE 43

Exploiting local structure for predicting links

  • Key problem: given what we know about interactions in a graph G,

what nodes should we recommend a user u to promote engagement?

  • Key idea: measure affiliation between u and other nodes by u’s graph

neighborhood!

slide-44
SLIDE 44

Rich literature in previous measures

Liben-Nowell ‘04

Users-by-users

slide-45
SLIDE 45

Key question: how can we leverage graphs for recommendation/ranking tasks?

  • Measuring webpage importance
  • Link prediction and recommendation
  • Local methods
  • Global methods
slide-46
SLIDE 46

Exploiting global structure for predicting links

  • Key problem: given what we know about interactions in a graph G,

what nodes should we recommend a user u to promote engagement?

  • Key idea: measure affiliation between u and other nodes via a latent

factor model/embedding that compactly encodes “interests”

slide-47
SLIDE 47

Singular value decomposition

  • Used for low-rank matrix approximation
  • Rank k SVD reduces matrix A into k latent factors/dense

blocks/communities

  • U and V capture “involvement” of nodes
  • ! denotes factor “strength”

A " ! #$

%×' %×( (×( (×'

)*)+ ),

~

x x ()*≥ )+ ≥ … ),)

slide-48
SLIDE 48

Singular value decomposition

  • Used for low-rank matrix approximation
  • Rank k SVD reduces matrix A into k latent factors/dense

blocks/communities

  • U and V capture “involvement” of nodes
  • ! denotes factor “strength”

n users m videos

~

“music lovers” “artist spotlights” “adrenaline junkies” “action movies” “dabbling cooks” “baking shows”

"# "$ "% &$ &# &%

+ + …

'# '$ '%

slide-49
SLIDE 49

Recommendation from latent factors

  • SVD effectively constructs vector embeddings in a k-dimensional space

which “summarize” user/item affinities towards latent factors

  • Compute vector similarity between user-user or user-item vectors

(depending on application)

  • Cosine similarity/dot product are common choices

Koren ‘09

slide-50
SLIDE 50

Recommendation from latent factors

  • SVD effectively constructs vector embeddings in a k-dimensional

space which “summarize” user/item affinities towards latent factors

  • Compute vector similarity between user-user or user-item vectors

(depending on application)

  • Cosine similarity/dot product are common choices

Koren ‘09

slide-51
SLIDE 51

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and prediction
  • Clustering
  • Anomaly detection
  • Takeaways
slide-52
SLIDE 52

Graph clustering for knowledge extraction

  • Key problem: what can we learn about group dynamics from graph

interactions? Are there natural “clusters” of behaviors?

  • Key idea: tightly-knit graph interactions form graph clusters, which

can indicate community behaviors. These are useful for

  • Behavioral understanding
  • Computational load balancing
  • Graph compression
  • Visualization

advertiser query

slide-53
SLIDE 53

Finding graph clusters

  • Given a graph G, we want to find clusters
  • Need to:
  • Formalize the notion of a cluster
  • Need to design an algorithm that will find sets of nodes that are good clusters

Content adapted from Leskovec ‘10

slide-54
SLIDE 54

Clustering objective functions

  • Essentially all objectives use the intuition that a good cluster S has
  • Many edges internally
  • Few edges pointing outside
  • Simplest objective function:
  • Conductance
  • Small conductance corresponds to good clusters
  • There are many other formalizations of roughly this intuition
  • Graph objectives are generally hard to optimize directly. Greedy/approximate

algorithms are common

slide-55
SLIDE 55

Clustering objective functions

  • Single-criterion (considers either internal or external)
  • Modularity: m-E(m)
  • Modularity Ratio: m-E(m)
  • Volume: åu d(u)=2m+c
  • Edges cut: c
  • Multi-criterion (considers both)
  • Conductance: c/(2m+c)
  • Expansion: c/n
  • Density: 1-m/n2
  • CutRatio: c/n(N-n)
  • Normalized Cut: c/(2m+c) + c/2(M-m)+c
  • Max ODF: max frac. of edges of a node pointing outside S
  • Average-ODF: avg. frac. of edges of a node pointing outside
  • Flake-ODF: frac. of nodes with mode than ½ edges inside

S

n: nodes in S m: edges in S c: edges pointing

  • utside S
slide-56
SLIDE 56

Multiple types of clustering algorithms

  • Global spectral
  • Compute graph Laplacian matrix L = D – A
  • Find 2nd smallest eigenvector of L
  • Split by sign to get a partitioning of nodes (related

to graph “cut”)

  • Recurse to get more clusters
  • Local spectral
  • Pick random seed node
  • Build local clusters around seed nodes based on random walk/PageRank
  • Prune cluster from graph and repeat
slide-57
SLIDE 57

Flow-based algorithms

  • METIS: multi-level graph partitioning
  • If it’s too expensive to partition a big graph… coarsen it into a smaller graph
  • If it’s still to big, keep coarsening
  • Compute a partition and

uncoarsen the graph

  • Improve heuristically
  • Swap vertices
  • Local search
slide-58
SLIDE 58

Measuring clustering algorithm performance

  • How to quantify performance:
  • What is the score of clusters across a range of sizes?
  • Network Community Profile (NCP) (Leskovec ‘08)
  • The score of the best cluster of size k
slide-59
SLIDE 59

NCPs for a real graph (LiveJournal)

  • 500 node comms. from Local Spectral
  • 500 node comms. from METIS

Interestingly, Local Spectral clusters are more compact and tighter, despite having higher (worse) conductance than METIS!

slide-60
SLIDE 60

NCPs for various objectives (Local Spectral)

  • Multiple objectives can be pretty

similar

  • Conductance
  • Expansion
  • Normalized Cut
  • Cut-ratio
  • Avg-ODF
  • Max-ODF prefers small clusters, Flake-

ODF prefers large clusters

  • Internal density not very good (large

clusters are very sparse)

slide-61
SLIDE 61

You should know…

  • Many types of clustering objectives and algorithms -- can use NCP to

analyze them

  • Not many “good” large clusters – real graphs are complicated!
  • Different types bias for various aspects (cluster size, internal and

external connectivity)

  • Overemphasis on clustering objectives can actually lead to “bad”

looking clusters according to human intuition

slide-62
SLIDE 62

Roadmap

  • Preliminaries
  • Notable graph properties
  • Cool applications
  • Recommendation and prediction
  • Clustering
  • Anomaly detection
  • Takeaways
slide-63
SLIDE 63

Graph-based anomaly detection

  • Key problem: what kinds of anomalous behaviors exist in real graphs,

and can we find such anomalies automatically?

  • Key idea: we can identify various types of “anomalous” behaviors by

building null/normal models and penalizing excessive deviation

  • Node-based anomalies
  • Group anomalies (too large, too dense to be

a real community)

slide-64
SLIDE 64

Anomalies in graphs: important applications

  • Email networks
  • Spammers
  • Computer networks
  • Hackers/port scanning
  • Phone-call networks
  • Telemarketers
  • Social networks
  • Fake engagement
slide-65
SLIDE 65

Major goal

  • How to go from a graph to a quantitative model/pattern?
slide-66
SLIDE 66

Local, egonet-based anomaly detection

  • What does a typical node look like?
  • Can’t say much about just a node in isolation
  • Let’s consider the egonets!
  • For each node,
  • extract egonet (=1-step-away neighbors)
  • extract features (#edges, total weight, etc.)
  • extract patterns (norms)
  • compare with the rest of the population (detect anomalies)

Users-by-users

Content adapted from Akoglu ‘10

slide-67
SLIDE 67

What is anomalous?

  • Not obvious!
slide-68
SLIDE 68

What is anomalous?

Near-star Near-clique

telemarketer, port scanner, people adding friends indiscriminatively, etc. tightly connected people, terrorist groups?, discussion group, etc.

Heavy vicinity

too much money wrt number

  • f accounts, high donation

wrt number of donors, etc. single-minded, tight company

Dominant heavy link

slide-69
SLIDE 69

Basic features to study

  • Ni : number of neighbors (degree) of ego i
  • Ei : number of edges in egonet I
  • Wi : total weight of egonet I
  • λw,I : 1st eigenvalue of the weighted adjacency matrix of egonet i
slide-70
SLIDE 70
  • Obs. 1: Egonet Density Power Law

Ei ∝ Ni

α

1 ≤ α ≤ 2

Differentiates “dense” from “sparse” neighborhoods

slide-71
SLIDE 71
  • Obs. 2: Egonet Weight Power Law

Wi ∝ Ei

β

β ≥ 1

Differentiates “heavy” from “light” neighborhoods

slide-72
SLIDE 72
  • Obs. 3: Egonet !"Weight Power Law

λw,i∝ Wi

γ

0.5 ≤ γ ≤ 1

Differentiates “uniform” distribution from “dominant” heavy edges

slide-73
SLIDE 73

Scoring node anomalies

violates our “laws” far away from most points Anomaly ≈

scoredist = distance to fitting line scoreoutl = outlierness score score = func( scoredist , scoreoutl )

slide-74
SLIDE 74

Triaging anomalies

ü can interpret the type of anomaly ü can sort nodes wrt their outlierness scores

slide-75
SLIDE 75

Interesting results: Blog post-to-post graph

Part of a group of posts who all link to each other Post linking to many other posts indiscriminately

slide-76
SLIDE 76

Interesting results: Committee-to-candidate donations graph

$87M - DNC $25M - RNC

slide-77
SLIDE 77

Interesting results: Author-to-conference publishing graph

Has published 40 papers, but to the same conference (and nowhere else) Have published hundreds of papers, to almost as many conferences!

slide-78
SLIDE 78

Group anomalies on graphs

Bob’s Carol’s Alice’s

Alice

Content adapted from Shin ‘16

slide-79
SLIDE 79

Fraud forms dense blocks

Restaurants Accounts Restaurants Accounts Adjacency Matrix

slide-80
SLIDE 80

Tensor modeling for attributed graphs

  • Natural dense blocks

are sparse on the time axis (formed gradually)

  • Suspicious dense

blocks are also dense

  • n the time axis (due to

synchronous behavior)

  • Suspicious dense

blocks are denser than natural dense blocks in the tensor model Restaurants Timestamp Sparse Dense Accounts

A cell indicates that account i rates restaurant j at time t

Adjacency Tensor

slide-81
SLIDE 81

Applications

  • Dense bocks signal anomalies/fraud in many multi-attribute graphs

Src IP Dst IP T i m e s t a m p Src User Dst User T i m e s t a m p User Page T i m e s t a m p

TCP Dumps Wikipedia Revision History Time-evolving Social Network

slide-82
SLIDE 82

How to find dense blocks in such tensors?

  • Exact solutions are combinatorial and intractable
  • Greedy solutions and heuristics are practical (i.e. greedily optimize a

“suspiciousness” metric)

  • What metric?

Assume a block (subtensor) ! in a 3- way tensor "

  • #$%&((): *+ + *- + *.
  • /01 ( = 345678((): *+×*-×*.
  • :;<<((): sum of entries in (

= *+ *- *. (

Some notable choices:

Traditional Density: ρ? (, = = ABCC ( /Vol(B) (maximized by single entry with max. value) Arithmetic Avg. Degree: ρI (, = = ABCC ( /Size(B) Geometric Avg. Degree: ρN (, = = ABCC ( /

O Vol B

slide-83
SLIDE 83

Detecting a single dense block

  • Greedy search method
  • Starts from the entire tensor

5 3 0 4 6 1 2 0 0

1 0 1

! = 2.9

slide-84
SLIDE 84

Detecting a single dense block

  • Remove a slice to maximize density !

5 3 0 4 6 1 2 0 0

" = 3

slide-85
SLIDE 85

Detecting a single dense block

5 3 4 6 2 0

! =3.3

  • Remove a slice to maximize density #
slide-86
SLIDE 86

Detecting a single dense block

5 3 4 6 2 0

! = 3.6

  • Remove a slice to maximize density &
slide-87
SLIDE 87

Detecting a single dense block

  • Output: return the densest block so far

5 3 4 6 2 0

! = 3.6

slide-88
SLIDE 88

Handling multiple blocks

  • Remove found blocks before finding others

Find & Remove Find & Remove Find & Remove Restore

slide-89
SLIDE 89

Algorithm details

  • Theorem 1 [Remove Minimum Mass First]

Among slices in the same mode, removing the slice with minimum mass is always best

12 > 9 > 2

  • Theorem 2 [Approximation Guarantee]

!" #, % ≥ ' ( !" #∗, %

Density metric Input Tensor Order Densest Block

slide-90
SLIDE 90

Practical discoveries

TCP connections forming the densest blocks are network attacks First three blocks found Src IP Dst IP T i m e s t a m p

slide-91
SLIDE 91

Practical discoveries

First three blocks found by M-Zoom Page edit wars : 11 users revised 10 pages, 2,305 times within 16 hours User Page T i m e s t a m p

slide-92
SLIDE 92

Takeaways

  • Graphs provide a means of describing interactions between objects
  • Almost all real graphs are “non-random” and obey various patterns
  • Considerable literature in graph mining focuses on learning to

leverage large-scale interaction patterns to

  • Recommend users new content based on what they might like
  • Identify interesting group behaviors and community norms
  • Discover abnormalities that correspond to fraud or “audit-worthy” events