Focused Clustering and Outlier Detection in Large Attributed Graphs - - PowerPoint PPT Presentation

focused clustering and outlier detection in large
SMART_READER_LITE
LIVE PREVIEW

Focused Clustering and Outlier Detection in Large Attributed Graphs - - PowerPoint PPT Presentation

Focused Clustering and Outlier Detection in Large Attributed Graphs ACM SIG-KDD August 26, 2014 Bryan Perozzi , Leman Akoglu Stony Brook University Patricia Iglesias Snchez * , Emmanuel Mller * * Karlsruhe Institute of Technology


slide-1
SLIDE 1

Focused Clustering and Outlier Detection in Large Attributed Graphs

ACM SIG-KDD

August 26, 2014 Patricia Iglesias Sánchez*, Emmanuel Müller*†

*Karlsruhe Institute of Technology †University of Antwerp

Bryan Perozzi, Leman Akoglu

Stony Brook University

slide-2
SLIDE 2

Attributed Graphs

 Attributed graph:

each node has 1+ properties

 Examples:

 Age  School  Relationship Status

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 2

slide-3
SLIDE 3

Focused Mining of Attributed Graphs

 Numerous attributes (ex: Facebook profiles)  Many irrelevant for most queries

 Ex: When trying to sell mortgages

 Useful: Income, Credit Score, Employer  Not Useful: Hair Color, # Apps Installed

 Ex: When trying to sell make up

 Useful: Hair Color, Skin Tone, Gender  Not Useful: Shoe Size

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 3

Users have a Focus  Algorithms need a Focus too!

 Focus  Focus

slide-4
SLIDE 4

Adding Focus to Algorithms

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 4

 Users provide examples of the kind of

similarity they are interested in.

 We infer the similarity function that matters to

them.

examples

! user infer focus attributes

focus

Task

slide-5
SLIDE 5

Outline

 Introduction  New Problem:

Focused Clustering & Outliers

 Our Approach: FocusCO  Evaluation  Conclusion

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 5

slide-6
SLIDE 6

Focused Clusters and Outliers: Problem

Given

1) a graph w/ node attributes, 2) exemplar nodes by the user

Infer attribute weights/relevance Extract focused clusters:

1) dense in structure, 2) coherent in “heavy” attributes (called the “focus”)

Detect focused outliers:

*) nodes deviating in focus attribute values

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 6

slide-7
SLIDE 7

An Example

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 7

 Users provide examples of

nodes they consider similar.

 Ex: ‘Yann LeCun’ and

‘Foster Provost’

 We learn a focus

 Education Level  Location

 We extract clusters

 which agree with the focus

 We detect outliers

 which don’t agree with focus

slide-8
SLIDE 8

Related Work

Graph Clustering Attributed Graphs Attribute Subspace User Preference Outlier Detection

METIS, Spectral

Parallel Nibble, BigClam

CoPaM, Gamer

✓ ✓ ✓

CODA

✓ ✓ ✓

GOutRank, ConSub

✓ ✓ ✓

FocusCO

✓ ✓ ✓ ✓ ✓

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 8

slide-9
SLIDE 9

examples

FocusCO: sketch

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 9

… …

age gender location

1 2 3

infer “focus” attribute(s)

4

detect focused clusters &

  • utliers
slide-10
SLIDE 10

Focus attribute inference

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 10

  • 1. Construct a set of similar pairs, PS
  • 2. Construct a set of dissimilar pairs, PD

Randomly sample pairs (u,v)

  • 3. Learn a distance metric between PS and PD

Input: Set of similar nodes, Cex

Pair user examples together Cex

slide-11
SLIDE 11

Distance Metric Learning

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 11

nodes attributes Feature Matrix PS and PD intermixed nodes attributes [Xing, et al 2002] Focused Attribute Vector PS closer together

slide-12
SLIDE 12

examples

FocusCO: sketch

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 12

… …

age gender location

1 2 3

infer “focus” attribute(s)

4

detect focused clusters &

  • utliers
slide-13
SLIDE 13

FocusCO: Cluster Extraction

 Local clustering algorithm

 Not cluster whole graph

 Expands a cluster around

a starting set

 Two procedures:

1.

Finding good candidate sets to start at

2.

Growing clusters

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 13

slide-14
SLIDE 14

Finding nodes to cluster around

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 14

1.) We reweigh the graph using the focus 2.) We keep only highly weighted edges 3.) The connected components are our seeds A seed set

slide-15
SLIDE 15

Growing a Focused Cluster

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 15

  • 2. At each step in cluster expansion:

2.1 - Examine boundary nodes 2.2 - Add node with best ∆ 2.3 - Record best structural node

  • 1. Clustering objective: conductance

weighted by focus Cluster Member Focused Outlier

  • 3. Focused Outliers:

left out best structural nodes

slide-16
SLIDE 16

Experiment set up

 Synthetic and Real World Graphs  Performance measures:

 Cluster quality: NMI  Outlier accuracy: precision, F1

 Compared to:

 CODA [Gao+’10]  METIS (no outlier detection) [Karypis+’98]

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 16

slide-17
SLIDE 17

Focused clustering performance

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 17

9 clusters (3 focus1 + 3 focus2 +3 unfocused). 5 focus attributes.

slide-18
SLIDE 18

Focused clustering performance

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 18

slide-19
SLIDE 19

Outlier detection performance

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 19

# deflated focus attributes increased (easier) from left to right

slide-20
SLIDE 20

Disney: Amazon co-purchase graph

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 20

Images are Focused Outliers

slide-21
SLIDE 21

DBLP co-authorship graph

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 21

Focused Outlier publishes in IR

slide-22
SLIDE 22

Political blogs citation graph

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 22

Focused Outlier did not mention Waas.

slide-23
SLIDE 23

Summary

Bryan Perozzi Focused Clustering and Outlier Detection in Large Attributed Graphs 23

A new graph mining paradigm where the focus steers graph mining according to user preference. A new problem formulation Focused Clustering & Outlier detection

Thanks! Any questions? Bryan Perozzi (bperozzi@cs.stonybrook.edu)

examples

!

user infer focus attributes

focus

Clustering