An Embedding A Approac ach t to Anom omal aly D Detection - - PowerPoint PPT Presentation

an embedding a approac ach t to anom omal aly d detection
SMART_READER_LITE
LIVE PREVIEW

An Embedding A Approac ach t to Anom omal aly D Detection - - PowerPoint PPT Presentation

An Embedding A Approac ach t to Anom omal aly D Detection Renjun Hu 1 , Charu Aggarwal 2 , Shuai Ma 1 , and Jinpeng Huai 1 1 SKLSDE Lab, Beihang University, China 2 IBM T. J. Watson Research Center, USA 1 Motiv tivatio tion Anomaly


slide-1
SLIDE 1

An Embedding A Approac ach t to Anom

  • mal

aly D Detection

Renjun Hu1, Charu Aggarwal2, Shuai Ma1, and Jinpeng Huai1

1SKLSDE Lab, Beihang University, China 2IBM T. J. Watson Research Center, USA 1

slide-2
SLIDE 2

Motiv tivatio tion

  • Anomaly detection
  • Identification of patterns in data that do not conform to

expected behaviors [Chandola et al. 2009]

  • Useful in a wide variety of applications
  • In networks, anomaly detection has broader meanings
  • Application-specific significance
  • Possibility to improve the performance of network-centric

mining tasks such as community detection and classification

  • V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv. 41(3), 2009.

2

slide-3
SLIDE 3

Motiv tivatio tion

  • Structural hole theory [Burt 1992, 2004]
  • Theory of social capital
  • A structural hole is a gap between two nodes who

have complementary sources to information

Burt, Ronald S. (1992). Structural holes: the social structure of competition. Harvard University Press. Burt, Ronald S. (2004). Structural Holes and Good Ideas. American Journal of Sociology 110 (2): 349–399.

  • Node A (social broker) is more likely to get novel information

than B, even though they have the same number of links.

  • Prof. Ronald S. Burt

u v How to detect social brokers? A formal quantitative definition is needed in the first place!

3

slide-4
SLIDE 4

Motiv tivatio tion

  • Structural inconsistencies
  • Nodes that connect to a number of

diverse influential communities

  • Detect social brokers quantitatively
  • Anomalousness from homophily [McPherson et al. 2001]
  • Linked nodes have similar properties
  • Fundamental to a wide variety of algorithms in network science

 E.g., community detection, collective classification, link prediction, influence analysis

  • Violated by structural inconsistencies
  • M. McPherson, L. Simth-lovin and J. Cook. Birds of a feather: Homophily in social networks. Annual

review of sociology, Vol. 27: 415-444, 2001. 4

slide-5
SLIDE 5

Motiv tivatio tion

  • Structural inconsistencies
  • Nodes that connect to a number of

diverse influential communities

  • Detect social brokers quantitatively
  • The presence of structural inconsistencies may:
  • have a substantial impact on network structure

 E.g., all nodes tend to form one large cluster

  • prevent effective applications of network mining algorithms

 E.g., hard for community detection algorithms to achieve meaningful clusters

5

slide-6
SLIDE 6

Outli tline

  • Anomaly detection model
  • Graph embedding
  • A quantitative measure of anomaly
  • Algorithm optimization techniques
  • Evaluation

6

slide-7
SLIDE 7

Why grap aph e embed edding?

  • Structural inconsistencies
  • connect to a number of diverse influential communities
  • Evaluate the diversity or similarity of nodes. How?
  • Graph embedding
  • Associate each node with a multidimensional vector
  • Preserve local linkage structure (instead of global structure)
  • Each dimension corresponds to a community in the network
  • To node B, node A is more similar than

C, even though they have the same (global) distance from B.

A B C

7

slide-8
SLIDE 8

Why grap aph e embed edding?

  • Structural inconsistencies
  • connect to a number of diverse influential communities
  • An alternative option: doing community detection

followed by anomaly detection

  • Do not distinguish anomalies from normal nodes
  • The presence of anomalies has certain impacts on the results
  • f community detection
  • Community detection is a heavy task.
  • Fail to detect structural inconsistencies!

8

slide-9
SLIDE 9

Gr Grap aph e embed edding

  • Given an undirected graph G=(V, E), associate each

node i with a d-dimensional vector Xi

  • V = {1,2,…,n}
  • d : number of communities
  • Xi : correlation between node i

and the d communities A reasonable selection of d suffices for anomaly detection. Not necessary to use the number of real-life communities.

9

slide-10
SLIDE 10

Gr Grap aph e embed edding

  • Computation: minimizing objective function O
  • Given an undirected graph G=(V, E), associate each

node i with a d-dimensional vector Xi

  • Goal: preserve local linkage structure
  • Connected nodes should have similar values of Xi
  • Disconnected nodes should have diverse values of Xi
  • n: number of nodes in G, m: number of edges in G
  • α : balancing factor that regulates the importance of the two

components in O

  • The embedding ensures that 0≤‖Xi - Xj‖2≤1

( )

( )

2 2 ( , ) ( , ) 2

1 ,

i j i j n i j E i j E

m O X X X X m α α

∈ ∉

= − + ⋅ − − = −

∑ ∑

10

slide-11
SLIDE 11

A quantitative m e mea easu sure

  • NB(i): how node i connects to communities
  • Inspired by structural inconsistencies and structural

holes (social brokers)

  • Connect to a number of diverse influential communities
  • Bridge across complementary sources

( )

( )

( )

1 ,

( ) ,..., 1

d i i i j j i j E

NB i y y X X X

= = − − ⋅

  • AScore(i): the anomalousness of node i

{ }

1 1

( ) , max ,...,

k d d i i i i k i

y AScore i y y y y

∗ ∗ =

= =

  • Detect anomalies by AScore(i) > thre

11

slide-12
SLIDE 12

Exam ample

  • Optimality of embedding,

i.e., minimum value of O

  • Small values within groups

because of missing edges

  • No values across groups
  • Certain values for the red node

(no better embedding)

  • Anomalousness of nodes
  • AScore(red) = 4 (equal values

in dimensions of NB(red))

  • AScore(i) ≈ 1 for others (NB(i)
  • nly has a dominating

dimension) ( )

2 2 ( , ) ( , )

1

i j i j i j E i j E

O X X X X α

∈ ∉

= − + ⋅ − −

∑ ∑

{ }

1 1

( ) , max ,...,

k d d i i i i k i

y AScore i y y y y

∗ ∗ =

= =

The red node is detected as an anomaly!

12

slide-13
SLIDE 13

Outli tline

  • Anomaly detection model
  • Algorithm optimization techniques
  • Sampling
  • Graph partitioning based initialization
  • Dimension reduction
  • Evaluation

13

slide-14
SLIDE 14

Issues es in t the m mod

  • del

el

  • Objective function O is a sum over O(n2) terms
  • Forbidden in large social networks
  • Optimizing O uses a gradient descent method
  • Critically dependent on a good initialization
  • Dimensionality of embedding (i.e., d) could be large
  • E.g., 8,353 for YouTube and 6,288,363 for Orkut [Yang &

Leskovec 2012]

  • J. Yang and J. Leskovec. Defining and evaluation network communities based on ground-truth. In ICDM,

2012. 14

slide-15
SLIDE 15

Sam ampling

  • Objective function O is a sum over O(n2) terms

( )

2 2 ( , ) ( , )

1 , {( , ) | ( , ) }

s

i j i j s i j E i j E

O X X X X E i j i j E

∈ ∈

≈ − + − − ⊂ ∉

∑ ∑

  • Observation: balancing factor α is close to 0
  • Very inefficient
  • Possible to approximately represent O by sampling

( )

( )

2 2 ( , ) ( , ) 2

1 ,

i j i j n i j E i j E

m O X X X X m α α

∈ ∉

= − + ⋅ − − = −

∑ ∑

  • |Es| = |E| = m
  • Sampled objective function O

15

slide-16
SLIDE 16

Gr Grap aph p partition

  • ning based initia

tializ lizatio tion

  • Optimizing O uses a gradient descent method
  • Critically dependent on a good initialization
  • Incorporating graph partitioning (METIS) for initialization
  • Pi : partition number of node i
  • A good initialization

means small value of O

  • Densely connected nodes

have similar values of Xi

  • Nodes across groups have

diverse values of Xi

1

1 2 ( ,...., ),

d j i i i i i i

j P X x x x j P  =  = =  ≠  

16

slide-17
SLIDE 17

Dimen ension r red educti ction

  • The complete d-dimensions

are unnecessary

  • Nodes typically connect to a

limited number of communities

  • A limited number of communities

suffice to ascertain anomalies

  • Data approximation (k+β reduction)
  • only maintain (k+β)-dimensions for embedding of each node
  • k : the maximum number of communities to connect
  • β : tolerate mistakes when determining the k communities
  • k << d & β << d, e.g., 10 & 2 for a network with n = 106
  • Dimensionality of embedding (i.e., d) can be large

(Gordon) Hughes Effect

17

slide-18
SLIDE 18

Impac acts o

  • f optimization
  • n t

techniques es

Space Efficiency Effectiveness Sampling / Prev.: O(n2∙d) Remain effective (from experiments) After: O(m∙d) Graph partitioning / Prev.: 0 Provide a good initialization After: O(n+m+d∙log(d)) k+β reduction Prev.: O(n∙d) Prev.: O(t∙m∙d) t : # of iterations Slightly improve effectiveness After: O(n∙(k+β)) After: O(t∙m∙(k+β))

18

slide-19
SLIDE 19

Outli tline

  • Anomaly detection model
  • Algorithm optimizations
  • Evaluation

19

slide-20
SLIDE 20

Exper erimental al s settings

  • Datasets

Dataset # of nodes # of edges Descriptions Amazon 334,863 925,872 Product co-purchasing DBLP 1,150,852 5,098,175 Co-authorship Synthetic 105 - 4x106 m = n1.15 LFR-benchmark graph

  • Anomaly injection on Synthetic data for ground-truth of anomalies
  • Algorithms
  • Embed(d) : embedding of d-dimensions
  • Embed(k+β) : embedding with k+β reduction
  • Oddball : based on violation of power-laws of egonet-based features
  • MDS(d) : similar to Embed(d), except using multi-dimensional scaling for

embedding (preserve global structure)

  • Parameters: d = n/500, k = avgDeg, β = k/4
  • Implementation: C++, Core i5 3.10GHz, 16GB of memory

20

slide-21
SLIDE 21

Cas Case s stud udy on y on DBL BLP

  • Different people with the same name

Wei Wang

  • 84 people named Wei Wang [DBLP, May 10 2016]
  • University of Waterloo (Canada), Fudan University (China), University of

California, San Diego (USA), etc.

  • People with many collaborators in diverse institutes
  • Dr. Ajith Abraham
  • Director of intelligence research labs which has members from more than

100 countries

  • Work in a multi-disciplinary environment involving machine intelligence,

cyber security, sensor networks and data mining

  • Teach in 23 universities all over the world

21

slide-22
SLIDE 22

Qual ality s study: mod

  • dular

arity

  • Modularity measures the strength of division of a network into communities
  • Using modularity to evaluate the improvement of the effectiveness of

community detection

  • ddball

Embed(d) Embed(k+β) Amazon 2.1% 2.8% 3.0% DBLP 4.2% 4.1% 5.6% Table 1: Improvement of modularity

22

slide-23
SLIDE 23

Qual ality s study: : F1 measu asure

  • On Synthetic data with ground-truth of anomalies
  • Mixing parameter μ: fraction of inter-group edges (i.e., μ ↑, strength of

community structure ↓)

  • ddball

Embed(d) Embed(k+β) Varying graph sizes 70% 88% 89% Varying μ 68% 86% 88% Table 2: F1 score of anomalies

23

slide-24
SLIDE 24

Impa pact cts on

  • n qua

quality: y: d & e embedding

MDS(d) Embed(d) d = 200 11.3% 89.4% d = 400 13.6% 90.6% d = 600 12.7% 89.8% d = 800 7.9% 85.5% d = 1000 11.3% 88.8% Average 11.3% 88.8% Table 3: MDS(d) vs. Embed(d) using F1 measure

  • Multi-dimensional scaling fails to effectively detect anomalies
  • Our approach works well as long as d falls into a reasonable range
  • Synthetic data, n = 400K, n/500 = 800

24

slide-25
SLIDE 25

Efficien ency s study

x : out of memory exception E(k+β)/E(d) E(k+β)/MDS(d) Amazon 35.3% 25.0% DBLP 23.4% 13.1% Synthetic 25.6% 13.2% Table 4: running time comparison

25

slide-26
SLIDE 26

Summa mmary

  • An embedding approach
  • Preserve local linkage structure of networks
  • A quantitative measure Ascore inspired by structural

inconsistencies and structural holes

  • Three algorithm optimization techniques
  • Structural inconsistencies
  • Nodes that connect to a number of diverse influential

communities

  • A formal quantitative definition of social brokers
  • Quality and efficiency results
  • Modularity increases 2.9%, 4.9% and 6.9% on Amazon, DBLP

and Synthetic data

  • F1 measure is 88% on Synthetic data
  • Running time increases reasonably w.r.t graph sizes

26

slide-27
SLIDE 27

Thanks! Q & A

27