an embedding a approac ach t to anom omal aly d detection
play

An Embedding A Approac ach t to Anom omal aly D Detection - PowerPoint PPT Presentation

An Embedding A Approac ach t to Anom omal aly D Detection Renjun Hu 1 , Charu Aggarwal 2 , Shuai Ma 1 , and Jinpeng Huai 1 1 SKLSDE Lab, Beihang University, China 2 IBM T. J. Watson Research Center, USA 1 Motiv tivatio tion Anomaly


  1. An Embedding A Approac ach t to Anom omal aly D Detection Renjun Hu 1 , Charu Aggarwal 2 , Shuai Ma 1 , and Jinpeng Huai 1 1 SKLSDE Lab, Beihang University, China 2 IBM T. J. Watson Research Center, USA 1

  2. Motiv tivatio tion  Anomaly detection • Identification of patterns in data that do not conform to expected behaviors [Chandola et al. 2009] • Useful in a wide variety of applications  In networks, anomaly detection has broader meanings • Application-specific significance • Possibility to improve the performance of network-centric mining tasks such as community detection and classification 2 V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv. 41(3), 2009.

  3. Motiv tivatio tion  Structural hole theory [Burt 1992, 2004] • Theory of social capital • A structural hole is a gap between two nodes who have complementary sources to information Prof. Ronald S. Burt How to detect social brokers? u A formal quantitative definition is needed in the first place! v • Node A (social broker) is more likely to get novel information than B, even though they have the same number of links. Burt, Ronald S. (1992). Structural holes: the social structure of competition. Harvard University Press. 3 Burt, Ronald S. (2004). Structural Holes and Good Ideas. American Journal of Sociology 110 (2): 349–399.

  4. Motiv tivatio tion  Structural inconsistencies • Nodes that connect to a number of diverse influential communities • Detect social brokers quantitatively  Anomalousness from homophily [McPherson et al. 2001] • Linked nodes have similar properties • Fundamental to a wide variety of algorithms in network science  E.g. , community detection, collective classification, link prediction, influence analysis • Violated by structural inconsistencies M. McPherson, L. Simth-lovin and J. Cook. Birds of a feather: Homophily in social networks. Annual 4 review of sociology , Vol. 27: 415-444, 2001.

  5. Motiv tivatio tion  Structural inconsistencies • Nodes that connect to a number of diverse influential communities • Detect social brokers quantitatively  The presence of structural inconsistencies may: • have a substantial impact on network structure  E.g. , all nodes tend to form one large cluster • prevent effective applications of network mining algorithms  E.g. , hard for community detection algorithms to achieve meaningful clusters 5

  6. Outli tline  Anomaly detection model • Graph embedding • A quantitative measure of anomaly  Algorithm optimization techniques  Evaluation 6

  7. Why grap aph e embed edding?  Structural inconsistencies • connect to a number of diverse influential communities  Evaluate the diversity or similarity of nodes. How? B C • To node B, node A is more similar than C, even though they have the same (global) distance from B. A  Graph embedding • Associate each node with a multidimensional vector • Preserve local linkage structure (instead of global structure) • Each dimension corresponds to a community in the network 7

  8. Why grap aph e embed edding?  Structural inconsistencies • connect to a number of diverse influential communities  An alternative option: doing community detection followed by anomaly detection • Do not distinguish anomalies from normal nodes • The presence of anomalies has certain impacts on the results of community detection • Community detection is a heavy task. • Fail to detect structural inconsistencies! 8

  9. Gr Grap aph e embed edding  Given an undirected graph G= ( V, E ), associate each node i with a d -dimensional vector X i • V = {1,2,…, n } • d : number of communities • X i : correlation between node i and the d communities A reasonable selection of d suffices for anomaly detection. Not necessary to use the number of real-life communities. 9

  10. Grap Gr aph e embed edding  Given an undirected graph G= ( V, E ), associate each node i with a d -dimensional vector X i  Goal: preserve local linkage structure • Connected nodes should have similar values of X i • Disconnected nodes should have diverse values of X i  Computation: minimizing objective function O ( ) ∑ ∑ m 2 2 = − + ⋅ α − − α = 1 , O X X X X ( ) i j i j − n m ∈ ∉ ( , ) ( , ) i j E i j E 2 • n : number of nodes in G , m : number of edges in G • α : balancing factor that regulates the importance of the two components in O • The embedding ensures that 0≤‖ X i - X j ‖ 2 ≤1 10

  11. A quantitative m e mea easu sure  Inspired by structural inconsistencies and structural holes (social brokers) • Connect to a number of diverse influential communities • Bridge across complementary sources  NB(i) : how node i connects to communities ( ) ( ) ∑ = = − − ⋅ 1 d ( ) ,..., 1 NB i y y X X X i i i j j ( ) ∈ , i j E  AScore(i) : the anomalousness of node i k { } d y ∑ ∗ = = 1 d i ( ) , max ,..., AScore i y y y ∗ i i i y = 1 k i • Detect anomalies by AScore ( i ) > thre 11

  12. Exam ample  Optimality of embedding, i.e. , minimum value of O • Small values within groups because of missing edges • No values across groups • Certain values for the red node (no better embedding) ( ) ∑ ∑ 2 2 = − + ⋅ α − − 1 O X X X X  Anomalousness of nodes i j i j ∈ ∉ ( , ) ( , ) i j E i j E • AScore(red) = 4 (equal values k { } d ∑ y = ∗ = 1 d i ( ) , max ,..., AScore i y y y in dimensions of NB ( red )) ∗ i i i y = k 1 i • AScore(i) ≈ 1 for others ( NB ( i ) The red node is detected only has a dominating as an anomaly! dimension) 12

  13. Outli tline  Anomaly detection model  Algorithm optimization techniques • Sampling • Graph partitioning based initialization • Dimension reduction  Evaluation 13

  14. Issues es in t the m mod odel el  Objective function O is a sum over O ( n 2 ) terms • Forbidden in large social networks  Optimizing O uses a gradient descent method • Critically dependent on a good initialization  Dimensionality of embedding ( i.e. , d ) could be large • E.g. , 8,353 for YouTube and 6,288,363 for Orkut [Yang & Leskovec 2012] J. Yang and J. Leskovec. Defining and evaluation network communities based on ground-truth. In ICDM , 14 2012.

  15. Sam ampling  Objective function O is a sum over O ( n 2 ) terms ( ) m ∑ ∑ 2 2 = − + ⋅ α − − α = 1 , O X X X X ( ) i j i j − n m ∈ ∉ ( , ) ( , ) i j E i j E 2  Observation: balancing factor α is close to 0 • Very inefficient • Possible to approximately represent O by sampling  Sampled objective function O ( ) ∑ ∑ 2 2 ≈ − + − − ⊂ ∉ 1 , {( , ) | ( , ) } O X X X X E i j i j E i j i j s ∈ ∈ ( , ) ( , ) i j E i j E s • | E s | = | E | = m 15

  16. Grap Gr aph p partition oning based initia tializ lizatio tion  Optimizing O uses a gradient descent method • Critically dependent on a good initialization  A good initialization means small value of O • Densely connected nodes have similar values of X i • Nodes across groups have diverse values of X i  Incorporating graph partitioning (METIS) for initialization • P i : partition number of node i  =  1 2 j P = =  1 d j ( ,...., ), i X x x x i i i i ≠   0 j P i 16

  17. Dimen ension r red educti ction  Dimensionality of embedding ( i.e. , d ) can be large  The complete d-dimensions are unnecessary • Nodes typically connect to a limited number of communities • A limited number of communities suffice to ascertain anomalies (Gordon) Hughes Effect  Data approximation ( k + β reduction) • only maintain ( k + β )-dimensions for embedding of each node • k : the maximum number of communities to connect • β : tolerate mistakes when determining the k communities • k << d & β << d , e.g. , 10 & 2 for a network with n = 10 6 17

  18. Impac acts o of optimization on t techniques es Space Efficiency Effectiveness Prev.: O( n 2 ∙ d ) Remain effective Sampling / (from experiments) After: O( m ∙ d ) Prev.: 0 Graph Provide a good / partitioning initialization After: O( n + m + d ∙log( d )) Prev.: O( t ∙ m ∙ d ) Prev.: O( n ∙ d ) t : # of iterations k+ β Slightly improve reduction effectiveness After: O( n ∙( k + β )) After: O( t ∙ m ∙( k + β )) 18

  19. Outli tline  Anomaly detection model  Algorithm optimizations  Evaluation 19

  20. Exper erimental al s settings  Datasets Dataset # of nodes # of edges Descriptions Amazon 334,863 925,872 Product co-purchasing DBLP 1,150,852 5,098,175 Co-authorship 10 5 - 4x10 6 m = n 1.15 Synthetic LFR-benchmark graph • Anomaly injection on Synthetic data for ground-truth of anomalies  Algorithms • Embed( d ) : embedding of d -dimensions • Embed( k + β ) : embedding with k+ β reduction • Oddball : based on violation of power-laws of egonet-based features • MDS( d ) : similar to Embed( d ), except using multi-dimensional scaling for embedding (preserve global structure)  Parameters: d = n /500, k = avgDeg , β = k /4  Implementation: C++, Core i5 3.10GHz, 16GB of memory 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend