detecting anomalies in inter hosts communication graph
play

Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, - PowerPoint PPT Presentation

Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, 2009 Keisuke ISHIBASHI*, Tsuyoshi KONDOH*, Shigeaki HARADA , Tatsuya MORI , Ryoichi KAWAHARA , Shoichiro ASANO *NTT Information Platform Labs. NTT Service


  1. Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, 2009 Keisuke ISHIBASHI*, Tsuyoshi KONDOH*, Shigeaki HARADA § , Tatsuya MORI § , Ryoichi KAWAHARA § , Shoichiro ASANO ¶ *NTT Information Platform Labs. § NTT Service Integration Labs. ¶ National Information Institute Flocon2009 1

  2. Outline • Anomalous traffic detection • Inter-host communication graph • Anomalies in communication graph • Detecting method for graph anomaly – Similarities between graphs • Experimental results – Synthesized traffic – Actual traffic Flocon09 2

  3. Anomalous traffic detection • DDoS attacks, Network failure etc: can be detected as sudden change in traffic volume • Worm scans or botnet C&C traffic: cannot be found as volume change – Whose traffic volume is very small, and buried in normal traffic • May be found as sudden change in traffic pattern, not volume • Traffic pattern – Entropy: can reveal traffic characteristic per hosts. – Communication pattern between hosts: can reveal anomalous traffic which appears as inter-hosts communication pattern Flocon2009 3

  4. Communication pattern between hosts • Can be represented as graph • Communication graphs for anomalous traffic – Some of them are difficult to detect with conventional methods • Conventional methods: monitoring entropies in number of flows, etc Botnet Victims Botnet Worm C&C C&C infected Victims server Victims server hosts Worm scan Botnet P2P Botnet More difficult to detect Flocon2009 4

  5. 5 Time series of communication graph Flocon2009

  6. Challenge • How to detect anomaly (change) in time series of graph? • Visualization or animation of commutation graph[Yurcik06] – Useful especially for digging anomalous event by hand – However, eyeballing by human operator is needed to detect anomalous event • Automated detection: need to define similarity between graphs S(G t ,G t+1 ), where G t and G t+1 are graphs of time t and t+1 – Can judge as an anomaly if S(G t ,G t+1 ) suddenly decreases t=3 t=2 S(G 2, G 3 ) t=1 t=0 S(G 1, G 2 ) S(G o, G 1 ) • [Yurcik06] William Yurcik, “VisFlowConnect-IP: A Link-Based Visualization of NetFlows for Security Monitoring,” 18 th Annual FIRST Conference, June 2006. Flocon2009 6

  7. Similarities between graphs • Graph Kernel – Define “inner product” like function f(•, •), a.k.a kernel, on the space of non-linear spaces [Kashima03] • Edit distance – Number of operations to change graph G to G’ [Bunke06] – operations: add/remove edges/nodes • Can be used to detect anomalies in graph time-series • Difficult to identify the source of anomaly • [Kashima03] H. Kashima, et.al , “Marginalized kernels between labeled graphs,” In Proc. ICML 2003, pp.321-328. • [Bunke06] H. Bunke et.al, “Computer Network Monitoring and Abnormal Event Detection Using Graph Matching and Multidimensional Scaling, ” LNCS Vol. 4065 2006. Flocon2009 7

  8. Linear feature space projection • Linear feature space projection[Ide04] – Mapping a graph to a vector in the linear space that represents the feature of the graph • As feature vectors, adopt a principal eigenvector of adjacency matrix for the graph – ≈ Page Rank vector – Dimension of linear space: Number of nodes in graphs Host3 1 2 3 Host2 1 - 1 1 Host2 Host1 2 1 - 0 Host3 3 1 0 - Principal Host1 eigen Communication graph Feature vector Adjacency matrix vector • [Ide04] Tsuyoshi Ide and Hisashi Kashima: Eigenspace-based Anomaly Detection in Computer Systems, In Proc. 10th ACM SIGKDD Conference (KDD2004), Seattle, WA, USA, 2004. Flocon2009 8

  9. Anomaly detection using feature vector • Periodically generate communication graph from observed traffic data, and calculate feature vectors of the graphs • Calculate similarity between the graph and the previous one Cosine similarity • Judge as anomaly if the similarity suddenly decreases High similarities Vector elements for Host3 Vector at time t Vector at time t+1 Low similarities-> detected as anomaly Host2 Vector at time t+2 Host1 Flocon2009 9

  10. Compressing adjacency matrix • In large communication graph, calculating principal eigen vector of adjacency matrix may be difficult. • Compress adjacency matrix by combining hash matrix and bloom filter Source Address Destination Address Hash(DstIP) 192.168.0.1 → 10.0.0.1 Hashing 1 2 3 M Hashing Source- Destination Pair 1 1 1 1 1 Hash(SrcIP) 2 1 1 2 1 H(192.168.0.1.10.0.0.1) 3 1 1 0 1 Chech whether the pair BloomFIiter M 0 1 1 0 is new or not If new, then increment the corresponding cell Flocon2009 10 10

  11. Experimental results • Observed data: packet capture data of 24-hour long at 1Gbps link • Use packets with ports 135/445(scans)/6667(IRC) – Current python implementation cannot handle whole traffic – Focus on botnet related traffic • Generate graphs every minutes • Hash matrix size : 1280 × 1280 Flocon2009 11

  12. Time series of simulates of feature vectors • Several sudden decreases in similarities • Try to find the source of anomaly for the first one eigv ec 1.2 1 0.8 Elapsed time 0.6 0.4 eigvec 0.2 0 0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00 0:00 Similarity Flocon2009 12

  13. Comparison of graphs before/after the anomaly • By comparing graphs and/or vectors before/after the anomaly, we can identify the source of anomaly • Comparing vectors is fit for automated identification • In this case: sudden large virus scan 8000 1 7000 0.8 6000 5000 0.6 degvec-before 4000 eigvec-before 0.4 3000 2000 0.2 1000 0 0 0 200 400 600 800 1000 1200 1400 Flocon2009 13

  14. Evaluation with synthesized anomaly cluster • Which type of anomaly and how large anomaly can be detected by the proposed method? • Evaluation using synthesized anomaly can answer the above question • Firstly, mesh cluster of various size is inserted to actual communication graph and calculate the similarity between the original graph Flocon2009 14

  15. Evaluation with synthesized anomaly cluster • With mesh size > 70, similarity decreases and the anomaly can be found 1.2 1 0.8 Similarity 0.6 degvec 0.4 eigvec 0.2 0 0 20 40 60 80 100 120 Num of mesh nodes Flocon2009 15

  16. Conclusion • Summary – Propose a method to detect anomalies in communication graphs • Projection of graph into linear feature spaces, and compare the simulates between feature vectors – Evaluate using actual traffic data • Found a sudden large worm scan • Future works – Apply to other traffic data to find out which type of anomaly the proposed method can detect – Faster implementation Flocon2009 16

  17. Acknowledgement • This study was supported in part by the Ministry of Internal Affairs and Communications of Japan. Flocon2009 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend