Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, - - PowerPoint PPT Presentation

detecting anomalies in inter hosts communication graph
SMART_READER_LITE
LIVE PREVIEW

Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, - - PowerPoint PPT Presentation

Detecting Anomalies in Inter- hosts Communication Graph Jan, 14, 2009 Keisuke ISHIBASHI*, Tsuyoshi KONDOH*, Shigeaki HARADA , Tatsuya MORI , Ryoichi KAWAHARA , Shoichiro ASANO *NTT Information Platform Labs. NTT Service


slide-1
SLIDE 1

Flocon2009 1

Detecting Anomalies in Inter- hosts Communication Graph

Jan, 14, 2009

Keisuke ISHIBASHI*, Tsuyoshi KONDOH*, Shigeaki HARADA§, Tatsuya MORI§, Ryoichi KAWAHARA§, Shoichiro ASANO ¶ *NTT Information Platform Labs.

§NTT Service Integration Labs. ¶ National Information Institute

slide-2
SLIDE 2

Flocon09 2

Outline

  • Anomalous traffic detection
  • Inter-host communication graph
  • Anomalies in communication graph
  • Detecting method for graph anomaly

– Similarities between graphs

  • Experimental results

– Synthesized traffic – Actual traffic

slide-3
SLIDE 3

Anomalous traffic detection

  • DDoS attacks, Network failure etc: can be detected as

sudden change in traffic volume

  • Worm scans or botnet C&C traffic: cannot be found as

volume change

– Whose traffic volume is very small, and buried in normal traffic

  • May be found as sudden change in traffic pattern, not

volume

  • Traffic pattern

– Entropy: can reveal traffic characteristic per hosts. – Communication pattern between hosts: can reveal anomalous traffic which appears as inter-hosts communication pattern

Flocon2009 3

slide-4
SLIDE 4

P2P Botnet Flocon2009 4

Communication pattern between hosts

  • Can be represented as graph
  • Communication graphs for anomalous traffic

– Some of them are difficult to detect with conventional methods

  • Conventional methods: monitoring entropies in number of flows, etc

C&C server Botnet Victims Worm infected hosts Victims

Worm scan

C&C server Botnet Victims

Botnet

More difficult to detect

slide-5
SLIDE 5

Flocon2009 5

Time series of communication graph

slide-6
SLIDE 6

Flocon2009 6

Challenge

  • How to detect anomaly (change) in time series of graph?
  • Visualization or animation of commutation graph[Yurcik06]

– Useful especially for digging anomalous event by hand – However, eyeballing by human operator is needed to detect anomalous event

  • Automated detection: need to define similarity between graphs

S(Gt,Gt+1), where Gt and Gt+1 are graphs of time t and t+1

– Can judge as an anomaly if S(Gt,Gt+1) suddenly decreases

t=0 t=1 t=2 t=3

S(Go,G1) S(G2,G3) S(G1,G2)

  • [Yurcik06] William Yurcik, “VisFlowConnect-IP: A Link-Based Visualization of NetFlows for Security Monitoring,” 18th Annual

FIRST Conference, June 2006.

slide-7
SLIDE 7

Flocon2009 7

Similarities between graphs

  • Graph Kernel

– Define “inner product” like function f(•, •), a.k.a kernel, on the space

  • f non-linear spaces [Kashima03]
  • Edit distance

– Number of operations to change graph G to G’ [Bunke06] – operations: add/remove edges/nodes

  • Can be used to detect anomalies in graph time-series
  • Difficult to identify the source of anomaly
  • [Kashima03] H. Kashima, et.al , “Marginalized kernels between labeled graphs,” In Proc. ICML 2003, pp.321-328.
  • [Bunke06] H. Bunke et.al, “Computer Network Monitoring and Abnormal Event Detection Using Graph Matching and Multidimensional

Scaling, ” LNCS Vol. 4065 2006.

slide-8
SLIDE 8

Linear feature space projection

  • Linear feature space projection[Ide04]

– Mapping a graph to a vector in the linear space that represents the feature of the graph

  • As feature vectors, adopt a principal eigenvector of

adjacency matrix for the graph

– ≈Page Rank vector – Dimension of linear space: Number of nodes in graphs

Flocon2009 8

Communication graph

Host1 Host2 Host3

  • [Ide04] Tsuyoshi Ide and Hisashi Kashima: Eigenspace-based Anomaly Detection in Computer Systems, In Proc. 10th ACM

SIGKDD Conference (KDD2004), Seattle, WA, USA, 2004.

Adjacency matrix

1 2 3 1

  • 1

1 2 1

  • 3

1

  • Host1

Host2 Host3

Feature vector

Principal eigen vector

slide-9
SLIDE 9

Flocon2009 9

Anomaly detection using feature vector

Host1 Host2 Vector elements for Host3

Vector at time t Vector at time t+1 Vector at time t+2

High similarities

Low similarities-> detected as anomaly

  • Periodically generate communication graph from observed

traffic data, and calculate feature vectors of the graphs

  • Calculate similarity between the graph and the previous one
  • Judge as anomaly if the similarity suddenly decreases

Cosine similarity

slide-10
SLIDE 10

Compressing adjacency matrix

  • In large communication graph, calculating principal eigen

vector of adjacency matrix may be difficult.

  • Compress adjacency matrix by combining hash matrix

and bloom filter

Flocon2009 10 10

1 2 3 M 1 1 1 1 1 2 1 1 2 1 3 1 1 1 M 0 1 1

Source Address Destination Address

192.168.0.1 → 10.0.0.1

Hash(SrcIP) Hash(DstIP)

Hashing BloomFIiter

Hashing Source- Destination Pair Chech whether the pair is new or not If new, then increment the corresponding cell

H(192.168.0.1.10.0.0.1)

slide-11
SLIDE 11

Flocon2009 11

Experimental results

  • Observed data: packet capture data of 24-hour long at

1Gbps link

  • Use packets with ports 135/445(scans)/6667(IRC)

– Current python implementation cannot handle whole traffic – Focus on botnet related traffic

  • Generate graphs every minutes
  • Hash matrix size:1280×1280
slide-12
SLIDE 12

Flocon2009 12

Time series of simulates of feature vectors

  • Several sudden decreases in similarities
  • Try to find the source of anomaly for the first one

eigv ec 0.2 0.4 0.6 0.8 1 1.2 0:00 3:00 6:00 9:00 12:00 15:00 18:00 21:00 0:00 Similarity Elapsed time

eigvec

slide-13
SLIDE 13

Flocon2009 13

Comparison of graphs before/after the anomaly

  • By comparing graphs and/or vectors before/after the anomaly, we

can identify the source of anomaly

  • Comparing vectors is fit for automated identification
  • In this case: sudden large virus scan

1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 1200 1400 0.2 0.4 0.6 0.8 1

degvec-before eigvec-before

slide-14
SLIDE 14

Evaluation with synthesized anomaly cluster

  • Which type of anomaly

and how large anomaly can be detected by the proposed method?

  • Evaluation using

synthesized anomaly can answer the above question

  • Firstly, mesh cluster of

various size is inserted to actual communication graph and calculate the similarity between the

  • riginal graph

Flocon2009 14

slide-15
SLIDE 15

Evaluation with synthesized anomaly cluster

  • With mesh size > 70, similarity decreases and the

anomaly can be found

Flocon2009 15

0.2 0.4 0.6 0.8 1 1.2 20 40 60 80 100 120 Num of mesh nodes Similarity degvec eigvec

slide-16
SLIDE 16

Flocon2009 16

Conclusion

  • Summary

– Propose a method to detect anomalies in communication graphs

  • Projection of graph into linear feature spaces, and compare

the simulates between feature vectors

– Evaluate using actual traffic data

  • Found a sudden large worm scan
  • Future works

– Apply to other traffic data to find out which type of anomaly the proposed method can detect – Faster implementation

slide-17
SLIDE 17

Acknowledgement

  • This study was supported in part by the

Ministry of Internal Affairs and Communications of Japan.

Flocon2009 17