Conformal Clustering and its Application to Botnet Traffic - - PowerPoint PPT Presentation

conformal clustering and its application to botnet traffic
SMART_READER_LITE
LIVE PREVIEW

Conformal Clustering and its Application to Botnet Traffic - - PowerPoint PPT Presentation

Conformal Clustering and its Application to Botnet Traffic Giovanni Cherubin, Ilia Nouretdinov, Alexander Gammerman Roberto Jordaney, Zhi Wang, Davide Papini, Lorenzo Cavallaro Netflow, network traces Internet Bot TCP/ netflow Date


slide-1
SLIDE 1

Conformal Clustering and its Application to Botnet Traffic

Giovanni Cherubin, Ilia Nouretdinov, Alexander Gammerman Roberto Jordaney, Zhi Wang, Davide Papini, Lorenzo Cavallaro

slide-2
SLIDE 2

Netflow, network traces

Internet

netflow Date Duration IP_src Port_src IP_dst Port_dst TCP/ UDP Sent Packets Recv Packets Sent Bytes Recv Bytes Tot Packets Tot Bytes Flags…

Bot

slide-3
SLIDE 3

Netflow, network traces

Date Duration TCP/ UDP Sent Bytes Port_dst … netflow_1 1248089563 2939 TCP 503 445 netflow_2 1248089702 51 TCP 354 139 …

slide-4
SLIDE 4

Conformal Predictor

Conformal Predictor D, zn, A pn: p-value Does zn conform D for 1-ε confidence?

slide-5
SLIDE 5

CP for anomaly detection

[Laxhammar11, Smith14]

x1 x2

slide-6
SLIDE 6

Conformal Clustering

  • Conformal Predictors in unsupervised setting.
  • Controls the objects left outside the clusters.
  • Regulates the “depth” of clusters.
slide-7
SLIDE 7

x1 x2 training objects

slide-8
SLIDE 8

x1 x2 training objects

slide-9
SLIDE 9

x1 x2

0.1 0.1 0.2 0.1 0.0 0.3 … 0.3

p-values grid

slide-10
SLIDE 10

x1 x2 respect to ε=0.1

slide-11
SLIDE 11

x1 x2 neighbouring rule

slide-12
SLIDE 12

x1 x2 test set

slide-13
SLIDE 13

x1 x2 clusters

slide-14
SLIDE 14

Our Approach

  • Each network trace produces a feature vector.
  • Normalisation.
  • Dimensionality reduction (t-SNE).
  • Non-conformity measures: k-NN, KDE.
  • Performance measures: Purity, Average P-Value.
slide-15
SLIDE 15

Performance Measures

Purity!

  • How “pure” are the clusters.
  • For the same ε the number of

clusters is not influenced. Average P-Value!

  • Efficiency criterion.
  • Size of the prediction set.
  • The smaller the prediction set

the better.

0.1 0.1 0.2 0.1 0.0 0.3 … 0.3

slide-16
SLIDE 16

Results (ε=0.2)

k-NN non-conformity measure k 1 2 3 4 5 … 10 APV 0.129 0.139 0.141 0.147 0.160 0.193 Purity 0.99 0.97 0.97 0.96 0.96 0.92 KDE (Gaussian kernel) non-conformity measure h 0.001 0.005 0.01 0.05 0.1 … 1.0 APV 0.404 0.332 0.299 0.165 0.130 0.221 Purity 1.00 0.98 1.00 0.99 0.99 0.92

slide-17
SLIDE 17

Future work

  • Avoid dimensionality reduction, reduce complexity.
  • New criteria of accuracy.
  • New non-conformity measures based on previous

work in botnets detection (e.g.: BotFinder).

  • Detection: “malicious” and “benign” data.
slide-18
SLIDE 18

Bibliography

  • [Vovk05] V. Vovk et al., Algorithmic learning in a random world.

Springer, 2005.

  • [Maaten08] L. van der Maaten et al., Visualizing data using t-SNE.

Journal of Machine Learning Research, 2008.

  • [Laxhammar11] R. Laxhammar et al., Sequential conformal anomaly

detection in trajectories based on hausdorff distance, 2011.

  • [Lei13] J. Lei et al., A conformal prediction approach to explore

functional data, 2013.

  • [Smith14] J. Smith et al., Anomaly Detection of Trajectories with

Kernel Density Estimation by Conformal Prediction. Artificial Intelligence Applications and Innovations, Springer, 2014.

slide-19
SLIDE 19

Thanks

slide-20
SLIDE 20

Conformal Clustering and its Application to Botnet Traffic

Giovanni Cherubin, Ilia Nouretdinov, Alexander Gammerman Roberto Jordaney, Zhi Wang, Davide Papini, Lorenzo Cavallaro