a labeled data set for flow based intrusion detection
play

A Labeled Data Set For Flow-based Intrusion Detection Anna - PowerPoint PPT Presentation

A Labeled Data Set For Flow-based Intrusion Detection Anna Sperotto, Ramin Sadre, Frank van Vliet, Aiko Pras Design and Analysis of Communication Systems University of Twente, The Netherlands NMRG Workshop on Netflow/IPFIX Usage in Network


  1. A Labeled Data Set For Flow-based Intrusion Detection Anna Sperotto, Ramin Sadre, Frank van Vliet, Aiko Pras Design and Analysis of Communication Systems University of Twente, The Netherlands NMRG Workshop on Netflow/IPFIX Usage in Network Management Maastricht - July 30, 2010

  2. Contents • Operational experience in trace collections • Experimental Setup • Data processing and labeling • The labeled data set

  3. Introduction 30% trace 1 system 1 attacks! 85% trace 2 system 2 attacks! • Systems are evaluated on proprietary traces • No shared ground truth • Results cannot be directly compared!

  4. Data set requirements We want the data set to be: • realistic data • complete and correct in labeling • achievable in an acceptable labeling time • sufficient trace size The requirements will determine the collection setup

  5. Measurement scale NETWORK • realistic • not complete • it does not scale

  6. Measurement scale NETWORK SUBNETWORK • • realistic realistic • • not complete not complete • it does not scale

  7. Measurement scale NETWORK SUBNETWORK SINGLE HOST • • • realistic realistic realistic • • • not complete enhanced logging not complete • (honeypot) it does not scale

  8. Setup HONEYPOT ssh, http, ftp ssh session transcript XEN SERVER tcpdump • daily used services with enhanced logging • direct connection to the Internet • attack exposure • complete tcpdump of the traffic (offline flow creation)

  9. Data set creation TRAFFIC FLOWS TYPESCRIPTS DUMP ALERT CLUSTERING LABELLED LOGS EVENTS GENERATION/ & CAUSALITY DATASET CORRELATION Preprocessing • packets  flows F = ( I src , I dst , P src , P dst , Pckts, Octs, T start , T end , Flags, Prot ) • logs  log events L = ( T, I src , P src , I dst , P dst , Descr, Auto, Succ, Corr )

  10. Data set creation TRAFFIC FLOWS TYPESCRIPTS DUMP ALERT CLUSTERING LABELLED LOGS EVENTS GENERATION/ & CAUSALITY DATASET CORRELATION • The correlation process will results in alerts A = ( T, Descr, Auto, Succ, Serv, Type )

  11. Correlation procedure HP F1 LOGS F2 CORRELATE ALERT A (F1, A) (F2, A)

  12. Cluster and Causality Alert Cluster alert cluster relation causality relation Alert Alert Alert Alert Alert Flow Flow Flow Flow Flow Flow • Hierarchic view of the alerts to enrich the data set with extra information on the traffic • Group simple alerts into cluster alerts • high level view of malicious activities

  13. Implementation Packets to flows • softflowd AUTOMATIC • shell scripts SEMI-AUTOMATIC Logs to log events • discriminate between manual/ MANUAL automated attacks • correlation procedure Alert correlation SEMI-AUTOMATIC • extensible for other attacks Cluster and • analysis of typescripts MANUAL causality

  14. The Dataset dump file 24 GB flows 14M alerts 7.6M • Flow breakdown 13942629 SSH 18038 ICMP 13 FTP 9798 HTTP 14151511 TCP 191339 AUTH/IDENT 7383 IRC 583 UDP 18970 OTHERS 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+08 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+08 number of flows number of flows

  15. The Dataset dump file 24 GB flows 14M alerts 7.6M • Alert breakdown 8756 SSH IN 10 SSH IN 7591869 SSH OUT 6 FTP 5317 HTTP 35 SSH OUT 95664 AUTH/IDENT IN 6 AUTH/IDENT OUT 4 HTTP 3692 IRC OUT 16382 ICMP IN 0 10 20 30 40 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+08 number of alerts number of alerts

  16. The Dataset • We labeled: 98,5% flows and 99,99% alerts • Mainly malicious traffic: • ssh brute force attacks • automated http connections • Small percentage of side-effect traffic • auth/ident on port 113 • IRC traffic

  17. Conclusions • We presented the first labeled data set for flow-based intrusion detection • http://traces.simpleweb.org/ • Semi-automated correlation process • manual intervention is still needed • Data set mainly constituted of malicious traffic • need to extend to benign traffic

  18. Conclusions • Reactions: • Since publication (October 2009) ~ 7 requests • We do not monitor the downloads at the webpage • In contact with Philipp Winter (Hagenberg University, AU): MSc Project “ Inductive Intrusion Detection in Flow-Based Network Data using One-Class Support Vector Machines ”

  19. Implementation ALERTS_CLUSTERING parent ALERT_TYPES ALERTS child i d i d description automated succeeded description timestamp ALERTS_CAUSALITY SERVICES type parent i d service child description NETFLOWS i d src_ip dst_ip packets NETFLOW_ALERTS octets flowid start_time alertid start_msec end_time end_msec src_port dst_port tcp_flag prot

  20. Correlation procedure Algorithm 1 Correlation procedure 1: procedure ProcessFlowsForService ( s : service) 2: for all Incoming flows F 1 for the service s do Retrieve matching response Flow F 2 such as 3: F 2 .I src = F 1 .I dst ∧ F 2 .I dst = F 1 .I src ∧ F 2 .P src = F 1 .P dst ∧ F 2 .P dst = F 1 .P src 4: ∧ F 1 .T start ≤ F 2 .T start ≤ F 1 .T start + δ 5: with smallest F 2 .T start − F 1 .T start ; 6: Retrieve a matching log event L such as 7: L.I src = F 1 .I src ∧ L.I dst = F 1 .I dst ∧ L.P src = F 1 .P dst ∧ L.P dst = F 1 .P src ∧ 8: F 1 .T start ≤ L.T ≤ F 1 .T end ∧ not L.Corr 9: with smallest L.T − F 1 .T start ; 10: if L exists then 11: Create alert A = ( L.T, L.Descr, L.Auto, L.Succ, s, CONN ). 12: Correlate F 1 to A ; 13: if F 2 exists then 14: Correlate F 2 to A ; L.Corr ← true ; 15: end if 16: end if 17: 18: end for

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend