Metadata format for benchmarking anomaly detection algorithms Youki - - PowerPoint PPT Presentation

metadata format for benchmarking anomaly detection
SMART_READER_LITE
LIVE PREVIEW

Metadata format for benchmarking anomaly detection algorithms Youki - - PowerPoint PPT Presentation

Metadata format for benchmarking anomaly detection algorithms Youki Kadobayashi NICT / NAIST youki-k <at> is.naist.jp 10 th CAIDA-WIDE workshop / 1 st CAIDA-WIDE-CASFI workshop August 2008 Anomaly detection algorithms:


slide-1
SLIDE 1

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Metadata format for benchmarking anomaly detection algorithms Youki Kadobayashi NICT / NAIST youki-k <at> is.naist.jp

slide-2
SLIDE 2

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Anomaly detection algorithms: The problem

  • We are still in the dark ages
  • Incompatible datasets
  • Incomparable results
  • No technical method to accurately communicate the

result of anomaly detection, even if we share the common dataset

  • Inability to benchmark their performance
slide-3
SLIDE 3

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Metadata format for anomaly detection algorithms

  • Separate file for each algorithm
  • XML-based
  • header, {record1, record2, …}
  • Envelope information: rely on datcat tools
slide-4
SLIDE 4

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Header

  • Algorithm name
  • Algorithm version
  • Algorithm URL
  • Parameters given to the algorithm
  • Date of analysis
  • Analyst name
  • Analyst organization
  • Target dataset
  • DATCAT dataset name
slide-5
SLIDE 5

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Record

  • Each record consists of:
  • src, dst, start_time, end_time, anomaly_type,

anomaly_value

  • Arbitrary number of records
  • Either src or dst can be wildcard
slide-6
SLIDE 6

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

API

  • label_data(int handle, in_addr_t src, in_addr_t dst,

time_t start, time_t end, string anomaly_type, float anomaly_value)

  • label_data_ex(int handle, in_addr_t[] src, in_addr_t[]

dst, time_t start, time_t end, string anomaly_type, float anomaly_value)

slide-7
SLIDE 7

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Slicing

  • Slice anomalous segments of pcap data
  • Based on anomaly_type, anomaly_value
  • Slice pcap data according to start_time, end_time
  • Useful for generating synthetic dataset
slide-8
SLIDE 8

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Merging

  • Insert pcap slice B into pcap slice A
  • At particular time offset
  • Useful for benchmarking anomaly detection algorithms

with synthetic dataset

slide-9
SLIDE 9

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Comparison

  • Visualize the spotted anomalies along timeline
  • Compute coverage and support, generate HTML table
slide-10
SLIDE 10

10th CAIDA-WIDE workshop / 1st CAIDA-WIDE-CASFI workshop August 2008

Current status

  • Implementation in progress
  • Your comments are welcomed
  • youki-k <at> is.naist.jp