Kargus: A Highly‐scalable Software‐based Intrusion Detection System
- M. Asim Jamshed*, Jihyung Lee†, Sangwoo Moon†, InsuYun*,
Kargus: A Highly scalable Software based Intrusion Detection System - - PowerPoint PPT Presentation
Kargus: A Highly scalable Software based Intrusion Detection System M. Asim Jamshed * , Jihyung Lee , Sangwoo Moon , InsuYun * , Deokjin Kim , Sungryoul Lee , Yung Yi , KyoungSoo Park * * Networked & Distributed
2
NIDS NIDS
Attack
3
4
5
Packet Acquisition Preprocessing
Decode Flow management Reassembly Match Success Match Failure (Innocent Flow)
Multi‐string Pattern Matching
Evaluation Failure (Innocent Flow) Evaluation Success
Rule Options Evaluation Output
Malicious Flow
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS 80 (msg:“possible attack attempt BACKDOOR optix runtime detection"; content:"/whitepages/page_me/100.html";
pcre:"/body=\x2521\x2521\x2521Optix\s+Pro\s+v\d+\x252E\d+\S+sErver\s+Online\x2521\x2521\x2521/")
6
7
[Core 1] [Core 2] [Core 3] [Core 4] [Core 5] 10 Gbps NIC B 10 Gbps NIC A [Core 1] [Core 2] [Core 3] [Core 4] [Core 5] 10 Gbps NIC B 10 Gbps NIC A [Core 7] [Core 8] [Core 9] [Core 10] [Core 11] 10 Gbps NIC D 10 Gbps NIC C [Core 7] [Core 8] [Core 9] [Core 10] [Core 11] 10 Gbps NIC D 10 Gbps NIC C
8
Rx Q A1 Rx Q A1 Rx Q B1 Rx Q B1 Rx Q A2 Rx Q A2 Rx Q B2 Rx Q B2 Rx Q A3 Rx Q A3 Rx Q B3 Rx Q B3 Rx Q A4 Rx Q A4 Rx Q B4 Rx Q B4 Rx Q A5 Rx Q A5 Rx Q B5 Rx Q B5 [Core 1] [Core 2] [Core 3] [Core 4] [Core 5] 10 Gbps NIC B 10 Gbps NIC B 10 Gbps NIC A 10 Gbps NIC A Rx Q A1 Rx Q B1 Rx Q A2 Rx Q B2 Rx Q A3 Rx Q B3 Rx Q A4 Rx Q B4 Rx Q A5 Rx Q B5 [Core 1] [Core 2] [Core 3] [Core 4] [Core 5] 10 Gbps NIC B 10 Gbps NIC A
9
10
Engine Thread Packet Acquisition Preprocess Multi‐string Matching Rule Option Evaluation GPU Dispatcher Thread Offloading Offloading GPU Multi‐string Matching PCRE Matching Multi‐string Matching Queue PCRE Matching Queue Engine Thread Packet Acquisition Packet Acquisition Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation GPU Dispatcher Thread Offloading Offloading Offloading Offloading GPU Multi‐string Matching Multi‐string Matching PCRE Matching PCRE Matching Multi‐string Matching Queue PCRE Matching Queue
11
12
Packet Acquisition
Core 1 Core 1
Preprocess Multi‐string Matching Rule Option Evaluation Packet Acquisition
Core 2 Core 2
Preprocess Multi‐string Matching Rule Option Evaluation Packet Acquisition
Core 3 Core 3
Preprocess Multi‐string Matching Rule Option Evaluation Packet Acquisition
Core 4 Core 4
Preprocess Multi‐string Matching Rule Option Evaluation Packet Acquisition
Core 5 Core 5
Preprocess Multi‐string Matching Rule Option Evaluation
Core 6
GPU Dispatcher Thread
Single thread pinned at core 1
Packet Acquisition Packet Acquisition
Core 1
Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation Packet Acquisition Packet Acquisition
Core 2
Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation Packet Acquisition Packet Acquisition
Core 3
Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation Packet Acquisition Packet Acquisition
Core 4
Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation Packet Acquisition Packet Acquisition
Core 5
Preprocess Preprocess Multi‐string Matching Multi‐string Matching Rule Option Evaluation Rule Option Evaluation
Core 6
GPU Dispatcher Thread GPU Dispatcher Thread
Single thread pinned at core 1
13
▲ Kargus configuration on a dual NUMA hexanode machine having 4 NICs, and 2 GPUs
14
15
Internal packet queue (per engine)
Queue Length
16
17
18
19
5 10 15 20 25 30 35 64 218 256 818 1024 1518 Throughput (Gbps) Packet size (Bytes) MIDeA Snort w/ PF_Ring Kargus CPU‐only Kargus CPU/GPU 5 10 15 20 25 30 35 64 218 256 818 1024 1518 Throughput (Gbps) Packet size (Bytes)
20
Actual payload analyzing bandwidth
21
5 10 15 20 25 30 35 64 256 1024 1518 Throughput (Gbps) Packet size (Bytes) Kargus, 25% Kargus, 50% Kargus, 100% Snort+PF_Ring, 25% Snort+PF_Ring, 50% Snort+PF_Ring, 100%
22
23
400 450 500 550 600 650 700 750 800 850 900 5 10 20 33 Kargus w/o LB (polling) Kargus w/o LB Kargus w/ LB Offered Incoming Traffic (Gbps) [Packet Size: 1518 B] Power Consumption (Watts)
– Packet size = 1518 B
24
25
27
UPDATE MIDEA KARGUS OUTCOME
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
28
UPDATE MIDEA KARGUS OUTCOME Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
29
UPDATE MIDEA KARGUS OUTCOME Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization Detection engine GPU‐support for Aho‐Corasick GPU‐support for Aho‐Corasick & PCRE 65% faster detection rate
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
30
UPDATE MIDEA KARGUS OUTCOME Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization Detection engine GPU‐support for Aho‐Corasick GPU‐support for Aho‐Corasick & PCRE 65% faster detection rate Architecture Process‐based Thread‐based 1/6 GPU memory usage
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
31
UPDATE MIDEA KARGUS OUTCOME Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization Detection engine GPU‐support for Aho‐Corasick GPU‐support for Aho‐Corasick & PCRE 65% faster detection rate Architecture Process‐based Thread‐based 1/6 GPU memory usage Batch processing Batching only for detection engine (GPU) Batching from packet acquisition to output 1.9x higher throughput
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
32
UPDATE MIDEA KARGUS OUTCOME Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization Detection engine GPU‐support for Aho‐Corasick GPU‐support for Aho‐Corasick & PCRE 65% faster detection rate Architecture Process‐based Thread‐based 1/6 GPU memory usage Batch processing Batching only for detection engine (GPU) Batching from packet acquisition to output 1.9x higher throughput Power‐efficient Always GPU (does not offload
is too small) Opportunistic offloading to GPUs (Ingress traffic rate) 15% power saving
* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011
33
34
0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x6d5a 0x56da 0x255b 0x0ec2 0x4167 0x253d 0x43a3 0x8fb0 0xd0ca 0x2bcb 0xae7b 0x30b4 0x77cb 0x2d3a 0x8030 0xf20c 0x6a42 0xb73b 0xbeac 0x01fa
35
Control Cache ALU ALU ALU ALU ALU ALU
36
5 10 15 20 25 30 35 40 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Throughput (Gbps) The number of packets in a batch (pkts/batch) GPU throughput (2B per entry) CPU throughput 2.15 Gbps
37
1 2 3 4 5 6 7 8 9 10 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 Throughput (Gbps) The number of packets in a batch (pkts/batch) GPU throughput CPU throughput 0.52 Gbps
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 64 128 256 512 1024 Innocent Traffic Malicious Traffic
38
Packet Size (Bytes) Performance Speedup 1518
39
2,000 4,000 6,000 8,000 10,000 12,000 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 Latency (msec) Packet Size (Bytes) GPU total latency CPU total latency GPU pattern matching latency CPU pattern matching latency
40
0.4 0.8 1.5 2.9 5.0 6.7 20 40 60 80 100 5 10 15 20 25 30 35 40 64 128 256 512 1024 1518 CPU Utilization (%) Receiving Throughput (Gbps) Packet Size (bytes) PCAP polling PCAP polling CPU %
41
0.4 0.8 1.5 2.9 5.0 6.7 20 40 60 80 100 5 10 15 20 25 30 35 40 64 128 256 512 1024 1518 CPU Utilization (%) Receiving Throughput (Gbps) Packet Size (bytes) PCAP polling PSIO PCAP polling CPU % PSIO CPU %