GoldenEye: stream-based network packet inspection using GPUs Qian - - PowerPoint PPT Presentation

goldeneye stream based network packet
SMART_READER_LITE
LIVE PREVIEW

GoldenEye: stream-based network packet inspection using GPUs Qian - - PowerPoint PPT Presentation

FERMILAB-SLIDES-18-122-CD GoldenEye: stream-based network packet inspection using GPUs Qian Gong, Wenji Wu, Phil DeMar The 43nd IEEE Conference on Local Computer Networks October 4, 2018 This manuscript has been authored by Fermi Research


slide-1
SLIDE 1

Qian Gong, Wenji Wu, Phil DeMar The 43nd IEEE Conference on Local Computer Networks October 4, 2018

GoldenEye: stream-based network packet inspection using GPUs

FERMILAB-SLIDES-18-122-CD This manuscript has been authored by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics.

slide-2
SLIDE 2
  • Motivation of GPU-based traffic analysis
  • Framework of GPU-based traffic analysis
  • Performance evaluation
  • Conclusion & future work

Outline

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 2

  • ---------------------------- CFermilab
slide-3
SLIDE 3
  • Network traffic analysis tools provides indispensable information for
  • Operation & management
  • Performance troubleshooting
  • Network security
  • Statistical purpose
  • Basic functions:
  • Profile traffic activities
  • Scan traffic content for suspicious patterns signatures

Network Traffic Analysis

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 3

Border Router

/ -

  • --·
  • ~ .._ - - ....._ . --...,
1.. ..... • - ·,

_,,

n tern

et -

'I)

4 --~

ii--~

__ ;••

  • ---,•,i-.-

✓·

: Porl-mkrored

r

  • ptical

, traffic tap :

~

: Traffic

  • L. ...

➔ 11n Analysis

~

System

..... ____ -·, ._ ,_ .~ .....

Internal Network

CFermilab

slide-4
SLIDE 4

Stateful packet processing

  • Track and maintain the states of network functions:
  • TCP connections
  • Sub-string matches in intrusion detection systems

Timely response

  • Fast and reliable network data processing at a link speed

Protect traffic integrity

  • Packet shouldn’t be lost in processing cycle

Task Overview

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 4

Challenges in data and state management

  • ---------------------------- CFermilab
slide-5
SLIDE 5

Challenges:

  • High-speed networks
  • 10/25/40GE-connected serves
  • 100GE backbone technologies are commonplace
  • Complex packet analysis algorithms
  • Algorithms are increasingly complex as security threats become more

sophisticated

  • Need a flexible and programmable computing platform

Data Management Challenges and Solutions

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 5

Millions of packets generated & transmitted per second

}

  • ---------------------------- CFermilab
slide-6
SLIDE 6

Solutions:

  • Heterogeneous data management
  • GPU-centric computing
  • GPU is specialized for data-parallel, large-throughput computations
  • Thousands of cores for massively parallelism
  • Tolerance of memory latency

Data Management Solutions

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 6

Features High compute power

Varies

X

High memory bandwidth

Varies

X

Easy programmability

X

✓ ✓

Data-parallel execution model

X

✓ ✓

  • ---------------------------- CFermilab
slide-7
SLIDE 7

Data processing flow:

  • CPU receives packets from NIC, parses headers and batches them in an input buffer.
  • When a specified batch size or a preset time limit is reached, the input buffer is

transferred to the GPU memory via PCIe.

  • A set of GPU kernels are then launched to perform tasks such as IP address matching,

cryptographic operations, and deep packet inspection.

  • The results are transferred back to the CPU memory to guide further actions.

Packet Processing on Heterogeneous Architectures

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 7

NIC CPU memory GPU memory

packets

… Stage packets back-to-back PCIe bus

Packet batching can be a feasible way to improve GPU utilization, but it increases difficulties in stateful packet processing…

DI-­

□ I -------

  • ------------------- CFermilab
slide-8
SLIDE 8

Challenge 1: flow management & stream reassembly

  • Stateful network functions must both track the states of network connections and

scan network packets at a per-flow level.

  • Flow state management and stream-reassembly require state synchronization

when dealing with packets from the same connection.

  • Limited data parallelism when less simultaneous TCP connections are present.

State Management Challenges

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 8

Conventional hash-based approach requires atomic locks with packets from the same TCP flow connection, and is prone to ambiguity caused by hash-key collision.

: pre-received packets

CFermilab

slide-9
SLIDE 9

Challenge 2: Inter-batch state connection

  • Stateful packet inspection must detect signatures that straddle packet boundaries.
  • GPU’s batch-processing mechanism requires maintaining connection states and

tracking potential sub-matches across input batches.

State Management Challenges

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 9

1.1 1.4 1.3 sequence gap

  • ut-of-order packet arrives in subsequent batch

cross-batch pattern matching malicious patterns Packet batch 1 … Packet batch 2 1.2 1.5 1.6 1.7 1.8 … … Arriving time 2.1 2.2 3.1 3.2 intra-batch stream reassembly

Stateful packet inspection must detect and memorize the sub-matches across input batches.

l□

r : ~ : J □D

D

7 I I .

I 9

D ODD II

  • ------------------- CFermilab
slide-10
SLIDE 10
  • Parallel flow management and stream processing via GPU sort and prefix-scan
  • Sort and prefix-scan are extremely fast on GPU (over ten billions of elements/sec).
  • Inter-batch network function state connection
  • Developed a buffer-free, cross-packet/batch pattern matching algorithm.
  • Combine the state and context information with packets in subsequent batches.

State Management Solutions

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 10

States kth batch processing k-1th batch processing GPU primitive libraries Flow state tracking Payload reassembly GPU packet analysis modules Per-flow state TCP data streams Allow on-line packets to come through, but retain and update the state information.

,--------,1

___ [.------

.____________.I

'-----I

__

  • ----------------------- CFermilab
slide-11
SLIDE 11

GoldenEye Network Traffic Analysis Framework

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 11

GoldenEye Modules

  • Packet capture & pre-processing
  • BPF filter
  • Stream processor
  • Traffic statistic summary
  • Deep packet inspection
  • State buffer

Network BPF Filter Packet Capture Engine Traffic Statistic Analysis Stream Processor DPI (string/regex match) State Storage IPS/IDS Traffic monitoring Traffic Engineering

. . .

GPU Domain Network Traffic Source External Applications

~------------------------------7

I I I I I I I I I I

:

~--------''

I I

~-------------------------------' '----------------

CFermilab

slide-12
SLIDE 12

Logics:

  • Multithreading packet capturing and pre-processing
  • Queue packets for batch processing
  • Dual-buffer for concurrent data transfer and GPU computing

Packet Capture and Processing on Multicore Systems

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 12

Traffic Steering

RQ 1 RQ 2 RQ n

capture engine capture engine capture engine Network Traffic core 1 core 2 core n

Multi-core host system Multi-queue NIC pre- proc pre- proc pre- proc Packet batches at host

… … …

packet buffer B packet buffer A

GPU Processing GPU System External Applications core n+1 core n+1 Analysis results

~ I

I

:

,..____, I

________ J________________ :

  • =~~.___

____ ____.

_

__!Qj

~

DI

I I

~□

~=~

H

__ gggg1

,,.,

,,

..,_J

1-

\ I

,_/

  • ---------------------------- CFermilab
slide-13
SLIDE 13

Tasks:

  • Monitor the states of TCP connections.
  • Reassemble TCP packets into bi-directional byte-streams.

Implementations:

  • Stream reassembly: sort packets into streams by their TCP 4-tuples and sequences.
  • Flow state tracking: compare the stream states against existing connections.
  • Stream normalization: rescan flow-reassembled packets and remove retransmission.

GPU-centric Stream Processor

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 13

packet in a batch

stream reassembly

I

ITD [TI]

[TI] ~ [IT] [TI] [IT]~ ·.. I

stream normalization flow reassembled I

OJ]!

....... !

[TI] ~ [IT] [TI] [IT]~ ·

.. I

packets

I

I L...a=J L...;...J

◊ ◊

update flow state records

C

Hash Table of Flow records :=>

I 1

: retransmission:

'-----------------

I

  • ---------------------------- CFermilab
slide-14
SLIDE 14

Main strategy:

  • Similar to the TCP flow management function, GoldenEye’s statistical aggregation module

is built with a set of primitive GPU sort and prefix-scan operations.

Example use cases:

  • Host traffic monitoring
  • Heavy-hitter Detection

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 14

Src IP Src Port Dest IP Dest Port Proto Pkt Sent Byte Sent 131.2.3.0 80 10.1.2.4 998 TCP 32 16484 10.1.2.4 998 131.2.3.0 80 TCP 121 179841

Traffic Statistical Analysis

DoS detection Capacity Planning

slide-15
SLIDE 15

Tasks:

  • Intra-batch pattern matching:
  • Perform pattern matching over stream-reassembled packets in the same

batches.

  • Inter-batch pattern matching:
  • Detect and reconstruct signature patterns that straddle batch boundaries.

Stream-based Deep Packet Inspection

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 15

  • ---------------------------- CFermilab
slide-16
SLIDE 16

General pattern matching algorithm: hybrid-FA1

  • HFA compresses the states of DFA by keeping any subset whose expansion

would cause state explosion in an NFA form.

Implementation of intra-batch pattern matching:

  • Each GPU thread takes one packet.
  • Perform cross-packet detection at a per-flow basis.

Intra-batch Signature Matching

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 16

[1] Michela Becchi, Patrick Crowley, “A hybrid finite automaton for practical deep packet inspection”, 2007 ACM CoNEXT conference.

retains the representation in an NFA form

CFermilab

slide-17
SLIDE 17
  • In-sequence pattern matching
  • Matching process of subsequent stream fragments will continue from the last FA-

states of the previous fragments.

  • Out-of-order pattern matching
  • Look for regex-suffixes in out-of-order streams.
  • Recover the string of previous matches and concatenate it to stream fragments that

arrive latter to fill the hole.

Inter-batch Signature Matching

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 17

Arriving time

Out-of-order pattern matching

Detected suffix-string Recover and concatenate the suffix-string to the end

  • f subsequent data streams in subsequent batches

In-sequence pattern matching Ar~ivinglr------- o

HFA state

time

~

Scan continue from the last-FA states r - -I Received packets

H

Newly arrived packets

c::::::J Signature pattern

r

l===:r-------l

1-----c:::::r..

_-_-_-_-_-J

CFermilab

slide-18
SLIDE 18

Suffix-regex detection: search streams with all potential initial states

  • Parallel among both out-of-order stream fragments and the possible initial states
  • f a search

Suffix-string reconstruction:

  • Retain the first and the last FA-states of any partial-matches
  • Reconstruct sub-regexes by relating states to their “depths” in the original regexes

Out-of-order Pattern Matching

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 18

Example: search suffix-regexes in the stream of “cdegggh”

.ft. Fermilab

'V'

slide-19
SLIDE 19

Logics:

  • Speed up DPI with string-based filters
  • Reduce memory consumptions by breaking complex regexes into chained pieces

Strategy:

  • Convert a regex into one of the three forms: <str><regex>, <regex><str>, or

<regex><str><regex>.

  • Process packet streams first through the string-filter.
  • The regex engine will not be triggered until the string guarded by its side

happens.

Optimization w/ Chained Expression

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 19

  • ---------------------------- CFermilab
slide-20
SLIDE 20

Testbed systems:

  • Dual Intel E5-2650 v4 CPU (12 cores per socket)
  • NVIDIA K40 GPU

Traffic trace:

  • Traffic source
  • Intrusion detection dataset created by Canadian Institute of Cybersecurity (CICIDS)
  • Science data flow mirrored from the border router at Fermilab (Fermilab)
  • Traffic pattern

Regex datasets:

  • 104 and 192 spyware and malware signatures snapshot from Snort 2.9.7.2

Performance Evaluation

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 20

Trace size # of packets # of TCP connections Mean packet-size Fermilab 9.8 GB 8.7⨉106 54.78⨉103 1118-byte CICIDS 17.1 GB 44.34⨉106 1.10⨉106 386-byte

  • ---------------------------- CFermilab
slide-21
SLIDE 21

Memory footprint

  • Stream reassembly
  • Chained regex FAs

Performance Evaluation

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 21

Regex Chained Regex DFA HFA string-filter regex engine malware 25.62 MB 5510 states 5.46 MB exploded N/A ~2⨉106 states ~2 GB spyware 26.64 MB 3637 states 3.74 MB ~7⨉106 states ~7 GB ~1⨉106 states ~1 GB Traffic Trace Default Buffer & reassembly GoldenEye Fermilab 307.81 MB 44.20 MB CICIDS 186.32 MB 48.05 MB

  • ---------------------------- CFermilab
slide-22
SLIDE 22

Flow tracking & TCP reassembly

  • Performance with real traffic
  • Scalability to the number of concurrent TCP connections

Performance Evaluation

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 22

Fermilab CICIDS GoldenEye wo/ PCIe transfer 623.30 Gbit/s 335.73 Gbit/s GoldenEye w/ PCIe transfer 552.30 Gbit/s 232.487 Gbit/s Libnids1 (12 CPU cores) 186.65 Gbit/s 31.102 Gbit/s

[1] R. Wojtczuk, “Libnids. http://libnids. sourceforge. net.”

110~--~--~---~--~---~---~--~~

100

i 90

a.

  • e. 80
"O C:

8 70

3l 60

ci>
  • a. 50

~ -a 40

~

30

C:

~ ~ 20 10 5 25 50 75 100 125 150 Concurrent TCP connections (1000 of connections)

CFermilab

slide-23
SLIDE 23

Stream-based regex matching Consolidate DPI application

Performance Evaluation

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 23

Regex Fermilab CICIDS malware 105.55 Gbit/s 62.13 Gbit/s spyware 108.69 Gbit/s 64.95 Gbit/s

Hyperscan: “a high-performance multiple regex matching library” https://01.org/hyperscan

~100

:0 ~

:5

a.

.c

O> ::,

e

i= 50

malware spyware malware spyware (Fermilab) (CICIDS2012)

CFermilab

slide-24
SLIDE 24

GoldenEye:

  • Provides a fast, memory efficient packet processing framework for GPU

platforms, capable of statistical and stream-based payload analysis.

  • Reassemble TCP streams in GPU and match signature patterns across packets,

without requiring system to buffer and rescan packets or limit scanning to a fixed window of historical data.

Future Directions:

  • Continue to add new features to support ever complex network tasks.
  • Combine our packet processing functions with advanced learning algorithms to

build behavior-based network automate detection.

Conclusion & Future Works

4/18/2019 GoldenEye: stream-based network packet inspection using GPUs, IEEE LCN 2018 24

  • ---------------------------- CFermilab