Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - - PowerPoint PPT Presentation

co evaluation of pattern matching algorithms on iot
SMART_READER_LITE
LIVE PREVIEW

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - - PowerPoint PPT Presentation

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindstrm Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems Motivation IoT security


slide-1
SLIDE 1

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs

Charalampos Stylianopoulos Simon Kindström Magnus Almgren Olaf Landsiedel Marina Papatriantafilou

Distributed Computing and Systems

slide-2
SLIDE 2

Motivation

2

¢ IoT security is a concern ¢ Recent attacks:

l Show that IoT security is lacking

  • Mirai botnet
  • Attacks on a casino’s aquarium

thermostat l Underline the need for

countermeasures

slide-3
SLIDE 3

Motivation

Standard security countermeasures (e.g. NIDS) can be applied

l on the IoT devices themselves l on the entry point to the network of IoT devices

3

slide-4
SLIDE 4

Motivation

¢ Challenges

l Resource constrained devices l More connected devices -> More traffic to inspect ¢ NIDS l Performance bottleneck l Not tailored to hardware

4

slide-5
SLIDE 5

5

… http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor…

Input Stream

… /etc/passwd admin.dll get.asp backdoor …

Pattern set Search for all patterns, anywhere in the network stream.

… http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor…

Pattern matching = The core functionality of NIDS Goal:

Compare all network traffic against all malicious signatures

Motivation: Pattern matching

more than 70% of running time [1]

[1] "Generating realistic workloads for network intrusion detection systems", Antonatos et al.

slide-6
SLIDE 6

Motivation: New Devices

¢ Opportunities l IoT/Embedded hardware is evolving l New hardware features

  • Example: ODROID single board computers with

embedded Graphic Processor Units (GPUs)

6

Making use of those features is an open issue

slide-7
SLIDE 7

¢ The questions we are trying to answer in this

work:

l Which algorithms to use? l What are the hardware characteristics that

affect the performance?

l How to create new algorithms that make

best use of those characteristics?

8

Our work

slide-8
SLIDE 8

Our work

¢ Co-evaluation of pattern matching algorithms

l Evaluate existing implementations l Influence the design of new ones

¢ Target embedded GPUs

l Deep look in their architectural features

¢ Extensive evaluation

l Different datasets, patterns, l Energy efficiency

9

slide-9
SLIDE 9

Outline

¢ Background l GPU computing ¢ Our Benchmark ¢ Evaluation

10

slide-10
SLIDE 10

Background

¢ General Purpose GPU computing (GPGPU) l Other than graphics, GPUs can be used for

general tasks as well

l Highly parallel architecture ¢ Pattern matching on a GPU: Not a new thing l Not much work on embedded GPUs

11 [1]"Gnort: High Performance Network Intrusion Detection Using Graphics Processors”, Vasiliadis et al., RAID 2008 [2]"APUNet: Revitalizing GPU as Packet Processing Accelerator”, Go et al, NSDI 2017 [3]"A highly-efficient memory-compression scheme for GPU-accelerated intrusion detection systems”, Bellekens et al. SINCONF 2017

slide-11
SLIDE 11

Background

¢ The platform

12 Source :Energy efficient run-time mapping and thread partitioning of concurrent OpenCL applications on CPU-GPU MPSoCs

slide-12
SLIDE 12

Background

Important characteristics

(unique to embedded GPUS)

¢ Small number of cores/threads ¢ No main memory on the GPU Ø Shared main memory between CPU and GPU ¢ No local memory on chip ¢ Vectorization in each GPU thread ¢ Separate instruction counter per GPU thread Ø No need to worry about divergent execution

13

slide-13
SLIDE 13

Outline

¢ Background ¢ Our Benchmark l Algorithms l Optimizations ¢ Evaluation

14

slide-14
SLIDE 14

Algorithms

Representative algorithms from two categories:

15

Aho Corasick DFC CPU GPU State machine based Filtering based

slide-15
SLIDE 15

Algorithms (CPU)

The Aho-Corasick algorithm

¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

  • Poor cache locality
  • Data dependencies

Limitations

  • Only one lookup per input byte

Benefits

¢ Aho Corasick ¢ DFC

16

slide-16
SLIDE 16

Algorithms (CPU)

The DFC algorithm

¢ Creates a filter from patterns ¢ Quickly filter outs parts of

the input

“DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

… a c t i v a t e a d m i n . d l l b a c k d o o r g e t . a s p …

Pattern set

… 1 1 1 1

Filter (8 KB)

ac ad ab ... ba bb ... ge ...

Fits in cache!

… t h i s i s a n i n p u t

Input Stream

¢ Aho Corasick ¢ DFC

17

slide-17
SLIDE 17

Algorithms (CPU)

¢ Aho Corasick ¢ DFC

¢ Progressive filtering

l in cache

¢ Verification

l in memory

Hash% tables Initial%filter

1B … … 223B 427B 82 B … … … … … … … …

Pattern length specific filters

… … … … … … … …

  • Verification phase is costly

Limitations

  • Cache locality (on filtering)
  • No data dependencies

Benefits

“DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

The DFC algorithm (continued)

18

slide-18
SLIDE 18

Algorithms

Representative algorithms from two categories:

19

Aho Corasick PFAC [1] DFC DFC (GPU) HYBRID CPU GPU

[1] “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs” Lin et al., TOC 2013

State machine based Filtering based

slide-19
SLIDE 19

Hardware-oriented optimizations

Relevant aspects that we investigate:

¢ Memory mapping vs data transfers l 2-5X faster with memory mapping ¢ Placement of the filters l Global memory l Texture memory l Local memory ¢ Vectorization l No significant speedup

20

More in the paper…

slide-20
SLIDE 20

Outline

¢ Background ¢ Our Benchmark ¢ Evaluation

21

slide-21
SLIDE 21

Evaluation Methodology

Hardware

22

CPU 4 ARM big.LITTLE GPU ARM Mali-T628 (6 shader cores) Memory 2GB RAM Sensors On board energy sensors

l 3 publicly available traffic traces l 1 randomly generated data set l 2183 patterns (from Snort)

Datasets Malicious Patterns

l 5000 patterns (emergingthreats.net)

slide-22
SLIDE 22

Evaluation Methodology

¢ Goal of the evaluation:

1.

How fast we can process the input (execution time)

2.

How much energy we spent for processing (energy consumption)

3.

Effect of datasets and number of patterns

4.

Influence the design of new algorithms

¢ Versions:

l Aho-Corasick l DFC l PFAC l DFC on GPU (w/wo vectorization) l HYBRID (w/wo vectorization)

CPU GPU 23

slide-23
SLIDE 23

Evaluation Results

¢ Experiment 1: execution time breakdown

24

(

Post-processing = Output which and how many patterns matched, on the CPU )

Post-processing

CPU->GPU CPU->GPU

CPU Versions GPU Versions

Vect

slide-24
SLIDE 24

Evaluation Results

¢ Experiment 2: energy consumption

25

slide-25
SLIDE 25

Evaluation Results

¢ Experiment 3: effect of datasets and #patterns

26

2183 patterns 5000 patterns

slide-26
SLIDE 26

Evaluation Results

¢ Experiment 4: configuring Hybrid

27

Bigger Filter = Slower access time (green trend, left y-axis) Higher hit ratio -> Less verification (red trend, right y-axis)

slide-27
SLIDE 27

Conclusions & Future Work

¢ Conclusions

l New hardware features (embedded GPUs) can alleviate the

bottleneck of pattern matching

l Architecture characteristics important for high performance and

low energy consumption

l Possible to design new algorithms tailored to the hardware

¢ Future Work

l Overlap CPU/GPU execution (heterogeneous design) l More algorithms and devices (e.g. Nvidia’s Jetson Nano) l Integrate with existing systems (e.g. Snort)

¢ Code available online 28

slide-28
SLIDE 28

¢ Backup Slides

29

slide-29
SLIDE 29

Background (1/3)

¢ Snort l The de-facto NIDS l Signature based (malicious signatures are

known in advance)

l The main pipeline looks like that

30

more than 70%

  • f running time

includes pattern matching

slide-30
SLIDE 30

Algorithms (CPU)

The Aho-Corasick algorithm

¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

31

  • Poor cache locality
  • Data dependencies

Limitations

  • Only one lookup per input byte

Benefits

¢ Aho Corasick ¢ DFC

slide-31
SLIDE 31

Related work

¢ State machine based l Aho Corasick

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

¢ Filter based l DFC

  • Poor cache locality
  • Data dependencies

Limitations

  • Only one lookup per input byte

Benefits

… a c t i v a t e a d m i n . d l l b a c k d o o r g e t . a s p …

Pattern set

Filter (8 KB)

ac ad ab ... ba bb ... ge ...

… 1 … 1 1 … 1 1 1 1

"DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

  • Much of the hardware remains

underutilized

Limitations

  • Cache locality (on filtering)
  • No data dependencies

Benefits

e.g. vector instructions?

32