Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - PowerPoint PPT Presentation

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindström Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems

Motivation ¢ IoT security is a concern ¢ Recent attacks: l Show that IoT security is lacking • Mirai botnet • Attacks on a casino’s aquarium thermostat l Underline the need for countermeasures 2

Motivation Standard security countermeasures (e.g. NIDS) can be applied l on the IoT devices themselves l on the entry point to the network of IoT devices 3

Motivation ¢ Challenges l Resource constrained devices l More connected devices -> More traffic to inspect ¢ NIDS l Performance bottleneck l Not tailored to hardware 4

Motivation: Pattern matching Pattern matching = The core functionality of NIDS Goal: Compare all network Input Stream … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… traffic against all malicious signatures … /etc/passwd Search for all patterns, admin.dll anywhere in the network get.asp stream. backdoor … more than 70% of Pattern set running time [1] … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… 5 [1] "Generating realistic workloads for network intrusion detection systems", Antonatos et al.

Motivation: New Devices ¢ Opportunities l IoT/Embedded hardware is evolving l New hardware features • Example: ODROID single board computers with embedded Graphic Processor Units (GPUs) Making use of those features is an open issue 6

Our work ¢ The questions we are trying to answer in this work: l Which algorithms to use? l What are the hardware characteristics that affect the performance? l How to create new algorithms that make best use of those characteristics? 8

Our work ¢ Co-evaluation of pattern matching algorithms l Evaluate existing implementations l Influence the design of new ones ¢ Target embedded GPUs l Deep look in their architectural features ¢ Extensive evaluation l Different datasets, patterns, l Energy efficiency 9

Outline ¢ Background l GPU computing ¢ Our Benchmark ¢ Evaluation 10

Background ¢ General Purpose GPU computing (GPGPU) l Other than graphics, GPUs can be used for general tasks as well l Highly parallel architecture ¢ Pattern matching on a GPU: Not a new thing l Not much work on embedded GPUs [1]"Gnort: High Performance Network Intrusion Detection Using Graphics Processors”, Vasiliadis et al., RAID 2008 [2]"APUNet: Revitalizing GPU as Packet Processing Accelerator”, Go et al, NSDI 2017 [3]"A highly-efficient memory-compression scheme for GPU-accelerated intrusion detection 11 systems”, Bellekens et al. SINCONF 2017

Background ¢ The platform Source :Energy efficient run-time mapping and thread partitioning of concurrent 12 OpenCL applications on CPU-GPU MPSoCs

Background Important characteristics (unique to embedded GPUS) ¢ Small number of cores/threads ¢ No main memory on the GPU Ø Shared main memory between CPU and GPU ¢ No local memory on chip ¢ Vectorization in each GPU thread ¢ Separate instruction counter per GPU thread Ø No need to worry about divergent execution 13

Outline ¢ Background ¢ Our Benchmark l Algorithms l Optimizations ¢ Evaluation 14

Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU 15

¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies 16 “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm … a c t i v a t e ¢ Creates a filter from patterns a d m i n . d l l ¢ Quickly filter outs parts of b a c k d o o r the input g e t . a s p Pattern set … ... ba bb ... ab ac ad ... ge … 0 1 1 0 0 0 1 0 0 0 1 0 0 Filter (8 KB) Input Stream Fits in cache! … t h i s i s a n i n p u t 17 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm (continued) … … Initial%filter ¢ Progressive filtering 1B 223B 427B 82 B … … … … … … … … l in cache Pattern … … … … length specific … … filters … … ¢ Verification Hash% l in memory tables • Cache locality (on filtering) Benefits • No data dependencies Limitations • Verification phase is costly 18 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU DFC (GPU) PFAC [1] HYBRID 19 [1] “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs” Lin et al., TOC 2013

Hardware-oriented optimizations Relevant aspects that we investigate: ¢ Memory mapping vs data transfers l 2-5X faster with memory mapping ¢ Placement of the filters l Global memory l Texture memory l Local memory ¢ Vectorization l No significant speedup 20 More in the paper…

Outline ¢ Background ¢ Our Benchmark ¢ Evaluation 21

Evaluation Methodology CPU 4 ARM big.LITTLE GPU ARM Mali-T628 (6 shader cores) Hardware Memory 2GB RAM Sensors On board energy sensors l 3 publicly available traffic traces Datasets l 1 randomly generated data set l 2183 patterns (from Snort) Malicious l 5000 patterns ( emergingthreats.net) Patterns 22

Evaluation Methodology ¢ Goal of the evaluation: How fast we can process the input ( execution time ) 1. How much energy we spent for processing ( energy consumption ) 2. Effect of datasets and number of patterns 3. Influence the design of new algorithms 4. ¢ Versions: l Aho-Corasick CPU l DFC l PFAC l DFC on GPU (w/wo vectorization) GPU l HYBRID (w/wo vectorization) 23

Evaluation Results ¢ Experiment 1: execution time breakdown CPU->GPU CPU->GPU Post-processing Vect CPU Versions GPU Versions ( Post-processing = Output which and how many patterns matched, on the CPU ) 24

Evaluation Results ¢ Experiment 2: energy consumption 25

Evaluation Results ¢ Experiment 3: effect of datasets and #patterns 2183 patterns 5000 patterns 26

Evaluation Results ¢ Experiment 4: configuring Hybrid Slower access time (green trend, left y-axis) Bigger Filter = 27 Higher hit ratio -> Less verification (red trend, right y-axis)

Conclusions & Future Work ¢ Conclusions l New hardware features (embedded GPUs) can alleviate the bottleneck of pattern matching l Architecture characteristics important for high performance and low energy consumption l Possible to design new algorithms tailored to the hardware ¢ Future Work l Overlap CPU/GPU execution (heterogeneous design) l More algorithms and devices (e.g. Nvidia’s Jetson Nano) l Integrate with existing systems (e.g. Snort) ¢ Code available online 28

¢ Backup Slides 29

Background (1/3) ¢ Snort l The de-facto NIDS l Signature based (malicious signatures are known in advance) l The main pipeline looks like that more than 70% includes pattern of running time 30 matching

¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75 31

Related work ¢ State machine based ¢ Filter based l Aho Corasick l DFC … a c t i v a t e a d m i n . d l l b a c k d o o r g e t . a s p Pattern set … ba bb ... ... ab ac ad ... ge … … … … 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Filter (8 KB) Benefits Benefits • Cache locality (on filtering) • Only one lookup per input byte • No data dependencies Limitations • Poor cache locality Limitations • Much of the hardware remains underutilized • Data dependencies e.g. vector “Efficient String Matching: An Aid to Bibliographic "DFC: Accelerating String Pattern Matching for Network 32 instructions? Search”, A. Aho, M, Corasick, ACM Comm.’75 Applications”, Choi et al. NSDI’16

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - PowerPoint PPT Presentation

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindstrm Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems Motivation IoT security

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

LPEG: a new approach to pattern LPEG: a new approach to pattern matching in Lua matching in Lua

Pattern matching and lexing Informatics 2A: Lecture 6 John Longley School of Informatics

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

Pattern Matching a b a c a a b 1 a b a c a b 4 3 2 a b a c a b Pattern

Simpler and efficient LZW-compressed multiple pattern matching Pawe Gawrychowski July 4, 2012

Quantum pattern matching fast on average Ashley Montanaro Department of Computer Science,

Globbing, pattern matching Globbing is the term used for bashs form of pattern matching in

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

Concurrent Pattern Matching: combining discovery, privacy and symmetry using pattern matching

CS 126 Lecture T1: Pattern Matching Outline Introduction Pattern matching in Unix

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

Multicore Workshop NUMA Mark Bull David Henty EPCC, University of Edinburgh Distributed

Slide Set #15: Exploiting Memory Hierarchy 1 ADMIN Chapter 5 Reading 5.1, 5.3, 5.4 2

What to do when coalescing fails Virtual Memory and Demand Paging 5H. Memory Compaction

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to

CoSMIX: A Compiler-based System for Secure Memory Instrumentation and Execution in Enclaves Meni

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks Ruben Zazo,

Sambuz

Useful Links

Newsletter

Mail Us

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - PowerPoint PPT Presentation

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindstrm Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems Motivation IoT security

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

LPEG: a new approach to pattern LPEG: a new approach to pattern matching in Lua matching in Lua

Pattern matching and lexing Informatics 2A: Lecture 6 John Longley School of Informatics

IoT - Big Data &amp; Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

Pattern Matching a b a c a a b 1 a b a c a b 4 3 2 a b a c a b Pattern

Simpler and efficient LZW-compressed multiple pattern matching Pawe Gawrychowski July 4, 2012

Quantum pattern matching fast on average Ashley Montanaro Department of Computer Science,

Globbing, pattern matching Globbing is the term used for bashs form of pattern matching in

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

CSE182-L7 Dicitionary matching Pattern matching October 09 CSE182 Dictionary Matching

Concurrent Pattern Matching: combining discovery, privacy and symmetry using pattern matching

CS 126 Lecture T1: Pattern Matching Outline Introduction Pattern matching in Unix

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

Multicore Workshop NUMA Mark Bull David Henty EPCC, University of Edinburgh Distributed

Slide Set #15: Exploiting Memory Hierarchy 1 ADMIN Chapter 5 Reading 5.1, 5.3, 5.4 2

What to do when coalescing fails Virtual Memory and Demand Paging 5H. Memory Compaction

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Last Class: Paging &amp; Segmentation Paging: divide memory into fixed-sized pages, map to

CoSMIX: A Compiler-based System for Secure Memory Instrumentation and Execution in Enclaves Meni

Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks Ruben Zazo,

Sambuz

Useful Links

Newsletter

Mail Us

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to