co evaluation of pattern matching algorithms on iot
play

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with - PowerPoint PPT Presentation

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindstrm Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems Motivation IoT security


  1. Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindström Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems

  2. Motivation ¢ IoT security is a concern ¢ Recent attacks: l Show that IoT security is lacking • Mirai botnet • Attacks on a casino’s aquarium thermostat l Underline the need for countermeasures 2

  3. Motivation Standard security countermeasures (e.g. NIDS) can be applied l on the IoT devices themselves l on the entry point to the network of IoT devices 3

  4. Motivation ¢ Challenges l Resource constrained devices l More connected devices -> More traffic to inspect ¢ NIDS l Performance bottleneck l Not tailored to hardware 4

  5. Motivation: Pattern matching Pattern matching = The core functionality of NIDS Goal: Compare all network Input Stream … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… traffic against all malicious signatures … /etc/passwd Search for all patterns, admin.dll anywhere in the network get.asp stream. backdoor … more than 70% of Pattern set running time [1] … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… 5 [1] "Generating realistic workloads for network intrusion detection systems", Antonatos et al.

  6. Motivation: New Devices ¢ Opportunities l IoT/Embedded hardware is evolving l New hardware features • Example: ODROID single board computers with embedded Graphic Processor Units (GPUs) Making use of those features is an open issue 6

  7. Our work ¢ The questions we are trying to answer in this work: l Which algorithms to use? l What are the hardware characteristics that affect the performance? l How to create new algorithms that make best use of those characteristics? 8

  8. Our work ¢ Co-evaluation of pattern matching algorithms l Evaluate existing implementations l Influence the design of new ones ¢ Target embedded GPUs l Deep look in their architectural features ¢ Extensive evaluation l Different datasets, patterns, l Energy efficiency 9

  9. Outline ¢ Background l GPU computing ¢ Our Benchmark ¢ Evaluation 10

  10. Background ¢ General Purpose GPU computing (GPGPU) l Other than graphics, GPUs can be used for general tasks as well l Highly parallel architecture ¢ Pattern matching on a GPU: Not a new thing l Not much work on embedded GPUs [1]"Gnort: High Performance Network Intrusion Detection Using Graphics Processors”, Vasiliadis et al., RAID 2008 [2]"APUNet: Revitalizing GPU as Packet Processing Accelerator”, Go et al, NSDI 2017 [3]"A highly-efficient memory-compression scheme for GPU-accelerated intrusion detection 11 systems”, Bellekens et al. SINCONF 2017

  11. Background ¢ The platform Source :Energy efficient run-time mapping and thread partitioning of concurrent 12 OpenCL applications on CPU-GPU MPSoCs

  12. Background Important characteristics (unique to embedded GPUS) ¢ Small number of cores/threads ¢ No main memory on the GPU Ø Shared main memory between CPU and GPU ¢ No local memory on chip ¢ Vectorization in each GPU thread ¢ Separate instruction counter per GPU thread Ø No need to worry about divergent execution 13

  13. Outline ¢ Background ¢ Our Benchmark l Algorithms l Optimizations ¢ Evaluation 14

  14. Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU 15

  15. ¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies 16 “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

  16. ¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm … a c t i v a t e ¢ Creates a filter from patterns a d m i n . d l l ¢ Quickly filter outs parts of b a c k d o o r the input g e t . a s p Pattern set … ... ba bb ... ab ac ad ... ge … 0 1 1 0 0 0 1 0 0 0 1 0 0 Filter (8 KB) Input Stream Fits in cache! … t h i s i s a n i n p u t 17 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

  17. ¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm (continued) … … Initial%filter ¢ Progressive filtering 1B 223B 427B 82 B … … … … … … … … l in cache Pattern … … … … length specific … … filters … … ¢ Verification Hash% l in memory tables • Cache locality (on filtering) Benefits • No data dependencies Limitations • Verification phase is costly 18 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

  18. Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU DFC (GPU) PFAC [1] HYBRID 19 [1] “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs” Lin et al., TOC 2013

  19. Hardware-oriented optimizations Relevant aspects that we investigate: ¢ Memory mapping vs data transfers l 2-5X faster with memory mapping ¢ Placement of the filters l Global memory l Texture memory l Local memory ¢ Vectorization l No significant speedup 20 More in the paper…

  20. Outline ¢ Background ¢ Our Benchmark ¢ Evaluation 21

  21. Evaluation Methodology CPU 4 ARM big.LITTLE GPU ARM Mali-T628 (6 shader cores) Hardware Memory 2GB RAM Sensors On board energy sensors l 3 publicly available traffic traces Datasets l 1 randomly generated data set l 2183 patterns (from Snort) Malicious l 5000 patterns ( emergingthreats.net) Patterns 22

  22. Evaluation Methodology ¢ Goal of the evaluation: How fast we can process the input ( execution time ) 1. How much energy we spent for processing ( energy consumption ) 2. Effect of datasets and number of patterns 3. Influence the design of new algorithms 4. ¢ Versions: l Aho-Corasick CPU l DFC l PFAC l DFC on GPU (w/wo vectorization) GPU l HYBRID (w/wo vectorization) 23

  23. Evaluation Results ¢ Experiment 1: execution time breakdown CPU->GPU CPU->GPU Post-processing Vect CPU Versions GPU Versions ( Post-processing = Output which and how many patterns matched, on the CPU ) 24

  24. Evaluation Results ¢ Experiment 2: energy consumption 25

  25. Evaluation Results ¢ Experiment 3: effect of datasets and #patterns 2183 patterns 5000 patterns 26

  26. Evaluation Results ¢ Experiment 4: configuring Hybrid Slower access time (green trend, left y-axis) Bigger Filter = 27 Higher hit ratio -> Less verification (red trend, right y-axis)

  27. Conclusions & Future Work ¢ Conclusions l New hardware features (embedded GPUs) can alleviate the bottleneck of pattern matching l Architecture characteristics important for high performance and low energy consumption l Possible to design new algorithms tailored to the hardware ¢ Future Work l Overlap CPU/GPU execution (heterogeneous design) l More algorithms and devices (e.g. Nvidia’s Jetson Nano) l Integrate with existing systems (e.g. Snort) ¢ Code available online 28

  28. ¢ Backup Slides 29

  29. Background (1/3) ¢ Snort l The de-facto NIDS l Signature based (malicious signatures are known in advance) l The main pipeline looks like that more than 70% includes pattern of running time 30 matching

  30. ¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75 31

  31. Related work ¢ State machine based ¢ Filter based l Aho Corasick l DFC … a c t i v a t e a d m i n . d l l b a c k d o o r g e t . a s p Pattern set … ba bb ... ... ab ac ad ... ge … … … … 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Filter (8 KB) Benefits Benefits • Cache locality (on filtering) • Only one lookup per input byte • No data dependencies Limitations • Poor cache locality Limitations • Much of the hardware remains underutilized • Data dependencies e.g. vector “Efficient String Matching: An Aid to Bibliographic "DFC: Accelerating String Pattern Matching for Network 32 instructions? Search”, A. Aho, M, Corasick, ACM Comm.’75 Applications”, Choi et al. NSDI’16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend