generalized pattern matching
play

Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan - PowerPoint PPT Presentation

Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan Rasool , Dilip Vasudevan*, Andrew A. Chien* Argonne National Laboratory King Faisal University University of Chicago * Big Data Applications Deep Packet


  1. Generalized Pattern Matching Micro-Engine Yuanwei Fang*, Raihan Rasool ‡ , Dilip Vasudevan*, Andrew A. Chien* † Argonne National Laboratory † King Faisal University ‡ University of Chicago *

  2. Big Data Applications ◦ Deep Packet Inspection ◦ Bioinformatics (DNA Alignment) ◦ JSON/XML Parsing ◦ Signal Triggering 2 6/24/2014 UNIVERSITY OF CHICAGO

  3. Deep Packet Inspection High speed network : 100Gb/s Growing number of patterns : 6000 Snort Rules Speed requirement : > 75 Tera DFAops/s Power budget : 200 W Energy efficiency requirement : > 375Gops/J 3 6/24/2014 UNIVERSITY OF CHICAGO

  4. Bioinformatics (DNA Alignment) Genome size : Bioinformatics database: millions of species 130G base pairs Speed requirement : > 1 Tera DFAops/s Power budget : 200 W Energy efficiency requirement : > 5 Gops/J 4 6/24/2014 UNIVERSITY OF CHICAGO

  5. Deterministic Finite Automata (DFA) 5 6/24/2014 UNIVERSITY OF CHICAGO

  6. Programmable Approaches target Intel Xeon E5-2600: 17G DFAops/second with 130W, 0.13Gops/J ; 6 6/24/2014 UNIVERSITY OF CHICAGO

  7. Approach ◦ Workload M input characters(M DFA transitions) N DFA rules perform on the M input characters ◦ Goal Compute N x M transitions efficiently ◦ Approach Parallelize DFA execution Fused Instruction 7 6/24/2014 UNIVERSITY OF CHICAGO

  8. What Is Micro-Engine Generalized Pattern Matching Micro-Engine ( GenPM ) is one micro-engine of 10x10 approach Local Memory I-Cache I-Cache I-Cache I-Cache Basic Micro- Micro- Micro- RISC engine engine engine CPU 2 3 4 I-Cache I-Cache I-Cache I-Cache Micro- Micro- Micro- GenPM engine engine engine 7 8 6 Shared L1 Data Cache 8 6/24/2014 UNIVERSITY OF CHICAGO

  9. GenPM Micro Architecture 9 6/24/2014 UNIVERSITY OF CHICAGO

  10. Fused Instructions: Multi-Step String String a a b b c c 0 0 1 Acc_Vec Acc_Vec buffer buffer Current State Current State A A Q 1 Q 1 Accept Accept ALU ALU D D Q 4 Q 4 address address Local Mem Local Mem ENB ENB Next State Next State 10 6/24/2014 UNIVERSITY OF CHICAGO

  11. Fused Instructions: Multi-Step String a b c 0 1 Acc_Vec buffer Current State A Q 1 Accept ALU D Q 4 address Local Mem ENB Next State 11 6/24/2014 UNIVERSITY OF CHICAGO

  12. Fused Instructions: Multi-Step String String a a b b c c 0 0 1 1 0 Acc_Vec Acc_Vec buffer buffer Current State Current State A A Q 1 Q 1 Accept Accept ALU ALU D D Q 4 Q 4 address address Local Mem Local Mem ENB ENB Next State Next State 12 6/24/2014 UNIVERSITY OF CHICAGO

  13. Fused Instructions: Multi-Step String String a a b b c c 0 0 1 1 0 Acc_Vec Acc_Vec buffer buffer Acc_Vec Current State Current State A A Q 1 Q 1 Accept Accept ALU ALU D D Q 4 Q 4 address address Local Mem Local Mem CHECK ENB ENB Next State Next State 13 6/24/2014 UNIVERSITY OF CHICAGO

  14. Parallel DFA: Vector Instruction SSE ADD + + + + + + + 14 6/24/2014 UNIVERSITY OF CHICAGO

  15. Parallel DFA: Vector Instruction GMVSNEXT DFAop DFAop DFAop DFAop DFAop DFAop DFAop 15 6/24/2014 UNIVERSITY OF CHICAGO

  16. GenPM Code Example Data movement Multi-step parallel DFA execution Find precise matching position 16 6/24/2014 UNIVERSITY OF CHICAGO

  17. Methodology • Design space: Parallelism and step length • Baseline • 32-bit 6-stage in-order RISC • 4GB DDR3 DRAM • 32KB L1 I-cache, 24KB L1 D-cache, 512KB L2 (modeled on Intel Silverthorne) • GenPM • 1MB Local memory (up to 64 banks) • Vector and Fused Instructions • Performance/Power Model • Core : 32nm synthesis by Synopsys Processor Designer • Memories : MARSSX86/CACTI 6 + DRAMSim2 • Workload • 64 Snort rules from 2.9.5.6 snapshot, 10KB random network dump 17 6/24/2014 UNIVERSITY OF CHICAGO

  18. Performance Speedup 3000 GenPM_8way GenPM_64way 2498 speedup versus RISC 2500 1947 2000 1500 1000 289 300 500 243 36 0 1 8 16 step length 18 6/24/2014 UNIVERSITY OF CHICAGO

  19. Energy Efficiency GenPM_8way GenPM_64way energy improvement versus RISC 1200 980 861 1000 800 600 400 213 174 151 200 31 0 1 8 16 step length 19 6/24/2014 UNIVERSITY OF CHICAGO

  20. Throughput/watt (absolute) Throughput/watt 40 GenPM_8way GenPM_64way 35 Throughput per watt(Gops/J) 30 25 20 15 10 5 0 1 8 16 step length Scale to a 75W chip, GenPM delivers > 2.6 Tera DFAops/second 20 6/24/2014 UNIVERSITY OF CHICAGO

  21. Energy Breakdown 100% 90% 80% 70% total energy 60% 50% 40% LM_max = 30% 83% 20% 10% 0% RISC GenPM_8B_1S GenPM_8B_8S GenPM_8B_16S GenPM_64B_1S GenPM_64B_8S GenPM_64B_16S LM L1_I L1_D L2 DRAM Core 21 6/24/2014 UNIVERSITY OF CHICAGO

  22. General Comparison 22 6/24/2014 UNIVERSITY OF CHICAGO

  23. Related Work ASIC: [Brodie, et.al. ISCA 2006], [Titanic System RXP], [ Cisco SCE ] FPGA: [Yang Xu, et.al. ANCS 2011], [ T Song, et.al. INFOCOM 2008], [I Sourdis et.al. VLSI 2008] CPU: [Mytkowicz et.al. ASPLOS 2014 ] , [ Intel HyperScan] GPU: [Vasiliadis G, et.al. CCS 2011], [ Lin CH, et.al. INFOCOM 2012] SoC: [C Johnson et.al. ISSCC 2010 ], [ Cavium Octeon ], [ IBM PowerEN ] 23 6/24/2014 UNIVERSITY OF CHICAGO

  24. Summery • GenPM is a high performance and energy efficient accelerator for pattern matching workloads • ISA exploits parallelism and multi-step execution • Scale to a 75W chip, GenPM delivers > 2.6 Tera DFAops/second • GenPM approaches ASIC efficiency and integrates it into a programmable core 24 6/24/2014 UNIVERSITY OF CHICAGO

  25. Future Work • DFA table compression • Scale up with multiple GenPM micro-engines • Explore more applications 25 6/24/2014 UNIVERSITY OF CHICAGO

  26. Acknowledgements • Defense Advanced Research Projects Agency (DARPA) • Agilent Technologies (now Keysight Technologies) • Synopsys Academic program • Dr. Tung Hoang and members of the Large Scale Systems Group in the Department of Computer Science 26 6/24/2014 UNIVERSITY OF CHICAGO

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend