eyeriss v2 a flexible accelerator for emerging deep
play

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37


  1. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37

  2. Energy Efficient Multimedia Systems Group ▪ Professor Vivienne Sze Systems Group | | Florian Mahlknecht 2020-04-31 2 / 37

  3. Design Principles Efficiency Latency Flexibility Systems Group | | Florian Mahlknecht 2020-04-31 3 / 37

  4. Motivation: Energy Efficiency ▪ 6 GB of data every 30 seconds ▪ Avg. 2.5kW (slide credits: Prof. Sze, see Wired 02.06.2018) Systems Group | | Florian Mahlknecht 2020-04-31 4 / 37

  5. Motivation: Latency < 100ms (Lin et al., 2018) Systems Group | | Florian Mahlknecht 2020-04-31 5 / 37

  6. Motivation: Edge Processing Systems Group | | Florian Mahlknecht 2020-04-31 6 / 37

  7. Motivation: Flexibility (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 7 / 37

  8. Recall core operations ▪ 4 Memory transfers, 2 FLOP ▪ Parallelizable! (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 8 / 37

  9. Architecture Overview ▪ ▪ Processing GPU / CPU Engines ▪ Vector / Thread ▪ Local memory ▪ Centralized ▪ Control logic control for ALUs ▪ data from memory (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 9 / 37

  10. Memory access cost ▪ DRAM access 20’000 x 8-bit addition (Hennessy, 2019) Systems Group | | Florian Mahlknecht 2020-04-31 10 / 37

  11. Memory access cost on Spatial Architecture (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 11 / 37

  12. Memory access speed (slide credits: Prof. Koumoutsakos ETH) Systems Group | | Florian Mahlknecht 2020-04-31 12 / 37

  13. Exploiting reuse opportunities ▪ Convolutional Reuse ▪ Filter Reuse ▪ Ifmap Reuse ? (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 13 / 37

  14. Row Stationary Dataflow Split into 1D CONV primitives : Map each primitive on 1 PE: ▪ ▪ 1 row of weights Row pairs remain stationary : ▪ psum and weights in local register ▪ 1 row of ifmap ▪ Sliding window (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 14 / 37

  15. Row Stationary 2D convolution filter(1,2,3) x filter(1,2,3) x filter (1,2,3) x input(1,2,3) input(2,3,4) input (3,4,5) ▪ ▪ Filter rows reused Ifmaps reused diagonally ofmap column 1 horizontally (1, row_size) ▪ Psums accumulated vertically (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 15 / 37

  16. Alex Net example mapping (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 16 / 37

  17. Row Stationary Dataflow (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 17 / 37

  18. Eyeriss v1 ▪ layer by layer (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 18 / 37

  19. Scalability Eyeriss v1 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 19 / 37

  20. Scalability Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 20 / 37

  21. Design Principles Eyeriss v2 Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 21 / 37

  22. Hierarchical Mesh Network ▪ ▪ Flat multicast network PEs and GLB grouped into clusters ▪ Hierarchical structure (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 22 / 37

  23. Why use a Mesh? (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 23 / 37

  24. Mesh operation modes • CONV layer • DW CONV layer • FC layer (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 24 / 37

  25. Eyeriss v2 Architecture (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 25 / 37

  26. Network for input activations (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 26 / 37

  27. Architecture Hierarchy (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 27 / 37

  28. Specification (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 28 / 37

  29. Exploit sparsity • process in CSC format • Latency gain (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 29 / 37

  30. Exploit sparsity (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 30 / 37

  31. Results for Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 31 / 37

  32. Comparison (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 32 / 37

  33. Conclusion ▪ > 10x improvement over v1 ▪ processing sparse weights and iacts in compressed domain ▪ flexibility from high bandwidth to high data reuse, for filter shape variety ▪ extend with cache Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 33 / 37

  34. Final comment (slide credits: Dr. Sergio Martin ETH) Systems Group | | Florian Mahlknecht 2020-04-31 34 / 37

  35. Final comment Systems Group | | Florian Mahlknecht 2020-04-31 35 / 37

  36. References and image credits 1. Hennessy, J. L. Computer architecture: a quantitative approach . (Morgan Kaufmann Publishers, 2019). 2. Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105 , 2295 – 2329 (2017). 3. Chen, Y.-H., Yang, T.-J., Emer, J. S. & Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE J. Emerg. Sel. Topics Circuits Syst. 9 , 292 – 308 (2019). 4. Chen, Y.-H., Krishna, T., Emer, J. S. & Sze, V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits 52 , 127 – 138 (2017). 5. Lin, S.-C. et al. The Architectural Implications of Autonomous Driving: Constraints and Acceleration. in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems 751 – 766 (ACM, 2018). doi:10.1145/3173162.3173191. Images: MIT EEMS Group (slide 2), Audi (slide 4), Nvidia (slide 5), WabisabiLearning (slide 6), iStock.com/VictoriaBar (slide 31) Online video talks: • slideslive.com • youtu.be/WbLQqPw_n88 Systems Group | | Florian Mahlknecht 2020-04-31 36 / 37

  37. Q & A Systems Group | | Florian Mahlknecht 2020-04-31 37 / 37

  38. Additional slides for interested readers Systems Group | | Florian Mahlknecht 2020-04-31 38 / 37

  39. Energy Efficiency (Strubell et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 39 / 37

  40. Motivation: Flexibility, Software Systems Group | | Florian Mahlknecht 2020-04-31 40 / 37

  41. Eyeriss v1 Implementation ▪ Customized Coffee Framework run on NVIDIA development board ▪ Xlinix serves as PCI controller (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 41 / 37

  42. Eyeriss v1 PE architecture (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 42 / 37

  43. Router implementation details Systems Group | | Florian Mahlknecht 2020-04-31 43 / 37

  44. Network for psums (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 44 / 37

  45. Network for weights (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 45 / 37

Recommend


More recommend