eyeriss v2 a flexible accelerator for emerging deep
play

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37


  1. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37

  2. Energy Efficient Multimedia Systems Group ▪ Professor Vivienne Sze Systems Group | | Florian Mahlknecht 2020-04-31 2 / 37

  3. Design Principles Efficiency Latency Flexibility Systems Group | | Florian Mahlknecht 2020-04-31 3 / 37

  4. Motivation: Energy Efficiency ▪ 6 GB of data every 30 seconds ▪ Avg. 2.5kW (slide credits: Prof. Sze, see Wired 02.06.2018) Systems Group | | Florian Mahlknecht 2020-04-31 4 / 37

  5. Motivation: Latency < 100ms (Lin et al., 2018) Systems Group | | Florian Mahlknecht 2020-04-31 5 / 37

  6. Motivation: Edge Processing Systems Group | | Florian Mahlknecht 2020-04-31 6 / 37

  7. Motivation: Flexibility (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 7 / 37

  8. Recall core operations ▪ 4 Memory transfers, 2 FLOP ▪ Parallelizable! (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 8 / 37

  9. Architecture Overview ▪ ▪ Processing GPU / CPU Engines ▪ Vector / Thread ▪ Local memory ▪ Centralized ▪ Control logic control for ALUs ▪ data from memory (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 9 / 37

  10. Memory access cost ▪ DRAM access 20’000 x 8-bit addition (Hennessy, 2019) Systems Group | | Florian Mahlknecht 2020-04-31 10 / 37

  11. Memory access cost on Spatial Architecture (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 11 / 37

  12. Memory access speed (slide credits: Prof. Koumoutsakos ETH) Systems Group | | Florian Mahlknecht 2020-04-31 12 / 37

  13. Exploiting reuse opportunities ▪ Convolutional Reuse ▪ Filter Reuse ▪ Ifmap Reuse ? (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 13 / 37

  14. Row Stationary Dataflow Split into 1D CONV primitives : Map each primitive on 1 PE: ▪ ▪ 1 row of weights Row pairs remain stationary : ▪ psum and weights in local register ▪ 1 row of ifmap ▪ Sliding window (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 14 / 37

  15. Row Stationary 2D convolution filter(1,2,3) x filter(1,2,3) x filter (1,2,3) x input(1,2,3) input(2,3,4) input (3,4,5) ▪ ▪ Filter rows reused Ifmaps reused diagonally ofmap column 1 horizontally (1, row_size) ▪ Psums accumulated vertically (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 15 / 37

  16. Alex Net example mapping (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 16 / 37

  17. Row Stationary Dataflow (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 17 / 37

  18. Eyeriss v1 ▪ layer by layer (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 18 / 37

  19. Scalability Eyeriss v1 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 19 / 37

  20. Scalability Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 20 / 37

  21. Design Principles Eyeriss v2 Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 21 / 37

  22. Hierarchical Mesh Network ▪ ▪ Flat multicast network PEs and GLB grouped into clusters ▪ Hierarchical structure (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 22 / 37

  23. Why use a Mesh? (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 23 / 37

  24. Mesh operation modes • CONV layer • DW CONV layer • FC layer (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 24 / 37

  25. Eyeriss v2 Architecture (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 25 / 37

  26. Network for input activations (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 26 / 37

  27. Architecture Hierarchy (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 27 / 37

  28. Specification (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 28 / 37

  29. Exploit sparsity • process in CSC format • Latency gain (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 29 / 37

  30. Exploit sparsity (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 30 / 37

  31. Results for Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 31 / 37

  32. Comparison (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 32 / 37

  33. Conclusion ▪ > 10x improvement over v1 ▪ processing sparse weights and iacts in compressed domain ▪ flexibility from high bandwidth to high data reuse, for filter shape variety ▪ extend with cache Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 33 / 37

  34. Final comment (slide credits: Dr. Sergio Martin ETH) Systems Group | | Florian Mahlknecht 2020-04-31 34 / 37

  35. Final comment Systems Group | | Florian Mahlknecht 2020-04-31 35 / 37

  36. References and image credits 1. Hennessy, J. L. Computer architecture: a quantitative approach . (Morgan Kaufmann Publishers, 2019). 2. Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105 , 2295 – 2329 (2017). 3. Chen, Y.-H., Yang, T.-J., Emer, J. S. & Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE J. Emerg. Sel. Topics Circuits Syst. 9 , 292 – 308 (2019). 4. Chen, Y.-H., Krishna, T., Emer, J. S. & Sze, V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits 52 , 127 – 138 (2017). 5. Lin, S.-C. et al. The Architectural Implications of Autonomous Driving: Constraints and Acceleration. in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems 751 – 766 (ACM, 2018). doi:10.1145/3173162.3173191. Images: MIT EEMS Group (slide 2), Audi (slide 4), Nvidia (slide 5), WabisabiLearning (slide 6), iStock.com/VictoriaBar (slide 31) Online video talks: • slideslive.com • youtu.be/WbLQqPw_n88 Systems Group | | Florian Mahlknecht 2020-04-31 36 / 37

  37. Q & A Systems Group | | Florian Mahlknecht 2020-04-31 37 / 37

  38. Additional slides for interested readers Systems Group | | Florian Mahlknecht 2020-04-31 38 / 37

  39. Energy Efficiency (Strubell et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 39 / 37

  40. Motivation: Flexibility, Software Systems Group | | Florian Mahlknecht 2020-04-31 40 / 37

  41. Eyeriss v1 Implementation ▪ Customized Coffee Framework run on NVIDIA development board ▪ Xlinix serves as PCI controller (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 41 / 37

  42. Eyeriss v1 PE architecture (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 42 / 37

  43. Router implementation details Systems Group | | Florian Mahlknecht 2020-04-31 43 / 37

  44. Network for psums (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 44 / 37

  45. Network for weights (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 45 / 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend