Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37

Energy Efficient Multimedia Systems Group ▪ Professor Vivienne Sze Systems Group | | Florian Mahlknecht 2020-04-31 2 / 37

Design Principles Efficiency Latency Flexibility Systems Group | | Florian Mahlknecht 2020-04-31 3 / 37

Motivation: Energy Efficiency ▪ 6 GB of data every 30 seconds ▪ Avg. 2.5kW (slide credits: Prof. Sze, see Wired 02.06.2018) Systems Group | | Florian Mahlknecht 2020-04-31 4 / 37

Motivation: Latency < 100ms (Lin et al., 2018) Systems Group | | Florian Mahlknecht 2020-04-31 5 / 37

Motivation: Edge Processing Systems Group | | Florian Mahlknecht 2020-04-31 6 / 37

Motivation: Flexibility (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 7 / 37

Recall core operations ▪ 4 Memory transfers, 2 FLOP ▪ Parallelizable! (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 8 / 37

Architecture Overview ▪ ▪ Processing GPU / CPU Engines ▪ Vector / Thread ▪ Local memory ▪ Centralized ▪ Control logic control for ALUs ▪ data from memory (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 9 / 37

Memory access cost ▪ DRAM access 20’000 x 8-bit addition (Hennessy, 2019) Systems Group | | Florian Mahlknecht 2020-04-31 10 / 37

Memory access cost on Spatial Architecture (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 11 / 37

Memory access speed (slide credits: Prof. Koumoutsakos ETH) Systems Group | | Florian Mahlknecht 2020-04-31 12 / 37

Exploiting reuse opportunities ▪ Convolutional Reuse ▪ Filter Reuse ▪ Ifmap Reuse ? (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 13 / 37

Row Stationary Dataflow Split into 1D CONV primitives : Map each primitive on 1 PE: ▪ ▪ 1 row of weights Row pairs remain stationary : ▪ psum and weights in local register ▪ 1 row of ifmap ▪ Sliding window (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 14 / 37

Row Stationary 2D convolution filter(1,2,3) x filter(1,2,3) x filter (1,2,3) x input(1,2,3) input(2,3,4) input (3,4,5) ▪ ▪ Filter rows reused Ifmaps reused diagonally ofmap column 1 horizontally (1, row_size) ▪ Psums accumulated vertically (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 15 / 37

Alex Net example mapping (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 16 / 37

Row Stationary Dataflow (Sze et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 17 / 37

Eyeriss v1 ▪ layer by layer (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 18 / 37

Scalability Eyeriss v1 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 19 / 37

Scalability Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 20 / 37

Design Principles Eyeriss v2 Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 21 / 37

Hierarchical Mesh Network ▪ ▪ Flat multicast network PEs and GLB grouped into clusters ▪ Hierarchical structure (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 22 / 37

Why use a Mesh? (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 23 / 37

Mesh operation modes • CONV layer • DW CONV layer • FC layer (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 24 / 37

Eyeriss v2 Architecture (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 25 / 37

Network for input activations (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 26 / 37

Architecture Hierarchy (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 27 / 37

Specification (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 28 / 37

Exploit sparsity • process in CSC format • Latency gain (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 29 / 37

Exploit sparsity (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 30 / 37

Results for Eyeriss v2 (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 31 / 37

Comparison (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 32 / 37

Conclusion ▪ > 10x improvement over v1 ▪ processing sparse weights and iacts in compressed domain ▪ flexibility from high bandwidth to high data reuse, for filter shape variety ▪ extend with cache Efficiency Latency Flexibility Scalability Systems Group | | Florian Mahlknecht 2020-04-31 33 / 37

Final comment (slide credits: Dr. Sergio Martin ETH) Systems Group | | Florian Mahlknecht 2020-04-31 34 / 37

Final comment Systems Group | | Florian Mahlknecht 2020-04-31 35 / 37

References and image credits 1. Hennessy, J. L. Computer architecture: a quantitative approach . (Morgan Kaufmann Publishers, 2019). 2. Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105 , 2295 – 2329 (2017). 3. Chen, Y.-H., Yang, T.-J., Emer, J. S. & Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE J. Emerg. Sel. Topics Circuits Syst. 9 , 292 – 308 (2019). 4. Chen, Y.-H., Krishna, T., Emer, J. S. & Sze, V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits 52 , 127 – 138 (2017). 5. Lin, S.-C. et al. The Architectural Implications of Autonomous Driving: Constraints and Acceleration. in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems 751 – 766 (ACM, 2018). doi:10.1145/3173162.3173191. Images: MIT EEMS Group (slide 2), Audi (slide 4), Nvidia (slide 5), WabisabiLearning (slide 6), iStock.com/VictoriaBar (slide 31) Online video talks: • slideslive.com • youtu.be/WbLQqPw_n88 Systems Group | | Florian Mahlknecht 2020-04-31 36 / 37

Q & A Systems Group | | Florian Mahlknecht 2020-04-31 37 / 37

Additional slides for interested readers Systems Group | | Florian Mahlknecht 2020-04-31 38 / 37

Energy Efficiency (Strubell et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 39 / 37

Motivation: Flexibility, Software Systems Group | | Florian Mahlknecht 2020-04-31 40 / 37

Eyeriss v1 Implementation ▪ Customized Coffee Framework run on NVIDIA development board ▪ Xlinix serves as PCI controller (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 41 / 37

Eyeriss v1 PE architecture (Chen et al., 2017) Systems Group | | Florian Mahlknecht 2020-04-31 42 / 37

Router implementation details Systems Group | | Florian Mahlknecht 2020-04-31 43 / 37

Network for psums (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 44 / 37

Network for weights (Chen et al., 2019) Systems Group | | Florian Mahlknecht 2020-04-31 45 / 37

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Challenges in Accelerator Applications Shukui Zhang Thomas Jefferson National Accelerator

FOA Landscape Manouchehr Farkhondeh DOE Office of Nuclear Physics EIC Accelerator Collaboration

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Eric Prebys FNAL Accelerator Physics Center 8/17/10 Im the head of the US LHC Accelerator

US LHC Accelerator Research Program HL-LHC BNL - FNAL- LBNL - SLAC LARP Accelerator Systems 17

All-flexible and hybrid solutions for Ultra Deep Water Simplified presentation SUT July 2006

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

r Prst

Deep Neural Network Enhanced VSLAM Landmark Selection Dr. Patrick Benavidez University of Texas

Senior Project I XiaoEx - The Exchange Expert By Kasperi Reinikainen5818014 Hein Htet Naing

Make Digital Mak al Real al | | Execute Smar art t The Digital Five Forces and Composite

Artificial neural network for image classification Author: Sten Sootla Mentor: Tambet Matiisen

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Global Prevention of Neural Tube Defects Accessible Version: https://youtu.be/OezK3hpF-cA October

The European Surveillance of Antim icrobial Consum ption Project Arno Muller ESAC History

Sambuz

Useful Links

Newsletter

Mail Us

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37

The The Beverly Beverly Middle Middle School School Flexible Flexible Learning Learning

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

Personalized Learning Flexible Seating and Space Flexible Seating and Space Flexible Seating and

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&amp;D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&amp;D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Flexible Instruction Day Parent Presentation Flexible Instruction Day March 16 - 20 - Flexible

Flexible Infrastructure Qualification What Is Flexible Infrastructure/Benefits Flexible

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Challenges in Accelerator Applications Shukui Zhang Thomas Jefferson National Accelerator

FOA Landscape Manouchehr Farkhondeh DOE Office of Nuclear Physics EIC Accelerator Collaboration

KEK, High Energy Accelerator Research Organization KEK High Energy Accelerator Research

Eric Prebys FNAL Accelerator Physics Center 8/17/10 Im the head of the US LHC Accelerator

US LHC Accelerator Research Program HL-LHC BNL - FNAL- LBNL - SLAC LARP Accelerator Systems 17

All-flexible and hybrid solutions for Ultra Deep Water Simplified presentation SUT July 2006

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

r Prst

Deep Neural Network Enhanced VSLAM Landmark Selection Dr. Patrick Benavidez University of Texas

Senior Project I XiaoEx - The Exchange Expert By Kasperi Reinikainen5818014 Hein Htet Naing

Make Digital Mak al Real al | | Execute Smar art t The Digital Five Forces and Composite

Artificial neural network for image classification Author: Sten Sootla Mentor: Tambet Matiisen

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Global Prevention of Neural Tube Defects Accessible Version: https://youtu.be/OezK3hpF-cA October

The European Surveillance of Antim icrobial Consum ption Project Arno Muller ESAC History

Sambuz

Useful Links

Newsletter

Mail Us

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional