Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - - PowerPoint PPT Presentation

eyeriss v2 a flexible accelerator for emerging deep
SMART_READER_LITE
LIVE PREVIEW

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks - - PowerPoint PPT Presentation

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht Systems Group | | Florian Mahlknecht 2020-04-31 1 / 37


slide-1
SLIDE 1 | | Systems Group Florian Mahlknecht 1 / 37 2020-04-31

Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer, Vivienne Sze Presented by: Florian Mahlknecht

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

slide-2
SLIDE 2 | | Systems Group Florian Mahlknecht 2 / 37 2020-04-31

▪ Professor Vivienne Sze

Energy Efficient Multimedia Systems Group

slide-3
SLIDE 3 | | Systems Group Florian Mahlknecht 3 / 37 2020-04-31

Design Principles

Efficiency Latency Flexibility

slide-4
SLIDE 4 | | Systems Group Florian Mahlknecht 4 / 37 2020-04-31

▪ 6 GB of data every 30 seconds ▪

  • Avg. 2.5kW

Motivation: Energy Efficiency

(slide credits: Prof. Sze, see Wired 02.06.2018)
slide-5
SLIDE 5 | | Systems Group Florian Mahlknecht 5 / 37 2020-04-31

Motivation: Latency

(Lin et al., 2018)

< 100ms

slide-6
SLIDE 6 | | Systems Group Florian Mahlknecht 6 / 37 2020-04-31

Motivation: Edge Processing

slide-7
SLIDE 7 | | Systems Group Florian Mahlknecht 7 / 37 2020-04-31

Motivation: Flexibility

(Sze et al., 2017)
slide-8
SLIDE 8 | | Systems Group Florian Mahlknecht 8 / 37 2020-04-31

Recall core operations

(Sze et al., 2017)

▪ 4 Memory transfers, 2 FLOP ▪ Parallelizable!

slide-9
SLIDE 9 | | Systems Group Florian Mahlknecht 9 / 37 2020-04-31

Architecture Overview

▪ GPU / CPU ▪ Vector / Thread ▪ Centralized control for ALUs ▪ data from memory ▪ Processing Engines ▪ Local memory ▪ Control logic

(Sze et al., 2017)
slide-10
SLIDE 10 | | Systems Group Florian Mahlknecht 10 / 37 2020-04-31

Memory access cost

(Hennessy, 2019)

▪ DRAM access 20’000 x 8-bit addition

slide-11
SLIDE 11 | | Systems Group Florian Mahlknecht 11 / 37 2020-04-31

Memory access cost on Spatial Architecture

(Sze et al., 2017)
slide-12
SLIDE 12 | | Systems Group Florian Mahlknecht 12 / 37 2020-04-31

Memory access speed

(slide credits: Prof. Koumoutsakos ETH)
slide-13
SLIDE 13 | | Systems Group Florian Mahlknecht 13 / 37 2020-04-31

Exploiting reuse opportunities

(Chen et al., 2017)

▪ Convolutional Reuse ▪ Filter Reuse ▪ Ifmap Reuse

?

slide-14
SLIDE 14 | | Systems Group Florian Mahlknecht 14 / 37 2020-04-31

Row Stationary Dataflow

(Chen et al., 2017)

Split into 1D CONV primitives: ▪ 1 row of weights ▪ 1 row of ifmap Map each primitive on 1 PE: ▪ Row pairs remain stationary:

▪ psum and weights in local register

▪ Sliding window

slide-15
SLIDE 15 | | Systems Group Florian Mahlknecht 15 / 37 2020-04-31

Row Stationary 2D convolution

(Chen et al., 2017)

▪ Filter rows reused horizontally ▪ Ifmaps reused diagonally ▪ Psums accumulated vertically

filter(1,2,3) x input(1,2,3) filter(1,2,3) x input(2,3,4) filter (1,2,3) x input (3,4,5)

  • fmap column 1

(1, row_size)

slide-16
SLIDE 16 | | Systems Group Florian Mahlknecht 16 / 37 2020-04-31

Alex Net example mapping

(Chen et al., 2017)
slide-17
SLIDE 17 | | Systems Group Florian Mahlknecht 17 / 37 2020-04-31

Row Stationary Dataflow

(Sze et al., 2017)
slide-18
SLIDE 18 | | Systems Group Florian Mahlknecht 18 / 37 2020-04-31

Eyeriss v1

(Chen et al., 2017)

▪ layer by layer

slide-19
SLIDE 19 | | Systems Group Florian Mahlknecht 19 / 37 2020-04-31

Scalability Eyeriss v1

(Chen et al., 2019)
slide-20
SLIDE 20 | | Systems Group Florian Mahlknecht 20 / 37 2020-04-31

Scalability Eyeriss v2

(Chen et al., 2019)
slide-21
SLIDE 21 | | Systems Group Florian Mahlknecht 21 / 37 2020-04-31

Design Principles Eyeriss v2

Efficiency Latency Flexibility Scalability

slide-22
SLIDE 22 | | Systems Group Florian Mahlknecht 22 / 37 2020-04-31

Hierarchical Mesh Network

(Chen et al., 2019)

▪ Flat multicast network ▪ PEs and GLB grouped into clusters ▪ Hierarchical structure

slide-23
SLIDE 23 | | Systems Group Florian Mahlknecht 23 / 37 2020-04-31

Why use a Mesh?

(Chen et al., 2019)
slide-24
SLIDE 24 | | Systems Group Florian Mahlknecht 24 / 37 2020-04-31

Mesh operation modes

(Chen et al., 2019)
  • CONV layer
  • DW CONV layer
  • FC layer
slide-25
SLIDE 25 | | Systems Group Florian Mahlknecht 25 / 37 2020-04-31

Eyeriss v2 Architecture

(Chen et al., 2019)
slide-26
SLIDE 26 | | Systems Group Florian Mahlknecht 26 / 37 2020-04-31

Network for input activations

(Chen et al., 2019)
slide-27
SLIDE 27 | | Systems Group Florian Mahlknecht 27 / 37 2020-04-31

Architecture Hierarchy

(Chen et al., 2019)
slide-28
SLIDE 28 | | Systems Group Florian Mahlknecht 28 / 37 2020-04-31

Specification

(Chen et al., 2019)
slide-29
SLIDE 29 | | Systems Group Florian Mahlknecht 29 / 37 2020-04-31

Exploit sparsity

(Chen et al., 2019)
  • process in CSC format
  • Latency gain
slide-30
SLIDE 30 | | Systems Group Florian Mahlknecht 30 / 37 2020-04-31

Exploit sparsity

(Chen et al., 2019)
slide-31
SLIDE 31 | | Systems Group Florian Mahlknecht 31 / 37 2020-04-31

Results for Eyeriss v2

(Chen et al., 2019)
slide-32
SLIDE 32 | | Systems Group Florian Mahlknecht 32 / 37 2020-04-31

Comparison

(Chen et al., 2019)
slide-33
SLIDE 33 | | Systems Group Florian Mahlknecht 33 / 37 2020-04-31

Conclusion

Efficiency Latency Flexibility Scalability

▪ > 10x improvement over v1 ▪ processing sparse weights and iacts in compressed domain ▪ flexibility from high bandwidth to high data reuse, for filter shape variety ▪ extend with cache

slide-34
SLIDE 34 | | Systems Group Florian Mahlknecht 34 / 37 2020-04-31

Final comment

(slide credits: Dr. Sergio Martin ETH)
slide-35
SLIDE 35 | | Systems Group Florian Mahlknecht 35 / 37 2020-04-31

Final comment

slide-36
SLIDE 36 | | Systems Group Florian Mahlknecht 36 / 37 2020-04-31
  • 1. Hennessy, J. L. Computer architecture: a quantitative approach. (Morgan Kaufmann Publishers, 2019).
  • 2. Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient Processing of Deep Neural Networks: A Tutorial and
  • Survey. Proc. IEEE 105, 2295–2329 (2017).
  • 3. Chen, Y.-H., Yang, T.-J., Emer, J. S. & Sze, V. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural

Networks on Mobile Devices. IEEE J. Emerg. Sel. Topics Circuits Syst. 9, 292–308 (2019).

  • 4. Chen, Y.-H., Krishna, T., Emer, J. S. & Sze, V. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for

Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits 52, 127–138 (2017).

  • 5. Lin, S.-C. et al. The Architectural Implications of Autonomous Driving: Constraints and Acceleration. in

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems 751–766 (ACM, 2018). doi:10.1145/3173162.3173191. Images: MIT EEMS Group (slide 2), Audi (slide 4), Nvidia (slide 5), WabisabiLearning (slide 6), iStock.com/VictoriaBar (slide 31)

References and image credits

Online video talks:

  • slideslive.com
  • youtu.be/WbLQqPw_n88
slide-37
SLIDE 37 | | Systems Group Florian Mahlknecht 37 / 37 2020-04-31

Q & A

slide-38
SLIDE 38 | | Systems Group Florian Mahlknecht 38 / 37 2020-04-31

Additional slides for interested readers

slide-39
SLIDE 39 | | Systems Group Florian Mahlknecht 39 / 37 2020-04-31

Energy Efficiency

(Strubell et al., 2019)
slide-40
SLIDE 40 | | Systems Group Florian Mahlknecht 40 / 37 2020-04-31

Motivation: Flexibility, Software

slide-41
SLIDE 41 | | Systems Group Florian Mahlknecht 41 / 37 2020-04-31

Eyeriss v1 Implementation

(Chen et al., 2017)

▪ Customized Coffee Framework run on NVIDIA development board ▪ Xlinix serves as PCI controller

slide-42
SLIDE 42 | | Systems Group Florian Mahlknecht 42 / 37 2020-04-31

Eyeriss v1 PE architecture

(Chen et al., 2017)
slide-43
SLIDE 43 | | Systems Group Florian Mahlknecht 43 / 37 2020-04-31

Router implementation details

slide-44
SLIDE 44 | | Systems Group Florian Mahlknecht 44 / 37 2020-04-31

Network for psums

(Chen et al., 2019)
slide-45
SLIDE 45 | | Systems Group Florian Mahlknecht 45 / 37 2020-04-31

Network for weights

(Chen et al., 2019)