cnvlutin ineffectual neuron free dnn computing
play

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. - PowerPoint PPT Presentation

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T.


  1. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source.

  2. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T. Hetherington* T. Aamodt* N. Enright Jerger A. Moshovos *

  3. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 3

  4. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 4

  5. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 5

  6. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 6

  7. ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 7

  8. CNVLUTIN: Smarter SIMD 52% Performance — 2x ED 2 P Out-of-the-box networks 8

  9. Outline 1. What’s a CNN? 2. A wide SIMD design 3. CNVLUTIN: Skipping neurons in a wide SIMD design 4. Evaluation 5. Our approach 9

  10. What’s a CNN? Korean … mask! 10’s of layers 10

  11. What’s a CNN? … 11

  12. What’s a CNN? Neurons (Input) … 11

  13. What’s a CNN? Synapses Neurons (Filters) (Input) … … 11

  14. What’s a CNN? … … 12

  15. What’s a CNN? Neurons (Output) … … 12

  16. What’s a CNN? Neurons (Output) … … 12

  17. What’s a CNN? Neurons (Output) … … … 12

  18. What’s a CNN? Korean … mask! 10’s of layers 13

  19. What’s a CNN? Convolution ReLU Pool Korean … mask! 10’s of layers 13

  20. What’s a CNN? CNN typical layer Convolution ReLU Pool Data size Negatives to 0 Inner products 3 reduction 2 x 1 … + 0 x -1 -2 -3 -3 -2 -1 0 1 2 3 14

  21. ~90% Time spent in convolutions 15

  22. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  23. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time and energy!!! 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  24. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time Dynamically and energy!!! generated 0.2 = 0.1 Not predictable 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  25. How to compute DNNs: DaDianNao* NBin Neuron 16 Lane 0 Neuron Lane 15 SB (eDRAM) IP0 x Neurons + f NBout x Filter 0 Filter 0 IP15 x + f Filter 15 x Filter 15 *Chen et al. MICRO 2014

  26. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 2 1 0 3 Lanes 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

  27. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

  28. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 X 0 Synapse Multiplication of corresponding 1 Lanes neuron and synapse elements Filter 0 15 0 X Synapse 1 Lanes Filter 15 15 18

  29. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? 0 2 1 1 2 0 Neuron 3 1 2 1 0 3 Lanes 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  30. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  31. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  32. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse 1 Lanes Filter 0 15 X 0 Synapse 1 Lanes Filter 15 15 19

  33. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse Lanes can 1 Lanes not longer Filter 0 15 operate in lock-step! X 0 Synapse 1 Lanes Filter 15 15 19

  34. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 0 Neuron Lane 0 Neuron 1 Lanes Filter 0 Synapses Filter 1 15 Lane 0 Filter 15 0 Synapse 1 Lanes Filter 0 Subunit 15 15 Neuron Lane 15 Filter 0 0 Synapse Synapses Filter 1 1 Lanes Lane 15 Filter 15 Filter 15 15 CNVLUTIN DaDianNao 20

  35. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 0 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 0 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

  36. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

  37. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 X Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 X Filter 0 Synapses Lane 15 Filter 15 21

  38. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 22

  39. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM 23

  40. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM 23

  41. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  42. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  43. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  44. CNVLUTIN: Computation Slicing … Neuron Lane 15 Neuron Lane 1 Neuron Lane 0 24

  45. Methodology • In-house timing simulator: baseline + CNVLUTIN • Logic + SRAM: Synthesis on 65nm TSMC • eDRAM model: Destiny • DNNs: Trained models from Caffe model zoo 25

  46. Area Only +4.5% in area overhead 26

  47. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 27

  48. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 27

  49. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 1.37x Performance on average 27

  50. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 28

  51. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<2 29

  52. Speedup: ineffectual >= 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo only 0's 0's and more 1.52x Performance No accuracy lost 30

  53. Speedup: ineffectual >= 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo only 0's 0's and more 1.52x Performance No accuracy lost 30

  54. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<2 31

  55. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<8 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend