scalar arithmetic multiple data customizable precision
play

Scalar Arithmetic Multiple Data Customizable Precision for Deep - PowerPoint PPT Presentation

Scalar Arithmetic Multiple Data Customizable Precision for Deep Neural Networks Andrew Anderson and Michael Doyle and David Gregg Lero, Trinity College Dublin { aanderso ,mjdoyle,dgregg } @tcd.ie ARITH, Kyoto June 2019 DNN Convolution Figure:


  1. Scalar Arithmetic Multiple Data Customizable Precision for Deep Neural Networks Andrew Anderson and Michael Doyle and David Gregg Lero, Trinity College Dublin { aanderso ,mjdoyle,dgregg } @tcd.ie ARITH, Kyoto June 2019

  2. DNN Convolution Figure: Multi-channel multi-kernel convolution

  3. DNN Convolution for ( unsigned m = 0; m < k e r n e l s ; m++) for ( unsigned h = 0; h < img h / s t r i d e h ; h++) for ( unsigned w = 0; w < img w/ s t r i d e w ; w++) for ( unsigned c = 0; c < channels ; c++) for ( unsigned y = 0; y < k ; y++) for ( unsigned x = 0; x < k ; x++) output [m] [ h ] [ w] += input [ c ] [ ( ( h ∗ s t r i d e h ) + y ) − ( k /2)] [ ( (w ∗ s t r i d e w ) + x ) − ( k /2)] ∗ k e r n e l [m] [ c ] [ y ] [ x ] ) ;

  4. Quantized Arithmetic DNN weights occupy huge amounts of space in FP32 VGG-19 Network: 548 MB Figure: But we want to use them on this! OpenMV Cam – 512 KB RAM, 2 MB ROM, 216 MHZ Cortex-M7

  5. Quantized Arithmetic In Deep Learning we have it very easy! ◮ Network training compensates for arithmetic error ◮ Often, noisy arithmetic actually helps ! (with overfitting) Lots of research about how harshly DNN weights can be quantized ◮ Can go to integer (eventually!) ◮ Can go down to one (1) bit (’binarized’ nets) ◮ But we don’t want to do all our work on FPGA... ◮ In fact, commodity hardware is ideal.

  6. The Simple Approach Convert to native arithmetic 4xuint8_t 4xuint4_t 00000010000011100000100100000101 00000101000011010000001000001011 Figure: uint4 t expanded to uint8 t ◮ Can use native SIMD ◮ Space overhead only in registers (not memory) ◮ Extra precision in intermediate results (for free) ◮ Easy to mix and match number formats (e.g. uint 6 t + uint 4 t )

  7. Quantized Arithmetic uint16_t 4xuint4_t 0010111010010101 0101110100101011 1 0 1 Figure: Example SWAR operation. 4 × 4-bit words packed into a 16-bit scalar register

  8. SIMD Within A Register (SWAR) Dealing with overflow X010X110X001X101 0XXX1XXX1XXX0XXX X101X101X010X011 0XXX1XXX0XXX1XXX masked add xor 0111100100111000 0XXX0XXX1XXX1XXX xor 0111100110110000 Figure: Spacer bits Temporary spacer bits are spacer bits in intermediate values that don’t get written to the data format in memory.

  9. SIMD Within A Register (SWAR) uint32_t 4xuint4_t i0 i1 i2 i3 k1 k2 k3 0 unsigned integer multiply k3i0 k3i1 k3i2 k3i3 k2i0 k2i1 k2i2 k2i3 + k1i0 k1i1 k1i2 k1i3 + uint64_t 8xuint8_t Figure: Convolutional substructure in scalar integer multiplication Long multiplication is discrete convolution over digit sequences

  10. SIMD Within A Register (SWAR) uint32_t 4xuint4_t i0 i1 i2 i3 k1 k2 k3 0 unsigned integer multiply k3i0 k3i1 k3i2 k3i3 k2i0 k2i1 k2i2 k2i3 + k1i0 k1i1 k1i2 k1i3 + uint64_t 8xuint8_t Figure: Convolution k × i subword multiplies and ( k − 1) × ( i − 1) additions with a single instruction

  11. Results SAMD Convolution with T emporary Spacer Bits (ARM Cortex A-57) 3x10 9 2.5x10 9 Execution Time (ns) 2x10 9 1.5x10 9 1x10 9 5x10 8 0 conv3-1 conv3-2 conv4-1 conv4-2 conv4-3 direct-sum2d SAMD7 SAMD5 SAMD3 SAMD8 SAMD6 SAMD4 SAMD2 Figure: Performance with Temporary Spacer bits

  12. Results SAMD Convolution with Permanent Spacer Bits (ARM Cortex A-57) 2.5x10 9 2x10 9 Execution Time (ns) 1.5x10 9 1x10 9 5x10 8 0 conv3-1 conv3-2 conv4-1 conv4-2 conv4-3 direct-sum2d SAMD7 SAMD5 SAMD3 SAMD8 SAMD6 SAMD4 SAMD2 Figure: Performance with Permanent Spacer bits

  13. Future Work ◮ All-SAMD network (nonlinearities & utility ops) ◮ Codesign HW Integer Support Instructions ◮ GPU (but microcontrollers don’t have GPUs (yet!))

  14. Thanks for listening!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend