a case for dynamic activation quantization in cnns
play

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah Overview Background Proposal Search Space Architecture Results Future Work Improving CNN Efficiency


  1. A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah

  2. Overview • Background • Proposal • Search Space • Architecture • Results • Future Work

  3. Improving CNN Efficiency • Stripes: Bit-Serial Deep Neural Network Computing • Per-layer bit precisions net significant savings with <1% accuracy loss • Brute force approach to find best quantization – retraining at each step! • Good end result, but expensive! • Weight-Entropy-Based Quantization for Deep Neural Networks • Quantize both weights and activations • Guided search to find optimal quantization (entropy and clustering) • Still requires retraining, still a passive approach Can we exploit adaptive reduced precision during inference?

  4. Proposal: Adaptive Quantization Approach (AQuA) • Most images contain regions of irrelevant information for the classification task • Can avoid such computations all together? • Quantize completely regions to 0 bits • More simply – Crop them!

  5. Proposal: Activation Cropping

  6. Proposal: Activation Cropping Concept: Save computations Add lightweight here predictor here

  7. Search Space – How to Crop • Exploit domain knowledge N • Information is typically centered within the image (>55% in our tests) • Utilize a regular pattern • Less control logic required Image N • Maps easier to different hardware • Added bonus: • While objects are centered, majority of area (and thus computation) is on the outside!

  8. Proposal: Activation Cropping N = 25 Concept: N = 10 Scale Feature Maps N = 8 Proportionally N = 5 N = 2

  9. Search Space – Crop Directions • We consider 16 possible crops as [ 0 1 0 0 ] [ 1 0 0 0 ] permutations of top, bottom, left, and right crops encoded as a vector: Image Image [ TOP , BOTTOM , LEFT , RIGHT ] • Unlike traditional pruning, AQuA can exploit image-based information to enhance pruning options. [ 0 0 1 0 ] [ 0 0 0 1 ] [ 0 1 0 1 ] [ 1 0 1 1 ] Image Image Image Image

  10. Quantifying Potentials • For maintaining original Number of Edges Cropped Top-1 accuracy, 75% images can tolerate some type of crop! • Greater savings with top-5 predictions • Technique invariant to weight quantization Weight Set

  11. Exploiting Energy Savings with ISAAC • Activation cropping technique can be applied to any architecture 1 bit • We use the ISAAC accelerator due 2 bit 1 bit to its flexibility Inputs W 2 bit e i g • Future work includes leveraging h 1 bit t s 2 bit additional variable precision 1 bit techniques 2 bit 8 bit Outputs

  12. Weight Precision Savings 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) 5 x ADC Operations 8 x ADC Operations

  13. “FlexPoint” Support 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) Can vary shift amount to compute fixed point computations with different exponents

  14. Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 1 bit 2 bit 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

  15. Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit Fewer computations means 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 increasing throughput, 1 bit 2 bit reducing area requirements, 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit and lowering energy. 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

  16. Naive Approach – Crop Everything • Substantial energy savings at a cost to accuracy • Theoretically, can save over 33% energy and maintain original accuracy!

  17. Overall Energy Savings • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

  18. Future Work • Predict unimportant regions Original • Using a “0 th ” layer with a just a few gradient-based kernels • Use variable low precision computations unimportant Sobel Gradient regions (not just cropping) • Quantify energy and latency changes due to additional prediction step, but fewer overall computations

  19. Conclusion • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

  20. Thank you! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend