poet bin power efficient tiny binary neurons
play

PoET-BiN: Power Efficient Tiny Binary Neurons Sivakumar Chidambaram 1 - PowerPoint PPT Presentation

Introduction Background PoET-BiN Experimental setup and results Conclusion PoET-BiN: Power Efficient Tiny Binary Neurons Sivakumar Chidambaram 1 , J.M. Pierre Langlois 2 , Jean Pierre David 1 Department of Electrical Engineering 1 Department


  1. Introduction Background PoET-BiN Experimental setup and results Conclusion PoET-BiN: Power Efficient Tiny Binary Neurons Sivakumar Chidambaram 1 , J.M. Pierre Langlois 2 , Jean Pierre David 1 Department of Electrical Engineering 1 Department of Computer and Software Engineering 2 Polytechnique Montréal Montréal, Canada 03 March 2020 PoET-BiN MLSys 03 March 2020 1 / 20

  2. Introduction Background PoET-BiN Experimental setup and results Conclusion Contents Introduction 1 2 Background PoET-BiN 3 4 Experimental setup and results Conclusion 5 PoET-BiN MLSys 03 March 2020 2 / 20

  3. Introduction Background PoET-BiN Experimental setup and results Conclusion Real-time deep learning use cases Autonomous Driving CCTV Monitoring www.alten.com/sector/automotive/next-generation-camera-based- www.munhwa.com/news/view.html?no=2019100101 adas-development/ Required Attributes : Accuracy Latency and Throughput Power and Energy constraints Memory and Hardware costs Translation Source : www.firebase.google.com/docs/ml-kit/translation PoET-BiN MLSys 03 March 2020 3 / 20

  4. Introduction Background PoET-BiN Experimental setup and results Conclusion Computation needs Exponential rise in computations PoET-BiN MLSys 03 March 2020 4 / 20

  5. Introduction Background PoET-BiN Experimental setup and results Conclusion Current Deep Learning Software Acceleration Techniques Quantized Neural Networks Quantization of weights and activations Binarizing, Ternarization, Multi-bit quantization Helps in generalization on the unseen data Pruning - Remove certain neurons from the vanilla neural network A bagging technique that averages various randomly pruned networks Introduces noise in the system that helps perform better on unseen data Sparsification - Sparse matrix multiplication Removing connections between neurons Reduces the number of multiplication and additions Reduces number of memory reads Implemented on hardware devices such as FPGA, microprocessors, microcontrollers etc. PoET-BiN MLSys 03 March 2020 5 / 20

  6. Introduction Background PoET-BiN Experimental setup and results Conclusion Hardware : FPGAs Goto device for rapid prototyping of accelerators FPGAs consist of Arithmetic Logical Modules (ALMs), programmable interconnects, IOs and BRAMs ALMs are the main computational unit 100,000s of ALMs in a typical FPGA Each ALM has a LookUp Table (LUT) with up to 8 inputs and up to 2 outputs Programmed using Hardware Description Languages Source : https ://hackaday.com/2018/03/01/another-introduction-to-fpgas/ PoET-BiN MLSys 03 March 2020 6 / 20

  7. Introduction Background PoET-BiN Experimental setup and results Conclusion Problem Definition and Objectives Problem Definition : Vanilla neural networks are computation, power and area intensive Current acceleration approaches are still computationally intensive Quantized neural networks and pruning are not optimized for FPGAs Objectives/Contributions : A modified Decision Tree training algorithm to better match LUTs with a fixed number of inputs. The Reduced Input Neural Circuit (RINC) : A LUT-based architecture founded on modified Decision Trees and the hierarchical version of the well known Adaboost algorithm to efficiently implement a network of binary neurons. A sparsely connected output layer for multiclass classification. The PoET-BiN architecture consisting of multiple RINC modules and a sparsely connected output layer. Automatic VHDL code generation of the PoET-BiN architecture for FPGA implementation. PoET-BiN MLSys 03 March 2020 7 / 20

  8. Introduction Background PoET-BiN Experimental setup and results Conclusion Binary Decision Trees What are Binary DTs : Inputs(X) and Outputs(Y) are binary Node wise creation of Decision Tree from root to leaves X 2 = True The feature that most reduces the entropy is chosen Yes No Divides the representation space to classify data Y = False X 5 = True Challenges : To classify large datasets - need larger Decision Trees Yes No Results in large implementations on the hardware- complex and high Y = True Y = False power consumption Binary Decision Trees To effectively implement on FPGAs we need small Decision Trees of ≤ 6 inputs to fit in one LUT PoET-BiN MLSys 03 March 2020 8 / 20

  9. Introduction Background PoET-BiN Experimental setup and results Conclusion RINC-0 : Modified Decision Tree Algorithm Modified DT algorithm - level based entropy reduction rather than node based Decision Trees are restricted by the number of inputs(I) A node-wise off-the-shelf 6-input Decision Tree would have only 7 leaf nodes Level-wise Decision Tree will have 2 6 = 64 leaf nodes More granularity Output I 0 Output I 1 I 2 DT Module I 0 I 3 I 4 1 I 5 I 1 1 0 0 1 1 0 I 2 Node based Decision Tree I 3 I 0 Address | Output 0 1 I 1 1 1 I P-1 I 2 2 0 Output . . 1 0 0 1 . . I (P-1) 2 P - 1 0 Level based Decision Tree 1 LUT Decision Tree as LUT PoET-BiN MLSys 03 March 2020 9 / 20

  10. Introduction Background PoET-BiN Experimental setup and results Conclusion RINC-1 : Incorporating Adaboost A single Decision Tree is a weak classifier Ensemble methods such as Boosting and Bagging are used to create strong classifiers from weak classifiers We use the well-know Adaboost algorithm The weak classifiers are created serially The samples are given equal weight initially The first weak classifier is trained on the data The mis-classified sample’s weights are increased Subsequent classifier focuses on the incorrect samples Each classifier is assigned a weight based on the number of correctly classified samples A weighted sum of all the Adaboost Algorithm weak classifier outputs forms Source :https ://packtpub.com/book/bigdataandbusinessintelligence/adaboost- the strong classifier classifier PoET-BiN MLSys 03 March 2020 10 / 20

  11. Introduction Background PoET-BiN Experimental setup and results Conclusion RINC-1 Module W 0 I 0-(P-1) RINC-0 I 0-(P-1) RINC-0 W 1 I P-(2P-1) RINC-0 I P-(2P-1) RINC-0 th W 2 I 2 P-(3P-1) RINC-0 I 2 P-(3P-1) RINC-0 >= LUT W 3 Comp Output Output I 3P-(4P-1) RINC-0 I 3P-(4P-1) RINC-0 W 4 I 4 P-(5P-1) RINC-0 I 4 P-(5P-1) RINC-0 W (P-1) I 5 P-(P2-1) RINC-0 I 5 P-(P2-1) RINC-0 MAT Module RINC 1 module The MAC and threshold operations can be implemented in a LUT Can group up to a maximum of P Decision Trees However, P Decision Trees with P 2 inputs are not enough compared to a MAC operation in a neural network Neuron in a neural network can have up to 4096 inputs as compared 36 (when P =6) in RINC-1 modules Hence, we introduce the hierarchical Adaboost algorithm PoET-BiN MLSys 03 March 2020 11 / 20

  12. Introduction Background PoET-BiN Experimental setup and results Conclusion RINC-2 : Hierarchical Adaboost W 00 RINC-0 W 01 W 0 RINC-0 I 0-(P2-1) LUT W 0(p-1) th RINC-0 RINC-1 Output >= Comp W (P-1)0 RINC-0 W (P-1)1 RINC-0 W P -1 3 2 3 I (P LUT -P ) - (P -1) MAT Module W (P-1)(P-1) RINC-0 RINC-1 RINC-2 module The RINC-2 modules have adequate capacity to represent MAC operations Can only be used for binary classifications PoET-BiN MLSys 03 March 2020 12 / 20

  13. Introduction Background PoET-BiN Experimental setup and results Conclusion Binary to Multiclass Classification Current Methods : Multiclass DTs : costly to implement One-vs-All classification : leads to reduction in accuracy Our Approach : A sparsely connected intermediate layer before the final output layer for multiclass classification Many FC hidden Only P neurons of the intermediate layers layer connected to each neuron in the Output layer output layer Intermediate Last hidden The neurons in the output layer need layer layer Replaced by our DT to have multiple bits to represent the algorithm probabilities and cannot be binary Input Output values Features extracted from the convolution layers (Unrolled) Implemented as LUTs Intermediate Layer PoET-BiN MLSys 03 March 2020 13 / 20

  14. Introduction Background PoET-BiN Experimental setup and results Conclusion Experimental Setup Vanilla Network (A 1 ) Hidden Layers FE Output layer of Classifier Binary Feature Representation Network (A 2 ) BIN Hidden Layers FE Output layer Act. of Classifier Teacher Network (A 3 ) BIN Hidden Layers Inter. BIN FE Output layer Act. of Classifier Layer Act. Final Architecture (A 4 ) Sparsely BIN FE RINC Classifiers Connected Output Act. layer Experimental Setup T ABLE – Network architecture A RCHITECTURE (A RCH .) S YMBOL D ATASET LeNET FE − ( 512 FC ) − ( 10 FC ) M1 MNIST VGG 11 FE − ( 4096 FC ) − ( 4096 FC ) − ( 10 FC ) C1 CIFAR-10 VGG 11 FE − ( 2048 FC ) − ( 2048 FC ) − ( 10 FC ) S1 SVHN PoET-BiN MLSys 03 March 2020 14 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend