a scalable time based integrate and fire neuromorphic
play

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with - PowerPoint PPT Presentation

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1


  1. A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1

  2. Outline • Background • Time Based Neural Networks • Leaky Neuron and Local Lateral Inhibition • Digit Recognition Application • Measurement Results • Conclusion 2

  3. Neuromorphic Computing http://juanribon.com/design/nerve-cell-body-diagram.php * synaptic weights: excitatory (+) or inhibitory (-) Biological neuron model Artificial neuron model • Biological neuron behavior: Weight multiplication (synapse) → Weight integration (cell body) → Threshold comparison & fire. • Applications: Image recognition/classification, natural language processing, speech recognition, etc. 3

  4. Prior Arts: Deep Learning Processor 4000µm TSMC 65nm LP 1P9M 65nm 1P8M CMOS 3.9 TOPS/W @1.1V (108KB) 235mW @ 1.1V Eyeriss: DCNN Accelerator 4000µm Peak Performance: DNPU: Reconfigurable CNN- 16.8 – 42.0 GOPS RNN Processor (1OP = 1MAC) Power: 278mW @ 1V 21mW @ 1.1V [1] Y.-H. Chen, et al ., ISSCC, 2016. [2] D. Shin, et al ., ISSCC, 2017. • Circuit/Architecture innovations: − Data reuse in convolutional neural network. − Utilize sparsity by data gating/zero skipping. − Reduced weight precision � � binary neural networks. � � 4

  5. Prior Arts: Emerging NVM based Implementation Memresitor based crossbar array [3] PCM based crossbar array [4] • Comparison with CMOS implementation: − Pros: Compact, analog computation. − Cons: Susceptible to noise, immature process. 5 [3] K.-H. Kim, et al ., Nano Lett., Dec. 2011. [4] D. Kuzum, et al ., Nano Lett., Jun. 2011.

  6. Time-based vs. Digital Implementation ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ y = i x i ·w i y = i x i ·w i = Delay 1 + Delay 2 + ··· + Delay i = x 1 ·w 1 +x 2 ·w 2 + ··· + x i ·w i N-bit M-bit Delay 1 Delay 2 Delay i Multipliers Adder x 1 ·w 1 x 2 ·w 2 x i ·w i x 1 w 1 Accumulate x 2 ∑ Activation w 2 x 1 ·w 1 +x 2 ·w 2 + ··· + x i ·w i x i w i Time Time-based Neural Network Digital Neural Network Time-based Digital Programmable delay Core circuits Multipliers & adders circuits Pros Area and power efficient High resolution Large area and power Cons Moderate resolution consumption 6

  7. Comparison with Previous Time- based Neural Network 7

  8. Proposed Time-based Neural Net DCO with 128 Programmable Delay Stages W 0,1 <2:0> W 2,3 <2:0> W 124,125 <2:0> W 126,127 <2:0> X 2, X 3 X 126, X 127 X 0, X 1 X 124, X 125 T DCO = = = = ∑ ∑ Delay i ∑ ∑ SRA SRA SRA SRA SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM ∑ ∑ ∑ ∑ M M M M X i w i ∝ ∝ ∝ ∝ ⋅ ⋅ ⋅ ⋅ EN_DCO SRA SRA SRA SRA SRAM SRAM SRAM SRAM M M M M SRAM SRAM SRAM SRAM 8 Threshold Compare & Fire C 7 C 6 C 1 C 0 8b Counter Q D Q D Q D Q D SPIKE QB QB QB QB rst rst rst rst SPIKE Neuron control logic LEAK LLI Leaky Integrate & Fire, Local Lateral Inhibition 8

  9. Proposed Time-based Neural Net Programmable Delay Stage Unit cell layout (2 stages) 8.1µm *BL,BLB omitted WL for simplicity 3 SRAM cells W i <2:0> SRA SRA SRA M M X i M w i <2> w i <1> w i <0> SRA 5.9µm SRAM X i X i SRAM X i M 4C 2C C 3 SRAM cells • Input pixel: X i − Determines whether a stage is activated or not. • Weight: W i <2:0> − Determines how many capacitors are turned on as load in that stage. 9

  10. 64x128 Time-based Neural Network • 8 DCO cores are grouped together to implement local lateral inhibition. • 64 DCO neuromorphic cores in total. • 121 out of 128 DCO stages are used as programmable inputs. • Remaining 7 stages are reserved for calibration. 10

  11. Frequency Calibration and Linearity Test Frequency (a.u.) Frequency (a.u.) 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 • Frequency variation between 10 DCOs − Before calibration: 1.17%, after calibration: 0.10%. 11

  12. Bio-Inspired Features: Leaky Neuron and Local Lateral Inhibition (LLI) Electrical modeling of cell membrane [3] Lateral inhibition: Mach band illusion [4] • Leaky neuron: Ions diffuse through the neuron cell . • Local lateral inhibition: Active neuron strives to suppress the activities of its neighbors. [3] W. Gerstner, et al ., Neuronal Dynamics. [4] Wikipedia. 12

  13. Time-based Leak and LLI Time-based Leaky Integrate & Fire Neuron • Leak enabled: From − LSB of every counter Compare & Fire DCO SPIKE C 7 C 6 C 1 C 0 is reset periodically. Q D Q D Q D Q D QB QB QB QB rst rst rst rst LEAK (LSB reset) • LLI enabled: Time-based Local Lateral Inhibition (LLI) − Specific bits in the Neighbor counter neighboring counters bit reset are reset after a DCO LEAK spikes. LLI C − The fastest DCO resets Threshold + + + + + Ʃ Ʃ Ʃ Ʃ Ʃ - - - - - the other DCOs more SPIKE<0> SPIKE<1> SPIKE<2> SPIKE<7> often than it is reset by others. 13

  14. Leak and LLI *None LEAK *None LLI Spike Frequency Spike Frequency Sharper Sharper contrast contrast 0 0 DCO No. DCO No. *None: No leak and no LLI, basic DCO operation. • Leak: Uniformly lower spiking frequency . • LLI: Preferentially lower spiking frequency. • Goal: Higher contrast between different neuron outputs. 14

  15. Handwritten Digit Recognition • Input database: MNIST. • Learning method: Supervised learning. • Learning network: Single-layer & multi-layer perceptron network. 15

  16. Single-layer Digit Recognition • Single-layer architecture: Proof-of-concept for time- based neural network 16

  17. Multi-layer Digit Recognition • Multi-layer architecture: Demonstrates the scalability of the core. 17

  18. Measurement Results 65nm LP CMOS, 1.2V, 25 o C Recognition Accuracy (%) 94 Measured (*None) 92 Measured (Leaky) Simulation 90 88 86 84 82 Single-layer Two-layer Two-layer with 11x11 with 11x11 with 4-patch images images 22x22 images *None: No leak and no LLI, basic DCO operation. • Measured recognition accuracy from hardware is comparable to software simulation results. 18

  19. Measurement Results 65nm LP CMOS, 1.2V, 25 o C 1700 *None LLI 1500 1.7% Spike Count 1300 1100 900 700 17.7% 500 300 100 0 1 2 3 4 5 6 7 8 9 (Target) Digit • Spike count difference between digit “2” and “0” − Without LLI: 1.7%, with LLI: 17.7 %. 19

  20. Measurement Results 65nm LP CMOS, 25 o C 350 100 Frequency (MHz) Power (µW per DCO) 80 280 DCO Frequency (MHz) Power (µW) 60 210 140 40 20 70 0 0 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Supply voltage (V) • Wide operating range: 0.7V ~ 1.2V. 20

  21. Performance Comparison ISSCC ’ 16 [6] ASSCC ’ 17 [5] VLSI ’ 15 [7] This work Note Hand writing Object detection + Hand writing Object a. N=16 in our measurements. Application recognition recognition intention prediction Recognition b. SOp/s/W: Synaptic operation Multi-layer perceptron Convolutional Deep neural Spiking LCA with Function (SOp). In DCO based time- network classification neural network network domain neural network, one Circuit Type Time-based Time-based Analog + Digital Digital oscillation of DCO is equivalent to 121 SOp. Technology 65nm 65nm 65nm 65nm c. 1GE: 1.44um 2 (65nm). PE: 0.24mm 2 (64 DCOs) 3.61mm 2 (32K PEs) 16.0mm 2 1.8mm 2 Area processing element. Voltage 1.2V 0.45V 1.2V - d. Operation: One operation is defined as one multiplication Frequency 99MHz (nominal DCO freq.) - 250MHz 40MHz (Inference) and accumulation (MAC). In DCO based time-domain neural Power 320.4 µW/DCO - 330mW 3.65mW network, one oscillation of Power 309G ÷ N spikes/s/W 5.7pJ/pixel DCO is equivalent to 121 3-bit 862GOPS/W 48.2TSOp/s/W (N=spiking threshold a ) Efficiency (memory+logic) MAC. Hardware - - - 76.5GE/PE e. Used spiking threshold of 16, Efficiency and only accounted for the 37.4TSOp/s/W b power consumption of core 48.2TSOp/s/W - - logic circuits, memory power is not included, since weight is 16.6GE/PE c 76.5GE/PE - - not updated during the Performance inference. Comparison 37.4TOPS/W d 862GOPS/W - - 5.7pJ/pixel 0.43pJ/pixel (logic) e - - (memory+logic) [5] D. Miyashita, et al ., ASSCC, 2017. [6] K. J. Lee, et al. , ISSCC, 2016. [7] J. K. Kim, et al ., VLSI, 2015. 21

  22. Die Photo and Performance Summary 22

  23. Conclusion • Neural network function is computed in time domain using standard digital circuits with high area and power efficiency. • Implemented brain-inspired leak and local lateral inhibition features to enhance the contrast between neuron outputs. • 65nm test chip measurements confirm 91% hand-written digit recognition accuracy. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend