A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities
Muqing Liu, Luke R. Everson, Chris H. Kim
- Dept. of ECE, University of Minnesota, Minneapolis, MN
liux3300@umn.edu
1
A Scalable Time-based Integrate-and-Fire Neuromorphic Core with - - PowerPoint PPT Presentation
A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1
1
2
3
Biological neuron model
* synaptic weights: excitatory (+) or inhibitory (-)
Artificial neuron model
http://juanribon.com/design/nerve-cell-body-diagram.php
4 (108KB)
Peak Performance: 16.8 – 42.0 GOPS (1OP = 1MAC) Power: 278mW @ 1V
4000µm 4000µm
21mW @ 1.1V 3.9 TOPS/W @1.1V 235mW @ 1.1V Eyeriss: DCNN Accelerator DNPU: Reconfigurable CNN- RNN Processor TSMC 65nm LP 1P9M 65nm 1P8M CMOS [1] Y.-H. Chen, et al., ISSCC, 2016. [2] D. Shin, et al., ISSCC, 2017.
5 Memresitor based crossbar array [3] PCM based crossbar array[4]
[3] K.-H. Kim, et al., Nano Lett., Dec. 2011. [4] D. Kuzum, et al., Nano Lett., Jun. 2011.
6
x1·w1 +x2·w2 + ··· + xi·wi x1·w1 x2·w2 xi·wi Time
Delay1 Delay2 Delayi Accumulate ∑ ∑ ∑ ∑ = Delay1 + Delay2 + ··· + Delayi y = i xi·wi ∑ ∑ ∑ ∑ y = i xi·wi = x1·w1 +x2·w2 + ··· + xi·wi N-bit Multipliers x1 w1
M-bit Adder Activation x2 w2 xi wi
Time-based Neural Network Digital Neural Network Time-based Digital
Core circuits Pros Cons Programmable delay circuits Multipliers & adders Area and power efficient High resolution Moderate resolution Large area and power consumption
7
8
SRAM SRAM
DCO with 128 Programmable Delay Stages
X0,X1 8b Counter
D Q QB D Q QB D Q QB D Q QB
Compare & Fire
C0 C1 C6 C7
LLI LEAK Neuron control logic SPIKE
rst rst rst rst
Threshold
8
Leaky Integrate & Fire, Local Lateral Inhibition
SRA M
SRAM SRAM
SRA M
SRAM SRAM
SRA M
SRAM SRAM
SRA M EN_DCO
SRAM SRAM
SRA M
SRAM SRAM
SRA M
SRAM SRAM
SRA M
SRAM SRAM
SRA M
W0,1<2:0>
∑ ∑ ∑ ∑ ⋅ ⋅ ⋅ ⋅ = = = = wi Xi TDCO
X2,X3 W2,3<2:0> X124,X125 W124,125<2:0> X126,X127 W126,127<2:0>
∑ ∑ ∑ ∑Delayi ∝ ∝ ∝ ∝
SPIKE
9
SRAM SRAM
Programmable Delay Stage
Xi
SRA M
Wi<2:0> 4C
SRA M
WL Xi wi<2> wi<1> wi<0> 2C C 3 SRAM cells 5.9µm 8.1µm 3 SRAM cells
SRA M SRA M
Xi Xi
Unit cell layout (2 stages)
*BL,BLB omitted for simplicity
10
11
Frequency (a.u.)
1 1 1 1 1 1 1 1 1 1 1 1
Frequency (a.u.)
12
Lateral inhibition: Mach band illusion[4] Electrical modeling of cell membrane[3]
[3] W. Gerstner, et al., Neuronal Dynamics. [4] Wikipedia.
13
D Q QB D Q QB rst D Q QB rst D Q QB rst
Compare & Fire SPIKE
Time-based Leaky Integrate & Fire Neuron Time-based Local Lateral Inhibition (LLI)
CThreshold
LEAK LLI
SPIKE<0> SPIKE<1> SPIKE<2> SPIKE<7>
LEAK (LSB reset) C0 C1 C6 C7
rst
From DCO Neighbor counter bit reset
Ʃ
+
+
+
+
+
14
*None: No leak and no LLI, basic DCO operation.
DCO No. Spike Frequency DCO No. Spike Frequency
*None LEAK
Sharper contrast
*None LLI
Sharper contrast
15
16
17
18
*None: No leak and no LLI, basic DCO operation. 65nm LP CMOS, 1.2V, 25oC 82 84 86 88 90 92 94 Single-layer with 11x11 images Two-layer with 11x11 images Two-layer with 4-patch 22x22 images
Measured (*None) Simulation Measured (Leaky)
19
20
70 140 210 280 350
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
20 40 60 80 100
DCO Frequency (MHz) Power (µW per DCO)
65nm LP CMOS, 25oC
21
[5] D. Miyashita, et al., ASSCC, 2017. [6] K. J. Lee, et al., ISSCC, 2016. [7] J. K. Kim, et al., VLSI, 2015.
This work Application Hand writing recognition Technology 65nm Area 0.24mm2 (64 DCOs) Voltage 1.2V Frequency 99MHz (nominal DCO freq.) Function Multi-layer perceptron network Performance Comparison 16.6GE/PEc Power ISSCC’16 [6] Object detection + intention prediction 65nm 16.0mm2 1.2V 250MHz Deep neural network 330mW Power Efficiency 309G ÷ N spikes/s/W (N=spiking thresholda)
Circuit Type Time-based Analog + Digital VLSI’15 [7] Object Recognition 65nm 1.8mm2 0.45V Spiking LCA with classification 5.7pJ/pixel (memory+logic) 3.65mW
40MHz (Inference) ASSCC’17 [5] Hand writing recognition 65nm 3.61mm2 (32K PEs)
neural network 48.2TSOp/s/W
862GOPS/W
(memory+logic)
0.43pJ/pixel (logic)e Note
(SOp). In DCO based time- domain neural network, one
to 121 SOp.
processing element.
defined as one multiplication and accumulation (MAC). In DCO based time-domain neural network,
DCO is equivalent to 121 3-bit MAC.
and only accounted for the power consumption of core logic circuits, memory power is not included, since weight is not updated during the inference. Hardware Efficiency
48.2TSOp/s/W 37.4TSOp/s/Wb
22
23