A Scalable Time-based Integrate-and-Fire Neuromorphic Core with - - PowerPoint PPT Presentation

a scalable time based integrate and fire neuromorphic
SMART_READER_LITE
LIVE PREVIEW

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with - - PowerPoint PPT Presentation

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1


slide-1
SLIDE 1

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities

Muqing Liu, Luke R. Everson, Chris H. Kim

  • Dept. of ECE, University of Minnesota, Minneapolis, MN

liux3300@umn.edu

1

slide-2
SLIDE 2

Outline

  • Background
  • Time Based Neural Networks
  • Leaky Neuron and Local Lateral Inhibition
  • Digit Recognition Application
  • Measurement Results
  • Conclusion

2

slide-3
SLIDE 3

Neuromorphic Computing

  • Biological neuron behavior: Weight multiplication

(synapse) → Weight integration (cell body) → Threshold comparison & fire.

  • Applications: Image recognition/classification, natural

language processing, speech recognition, etc.

3

Biological neuron model

* synaptic weights: excitatory (+) or inhibitory (-)

Artificial neuron model

http://juanribon.com/design/nerve-cell-body-diagram.php

slide-4
SLIDE 4

Prior Arts: Deep Learning Processor

  • Circuit/Architecture innovations:

− Data reuse in convolutional neural network. − Utilize sparsity by data gating/zero skipping. − Reduced weight precision

  • binary neural networks.

4 (108KB)

Peak Performance: 16.8 – 42.0 GOPS (1OP = 1MAC) Power: 278mW @ 1V

4000µm 4000µm

21mW @ 1.1V 3.9 TOPS/W @1.1V 235mW @ 1.1V Eyeriss: DCNN Accelerator DNPU: Reconfigurable CNN- RNN Processor TSMC 65nm LP 1P9M 65nm 1P8M CMOS [1] Y.-H. Chen, et al., ISSCC, 2016. [2] D. Shin, et al., ISSCC, 2017.

slide-5
SLIDE 5

Prior Arts: Emerging NVM based Implementation

  • Comparison with CMOS implementation:

− Pros: Compact, analog computation. − Cons: Susceptible to noise, immature process.

5 Memresitor based crossbar array [3] PCM based crossbar array[4]

[3] K.-H. Kim, et al., Nano Lett., Dec. 2011. [4] D. Kuzum, et al., Nano Lett., Jun. 2011.

slide-6
SLIDE 6

Time-based vs. Digital Implementation

6

x1·w1 +x2·w2 + ··· + xi·wi x1·w1 x2·w2 xi·wi Time

Delay1 Delay2 Delayi Accumulate ∑ ∑ ∑ ∑ = Delay1 + Delay2 + ··· + Delayi y = i xi·wi ∑ ∑ ∑ ∑ y = i xi·wi = x1·w1 +x2·w2 + ··· + xi·wi N-bit Multipliers x1 w1

M-bit Adder Activation x2 w2 xi wi

Time-based Neural Network Digital Neural Network Time-based Digital

Core circuits Pros Cons Programmable delay circuits Multipliers & adders Area and power efficient High resolution Moderate resolution Large area and power consumption

slide-7
SLIDE 7

Comparison with Previous Time- based Neural Network

7

slide-8
SLIDE 8

Proposed Time-based Neural Net

8

SRAM SRAM

DCO with 128 Programmable Delay Stages

X0,X1 8b Counter

D Q QB D Q QB D Q QB D Q QB

Compare & Fire

C0 C1 C6 C7

LLI LEAK Neuron control logic SPIKE

rst rst rst rst

Threshold

8

Leaky Integrate & Fire, Local Lateral Inhibition

SRA M

SRAM SRAM

SRA M

SRAM SRAM

SRA M

SRAM SRAM

SRA M EN_DCO

SRAM SRAM

SRA M

SRAM SRAM

SRA M

SRAM SRAM

SRA M

SRAM SRAM

SRA M

W0,1<2:0>

∑ ∑ ∑ ∑ ⋅ ⋅ ⋅ ⋅ = = = = wi Xi TDCO

X2,X3 W2,3<2:0> X124,X125 W124,125<2:0> X126,X127 W126,127<2:0>

∑ ∑ ∑ ∑Delayi ∝ ∝ ∝ ∝

SPIKE

slide-9
SLIDE 9

Proposed Time-based Neural Net

  • Input pixel: Xi

− Determines whether a stage is activated or not.

  • Weight: Wi<2:0>

− Determines how many capacitors are turned on as load in that stage.

9

SRAM SRAM

Programmable Delay Stage

Xi

SRA M

Wi<2:0> 4C

SRA M

WL Xi wi<2> wi<1> wi<0> 2C C 3 SRAM cells 5.9µm 8.1µm 3 SRAM cells

SRA M SRA M

Xi Xi

Unit cell layout (2 stages)

*BL,BLB omitted for simplicity

slide-10
SLIDE 10

64x128 Time-based Neural Network

  • 8 DCO cores are grouped

together to implement local lateral inhibition.

  • 64

DCO neuromorphic cores in total.

  • 121 out of 128 DCO stages

are used as programmable inputs.

  • Remaining

7 stages are reserved for calibration.

10

slide-11
SLIDE 11

Frequency Calibration and Linearity Test

  • Frequency variation between 10 DCOs

− Before calibration: 1.17%, after calibration: 0.10%.

11

Frequency (a.u.)

1 1 1 1 1 1 1 1 1 1 1 1

Frequency (a.u.)

slide-12
SLIDE 12
  • Leaky neuron: Ions diffuse through the neuron cell .
  • Local

lateral inhibition: Active neuron strives to suppress the activities of its neighbors.

12

Lateral inhibition: Mach band illusion[4] Electrical modeling of cell membrane[3]

[3] W. Gerstner, et al., Neuronal Dynamics. [4] Wikipedia.

Bio-Inspired Features: Leaky Neuron and Local Lateral Inhibition (LLI)

slide-13
SLIDE 13

Time-based Leak and LLI

13

D Q QB D Q QB rst D Q QB rst D Q QB rst

Compare & Fire SPIKE

Time-based Leaky Integrate & Fire Neuron Time-based Local Lateral Inhibition (LLI)

C

Threshold

LEAK LLI

SPIKE<0> SPIKE<1> SPIKE<2> SPIKE<7>

LEAK (LSB reset) C0 C1 C6 C7

rst

From DCO Neighbor counter bit reset

Ʃ

+

  • Ʃ

+

  • Ʃ

+

  • Ʃ

+

  • Ʃ

+

  • Leak enabled:

− LSB of every counter is reset periodically.

  • LLI enabled:

− Specific bits in the neighboring counters are reset after a DCO spikes. − The fastest DCO resets the other DCOs more

  • ften than it is reset by
  • thers.
slide-14
SLIDE 14

Leak and LLI

  • Leak: Uniformly lower spiking frequency.
  • LLI: Preferentially lower spiking frequency.
  • Goal: Higher contrast between different neuron outputs.

14

*None: No leak and no LLI, basic DCO operation.

DCO No. Spike Frequency DCO No. Spike Frequency

*None LEAK

Sharper contrast

*None LLI

Sharper contrast

slide-15
SLIDE 15

Handwritten Digit Recognition

15

  • Input database: MNIST.
  • Learning method: Supervised learning.
  • Learning network: Single-layer & multi-layer perceptron

network.

slide-16
SLIDE 16

Single-layer Digit Recognition

16

  • Single-layer

architecture: Proof-of-concept for time- based neural network

slide-17
SLIDE 17

Multi-layer Digit Recognition

17

  • Multi-layer architecture: Demonstrates the scalability of

the core.

slide-18
SLIDE 18

Measurement Results

18

*None: No leak and no LLI, basic DCO operation. 65nm LP CMOS, 1.2V, 25oC 82 84 86 88 90 92 94 Single-layer with 11x11 images Two-layer with 11x11 images Two-layer with 4-patch 22x22 images

Recognition Accuracy (%)

Measured (*None) Simulation Measured (Leaky)

  • Measured

recognition accuracy from hardware is comparable to software simulation results.

slide-19
SLIDE 19

Measurement Results

19

9 100 300 500 700 900 1100 1300 1500 1 2 3 4 5 6 7 8

Spike Count

1700 *None LLI

1.7% 17.7%

(Target)

Digit

65nm LP CMOS, 1.2V, 25oC

  • Spike count difference between digit “2” and “0”

− Without LLI: 1.7%, with LLI: 17.7 %.

slide-20
SLIDE 20

Measurement Results

20

70 140 210 280 350

Power (µW)

0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

Supply voltage (V)

20 40 60 80 100

Frequency (MHz)

DCO Frequency (MHz) Power (µW per DCO)

65nm LP CMOS, 25oC

  • Wide operating range: 0.7V ~ 1.2V.
slide-21
SLIDE 21

Performance Comparison

21

[5] D. Miyashita, et al., ASSCC, 2017. [6] K. J. Lee, et al., ISSCC, 2016. [7] J. K. Kim, et al., VLSI, 2015.

This work Application Hand writing recognition Technology 65nm Area 0.24mm2 (64 DCOs) Voltage 1.2V Frequency 99MHz (nominal DCO freq.) Function Multi-layer perceptron network Performance Comparison 16.6GE/PEc Power ISSCC’16 [6] Object detection + intention prediction 65nm 16.0mm2 1.2V 250MHz Deep neural network 330mW Power Efficiency 309G ÷ N spikes/s/W (N=spiking thresholda)

  • 320.4 µW/DCO

Circuit Type Time-based Analog + Digital VLSI’15 [7] Object Recognition 65nm 1.8mm2 0.45V Spiking LCA with classification 5.7pJ/pixel (memory+logic) 3.65mW

  • Digital

40MHz (Inference) ASSCC’17 [5] Hand writing recognition 65nm 3.61mm2 (32K PEs)

  • Convolutional

neural network 48.2TSOp/s/W

  • Time-based
  • 862GOPS/W

862GOPS/W

  • 5.7pJ/pixel

(memory+logic)

  • 37.4TOPS/Wd

0.43pJ/pixel (logic)e Note

  • a. N=16 in our measurements.
  • b. SOp/s/W: Synaptic operation

(SOp). In DCO based time- domain neural network, one

  • scillation of DCO is equivalent

to 121 SOp.

  • c. 1GE: 1.44um2(65nm). PE:

processing element.

  • d. Operation: One operation is

defined as one multiplication and accumulation (MAC). In DCO based time-domain neural network,

  • ne
  • scillation
  • f

DCO is equivalent to 121 3-bit MAC.

  • e. Used spiking threshold of 16,

and only accounted for the power consumption of core logic circuits, memory power is not included, since weight is not updated during the inference. Hardware Efficiency

  • 76.5GE/PE
  • 76.5GE/PE

48.2TSOp/s/W 37.4TSOp/s/Wb

slide-22
SLIDE 22

Die Photo and Performance Summary

22

slide-23
SLIDE 23

Conclusion

  • Neural network function is computed in time

domain using standard digital circuits with high area and power efficiency.

  • Implemented

brain-inspired leak and local lateral inhibition features to enhance the contrast between neuron outputs.

  • 65nm test chip measurements confirm 91%

hand-written digit recognition accuracy.

23