neuro inspired processor design for
play

Neuro-Inspired Processor Design for On-Chip Learning and - PowerPoint PPT Presentation

Neuro-Inspired Processor Design for On-Chip Learning and Classification with CMOS and Resistive Synapses Jae-sun Seo School of ECEE, Arizona State University The 13 th Korea-U.S. Forum on Nanotechnology September 26, 2016 1 ML Literature


  1. Neuro-Inspired Processor Design for On-Chip Learning and Classification with CMOS and Resistive Synapses Jae-sun Seo School of ECEE, Arizona State University The 13 th Korea-U.S. Forum on Nanotechnology September 26, 2016 1

  2. ML Literature (DNN) Neuromorphic (SNN) Courtesy: Nuance Song, PLoS Biol. 2005 ● Dense connectivity ● Sparse connectivity ● Learning done offline ● Online learning ● Back-propagation ● STDP, SRDP, Reward (requires labeled data) (biological evidence) ● MNIST 99.79%, ImageNet 95% ● MNIST 99.08%, ImageNet N/A ● What about unlabeled data ● Cont. learning & detection ● Adaptable for input change or customization? ● Full computation on each layer ● Sparse spiking, attention → high power → low power 2

  3. Neuromorphic Core with On-Chip STDP fully functional 20X retention mode ● Under STDP learning, when neuron K spikes, all Slim neuron Base design variant 2.05mm synapses on row K and column K may update 64K 64K synapse synapse array array ● Transposable SRAM: single-cycle read & write in 2.05mm both row and col. directions 4-b synapse Low leakage ● Efficient pre- and post-synaptic update variant variant 256K 64K ● synapse synapse Near threshold operation array array ● Pattern recognition Seo, CICC, 2011 3

  4. Versatile Learning in Neuromorphic Core Various STDP Learning Rules (Feldman, Neuron 2012) pre-synaptic post-synaptic neurons neurons cnt. N1_0 N3_0 cnt. LTP LTD wa0 wb0 cnt. N1_1 N3_1 cnt. wa1 wb1 N2 wa2 wb2 cnt. N1_2 N3_2 cnt. wb3 wa3 spike When N3 sp cnt. N1_3 N3_3 cnt. wb* synaps when Δ w Δ w subject to L N3 LTP: spikes w = w + [pre cnt.] + [post cnt.] Δ t Δ t LTD: w = w – [post cnt.] – [pre cnt.] Multi-factor Triplet-STDP ● A versatile neurosynaptic core to support various learning rules, large fan-in/-out, sparse connectivity ● Triplet STDP ( Pfister, J. of Neuroscience, 2006, Gjorgjieva, PNAS 2011 ) ● post-pre-post: post nrn. spike & pre nrn. timing & post nrn. timing ● pre-post-pre: pre nrn. spike & post nrn. timing & pre nrn. Timing 4

  5. Feedforward Excitation & Inhibition Layer (i+1) neurons Axons w/ timing info. Synapses: TX => Inhib. Inhibition Synapse Array decoder Layer (i) neurons 1024x256 spike packet connection recurrent Synapses Inh. nrn Inh. => RX neuron 256 spike timing spike Neurons info. packet [1] Diehl, Front. of Neuroscience, 2015 ● Joint feed-forward excitation and inhibition ● For a small number of inhibitory neurons, add pre=>inh, inh=>post synapses ● Balance excitatory & inhibitory synaptic inputs 5 Vogels, Science, 2011

  6. Neural Spike Sorting Processor (for deep brain sensing & stimulation) Detection & Sorting Input: Clustering Output Alignment Processor Raw Signal neuromorphic ● Signals from invasive electrodes: spikes from multiple neurons ● Online, unsupervised neuromorphic spike-sorting processor Collaboration with Columbia University (ISLPED 2015) I 1 H 1 I 2 ● Weight update through STDP I 3 H 2 Z 1 ● Start with K=2, automatically Encoder increases # of output neurons if ... H 3 I 8 bits ... the spike difference is large enough (self-organized map) Z K 32 samples I m H N 6

  7. Exp. Results: Clustering Accuracy Receptive field of dataset that contains 4 clusters in 3000 spikes 100 Proposed. Avg acc.= 91% Osort based. Avg acc.=69% 100 26  W/ch Input Neurons Frequency(MHz) 10 Output 80 Neurons 70 spikes/s/neuron ([4]) Accuracy(%) 60 1 2.5 spikes/s/neuron 9.3  W/ch Others 40 Synapse (D2, D3, D4, D4*) 0.1 Array 20 Decoder Output 0.01 0 Neurons 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 D2 D3 D4 D4* W-D1 W-D2 VDD (V) Dataset • 65nm GP, high-Vth, 0.5x0.5mm 2 Spike sorting accuracy more • 9.3µW/ch at 0.3V reliable than other low-complexity algorithms such as O-sort • Layout of the design is dominated by memory Avg. accuracy: 91% vs. 69% elements, as well as power. 7

  8. Neuromorphic Computing w/ NVMs ● Emerging NVMs (e.g. RRAM) could alleviate power/area bottleneck of conv. memories ● Read rows in parallel: weighted sum current ● Peripheral CMOS read: current-to-digital converter 0.53 130nm Voltage (V) V in V in 0.50 RRAM array + 1.5 RE CMOS read circuits RE (under testing) V spike V spike 0.0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (ns) Simulation results for 4ns read timing window 8

  9. Summary ● Neuromorphic computing hardware ● 45nm testchip with on-chip STDP learning ● Versatile learning neuromorphic core & architecture ● 65nm spike clustering processor ● Emerging NVM arrays + peripheral read/write circuits ● Future research with circuit-device-architecture co- design and optimization 9

  10. Collaborators ● ASU ● Faculty: Yu Cao, Shimeng Yu, Chaitali Chakrabarti, Sarma Vrudhula, Visar Berisha ● Students: Minkyu Kim, Deepak Kadetotad, Shihui Yin, Abinash Mohanty, Yufei Ma ● Intel: Gregory Chen, Ram Krishnamurthy ● Columbia University: Mingoo Seok, Qi Wang 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend