Stanford University 2015.04.15 H.-S. Philip Wong 1
Stanford University
Department of Electrical Engineering
2007.11.08 Stanford SystemX Alliance
Brain-Inspired Computing H.-S. Philip Wong Stanford University - - PowerPoint PPT Presentation
The N3XT Technology for Stanford University Brain-Inspired Computing H.-S. Philip Wong Stanford University Stanford SystemX Alliance 2007.11.08 Department of Electrical Engineering Stanford University 1 H.-S. Philip Wong 2015.04.15
Stanford University 2015.04.15 H.-S. Philip Wong 1
Stanford University
Department of Electrical Engineering
2007.11.08 Stanford SystemX Alliance
Stanford University 2015.04.15 H.-S. Philip Wong 2
Source: Google
Stanford University 2015.04.15 H.-S. Philip Wong 3
Source: vrworld.com
Stanford University 2015.04.15 H.-S. Philip Wong 4
Source: BDC Magazine
Stanford University 2015.04.15 H.-S. Philip Wong 5
1988 Winter Olympic Games in Calgary, Canada
Stanford University 2015.04.15 H.-S. Philip Wong 6
Stanford University 2015.04.15 H.-S. Philip Wong 7
Source: Gogamguro.com
Stanford University 2015.04.15 H.-S. Philip Wong 8
Source: Google
Stanford University 2015.04.15 H.-S. Philip Wong 9
Application Hardware used
Estimated power consumption
Emulating 4.5% of human brain:1013 synapses, 109 neurons Blue Gene/P: 36,864 nodes, 147,456 cores 2.9 MW (LINPACK) Deep sparse autoencoder: 109 synapses, 10M images 1,000 CPUs (16,000 cores) ~100 kW (cores only) Convolutional neural net with 60M synapses, 650K neurons 2 GPUs 1,200 W Restricted Boltzmann Machine: 28M synapses; 69,888 neurons GPU 550 W CPU 65 W Processing 1 s of speech using deep neural network GPU 238 W CPU (4 cores) 80 W
Large scale Small to moderate scale
Stanford University 2015.04.15 H.-S. Philip Wong 10
Stanford University 2015.04.15 H.-S. Philip Wong 11
Stanford University 2015.04.15 H.-S. Philip Wong 12
Biology-based models / algorithms Conventional ML algorithms
Stanford University 2015.04.15 H.-S. Philip Wong 13
Conventional hardware (CPU, GPU, supercomputers, etc) Neuromorphic hardware
Stanford University 2015.04.15 H.-S. Philip Wong 14
Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware
Stanford University 2015.04.15 H.-S. Philip Wong 15
Biology-based models / algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3]
Stanford University 2015.04.15 H.-S. Philip Wong 16
Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]
Stanford University 2015.04.15 H.-S. Philip Wong 17
Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19]
Stanford University 2015.04.15 H.-S. Philip Wong 18
Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19] Hebbian learning Spike-based ANN PCM, RRAM, CBRAM
Stanford University 2015.04.15 H.-S. Philip Wong 19
Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19] Hebbian learning Spike-based ANN PCM, RRAM, CBRAM ANN, RBM, sparse learning PCM, RRAM
Stanford University 2015.04.15 H.-S. Philip Wong 21
Stanford University 2015.04.15 H.-S. Philip Wong 22
Stanford University 2015.04.15 H.-S. Philip Wong 23
Stanford University 2015.04.15 H.-S. Philip Wong 24
thermal thermal MRAM Quick access 3D Resistive RAM Massive storage
thermal 1D CNFET, 2D FET Compute, RAM access 1D CNFET, 2D FET Compute, RAM access 1D CNFET, 2D FET Compute, Power, Clock
Stanford University 2015.04.15 H.-S. Philip Wong 25
Aly et al., IEEE Computer, 2015
Stanford University 2015.04.15 H.-S. Philip Wong 26
Metal oxide resistive switching memory (RRAM)
25nm 12 nm
TiOx/ HfOx TiN
10 nm
SiO2
filament
Top Electrode Bottom Electrode metal
vacancy
poly c-GST amorphous SiO2 TiN TiN partially reset state
Phase change memory (PCM)
BottomElectrode Top Electrode
isolation switching region phase change material
Conductive bridge memory (CBRAM)
filament Bottom Electrode solid electrolyte Active Top Electrode metal atoms
Cu ion Electrolyte Bottom electrode
Stanford University 2015.04.15 H.-S. Philip Wong 27
Metal oxide resistive switching memory (RRAM)
25nm 12 nm
TiOx/ HfOx TiN
10 nm
SiO2
poly c-GST amorphous SiO2 TiN TiN partially reset state
Phase change memory (PCM) Conductive bridge memory (CBRAM)
Cu ion Electrolyte Bottom electrode
Stanford University 2015.04.15 H.-S. Philip Wong 28
500 1000 1500 2000 2500 10
3
10
4
10
5
Resistance (Ohm) Pulse Number 2100 2200 2300 10
3
10
4
10
5
Resistance (Ohm) Pulse Number
Requirement: analog resistance change
Stanford University 2015.04.15 H.-S. Philip Wong 29
20 40 60
40 80 120
=-11.3 =-18.6 =-29 =11 =20.7
LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3
Synaptic weight change w (%) Spike timing t (ms)
=29.5
Various time constants
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
Various STDP kernels
20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)
Number of pre/post spike pairs
Weight update saturation
Stanford University 2015.04.15 H.-S. Philip Wong 30
20 40 60
40 80 120
=-11.3 =-18.6 =-29 =11 =20.7
LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3
Synaptic weight change w (%) Spike timing t (ms)
=29.5
Various time constants
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
Various STDP kernels
20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)
Number of pre/post spike pairs
Weight update saturation
Stanford University 2015.04.15 H.-S. Philip Wong 31
20 40 60
40 80 120
=-11.3 =-18.6 =-29 =11 =20.7
LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3
Synaptic weight change w (%) Spike timing t (ms)
=29.5
Various time constants
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
Various STDP kernels
20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)
Number of pre/post spike pairs
Weight update saturation
Stanford University 2015.04.15 H.-S. Philip Wong 32
20 40 60
40 80 120
=-11.3 =-18.6 =-29 =11 =20.7
LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3
Synaptic weight change w (%) Spike timing t (ms)
=29.5
Various time constants
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
50
50 100
w (%) t (ms)
Various STDP kernels
20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)
Number of pre/post spike pairs
Weight update saturation
Stanford University 2015.04.15 H.-S. Philip Wong 33
(e.g., D = 10,000)
vector algebra:
– Multiplication for Binding – Addition for Bundling
further composition
exact ones using content-addressable memory
ENIGMA Elements of Hyper dimensional Computing: Also known as vector symbolic architectures or holographic reduced representation (Kanerva, Cognitive Computation, 1(2):139-159,2009)
Stanford University 2015.04.15 H.-S. Philip Wong 34
011101010101…..011101010100….. 10110010010…..101010011110…..
Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...
Projected into hyper-dimensional space
Random vectors: 1k ~ 10k bits
MAP Kernels
(Multiplication-Addition-Permutation)
v1 v2, sum(v1, v2,…), sum(v1), perm(v1) unknown
‘closest’?
Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference
Stanford University 2015.04.15 H.-S. Philip Wong 35
011101010101…..011101010100….. 10110010010…..101010011110…..
Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...
Projected into hyper-dimensional space
Random vectors: 1k ~ 10k bits
MAP Kernels
(Multiplication-Addition-Permutation)
v1 v2, sum(v1, v2,…), sum(v1), perm(v1) unknown
‘closest’?
Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference
Algorithms
Stanford University 2015.04.15 H.-S. Philip Wong 36
011101010101…..011101010100….. 10110010010…..101010011110…..
Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...
Projected into hyper-dimensional space
Random vectors: 1k ~ 10k bits
MAP Kernels
(Multiplication-Addition-Permutation)
v1 v2, sum(v1, v2,…), sum(v1), perm(v1) unknown
‘closest’?
Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference
Algorithms System and Circuit Design
Stanford University 2015.04.15 H.-S. Philip Wong 37
011101010101…..011101010100….. 10110010010…..101010011110…..
Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...
Projected into hyper-dimensional space
Random vectors: 1k ~ 10k bits
MAP Kernels
(Multiplication-Addition-Permutation)
v1 v2, sum(v1, v2,…), sum(v1), perm(v1) unknown
‘closest’?
Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference
Algorithms System and Circuit Design Associative memory enabled by novel device technologies
Stanford University 2015.04.15 H.-S. Philip Wong 38
3D RRAM + low power access transistors + address decoders Low power computation High-density Inter-layer vias
Stanford University 2015.04.15 H.-S. Philip Wong 39
50 nm TiN (20 nm) TiN/Ti (50 nm)
TiN TiN TiN (BE) TiN
TiN/Ti
(TE) Layer 1 (L1) Layer 2 (L2) Layer 3 (L3) Layer 4 (L4)
3D RRAM with FinFET BL select
Stanford University 2015.04.15 H.-S. Philip Wong 40
1 1 1 1 1 1 1 1
Multiplication
Multiplication {-1, 1} system XOR {0, 1} system equiv.
1 1 1 1 1 1 1 1 1
Addition
1 1 Permutation
10 1k 100k 10M 1G 100G 10 20 30 40 50 60
Current (A) Addition Cycle (#)
Symbol: 1-pillar exp. Line: 2-pillar emulated 11111111 01111111 0011111 1 00011111 1111 0111 0011 0001 0000
1 1 1
1 2 3 41
VDD gnd VDD gnd VDD gnd
L1 L2 L4 L3
200 ns
(a)
1 1 1 1 1 1 1 1 1 1 1 1
VDD gnd gnd VDD VDD gnd
200 ns
L1 L2 L4 L3
(b)
Measured HRS (400kΩ-1MΩ) Measured LRS (~10kΩ)
11
Measured data on 4-layer 3D vertical RRAM 1k 1M 1G 1T
1M 100k 10k
Resistance Logic Evaluation
C = 0 D = 1
A B C D 1 1 Input AB = pillar addr. = 10
Logic Evaluation Cycle (#) Resistance ()
Stanford University 2015.04.15 H.-S. Philip Wong 41
1 1
Stanford University 2015.04.15 H.-S. Philip Wong 42
1 1
Aly et al., IEEE Computer, 2015
Stanford University 2015.04.15 H.-S. Philip Wong 43
Stanford University 2015.04.15 H.-S. Philip Wong 44
Gert Cauwenberghs Siddharth Joshi Emre Neftci (UC San Diego) Jinfeng Kang (Peking U) Chung Lam SangBum Kim Matt Brightsky K.S. Lee, J.M. Shieh, W.K. Yen… (NDL, Taiwan)
Stanford University 2015.04.15 H.-S. Philip Wong 45
Non-Volatile Memory Technology Research Initiative E2CDA “Engima”
Stanford University 2015.04.15 H.-S. Philip Wong 46
Stanford University 2015.04.15 H.-S. Philip Wong 47
Stanford University 2015.04.15 H.-S. Philip Wong 48
poly c-GST amorphous SiO2 TiN TiN partially reset state
Stanford University 2015.04.15 H.-S. Philip Wong 49
1. Functionality → performance/Watt, performance/m2 → variability → reliability 2. Scale up (system size), scale down (device size) 3. Role of variability (functionality, performance) 4. Fan-in / fan-out, hierarchical connections, power delivery 5. Low voltage (wire energy ≅ device energy) 6. Stochastic learning behavior → statistical learning rules 7. Meta-plasticity (internal state variables) 8. Timing as an internal variable 9. Learning rules: biological? AI?
monolithic 3D integration a must, MUST be low temperature