Brain-Inspired Computing H.-S. Philip Wong Stanford University - - PowerPoint PPT Presentation

brain inspired computing
SMART_READER_LITE
LIVE PREVIEW

Brain-Inspired Computing H.-S. Philip Wong Stanford University - - PowerPoint PPT Presentation

The N3XT Technology for Stanford University Brain-Inspired Computing H.-S. Philip Wong Stanford University Stanford SystemX Alliance 2007.11.08 Department of Electrical Engineering Stanford University 1 H.-S. Philip Wong 2015.04.15


slide-1
SLIDE 1

Stanford University 2015.04.15 H.-S. Philip Wong 1

Stanford University

Department of Electrical Engineering

2007.11.08 Stanford SystemX Alliance

The N3XT Technology for Brain-Inspired Computing

H.-S. Philip Wong Stanford University

slide-2
SLIDE 2

Stanford University 2015.04.15 H.-S. Philip Wong 2

Source: Google

slide-3
SLIDE 3

Stanford University 2015.04.15 H.-S. Philip Wong 3

Source: vrworld.com

slide-4
SLIDE 4

Stanford University 2015.04.15 H.-S. Philip Wong 4

Source: BDC Magazine

slide-5
SLIDE 5

Stanford University 2015.04.15 H.-S. Philip Wong 5

1988 Winter Olympic Games in Calgary, Canada

slide-6
SLIDE 6

Stanford University 2015.04.15 H.-S. Philip Wong 6

slide-7
SLIDE 7

Stanford University 2015.04.15 H.-S. Philip Wong 7

Source: Gogamguro.com

slide-8
SLIDE 8

Stanford University 2015.04.15 H.-S. Philip Wong 8

100’s of kW

Source: Google

slide-9
SLIDE 9

Stanford University 2015.04.15 H.-S. Philip Wong 9

Application Hardware used

Estimated power consumption

Emulating 4.5% of human brain:1013 synapses, 109 neurons Blue Gene/P: 36,864 nodes, 147,456 cores 2.9 MW (LINPACK) Deep sparse autoencoder: 109 synapses, 10M images 1,000 CPUs (16,000 cores) ~100 kW (cores only) Convolutional neural net with 60M synapses, 650K neurons 2 GPUs 1,200 W Restricted Boltzmann Machine: 28M synapses; 69,888 neurons GPU 550 W CPU 65 W Processing 1 s of speech using deep neural network GPU 238 W CPU (4 cores) 80 W

Large scale Small to moderate scale

Scale Up Requires Energy Efficiency

  • S. B. Eryilmaz et al., IEDM 2015
slide-10
SLIDE 10

Stanford University 2015.04.15 H.-S. Philip Wong 10

These nanotechnology innovations will have to be developed in close coordination with new computer architectures, and will likely be informed by our growing understanding of the brain—a remarkable, fault- tolerant system that consumes less power than an incandescent light bulb.

slide-11
SLIDE 11

Stanford University 2015.04.15 H.-S. Philip Wong 11

Approaches of Neuromorphic Hardware

slide-12
SLIDE 12

Stanford University 2015.04.15 H.-S. Philip Wong 12

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional ML algorithms

slide-13
SLIDE 13

Stanford University 2015.04.15 H.-S. Philip Wong 13

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc) Neuromorphic hardware

slide-14
SLIDE 14

Stanford University 2015.04.15 H.-S. Philip Wong 14

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware

slide-15
SLIDE 15

Stanford University 2015.04.15 H.-S. Philip Wong 15

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3]

slide-16
SLIDE 16

Stanford University 2015.04.15 H.-S. Philip Wong 16

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

slide-17
SLIDE 17

Stanford University 2015.04.15 H.-S. Philip Wong 17

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19]

slide-18
SLIDE 18

Stanford University 2015.04.15 H.-S. Philip Wong 18

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19] Hebbian learning Spike-based ANN PCM, RRAM, CBRAM

slide-19
SLIDE 19

Stanford University 2015.04.15 H.-S. Philip Wong 19

Approaches of Neuromorphic Hardware

Biology-based models / algorithms Conventional ML algorithms Conventional hardware (CPU, GPU, supercomputers, etc) with analog non- volatile memory synapses Neuromorphic hardware Brain emulation on BlueGene [7] HTM [3] “Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13] Human Brain Project [20] TrueNorth [16] SpiNNaker [19] Hebbian learning Spike-based ANN PCM, RRAM, CBRAM ANN, RBM, sparse learning PCM, RRAM

slide-20
SLIDE 20

Stanford University 2015.04.15 H.-S. Philip Wong 21

Many of these breakthroughs will require new kinds of nanoscale devices and materials integrated into three-dimensional systems and may take a decade or more to achieve.

slide-21
SLIDE 21

Stanford University 2015.04.15 H.-S. Philip Wong 22

Memory Ultra-dense vertical connections

N3XT Nanosystems

Computation immersed in memory

Computing logic

slide-22
SLIDE 22

Stanford University 2015.04.15 H.-S. Philip Wong 23

Memory Ultra-dense vertical connections

N3XT Nanosystems

Computation immersed in memory Impossible with today’s technologies

Computing logic

slide-23
SLIDE 23

Stanford University 2015.04.15 H.-S. Philip Wong 24

N3XT: Computation Immersed in Memory

thermal thermal MRAM Quick access 3D Resistive RAM Massive storage

Not TSV

thermal 1D CNFET, 2D FET Compute, RAM access 1D CNFET, 2D FET Compute, RAM access 1D CNFET, 2D FET Compute, Power, Clock

Silicon compatible

Ultra-dense, fine-grained vias

slide-24
SLIDE 24

Stanford University 2015.04.15 H.-S. Philip Wong 25

Aly et al., IEEE Computer, 2015

slide-25
SLIDE 25

Stanford University 2015.04.15 H.-S. Philip Wong 26

Non-Volatile Memory (NVM)

  • D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm 12 nm

TiOx/ HfOx TiN

10 nm

SiO2

filament

  • xygen ion

Top Electrode Bottom Electrode metal

  • xide
  • xygen

vacancy

poly c-GST amorphous SiO2 TiN TiN partially reset state

Phase change memory (PCM)

BottomElectrode Top Electrode

  • xide

isolation switching region phase change material

Conductive bridge memory (CBRAM)

filament Bottom Electrode solid electrolyte Active Top Electrode metal atoms

Cu ion Electrolyte Bottom electrode

slide-26
SLIDE 26

Stanford University 2015.04.15 H.-S. Philip Wong 27

Non-Volatile Memory (NVM) → Synapse

  • Analog programmable
  • Scalable to a few nm
  • Stack in 3D
  • D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm 12 nm

TiOx/ HfOx TiN

10 nm

SiO2

poly c-GST amorphous SiO2 TiN TiN partially reset state

Phase change memory (PCM) Conductive bridge memory (CBRAM)

Cu ion Electrolyte Bottom electrode

slide-27
SLIDE 27

Stanford University 2015.04.15 H.-S. Philip Wong 28

500 1000 1500 2000 2500 10

3

10

4

10

5

Resistance (Ohm) Pulse Number 2100 2200 2300 10

3

10

4

10

5

Resistance (Ohm) Pulse Number

100-step grey scale (1% resolution) Partial RESET Partial SET Phase change synapse

Nanoscale Memory as Synaptic Weights

Synaptic updates in the brain: basis for learning

Requirement: analog resistance change

  • D. Kuzum et al., Nano Lett., p. 2179 (2012)
slide-28
SLIDE 28

Stanford University 2015.04.15 H.-S. Philip Wong 29

Nanoscale Memory Can Emulate Biological Synaptic Behavior

  • 60
  • 40
  • 20

20 40 60

  • 40

40 80 120

=-11.3 =-18.6 =-29 =11 =20.7

LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3

Synaptic weight change w (%) Spike timing t (ms)

=29.5

Various time constants

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

Various STDP kernels

20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

  • D. Kuzum et al., Nano Lett., p. 2179 (2012)
slide-29
SLIDE 29

Stanford University 2015.04.15 H.-S. Philip Wong 30

Nanoscale Memory Can Emulate Biological Synaptic Behavior

  • 60
  • 40
  • 20

20 40 60

  • 40

40 80 120

=-11.3 =-18.6 =-29 =11 =20.7

LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3

Synaptic weight change w (%) Spike timing t (ms)

=29.5

Various time constants

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

Various STDP kernels

20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

  • D. Kuzum et al., Nano Lett., p. 2179 (2012)
slide-30
SLIDE 30

Stanford University 2015.04.15 H.-S. Philip Wong 31

Nanoscale Memory Can Emulate Biological Synaptic Behavior

  • 60
  • 40
  • 20

20 40 60

  • 40

40 80 120

=-11.3 =-18.6 =-29 =11 =20.7

LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3

Synaptic weight change w (%) Spike timing t (ms)

=29.5

Various time constants

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

Various STDP kernels

20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

  • D. Kuzum et al., Nano Lett., p. 2179 (2012)
slide-31
SLIDE 31

Stanford University 2015.04.15 H.-S. Philip Wong 32

Nanoscale Memory Can Emulate Biological Synaptic Behavior

  • 60
  • 40
  • 20

20 40 60

  • 40

40 80 120

=-11.3 =-18.6 =-29 =11 =20.7

LTP-1 LTP-2 LTP-3 LTD-1 LTD-2 LTD-3

Synaptic weight change w (%) Spike timing t (ms)

=29.5

Various time constants

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

  • 50

50

  • 50

50 100

w (%) t (ms)

Various STDP kernels

20 40 60 80 100 20 40 60 80 100 120 140 10 ms 25 ms 15 ms 35 ms 20 ms 45 ms Synaptic Weight Change w (%)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

  • D. Kuzum et al., Nano Lett., p. 2179 (2012)
slide-32
SLIDE 32

Stanford University 2015.04.15 H.-S. Philip Wong 33

Hyper Dimensional (HD) Computing

  • Information is represented by High-dimensional representation

(e.g., D = 10,000)

  • Variables and values are combined into a “holistic” record using

vector algebra:

– Multiplication for Binding – Addition for Bundling

  • Composed vector can in turn become a component in

further composition

  • Holistic record is decoded with (inverse) multiplication
  • Approximate results of vector operations are identified with

exact ones using content-addressable memory

ENIGMA Elements of Hyper dimensional Computing: Also known as vector symbolic architectures or holographic reduced representation (Kanerva, Cognitive Computation, 1(2):139-159,2009)

slide-33
SLIDE 33

Stanford University 2015.04.15 H.-S. Philip Wong 34

HD Computing layers

011101010101…..011101010100….. 10110010010…..101010011110…..

Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels

(Multiplication-Addition-Permutation)

v1  v2, sum(v1, v2,…), sum(v1), perm(v1) unknown

‘closest’?

Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference

slide-34
SLIDE 34

Stanford University 2015.04.15 H.-S. Philip Wong 35

HD Computing layers

011101010101…..011101010100….. 10110010010…..101010011110…..

Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels

(Multiplication-Addition-Permutation)

v1  v2, sum(v1, v2,…), sum(v1), perm(v1) unknown

‘closest’?

Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference

Algorithms

slide-35
SLIDE 35

Stanford University 2015.04.15 H.-S. Philip Wong 36

HD Computing layers

011101010101…..011101010100….. 10110010010…..101010011110…..

Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels

(Multiplication-Addition-Permutation)

v1  v2, sum(v1, v2,…), sum(v1), perm(v1) unknown

‘closest’?

Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference

Algorithms System and Circuit Design

slide-36
SLIDE 36

Stanford University 2015.04.15 H.-S. Philip Wong 37

HD Computing layers

011101010101…..011101010100….. 10110010010…..101010011110…..

Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences... Letters/Image features/ Phonemes/DNA sequences...

Projected into hyper-dimensional space

Random vectors: 1k ~ 10k bits

MAP Kernels

(Multiplication-Addition-Permutation)

v1  v2, sum(v1, v2,…), sum(v1), perm(v1) unknown

‘closest’?

Measure ‘distance’: learned & unseen vectors (recognition/classification/reasoning/etc.) Application Representation Computation Inference

Algorithms System and Circuit Design Associative memory enabled by novel device technologies

slide-37
SLIDE 37

Stanford University 2015.04.15 H.-S. Philip Wong 38

Hyperdimensional (HD) Computing

  • Monolithic 3D enables

– Energy-efficient classification – Area efficient HD projection ➔ use RRAM variability & stochastic switching

3D RRAM + low power access transistors + address decoders Low power computation High-density Inter-layer vias

slide-38
SLIDE 38

Stanford University 2015.04.15 H.-S. Philip Wong 39

3D Enables In-Memory Computing

50 nm TiN (20 nm) TiN/Ti (50 nm)

TiN TiN TiN (BE) TiN

TiN/Ti

(TE) Layer 1 (L1) Layer 2 (L2) Layer 3 (L3) Layer 4 (L4)

3D RRAM with FinFET BL select

  • H. Li et al., Symp. VLSI Tech., 2016
slide-39
SLIDE 39

Stanford University 2015.04.15 H.-S. Philip Wong 40

1 1 1 1 1 1 1 1

Multiplication

Multiplication {-1, 1} system XOR {0, 1} system equiv.

MAP Kernels: 3D RRAM Approach

Key HD operations: multiplication, addition, permutation

1 1 1 1 1 1 1 1 1

Addition

1 1 Permutation

10 1k 100k 10M 1G 100G 10 20 30 40 50 60

Current (A) Addition Cycle (#)

Symbol: 1-pillar exp. Line: 2-pillar emulated 11111111 01111111 0011111 1 00011111 1111 0111 0011 0001 0000

1 1 1

1 2 3 4

1

VDD gnd VDD gnd VDD gnd

L1 L2 L4 L3

200 ns

(a)

1 1 1 1 1 1 1 1 1 1 1 1

VDD gnd gnd VDD VDD gnd

200 ns

L1 L2 L4 L3

(b)

Measured HRS (400kΩ-1MΩ) Measured LRS (~10kΩ)

1

1

Measured data on 4-layer 3D vertical RRAM 1k 1M 1G 1T

1M 100k 10k

Resistance  Logic Evaluation

C = 0 D = 1

A B C D 1 1 Input AB = pillar addr. = 10

Logic Evaluation Cycle (#) Resistance ()

slide-40
SLIDE 40

Stanford University 2015.04.15 H.-S. Philip Wong 41

3D Integration of Memory and Logic Circuits within the Same Layer & Across Layers

Logic (Si CMOS) Neuron circuits Communication Synapses/Weights (CNFET/2D FET) + RRAM Synapses/Weights (3D RRAM)

1 1

slide-41
SLIDE 41

Stanford University 2015.04.15 H.-S. Philip Wong 42

Nano-Engineered Computing Systems Technology

1 1

Aly et al., IEEE Computer, 2015

slide-42
SLIDE 42

Stanford University 2015.04.15 H.-S. Philip Wong 43

Students and Post-Docs

slide-43
SLIDE 43

Stanford University 2015.04.15 H.-S. Philip Wong 44

Collaborators

Gert Cauwenberghs Siddharth Joshi Emre Neftci (UC San Diego) Jinfeng Kang (Peking U) Chung Lam SangBum Kim Matt Brightsky K.S. Lee, J.M. Shieh, W.K. Yen… (NDL, Taiwan)

slide-44
SLIDE 44

Stanford University 2015.04.15 H.-S. Philip Wong 45

Sponsors

Non-Volatile Memory Technology Research Initiative E2CDA “Engima”

slide-45
SLIDE 45

Stanford University 2015.04.15 H.-S. Philip Wong 46

Stanford SystemX Alliance

slide-46
SLIDE 46

Stanford University 2015.04.15 H.-S. Philip Wong 47

Non-Volatile Memory Technology Research Initiative (NMTRI) @ Stanford University

slide-47
SLIDE 47

Stanford University 2015.04.15 H.-S. Philip Wong 48

End of Talk Questions?

poly c-GST amorphous SiO2 TiN TiN partially reset state

slide-48
SLIDE 48

Stanford University 2015.04.15 H.-S. Philip Wong 49

Open Research Questions

1. Functionality → performance/Watt, performance/m2 → variability → reliability 2. Scale up (system size), scale down (device size) 3. Role of variability (functionality, performance) 4. Fan-in / fan-out, hierarchical connections, power delivery 5. Low voltage (wire energy ≅ device energy) 6. Stochastic learning behavior → statistical learning rules 7. Meta-plasticity (internal state variables) 8. Timing as an internal variable 9. Learning rules: biological? AI?

  • 10. Algorithm-device co-design
  • 11. Materials/fabrication:

monolithic 3D integration a must, MUST be low temperature