Learning on Silicon: Overview Gert Cauwenberghs Johns Hopkins - - PowerPoint PPT Presentation

learning on silicon overview
SMART_READER_LITE
LIVE PREVIEW

Learning on Silicon: Overview Gert Cauwenberghs Johns Hopkins - - PowerPoint PPT Presentation

Learning on Silicon: Overview Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon Learning on Silicon: Overview


slide-1
SLIDE 1
  • G. Cauwenberghs

520.776 Learning on Silicon

Learning on Silicon: Overview

Gert Cauwenberghs

Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon

http://bach.ece.jhu.edu/gert/courses/776

slide-2
SLIDE 2
  • G. Cauwenberghs

520.776 Learning on Silicon

Learning on Silicon: Overview

  • Adaptive Microsystems

– Mixed-signal parallel VLSI – Kernel machines

  • Learning Architecture

– Adaptation, learning and generalization – Outer-product incremental learning

  • Technology

– Memory and adaptation

  • Dynamic analog memory
  • Floating gate memory

– Technology directions

  • Silicon on Sapphire
  • System Examples
slide-3
SLIDE 3
  • G. Cauwenberghs

520.776 Learning on Silicon

Massively Parallel Distributed VLSI Computation

  • Neuromorphic

– distributed representation – local memory and adaptation – sensory interface – physical computation – internally analog, externally digital

  • Scalable

throughput scales linearly with silicon area

  • Ultra Low-Power

factor 100 to 10,000 less energy than CPU or DSP

Example: VLSI Analog-to-digital vector quantizer (Cauwenberghs and Pedroni, 1997)

slide-4
SLIDE 4
  • G. Cauwenberghs

520.776 Learning on Silicon

Learning on Silicon

REFERENCE

{ }

p i INPUTS OUTPUTS

ε(p)

SYSTEM INPUTS OUTPUTS

ε(p)

SYSTEM MODEL

{ }

pi

Adaptation:

– necessary for robust performance under variable and unpredictable conditions – also compensates for imprecisions in the computation – avoids ad-hoc programming, tuning, and manual parameter adjustment

Learning:

– generalization of output to previously unknown, although similar, stimuli – system identification to extract relevant environmental parameters

slide-5
SLIDE 5
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Elements

Adaptation:*

Autozeroing (high-pass filtering)

  • utputs

Offset Correction

  • utputs

e.g. Image Non-Uniformity Correction

Equalization /Deconvolution inputs, outputs

e.g. Source Separation; Adaptive Beamforming

Learning:

Unsupervised Learning inputs, outputs

e.g. Adaptive Resonance; LVQ; Kohonen

Supervised Learning inputs, outputs, targets

e.g. Least Mean Squares; Backprop

Reinforcement Learning reward/punishment

slide-6
SLIDE 6
  • G. Cauwenberghs

520.776 Learning on Silicon

Example: Learning Vector Quantization (LVQ)

Distance Calculation: Winner-Take-All Selection: Training:

δ(a j, α i

j)

α i

j

α 1

1

α i

1

α 1

j

a1 aj αk am

WTA

k n α 1

m

α i

m

α n

1

α n

j

α n

m

d(a, αi )

αj

k ← (1 - λ) αj k + λ aj

d(a, αi) = δ(aj, αj

i)

Σ

j

= aj - αj

i ν

Σ

j

k = argmin i d(a, αi)

slide-7
SLIDE 7
  • G. Cauwenberghs

520.776 Learning on Silicon

Incremental Outer-Product Learning in Neural Nets

pij xj xi ej ei j i

xi = f( p ij xj

Σ

j

)

Multi-Layer Perceptron: Outer-Product Learning Update: – Hebbian (Hebb, 1949): – LMS Rule (Widrow-Hoff, 1960): – Backpropagation (Werbos, Rumelhart, LeCun):

∆p ij = η xj⋅ ei

e i = xi

ei = f 'i⋅ xi

target - xi

ej = f 'j⋅ p ij ei

Σ

i

slide-8
SLIDE 8
  • G. Cauwenberghs

520.776 Learning on Silicon

Technology

Incremental Adaptation:

– Continuous-Time: – Discrete-Time:

Storage:

– Volatile capacitive storage (incremental refresh) – Non-volatile storage (floating gate)

Precision:

– Only polarity of the increments is critical (not amplitude). – Adaptation compensates for inaccuracies in the analog implementation of the system.

C d dtVstored = Iadapt C ∆V stored = Q adapt

C V stored Iadapt

G D S

Qadapt

slide-9
SLIDE 9
  • G. Cauwenberghs

520.776 Learning on Silicon

Floating-Gate Non-Volatile Memory and Adaptation

Paul Hasler, Chris Diorio, Carver Mead, …

  • Hot electron injection

– ‘Hot’ electrons injected from drain onto floating gate of M1. – Injection current is proportional to drain current and exponential in floating-gate to drain voltage (~5V).

  • Tunneling

– Electrons tunnel through thin gate oxide from floating gate onto high-voltage (~30V) n-well. – Tunneling voltage decreases with decreasing gate oxide thickness.

  • Source degeneration

– Short-channel M2 improves stability of closed-loop adaptation (Vd open-circuit). – M2 is not required if adaptation is regulated (Vd driven).

  • Current scaling

– In subthreshold, Iout is exponential both in the floating gate charge, and in control voltage Vg.

Iout

slide-10
SLIDE 10
  • G. Cauwenberghs

520.776 Learning on Silicon

Dynamic Analog Memory Using Quantization and Refresh

Autonomous Active Refresh Using A/D/A Quantization:

– Allows for an excursion margin around discrete quantization levels, provided the rate of refresh is sufficiently fast. – Supports digital format for external access – Trades analog depth for storage stability

A/D D/A WR D pi

slide-11
SLIDE 11
  • G. Cauwenberghs

520.776 Learning on Silicon

Binary Quantization and Partial Incremental Refresh

Problems with Standard Refresh Schemes:

– Systematic offsets in the A/D/A loop – Switch charge injection (clock feedthrough) during refresh – Random errors in the A/D/A quantization

Binary Quantization:

– Avoids errors due to analog refresh – Uses a charge pump with precisely controlled polarity of increments

Partial Incremental Refresh:

– Partial increments avoid catastrophic loss of information in the presence of random errors and noise in the quantization – Robustness to noise and errors increases with smaller increment amplitudes

slide-12
SLIDE 12
  • G. Cauwenberghs

520.776 Learning on Silicon

Binary Quantization and Partial Incremental Refresh

+1 –1 ∆ +δ –δ

Q(pi) pi

pd

1

pd

2

pd

3

pd

4

p i

(k + 1) = p i (k) - δ Q(p i (k))

– Resolution ∆ – Increment size δ – Worst-case drift rate (|dp/dt|) r – Period of refresh cycle T

r T < δ << ∆

slide-13
SLIDE 13
  • G. Cauwenberghs

520.776 Learning on Silicon

Functional Diagram of Partial Incremental Refresh

Σ

z-1

Q

Σ

DRIFT NOISE Q(pi

(k ))

pi

(k )

δ ±δ

  • Similar in function and structure to the technique of delta-sigma

modulation

  • Supports efficient and robust analog VLSI implementation, using

binary controlled charge pump

slide-14
SLIDE 14
  • G. Cauwenberghs

520.776 Learning on Silicon

Analog VLSI Implementation Architectures

I/D Q

EN INCR/DECR

C

pi pi

(k )

Q(pi

(k ))

I/D Q

EN INCR/DECR SEL

C

pi pi

(k )

Q(pi

(k ))

  • An increment/decrement device I/D is provided for every memory

cell, serving refresh increments locally.

  • The binary quantizer Q is more elaborate to implement, and one

instance can be time-multiplexed among several memory cells

slide-15
SLIDE 15
  • G. Cauwenberghs

520.776 Learning on Silicon

Charge Pump Implementation of the I/D Device

EN INCR/DECR EN

I/D

V b INCR V b DECR MP MN

pi

Binary controlled polarity of increment/decrement

– INCR/DECR controls polarity of current

Accurate amplitude over wide dynamic range of increments

– EN controls duration of current – Vb INCR and Vb DECR control amplitude of subthreshold current – No clock feedthrough charge injection (gates at constant potentials)

slide-16
SLIDE 16
  • G. Cauwenberghs

520.776 Learning on Silicon

Dynamic Memory and Incremental Adaptation

C Vstored Iadapt ∆Qadapt ENp ENn POL Vbp Vbn

(b) (a)

1pF

0.1 0.2 0.3 0.4 0.5 0.6 10 10 10 10 10 10

  • 5
  • 4
  • 3
  • 2
  • 1

Gate Voltage Vbn (V) Voltage Decrement ²V stored (V)

∆t = 40 msec 1 msec 23 µsec ∆t = 0

(a)

0.1 0.2 0.3 0.4 0.5 0.6 10 10 10 10 10 10

  • 5
  • 4
  • 3
  • 2
  • 1

Gate Voltage Vbp (V) Voltage Increment ²V stored (V)

∆t = 40 msec 1 msec 23 µsec

(b)

slide-17
SLIDE 17
  • G. Cauwenberghs

520.776 Learning on Silicon

A/D/A Quantizer for Digital Write and Read Access

D/A

A D

WR

D A/D/A (Q) Q(pi

(k ))

pi

Integrated bit-serial (MSB-first) D/A and SA A/D converter:

– Partial Refresh: Q(.) from LSB of (n+1)-bit A/D conv. – Digital Read Access: n-bit A/D conv. – Digital Write Access: n-bit D/A ; WR ; Q(.) from COMP

slide-18
SLIDE 18
  • G. Cauwenberghs

520.776 Learning on Silicon

Dynamic Analog Memory Retention

Distribution (%/mV) Capacitor Voltage

(V)

100 50 2.33 2.32 2.31 2.30 2.29 01111110 01111111 10000000 10000001

P(LSB = "1")

Input Voltage (V)

1.0 0.8 0.6 0.4 0.2 0.0 2.33 2.32 2.31 2.30 2.29

– 109 cycles mean time between failure – 8 bit effective resolution – 20 µV increments/decrements – 200 µm X 32 µm in 2 µm CMOS

slide-19
SLIDE 19
  • G. Cauwenberghs

520.776 Learning on Silicon

Silicon on Sapphire

Peregrine UTSi process – Higher integration density – Drastically reduced bulk leakage

  • Improved

analog memory retention

– Transparent substrate

  • Adaptive optics

applications

slide-20
SLIDE 20
  • G. Cauwenberghs

520.776 Learning on Silicon

The Credit Assignment Problem

  • r How to Learn from Delayed Rewards

ADAPTIVE CRITIC SYSTEM

{ }

pi INPUTS OUTPUTS

r(t)

r*(t)

External, discontinuous reinforcement signal r(t). Adaptive Critics:

– Heuristic Dynamic Programming (Werbos, 1977) – Reinforcement Learning (Sutton and Barto, 1983) – TD(λ) (Sutton, 1988) – Q-Learning (Watkins, 1989)

slide-21
SLIDE 21
  • G. Cauwenberghs

520.776 Learning on Silicon

Reinforcement Learning Classifier for Binary Control

ek r

^

SELhor SELvert Vδ Vbp UPD UPD Vbn

qk

Vαp Vbn SELhor

qvert

Vbn SELhor

yvert

UPD UPD Vbp Vbp Vαp Vαn

LOCK LOCK

yk

HYST HYST

u(t) y = –1 y = 1 y(t) x1(t) x2(t)

slide-22
SLIDE 22
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optical Wavefront Correction

with Marc Cohen, Tim Edwards and Mikhail Vorontsov

cornea iris retina

  • ptic nerve

lens zonule fibers

slide-23
SLIDE 23
  • G. Cauwenberghs

520.776 Learning on Silicon

Gradient Flow Source Localization and Separation

with Milutin Stanacevic and George Zweig

∑ ∑ ∑

≈ ≈ − ≈ ≈ − ≈ ≈ + + +

∂ ∂ − ∂ ∂ − ∂ ∂ − − l l l l l l l l

& & & ) ( ) ( ) ( ) ( ) ( ) (

2 1 , 1 , 2 1 1 , 1 , 1 2 1 1 , 1 , , 1 , 1 4 1

t s x x x t s x x x t s x x x x x

q p t dt d

τ τ

sl(t)

1cm

Digital LMS adaptive 3-D bearing estimation 2µsec resolution at 2kHz clock 30µW power dissipation

3mm 3mm

slide-24
SLIDE 24
  • G. Cauwenberghs

520.776 Learning on Silicon

The Kerneltron: Support Vector “Machine” in Silicon

Genov and Cauwenberghs, 2001

  • 512 inputs, 128 support vectors
  • 3mm X 3mm in 0.5um CMOS
  • “Computational memories” in hybrid

DRAM/CCD technology

  • Internally analog, externally digital
  • Low bit-rate, serial I/O interface
  • 6GMACS throughput @ 6mW power

512 X 128 CID/DRAM array 128 ADCs