Model-Free Stochastic Perturbative Adaptation and Optimization Gert - - PowerPoint PPT Presentation

model free stochastic perturbative adaptation and
SMART_READER_LITE
LIVE PREVIEW

Model-Free Stochastic Perturbative Adaptation and Optimization Gert - - PowerPoint PPT Presentation

Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon


slide-1
SLIDE 1
  • G. Cauwenberghs

520.776 Learning on Silicon

Model-Free Stochastic Perturbative Adaptation and Optimization

Gert Cauwenberghs

Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon

http://bach.ece.jhu.edu/gert/courses/776

slide-2
SLIDE 2
  • G. Cauwenberghs

520.776 Learning on Silicon

Model-Free Stochastic Perturbative Adaptation and Optimization

OUTLINE

  • Model-Free Learning

– Model Complexity – Compensation of Analog VLSI Mismatch

  • Stochastic Parallel Gradient Descent

– Algorithmic Properties – Mixed-Signal Architecture – VLSI Implementation

  • Extensions

– Learning of Continuous-Time Dynamics – Reinforcement Learning

  • Model-Free Adaptive Optics

– AdOpt VLSI Controller – Adaptive Optics “Quality” Metrics – Applications to Laser Communication and Imaging

slide-3
SLIDE 3
  • G. Cauwenberghs

520.776 Learning on Silicon

The Analog Computing Paradigm

  • Local functions are efficiently implemented with minimal

circuitry, exploiting the physics of the devices.

  • Excessive global interconnects are avoided:

– Currents or charges are accumulated along a single wire. – Voltage is distributed along a single wire.

Pros:

– Massive Parallellism – Low Power Dissipation – Real-Time, Real-World Interface – Continuous-Time Dynamics

Cons:

– Limited Dynamic Range – Mismatches and Nonlinearities (WYDINWYG)

slide-4
SLIDE 4
  • G. Cauwenberghs

520.776 Learning on Silicon

Effect of Implementation Mismatches

REFERENCE

{ }

p i INPUTS OUTPUTS

ε(p)

SYSTEM

Associative Element:

– Mismatches can be properly compensated by adjusting the parameters pi accordingly, provided sufficient degrees of freedom are available to do so.

Adaptive Element:

– Requires precise implementation – The accuracy of implemented polarity (rather than amplitude) of parameter update increments ∆pi is the performance limiting factor.

slide-5
SLIDE 5
  • G. Cauwenberghs

520.776 Learning on Silicon

Example: LMS Rule

A linear perceptron under supervised learning: with gradient descent: reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI.

yi

(k) =

p ij xj

(k)

Σ

j

ε = 1

2 yi

target ( k) - yi (k) 2

Σ

j

Σ

k

( )

∆p ij

(k) = -η ∂ε(k)

∂p ij = -η xj

(k)⋅ yi target ( k) - yi (k)

slide-6
SLIDE 6
  • G. Cauwenberghs

520.776 Learning on Silicon

Incremental Outer-Product Learning in Neural Nets

pij xj xi ej ei j i

xi = f( p ij xj

Σ

j

)

Multi-Layer Perceptron: Outer-Product Learning Update: – Hebbian (Hebb, 1949): – LMS Rule (Widrow-Hoff, 1960): – Backpropagation (Werbos, Rumelhart, LeCun):

∆p ij = η xj⋅ ei

e i = xi

ei = f 'i⋅ xi

target - xi

ej = f 'j⋅ p ij ei

Σ

i

slide-7
SLIDE 7
  • G. Cauwenberghs

520.776 Learning on Silicon

Gradient Descent Learning

Minimize ε(p) by iterating: from calculation of the gradient: Implementation Problems:

– Requires an explicit model of the internal network dynamics. – Sensitive to model mismatches and noise in the implemented network and learning system. – Amount of computation typically scales strongly with the number of parameters.

p i

(k + 1) = p i (k) - η ∂ε

∂p i

(k)

∂ε ∂p i = ∂ε ∂yl ⋅ ∂yl ∂xm ⋅ ∂xm ∂p i

Σ

m

Σ

l

slide-8
SLIDE 8
  • G. Cauwenberghs

520.776 Learning on Silicon

Gradient-Free Approach to Error-Descent Learning

Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre- assumed model of the network.

Stochastic Approximation:

– Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978) – Function Smoothing Global Optimization (Styblinski & Tang 1990) – Simultaneous Perturbation Stochastic Approximation (Spall 1992)

Hardware-Related Variants:

– Model-Free Distributed Learning (Dembo & Kailath 1990) – Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93) – Stochastic Error Descent (Cauwenberghs 1993) – Constant Perturbation, Random Sign (Alspector & al. 1993) – Summed Weight Neuron Perturbation (Flower & Jabri 1993)

slide-9
SLIDE 9
  • G. Cauwenberghs

520.776 Learning on Silicon

Stochastic Error-Descent Learning

Minimize ε(p) by iterating: from observation of the gradient in the direction of π(k): with random uncorrelated binary components of the perturbation vector π(k):

Advantages:

– No explicit model knowledge is required. – Robust in the presence of noise and model mismatches. – Computational load is significantly reduced. – Allows simple, modular, and scalable implementation. – Convergence properties similar to exact gradient descent.

p(k +1) = p(k ) – µε (k )π(k ) ε (k ) = 1

2 ε(p(k )+π(k )) – ε(p(k )–π(k ))

πi

(k ) = ±σ ; E(πi (k )πj (l)) ≈ σ2 δijδkl

slide-10
SLIDE 10
  • G. Cauwenberghs

520.776 Learning on Silicon

Stochastic Perturbative Learning Cell Architecture

Σ Σ

GLOBAL

NETWORK

pi(t) + φ(t) πi(t) pi(t)

ε (p(t) + φ(t) π(t))

z–1

πi(t) –η ε(t)

^

φ(t) LOCAL φ(t) –η ε(t)

^

ε (k ) = 1

2 ε(p(k )+π(k )) – ε(p(k )–π(k ))

p(k +1) = p(k ) – µε (k )π(k )

slide-11
SLIDE 11
  • G. Cauwenberghs

520.776 Learning on Silicon

Stochastic Perturbative Learning Circuit Cell

ENp EN n

POL

V bp V bn pi(t) + φ(t) πi(t) Cstore Cperturb πi πi V σ+ V σ– πi πi sign(ε)

^

slide-12
SLIDE 12
  • G. Cauwenberghs

520.776 Learning on Silicon

Charge Pump Characteristics

0.1 0.2 0.3 0.4 0.5 0.6 10 10 10 10 10 10

  • 5
  • 4
  • 3
  • 2
  • 1

Gate Voltage Vbn (V) Voltage Decrement ²V stored (V)

∆t = 40 msec 1 msec 23 µsec ∆t = 0

(a)

0.1 0.2 0.3 0.4 0.5 0.6 10 10 10 10 10 10

  • 5
  • 4
  • 3
  • 2
  • 1

Gate Voltage Vbp (V) Voltage Increment ²V stored (V)

∆t = 40 msec 1 msec 23 µsec

(b)

C V stored Iadapt ∆Qadapt EN p EN n POL V bp V bn

(b) (a)

slide-13
SLIDE 13
  • G. Cauwenberghs

520.776 Learning on Silicon

Supervised Learning of Recurrent Neural Dynamics

Iref

+

π1

H

π2

H

π3

H

π4

H

π5

H

π6

H

π0

H

π6

V

π5

V

π4

V

π3

V

π2

V

π1

V

UPDATE ACTIVATION AND PROBE MULTIPLEXING

Q(.) Q(.) Q(.) Q(.) Q(.) Q(.)

BINARY QUANTIZATION

x1(t) x1

T(t)

x2(t) x2

T(t)

x1 x2 x3 x4 x5 x6 Iref

W 11 W 12 W 13 W 14 W 15 W 16 W 21 W 31 W 41 W 51 W 61 W 22 W 66 W 56 W 65

θ1 θ2 θ3 θ4 θ5 θ6 W off W off W off W off W off W off

TEACHER FORCING

dx dt = (

)

F p, x, y z =

( )

G x x(t) y(t) z(t) DYNAMICAL SYSTEM

slide-14
SLIDE 14
  • G. Cauwenberghs

520.776 Learning on Silicon

The Credit Assignment Problem

  • r How to Learn from Delayed Rewards

ADAPTIVE CRITIC SYSTEM

{ }

pi INPUTS OUTPUTS

r(t)

r*(t)

– External, discontinuous reinforcement signal r(t). – Adaptive Critics:

  • Discrimination Learning (Grossberg, 1975)
  • Heuristic Dynamic Programming (Werbos, 1977)
  • Reinforcement Learning (Sutton and Barto, 1983)
  • TD(λ) (Sutton, 1988)
  • Q-Learning (Watkins, 1989)
slide-15
SLIDE 15
  • G. Cauwenberghs

520.776 Learning on Silicon

Reinforcement Learning

(Barto and Sutton, 1983)

Locally tuned, address encoded neurons: Adaptation of classifier and adaptive critic:

– eligibilities: – internal reinforcement:

χ(t) ∈ {0, ... 2n–1} : n–bit address encoding of state space y(t) = yχ(t) : classifier output q(t) = q χ(t) : adaptive critic yk(t+1) = yk(t) + α r(t) ek(t) yk(t) q k(t+1) = q k(t) + β r(t) ek(t) e k(t+1) = λ e k(t) + (1 – λ) δk χ(t) r(t) = r(t) + γ q (t) – q (t – 1)

slide-16
SLIDE 16
  • G. Cauwenberghs

520.776 Learning on Silicon

Reinforcement Learning Classifier for Binary Control

ek r

^

SELhor SELvert Vδ Vbp UPD UPD Vbn

qk

Vαp Vbn SELhor

qvert

Vbn SELhor

yvert

UPD UPD Vbp Vbp Vαp Vαn

LOCK LOCK

yk

HYST HYST

Adaptive Critic Action Network Neuron Select (State) State Eligibility

u(t) y = –1 y = 1 y(t) x1(t) x2(t)

State (Quantized) Action (Binary) 64 Reinforcement Learning Neurons

slide-17
SLIDE 17
  • G. Cauwenberghs

520.776 Learning on Silicon

A Biological Adaptive Optics System

cornea iris retina

  • ptic nerve

lens zonule fibers brain

slide-18
SLIDE 18
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Distortion and Adaptive Optics

  • Imaging
  • defocus
  • motion
  • Laser beam
  • beam wander/spread
  • intensity fluctuations
slide-19
SLIDE 19
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optics Conventional Approach

– Performs phase conjugation

  • assumes intensity is unaffected

– Complex

  • requires accurate wavefront phase sensor (Shack-Hartman; Zernike

nonlinear filter; etc.)

  • computationally intensive control system
slide-20
SLIDE 20
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optics Model-Free Integrated Approach

Wavefront corrector with N elements: u1 ,…,un ,…,uN Incoming wavefront

– Optimizes a direct measure J of optical performance (“quality metric”) – No (explicit) model information is required

  • any type of quality metric J, wavefront corrector (MEMS, LC, …)
  • no need for wavefront phase sensor

– Tolerates imprecision in the implementation of the updates

  • system level precision limited by accuracy of the measured J
slide-21
SLIDE 21
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optics Controller Chip

Optimization by Parallel Perturbative Stochastic Gradient Descent

image

Φ(u)

J(u) wavefront corrector performance metric sensor

J(u) u

AdOpt VLSI wavefront controller

slide-22
SLIDE 22
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optics Controller Chip

Optimization by Parallel Perturbative Stochastic Gradient Descent

image

Φ(u) Φ(u+δu)

J(u+δu) J(u) wavefront corrector performance metric sensor

J(u+δu) u+δu

AdOpt VLSI wavefront controller

slide-23
SLIDE 23
  • G. Cauwenberghs

520.776 Learning on Silicon

Adaptive Optics Controller Chip

Optimization by Parallel Perturbative Stochastic Gradient Descent

image

Φ(u)

J(u+δu)

Φ(u+δu) Φ(u-δu)

wavefront corrector performance metric sensor AdOpt VLSI wavefront controller J(u) δJ J(u-δu)

J(u-δu) u-δu uj

(k+1) = uj (k) + γ δJ(k) δuj (k)

slide-24
SLIDE 24
  • G. Cauwenberghs

520.776 Learning on Silicon

Parallel Perturbative Stochastic Gradient Descent Architecture

γ δJ

z–1

uj δuj

{-1,0,+1}

<δuiδuj > = σ2 δij

uncorrelated perturbations

J(u)

uj(k+1) = uj (k) + γ δJ(k) δuj (k)

( ) ( )

( )

( ) ( )

( )

(k) (k) (k) (k) (k) 1 N 1 N k k k k 1 N 1 N

u u J J u ,...,u J u , , u u ... u δ δ δ δ δ = + + − − − 2

slide-25
SLIDE 25
  • G. Cauwenberghs

520.776 Learning on Silicon

Parallel Perturbative Stochastic Gradient Descent Mixed-Signal Architecture

z–1

uj δuj sgn(δJ) J(u)

δuj = ±1

Bernoulli

{-1,0,+1}

γ |δJ|

analog digital

uj(k+1) = uj (k) + γ |δJ(k)| sgn(δJ(k) ) δuj (k)

Polarity Amplitude

slide-26
SLIDE 26
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller VLSI Implementation

Edwards, Cohen, Cauwenberghs, Vorontsov & Carhart (1999)

AdOpt mixed-mode chip 2.2 mm2, 1.2µm CMOS

  • Controls 19 channels
  • Interfaces with LC SLM
  • r MEMS mirrors

( )

{ }

( ) ( )

( )

( )

( ) ( )

( )

k j k k k j j j (k) (k) (k) (k 1) (k) (k) (k) j j j

Generate Bernoulli distributed u with u and u 1 Decompose J into J J Update u u sg xor n J sgn sgn δ δ σ δ π δ δ δ γ δ π

+

  • =

= = ±

  • =

slide-27
SLIDE 27
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller Chip-level Characterization

slide-28
SLIDE 28
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller System-level Characterization (LC SLM)

HEX-127 LC SLM

slide-29
SLIDE 29
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller System-level Characterization (SLM)

Maximized and minimized J(u) 100 times min max

slide-30
SLIDE 30
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller System-Level Characterization (MEMS)

Weyrauch, Vorontsov, Bifano, Hammer, Cohen & Cauwenberghs

Response function MEMS mirror (OKO)

slide-31
SLIDE 31
  • G. Cauwenberghs

520.776 Learning on Silicon

Wavefront Controller System-level Characterization

(over 500 trials)

slide-32
SLIDE 32
  • G. Cauwenberghs

520.776 Learning on Silicon

Quality Metrics

2

J d image

I( ,t)

υ

=

r

r

r

J bit-error rate comm =

Horn(1968), Delbruck (1999)

( )

{ }

J = F I ,t beam r

Muller et al.(1974) Vorontsov et al.(1996)

laser comm: imaging: laser beam focusing:

slide-33
SLIDE 33
  • G. Cauwenberghs

520.776 Learning on Silicon

Image Quality Metric Chip

Cohen, Cauwenberghs, Vorontsov & Carhart (2001)

N,M N,M i,j i,j i,j i,j

IQM I K I = ∗

∑ ∑

1 K 1 4 1 1 − = − + − −

⎡ ⎤ ⎢ ⎥ ⎣ ⎦ 2mm 120µm 22 x 22 pixel array

ij

image I

ij

edge image I K ∗ 1 2 µ m pixel

slide-34
SLIDE 34
  • G. Cauwenberghs

520.776 Learning on Silicon

Architecture (3 x 3) Edge and Corner Kernels

+4

  • 1
  • 1
  • 1
  • 1
slide-35
SLIDE 35
  • G. Cauwenberghs

520.776 Learning on Silicon

Image Quality Metric Chip Characterization Experimental Setup

slide-36
SLIDE 36
  • G. Cauwenberghs

520.776 Learning on Silicon

Image Quality Metric Chip Characterization Experimental Results

uniformly illuminated dynamic range ~ 20

slide-37
SLIDE 37
  • G. Cauwenberghs

520.776 Learning on Silicon

Image Quality Metric Chip Characterization Experimental Results

slide-38
SLIDE 38
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Chip

Cohen, Cauwenberghs, Vorontsov & Carhart (2001)

2 N,M N,M 2 i,j i,j i,j i,j

BVM N M I I ⎛ ⎞ = ⋅ ⋅ ⎜ ⎟ ⎝ ⎠

∑ ∑

22 x 22 array 22x22 Pixel array

+ + + +

20 x 20 Pixel array

70µm

perimeter of dummy pixels

slide-39
SLIDE 39
  • G. Cauwenberghs

520.776 Learning on Silicon

( )

N,M 1+κ 2 κ u i,j u i,j BVM 2 N,M i,j i,j

MNI I I I I ≈ ⎛ ⎞ ⎜ ⎟ ⎝ ⎠

∑ ∑

slide-40
SLIDE 40
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Chip Characterization Experimental Setup

slide-41
SLIDE 41
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Chip Characterization Experimental Results

dynamic range ~ 5

slide-42
SLIDE 42
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Chip Characterization Experimental Results

slide-43
SLIDE 43
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Sensor in the Loop Laser Receiver Setup

slide-44
SLIDE 44
  • G. Cauwenberghs

520.776 Learning on Silicon

Beam Variance Metric Sensor in the Loop

slide-45
SLIDE 45
  • G. Cauwenberghs

520.776 Learning on Silicon

Conclusions

  • Computational primitives of adaptation and learning are naturally

implemented in analog VLSI, and allow to compensate for inaccuracies in the physical implementation of the system under adaptation.

  • Care should still be taken to avoid inaccuracies in the implementation of

the adaptive element. Nevertheless, this can easily be achieved by ensuring the correct polarity, rather than amplitude, of the parameter update increments.

  • Adaptation algorithms based on physical observation of the

“performance” gradient in parameter space are better suited for analog VLSI implementation than algorithms based on a calculated gradient.

  • Among the most generally applicable learning architectures are those

that operate on reinforcement signals, and those that blindly extract and classify signals.

  • Model-free adaptive optics leads to efficient and robust analog

implementation of the control algorithm using a criterion that can be freely chosen to accommodate different wavefront correctors, and different imaging or laser communication applications.