model free stochastic perturbative adaptation and
play

Model-Free Stochastic Perturbative Adaptation and Optimization Gert - PowerPoint PPT Presentation

Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon


  1. Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University gert@jhu.edu 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon

  2. Model-Free Stochastic Perturbative Adaptation and Optimization OUTLINE • Model-Free Learning – Model Complexity – Compensation of Analog VLSI Mismatch • Stochastic Parallel Gradient Descent – Algorithmic Properties – Mixed-Signal Architecture – VLSI Implementation • Extensions – Learning of Continuous-Time Dynamics – Reinforcement Learning • Model-Free Adaptive Optics – AdOpt VLSI Controller – Adaptive Optics “Quality” Metrics – Applications to Laser Communication and Imaging G. Cauwenberghs 520.776 Learning on Silicon

  3. The Analog Computing Paradigm • Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices. • Excessive global interconnects are avoided: – Currents or charges are accumulated along a single wire. – Voltage is distributed along a single wire. Pros: – Massive Parallellism – Low Power Dissipation – Real-Time, Real-World Interface – Continuous-Time Dynamics Cons: – Limited Dynamic Range – Mismatches and Nonlinearities (WYDINWYG) G. Cauwenberghs 520.776 Learning on Silicon

  4. Effect of Implementation Mismatches SYSTEM INPUTS OUTPUTS { } p i ε ( p ) REFERENCE Associative Element: – Mismatches can be properly compensated by adjusting the parameters p i accordingly, provided sufficient degrees of freedom are available to do so. Adaptive Element: – Requires precise implementation – The accuracy of implemented polarity (rather than amplitude) of parameter update increments ∆ p i is the performance limiting factor. G. Cauwenberghs 520.776 Learning on Silicon

  5. Example: LMS Rule A linear perceptron under supervised learning: ( k ) = Σ ( k ) y i p ij x j j target ( k ) - y i ( k ) 2 ε = 1 Σ Σ ) ( y i 2 k j with gradient descent: ( k ) = - η ∂ ε ( k ) ( k ) ⋅ y i target ( k ) - y i ( k ) ∆ p ij = - η x j ∂ p ij reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI. G. Cauwenberghs 520.776 Learning on Silicon

  6. Incremental Outer-Product Learning in Neural Nets x i p ij x j j i e i e j Σ x i = f ( ) p ij x j Multi-Layer Perceptron: j ∆ p ij = η x j ⋅ e i Outer-Product Learning Update: e i = x i – Hebbian (Hebb, 1949) : target - x i e i = f ' i ⋅ x i – LMS Rule (Widrow-Hoff, 1960) : Σ e j = f ' j ⋅ – Backpropagation (Werbos, Rumelhart, LeCun) : p ij e i i G. Cauwenberghs 520.776 Learning on Silicon

  7. Gradient Descent Learning Minimize ε ( p ) by iterating: ( k ) - η ∂ ε ( k ) ( k + 1) = p i p i ∂ p i from calculation of the gradient: ∂ ε ∂ ε ⋅ ∂ y l ⋅ ∂ x m Σ Σ = ∂ p i ∂ y l ∂ x m ∂ p i m l Implementation Problems: – Requires an explicit model of the internal network dynamics. – Sensitive to model mismatches and noise in the implemented network and learning system. – Amount of computation typically scales strongly with the number of parameters. G. Cauwenberghs 520.776 Learning on Silicon

  8. Gradient-Free Approach to Error-Descent Learning Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a pre- assumed model of the network. Stochastic Approximation: – Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978) – Function Smoothing Global Optimization (Styblinski & Tang 1990) – Simultaneous Perturbation Stochastic Approximation (Spall 1992) Hardware-Related Variants: – Model-Free Distributed Learning (Dembo & Kailath 1990) – Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93) – Stochastic Error Descent (Cauwenberghs 1993) – Constant Perturbation, Random Sign (Alspector & al. 1993) – Summed Weight Neuron Perturbation (Flower & Jabri 1993) G. Cauwenberghs 520.776 Learning on Silicon

  9. Stochastic Error-Descent Learning Minimize ε ( p ) by iterating: p ( k +1) = p ( k ) – µ ε ( k ) π ( k ) from observation of the gradient in the direction of π ( k ) : ε ( k ) = 1 2 ε ( p ( k ) + π ( k ) ) – ε ( p ( k ) – π ( k ) ) with random uncorrelated binary components of the perturbation vector π ( k ) : ( k ) = ±σ ; E( π i ( k ) π j ( l ) ) ≈ σ 2 δ ij δ kl π i Advantages: – No explicit model knowledge is required. – Robust in the presence of noise and model mismatches. – Computational load is significantly reduced. – Allows simple, modular, and scalable implementation. – Convergence properties similar to exact gradient descent. G. Cauwenberghs 520.776 Learning on Silicon

  10. Stochastic Perturbative Learning Cell Architecture φ ( t ) – η ε ( t ) ^ φ ( t ) π i ( t ) – η ε ( t ) ^ NETWORK p i ( t ) + φ ( t ) π i ( t ) p i ( t ) Σ Σ ε ( p ( t ) + φ ( t ) π ( t )) z –1 LOCAL GLOBAL ε ( k ) = 1 p ( k +1) = p ( k ) – µ ε ( k ) π ( k ) 2 ε ( p ( k ) + π ( k ) ) – ε ( p ( k ) – π ( k ) ) G. Cauwenberghs 520.776 Learning on Silicon

  11. Stochastic Perturbative Learning Circuit Cell V σ + V σ – π i π i π i EN p V bp C perturb sign( ε ) ^ POL p i ( t ) + φ ( t ) π i ( t ) C store V bn EN n π i G. Cauwenberghs 520.776 Learning on Silicon

  12. Charge Pump Characteristics EN p (b) V bp I adapt V stored POL ∆ Q adapt C V bn (a) EN n Voltage Decrement ²V stored (V) 0 0 ∆ t = 40 msec 10 Voltage Increment ²V stored (V) 10 ∆ t = 40 msec 1 msec -1 -1 10 10 1 msec -2 -2 10 10 23 µsec -3 -3 10 10 23 µsec -4 -4 10 10 ∆ t = 0 -5 -5 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 Gate Voltage V bn (V) Gate Voltage V bp (V) (a) (b) G. Cauwenberghs 520.776 Learning on Silicon

  13. Supervised Learning of Recurrent Neural Dynamics BINARY QUANTIZATION Q (.) Q (.) Q (.) Q (.) Q (.) Q (.) TEACHER FORCING π 1 H x 1 ( t ) W 11 W 12 W 13 W 14 W 15 W 16 x 1 T ( t ) x 1 π 2 H x 2 ( t ) UPDATE ACTIVATION AND PROBE MULTIPLEXING W 21 W 22 x 2 T ( t ) x 2 π 3 H W 31 x 3 π 4 H W 41 x 4 π 5 H W 51 W 56 x 5 π 6 H W 65 W 66 W 61 x 6 + π 0 H I ref θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 DYNAMICAL SYSTEM – I ref W off W off W off W off W off W off x ( t ) z ( t ) d x y ( t ) d t = ( ) ( ) π 1 π 2 π 3 π 4 π 5 π 6 F p , x , y z = V V V V V V G x G. Cauwenberghs 520.776 Learning on Silicon

  14. The Credit Assignment Problem or How to Learn from Delayed Rewards SYSTEM INPUTS OUTPUTS { } p i r*(t) r(t) ADAPTIVE CRITIC – External, discontinuous reinforcement signal r(t). – Adaptive Critics: • Discrimination Learning (Grossberg, 1975) • Heuristic Dynamic Programming (Werbos, 1977) • Reinforcement Learning (Sutton and Barto, 1983) • TD( λ ) (Sutton, 1988) • Q-Learning (Watkins, 1989) G. Cauwenberghs 520.776 Learning on Silicon

  15. Reinforcement Learning (Barto and Sutton, 1983) Locally tuned, address encoded neurons: χ ( t ) ∈ {0, ... 2 n –1} : n –bit address encoding of state space y ( t ) = y χ ( t ) : classifier output q ( t ) = q χ ( t ) : adaptive critic Adaptation of classifier and adaptive critic: y k ( t +1) = y k ( t ) + α r ( t ) e k ( t ) y k ( t ) q k ( t +1) = q k ( t ) + β r ( t ) e k ( t ) – eligibilities: e k ( t +1) = λ e k ( t ) + (1 – λ ) δ k χ ( t ) – internal reinforcement: r ( t ) = r ( t ) + γ q ( t ) – q ( t – 1) G. Cauwenberghs 520.776 Learning on Silicon

  16. Reinforcement Learning Classifier for Binary Control State Eligibility SEL hor Vbp UPD q vert SEL vert V α p q k e k UPD V δ ^ Vbn Vbn r Neuron Select Adaptive Critic SEL hor (State) HYST Vbp 64 Reinforcement Learning Neurons UPD UPD HYST x 2 ( t ) y vert V α p Vbp LOCK LOCK V α n State (Quantized) y k x 1 ( t ) Vbn Action Network SEL hor y = –1 y ( t ) y = 1 Action (Binary) u ( t ) G. Cauwenberghs 520.776 Learning on Silicon

  17. A Biological Adaptive Optics System brain iris retina lens zonule fibers cornea optic nerve G. Cauwenberghs 520.776 Learning on Silicon

  18. Wavefront Distortion and Adaptive Optics • Imaging • Laser beam - defocus - beam wander/spread - motion - intensity fluctuations G. Cauwenberghs 520.776 Learning on Silicon

  19. Adaptive Optics Conventional Approach – Performs phase conjugation • assumes intensity is unaffected – Complex • requires accurate wavefront phase sensor (Shack-Hartman; Zernike nonlinear filter; etc.) • computationally intensive control system G. Cauwenberghs 520.776 Learning on Silicon

  20. Adaptive Optics Model-Free Integrated Approach Incoming wavefront Wavefront corrector with N elements: u 1 ,…,u n ,…,u N – Optimizes a direct measure J of optical performance (“quality metric”) – No (explicit) model information is required • any type of quality metric J, wavefront corrector (MEMS, LC, …) • no need for wavefront phase sensor – Tolerates imprecision in the implementation of the updates • system level precision limited by accuracy of the measured J G. Cauwenberghs 520.776 Learning on Silicon

  21. Adaptive Optics Controller Chip Optimization by Parallel Perturbative Stochastic Gradient Descent image Φ (u) J( u ) wavefront performance corrector metric sensor J( u ) u AdOpt VLSI wavefront controller G. Cauwenberghs 520.776 Learning on Silicon

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend