Local Representation Alignment: A Biologically Motivated Algorithm - PowerPoint PPT Presentation

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1

Collaborators • The Pennsylvania State University • Dr. C. Lee Giles • Dr. Daniel Kifer • Rochester Institute of Technology (RIT) • Dr. Ifeoma Nwogu (Computer Vision) • Dr. Travis Desell (Neuro-evolution, distributed computing) • Students • Ankur Mali (PhD student, Penn State, co-advised w/ Dr. C. Lee Giles) • Timothy Zee (PhD student, RIT, co-advised w/ Dr. Ifeoma Nwogu) • Abdelrahman Elsiad (PhD student, RIT, co-advised w/ Dr. Travis Desell) 2

Objectives • Context : Credit assignment & algorithmic alternatives • Backpropagation of errors (backprop) • Feedback alignment algorithms Equilibrium propagation (EP) • Target propagation (TP) • Contrastive Hebbian learning (CHL) Contrastive Divergence (CD) • Discrepancy Reduction – a family of learning procedures • Error-Driven Local Representation Alignment (LRA/LRA-E) • Adaptive Noise Difference Target Propagation (DTP- σ ) • Experimental Results & Variations • Conclusions 3

MLP = Multilayer perceptron SGD, Adam, RMSprop AE = Autoencoder BM = Boltzmann machine Backprop, CHL, LRA MSE, MAE, CNLL MLP, AE, BM, RNN MNIST 5

Problems with Backprop • The global feedback pathway • Vanishing/exploding gradients • In recurrent networks, this is worse!! • The weight transport problem • High sensitivity to initialization • Activation constraints/conditions • Requires system to be fully differentiable → difficulty in handling discrete-valued functions • Requires sufficiently linearity → adversarial samples Global optimization, back-prop through whole graph. 6

Feedforward Inference Illustration: forward propagation in a multilayer perceptron (MLP) to collect activities (Shared across most algorithms, i.e., backprop, random feedback alignment, direct feedback alignment, local representation alignment) 7

Backpropagation of Errors 13

Conducting credit assignment using the activities produced by the inference pass 14

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions) 15

Pass error signal back through (incoming) synaptic weights to get error signal transmitted to post- activations in layer below 16

Repeat the previous steps, layer by layer (recursive treatment of backprop procedure) 17

Random Feedback Alignment 20

Pass error signal back through fixed, random alignment weights (replaces backprop’s step of passing error through transpose of feedforward weights) 23

Repeat previous steps (similar to backprop) 24

Direct Feedback Alignment 27

Pass error signal along first set of direct alignment weights to second layer 30

Pass error signal along next set of direct alignment weights to first layer 31

Treat the signals propagated along direct alignment connections as proxies for error derivatives and run them through post-activations in each layer, respectively 32

Backpropagation of Errors: Direct Feedback Alignment: Random Feedback Alignment: 34

Global versus Local Signals Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs. 36

Global versus Local Signals Will these yield coherent models? Global feedback pathway Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs. 37

Equilibrium Propagation 38 Negative phase Positive phase

The Discrepancy Reduction Family • General process (Ororbia et. al., 2017 Adapt) • 1) Search for latent representations that better explain input/output (targets) • 2) Reduce mismatch between currently “guessed” representations & target representations • Sum of internal, local losses (in nats) → total discrepancy (akin to “pseudo - energy”) • Coordinated local learning rules • Algorithms • Difference target propagation (DTP) (Lee et. al., 2014) • DTP- σ (Ororbia et. al., 2019) • LRA (Ororbia et. al., 2018, Ororbia et. al., 2019) • Others – targets could come from an external, interacting process • NPC (neural predictive coding, Ororbia et. al., 2017/2018/2019) 39

Adaptive Noise Difference Target Propagation (DTP- σ ) z L ˄ z L g(z ) L ˄ g(z ) z L-1 L ˄ z L-1 40 Image adapted from (Lillicrap et al., 2018)

Error-Driven Local Representation Alignment (LRA-E) 41

Transmit error along error feedback weights, and error correct the post- activations using the transmitted displacement/delta 43

Calculate local error in layer below, measuring discrepancy between original post-activation and error- corrected post-activation 44

Repeat the past several steps, error-correcting each layer further down within the network/system 45

Optional…substitute & repeat! 47

Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units The Cauchy local loss: 48

Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units, motivated/inspired by (Rao & Ballard, 1999) 49

Aligning Local Representations • Credit assignment by optimizing subgraphs linked by error units, motivated/inspired by (Rao & Ballard, 1999) There is more than one way to compute these changes 50

Some Experimental Results 51

Experimental Results MNIST 7 3 Fashion MNIST Trousers Dress Shirt 52 (Ororbia et al., 2018 Bio)

Acquired Filters LRA Backprop Third level filters acquired, after a single pass through the data, by tanh network trained by a) backprop, b) LRA. 53

Visualization of Topmost Post-Activities 54

Measuring Total Discrepancy in LRA-E Angle between LRA, DFA, & DTP- σ against Backprop 55

Training Deep (& Thin) Networks Equilibrium Propagation (8 layers): MNIST : 59.03% Fashion MNIST : 67.33% (Ororbia et al., 2018 Credit) Equilibrium Propagation (3 layers): MNIST : 6.00% Fashion MNIST : 16.71% 56

Training Networks from Null Initialization LWTA: SLWTA: (Ororbia et al., 2018 Credit) 57

Training Stochastic Networks (Ororbia et al., 2018 Credit) 58

If time p ermits…let’s talk about modeling time… 59

Training Neural Temporal/Recurrent Models • Integrating LRA into recurrent networks – result = Temporal Neural Coding Network The Parallel Temporal Neural Coding Network (P-TNCN) (Ororbia et al., 2018) (Ororbia et al., 2018 Continual) 60

Removing Back-Propagation through Time! • Each step in time entails: 1) generate hypothesis, 2) error correction in light of evidence 61

Conclusions • Backprop has issues, alignment algorithms fix one issue • Other algorithms such as DTP or EP are slow…. • Discrepancy reduction • Local representation alignment • Adaptive noise difference target propagation (DTP- σ ) • Showed promising results, stable and performant compared to alternatives such as Equilibrium Propagation & alignment algorithms • Can work with non-differentiable operators (discrete/stochastic) • Can be used to train recurrent/temporal models too! 64

Questions? 65

References • (Ororbia et al., 2018, Credit) -- Alexander G. Ororbia II, Ankur Mali, Daniel Kifer, and C. Lee Giles. “Deep Credit Assignment by Aligning Local Distributed Representations”. arXiv:1803.01834 [ cs.LG]. • (Ororbia et al., 2018, Continual) -- Alexander G. Ororbia II , Ankur Mali, C. Lee Giles, and Daniel Kifer . “Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations”. arXiv:1810.07411 [ cs.LG]. • (Ororbia et al., 2017, Adapt) -- Alexander G. Ororbia II , Patrick Haffner, David Reitter, and C. Lee Giles. “Learning to Adapt by Minimizing Discrepancy”. arXiv:1711.11542 [cs.LG]. • (Ororbia et al., 2018, Lifelong) -- Alexander G. Ororbia II , Ankur Mali, Daniel Kifer , and C. Lee Giles. “Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively”. arXiv :1905.10696 [cs.LG]. • (Ororbia et al., 2018, Bio) -- Alexander G. Ororbia II and Ankur Mali. “Biologically Motivated Algorithms for Propagating Local Target Representations”. In: Thirty - Third AAAI Conference on Artificial Intelligence. 66

Local Representation Alignment: A Biologically Motivated Algorithm - PowerPoint PPT Presentation

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1 Collaborators The Pennsylvania State

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Alignment with beam halo MC Andrea Parenti 05/05/2009 Outline: Alignment with Beam Halo (BH)

Heuristic Alignment and Searching Mark Voorhies 3/28/2012 Mark Voorhies Heuristic Alignment and

K K Knowledge Knowledge l d l d Representation Representation Representation

Eclipse Andmore Project David Carver Eric Cloninger

Digital Media Technology Group Strategy Planning Session Compiled Presentations Ryan Group and

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

RINAS IM : Y OUR R ECURSIVE I NTER N ETWORK Intro RINASim A RCHITECTURE S IMULATOR Outro

Language using Dependent Types -Ware: An Embedded Hardware Description Future work DTP / Agda

THE FEDERAL CASE FOR COMPUTING PETER HARSHA CRA LISPI 2017 Me Brian Mosley Policy Analyst

The transcriptome and differential expression http://mit6874.github.io 1 Whats on tap today!

Local Representation Alignment: A Biologically Motivated Algorithm - PowerPoint PPT Presentation

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1 Collaborators The Pennsylvania State

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Alignment with beam halo MC Andrea Parenti 05/05/2009 Outline: Alignment with Beam Halo (BH)

Heuristic Alignment and Searching Mark Voorhies 3/28/2012 Mark Voorhies Heuristic Alignment and

K K Knowledge Knowledge l d l d Representation Representation Representation

Eclipse Andmore Project David Carver Eric Cloninger

Digital Media Technology Group Strategy Planning Session Compiled Presentations Ryan Group and

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

RINAS IM : Y OUR R ECURSIVE I NTER N ETWORK Intro RINASim A RCHITECTURE S IMULATOR Outro

Language using Dependent Types -Ware: An Embedded Hardware Description Future work DTP / Agda

THE FEDERAL CASE FOR COMPUTING PETER HARSHA CRA LISPI 2017 Me Brian Mosley Policy Analyst

The transcriptome and differential expression http://mit6874.github.io 1 Whats on tap today!

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,