Local Representation Alignment: A Biologically Motivated Algorithm - - PowerPoint PPT Presentation

local representation alignment
SMART_READER_LITE
LIVE PREVIEW

Local Representation Alignment: A Biologically Motivated Algorithm - - PowerPoint PPT Presentation

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology 1 Collaborators The Pennsylvania State


slide-1
SLIDE 1

Local Representation Alignment: A Biologically Motivated Algorithm for Training Neural Systems

Alexander G. Ororbia II The Neural Adaptive Computing (NAC) Laboratory Rochester Institute of Technology

1

slide-2
SLIDE 2

Collaborators

  • The Pennsylvania State University
  • Dr. C. Lee Giles
  • Dr. Daniel Kifer
  • Rochester Institute of Technology (RIT)
  • Dr. Ifeoma Nwogu (Computer Vision)
  • Dr. Travis Desell (Neuro-evolution, distributed computing)
  • Students
  • Ankur Mali (PhD student, Penn State, co-advised w/ Dr. C. Lee Giles)
  • Timothy Zee (PhD student, RIT, co-advised w/ Dr. Ifeoma Nwogu)
  • Abdelrahman Elsiad (PhD student, RIT, co-advised w/ Dr. Travis Desell)

2

slide-3
SLIDE 3

Objectives

  • Context: Credit assignment & algorithmic alternatives
  • Backpropagation of errors (backprop)
  • Feedback alignment algorithms
  • Target propagation (TP)
  • Contrastive Hebbian learning (CHL)
  • Discrepancy Reduction – a family of learning procedures
  • Error-Driven Local Representation Alignment (LRA/LRA-E)
  • Adaptive Noise Difference Target Propagation (DTP-σ)
  • Experimental Results & Variations
  • Conclusions

3

Equilibrium propagation (EP) Contrastive Divergence (CD)

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Backprop, CHL, LRA SGD, Adam, RMSprop MSE, MAE, CNLL MNIST MLP, AE, BM, RNN

5

MLP = Multilayer perceptron AE = Autoencoder BM = Boltzmann machine

slide-6
SLIDE 6

Problems with Backprop

Global optimization, back-prop through whole graph.

6

  • The global feedback pathway
  • Vanishing/exploding gradients
  • In recurrent networks, this is worse!!
  • The weight transport problem
  • High sensitivity to initialization
  • Activation constraints/conditions
  • Requires system to be fully

differentiable → difficulty in handling discrete-valued functions

  • Requires sufficiently linearity →

adversarial samples

slide-7
SLIDE 7

Feedforward Inference

Illustration: forward propagation in a multilayer perceptron (MLP) to collect activities (Shared across most algorithms, i.e., backprop, random feedback alignment, direct feedback alignment, local representation alignment)

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

Backpropagation of Errors

13

slide-14
SLIDE 14

Conducting credit assignment using the activities produced by the inference pass

14

slide-15
SLIDE 15

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

15

slide-16
SLIDE 16

Pass error signal back through (incoming) synaptic weights to get error signal transmitted to post- activations in layer below

16

slide-17
SLIDE 17

Repeat the previous steps, layer by layer (recursive treatment of backprop procedure)

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

Random Feedback Alignment

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

22

slide-23
SLIDE 23

Pass error signal back through fixed, random alignment weights (replaces backprop’s step of passing error through transpose

  • f feedforward weights)

23

slide-24
SLIDE 24

Repeat previous steps (similar to backprop)

24

slide-25
SLIDE 25

25

slide-26
SLIDE 26

26

slide-27
SLIDE 27

Direct Feedback Alignment

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

29

slide-30
SLIDE 30

Pass error signal along first set of direct alignment weights to second layer

30

slide-31
SLIDE 31

Pass error signal along next set of direct alignment weights to first layer

31

slide-32
SLIDE 32

Treat the signals propagated along direct alignment connections as proxies for error derivatives and run them through post-activations in each layer, respectively

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

Random Feedback Alignment: Direct Feedback Alignment: Backpropagation of Errors:

34

slide-35
SLIDE 35

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

36

slide-36
SLIDE 36

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

37

Global feedback pathway Will these yield coherent models?

slide-37
SLIDE 37

Equilibrium Propagation

38

Negative phase Positive phase

slide-38
SLIDE 38

The Discrepancy Reduction Family

  • General process (Ororbia et. al., 2017 Adapt)
  • 1) Search for latent representations that better explain input/output (targets)
  • 2) Reduce mismatch between currently “guessed” representations &

target representations

  • Sum of internal, local losses (in nats) → total discrepancy (akin to “pseudo-energy”)
  • Coordinated local learning rules
  • Algorithms
  • Difference target propagation (DTP) (Lee et. al., 2014)
  • DTP-σ (Ororbia et. al., 2019)
  • LRA (Ororbia et. al., 2018, Ororbia et. al., 2019)
  • Others – targets could come from an external, interacting process
  • NPC (neural predictive coding, Ororbia et. al., 2017/2018/2019)

39

slide-39
SLIDE 39

Adaptive Noise Difference Target Propagation (DTP-σ)

Image adapted from (Lillicrap et al., 2018)

zL zL ˄ z L-1 z L-1 ˄ g(z )

L

g(z )

L

˄

40

slide-40
SLIDE 40

Error-Driven Local Representation Alignment (LRA-E)

41

slide-41
SLIDE 41

42

slide-42
SLIDE 42

Transmit error along error feedback weights, and error correct the post- activations using the transmitted displacement/delta

43

slide-43
SLIDE 43

Calculate local error in layer below, measuring discrepancy between

  • riginal post-activation and error-

corrected post-activation

44

slide-44
SLIDE 44

Repeat the past several steps, error-correcting each layer further down within the network/system

45

slide-45
SLIDE 45

46

slide-46
SLIDE 46

Optional…substitute & repeat!

47

slide-47
SLIDE 47

Aligning Local Representations

  • Credit assignment by optimizing subgraphs linked by error units

The Cauchy local loss:

48

slide-48
SLIDE 48

Aligning Local Representations

  • Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999)

49

slide-49
SLIDE 49

Aligning Local Representations

  • Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999) There is more than

  • ne way to compute

these changes

50

slide-50
SLIDE 50

Some Experimental Results

51

slide-51
SLIDE 51

Experimental Results

52

MNIST Fashion MNIST

7 3

Trousers Dress Shirt

(Ororbia et al., 2018 Bio)

slide-52
SLIDE 52

Acquired Filters

Third level filters acquired, after a single pass through the data, by tanh network trained by a) backprop, b) LRA.

Backprop LRA

53

slide-53
SLIDE 53

Visualization of Topmost Post-Activities

54

slide-54
SLIDE 54

55

Angle between LRA, DFA, & DTP-σ against Backprop Measuring Total Discrepancy in LRA-E

slide-55
SLIDE 55

Equilibrium Propagation (8 layers): MNIST: 59.03% Fashion MNIST: 67.33% Equilibrium Propagation (3 layers): MNIST: 6.00% Fashion MNIST: 16.71%

Training Deep (& Thin) Networks

(Ororbia et al., 2018 Credit)

56

slide-56
SLIDE 56

Training Networks from Null Initialization

LWTA: SLWTA:

(Ororbia et al., 2018 Credit)

57

slide-57
SLIDE 57

Training Stochastic Networks

58

(Ororbia et al., 2018 Credit)

slide-58
SLIDE 58

If time permits…let’s talk about modeling time…

59

slide-59
SLIDE 59

Training Neural Temporal/Recurrent Models

The Parallel Temporal Neural Coding Network (P-TNCN) (Ororbia et al., 2018) (Ororbia et al., 2018 Continual)

  • Integrating LRA into recurrent networks – result = Temporal Neural Coding Network

60

slide-60
SLIDE 60

Removing Back-Propagation through Time!

  • Each step in time entails: 1) generate hypothesis, 2) error correction in light of evidence

61

slide-61
SLIDE 61

62

slide-62
SLIDE 62

63

slide-63
SLIDE 63

Conclusions

  • Backprop has issues, alignment algorithms fix one issue
  • Other algorithms such as DTP or EP are slow….
  • Discrepancy reduction
  • Local representation alignment
  • Adaptive noise difference target propagation (DTP-σ)
  • Showed promising results, stable and performant compared to

alternatives such as Equilibrium Propagation & alignment algorithms

  • Can work with non-differentiable operators (discrete/stochastic)
  • Can be used to train recurrent/temporal models too!

64

slide-64
SLIDE 64

Questions?

65

slide-65
SLIDE 65

References

  • (Ororbia et al., 2018, Credit) -- Alexander G. Ororbia II, Ankur Mali, Daniel Kifer,

and C. Lee Giles. “Deep Credit Assignment by Aligning Local Distributed Representations”. arXiv:1803.01834 [cs.LG].

  • (Ororbia et al., 2018, Continual) -- Alexander G. Ororbia II , Ankur Mali, C. Lee

Giles, and Daniel Kifer. “Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations”. arXiv:1810.07411 [cs.LG].

  • (Ororbia et al., 2017, Adapt) -- Alexander G. Ororbia II , Patrick Haffner, David

Reitter, and C. Lee Giles. “Learning to Adapt by Minimizing Discrepancy”. arXiv:1711.11542 [cs.LG].

  • (Ororbia et al., 2018, Lifelong) -- Alexander G. Ororbia II , Ankur Mali, Daniel Kifer

, and C. Lee Giles. “Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively”. arXiv:1905.10696 [cs.LG].

  • (Ororbia et al., 2018, Bio) -- Alexander G. Ororbia II and Ankur Mali. “Biologically

Motivated Algorithms for Propagating Local Target Representations”. In: Thirty- Third AAAI Conference on Artificial Intelligence.

66