ISIT 2020 Signal and Information Processing Laboratory Institut fr - - PowerPoint PPT Presentation

isit 2020
SMART_READER_LITE
LIVE PREVIEW

ISIT 2020 Signal and Information Processing Laboratory Institut fr - - PowerPoint PPT Presentation

Online Memorization of Random Firing Sequences by a Recurrent Neural Network Patrick Murer and Hans-Andrea Loeliger ETH Zrich ISIT 2020 Signal and Information Processing Laboratory Institut fr Signal- und Informationsverarbeitung


slide-1
SLIDE 1

Online Memorization of Random Firing Sequences by a Recurrent Neural Network

Patrick Murer and Hans-Andrea Loeliger ETH Zürich

ISIT 2020

Signal and Information Processing Laboratory Institut für Signal- und Informationsverarbeitung

slide-2
SLIDE 2

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Setting the Stage

  • Background: Spiking neural networks

– Models of biological neural networks – Candidates for neuromorphic hardware – A mode of mathematical signal processing

  • This paper:

– Fully connected recurrent neural network – Memorize long sequences of binary vectors – Using quasi-Hebbian (i.e., “local”) learning rules

This paper is not directly related to nonspiking recurrent neural networks (LSTM etc.).

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 2 / 20

slide-3
SLIDE 3

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Preview of Main Results

  • Single-pass quasi-Hebbian memorization is possible...
  • ...but requires more resources (neurons, connections) than

multi-pass memorization.

  • Multi-pass memorization achieves O(1) bits per connection

(i.e., per synapse), which beats the Hopfield network.

  • Perhaps useful for understanding short-term memory vs.

long-term memory in neuroscience.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 3 / 20

slide-4
SLIDE 4

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Fully Connected Recurrent Neural Network Model

Network with L = 4 neurons which produces y[1], y[2], . . . ∈ {0, 1}L:

z−1 ξ1 z−1 ξ2 z−1 ξ3 z−1 ξ4

y1[k] ξ1(y[k]) ∈ {0, 1} y1[k+1] y2[k] ξ2(y[k]) ∈ {0, 1} y2[k+1] y3[k] ξ3(y[k]) ∈ {0, 1} y3[k+1] y4[k] ξ4(y[k]) ∈ {0, 1} y4[k+1]

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 4 / 20

slide-5
SLIDE 5

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Neurons with Bounded Disturbance

z−1 ξ1 z−1 ξ2 z−1 ξ3 z−1 ξ4

y1[k] ξ1(y[k]) ∈ {0, 1} y1[k+1] y2[k] ξ2(y[k]) ∈ {0, 1} y2[k+1] y3[k] ξ3(y[k]) ∈ {0, 1} y3[k+1] y4[k] ξ4(y[k]) ∈ {0, 1} y4[k+1]

Each neuron is a mapping ξℓ : RL → {0, 1} defined as y → ξℓ(y) :=

  • 1,

if y, wℓ + ηℓ ≥ θℓ 0,

  • therwise,

where y, wℓ := wT

ℓ y, i.e., output is a threshold on linear combination of inputs.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 5 / 20

slide-6
SLIDE 6

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Neurons with Bounded Disturbance

z−1 ξ1 z−1 ξ2 z−1 ξ3 z−1 ξ4

y1[k] ξ1(y[k]) ∈ {0, 1} y1[k+1] y2[k] ξ2(y[k]) ∈ {0, 1} y2[k+1] y3[k] ξ3(y[k]) ∈ {0, 1} y3[k+1] y4[k] ξ4(y[k]) ∈ {0, 1} y4[k+1]

  • The disturbance (or error) ηℓ is bounded as

−η ≤ ηℓ ≤ η, ℓ = 1, . . . , L.

  • The bound η will be allowed to grow linearly with L.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 6 / 20

slide-7
SLIDE 7

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Memorizing Firing Sequences

  • The goal is to reproduce a firing sequence of length N which is given

in the form of a matrix A =

a1, . . . , aN ∈ {0, 1}L×N

with columns a1, . . . , aN ∈ {0, 1}L.

  • Thus, if the network is initialized with an arbitrary column y[0] = an,

then it should produce the sequence y[k] = a(k+n) mod N, k = 1, 2, . . . with a0 := aN.

  • By contrast, a Hopfield network memorizes static vectors.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 7 / 20

slide-8
SLIDE 8

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Quasi-Hebbian Learning

Given A = (aℓ,n), we consider learning rules of the following form: Starting from w(0)

∈ RL the weights are updated recursively by w(n)

= w(n−1)

+ ∆wℓ,n, n = 1, . . . , K, where ∆wℓ,n depends only on aℓ,n, and on an−1, and perhaps also on w(n−1)

.

  • These restrictions essentially agree with those of Hebbian learning...
  • ...but Hebbian learning is normally unsupervised.
  • Suitable for hardware implementation (biological or neuromorphic).

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 8 / 20

slide-9
SLIDE 9

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Single-pass vs. Multi-pass Memorization

Single-pass

Exactly one pass through the data, i.e., K = N, with ∆wℓ,n := aℓ,n

an−1 − p1L

  • where 1L :=

1, 1, . . . , 1 T ∈ RL and 0 < p < 1.

Multi-pass

Multiple passes through the data, i.e., K ≫ N, with ∆wℓ,n := β(n) aℓ,n −

an−1, w(n−1)

  • ,

for some step size β(n) > 0.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 9 / 20

slide-10
SLIDE 10

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Single-pass Memorization

Single-pass

Exactly one pass through the data, i.e., K = N, with ∆wℓ,n := aℓ,n

an−1 − p1L

  • where 1L :=

1, 1, . . . , 1 T ∈ RL and 0 < p < 1.

Multi-pass

Multiple passes through the data, i.e., K ≫ N, with ∆wℓ,n := β(n) aℓ,n −

an−1, w(n−1)

  • ,

for some step size β(n) > 0.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 10 / 20

slide-11
SLIDE 11

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Single-pass Memorization of Random Firing Sequences

  • We analyze the probability of perfect memorization for a random

matrix A ∈ {0, 1}L×N with i.i.d. entries aℓ,n parameterized by p := Pr[aℓ,n = 1], which we denote by A i.i.d. ∼ Ber(p)L×N.

  • Then for ℓ = 1, . . . , L, we fix the weights to wℓ := w(N)

, where w(n)

:=

  

w(n−1)

, if aℓ,n = 0 w(n−1)

+ an−1 − p1L, if aℓ,n = 1, w(0)

:= 0, and the thresholds to θℓ := θ := 1 4Lp(1 − p).

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 11 / 20

slide-12
SLIDE 12

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Main Result

Let EA be the event that the memorization of A is not perfect.

Theorem (Upper Bound on Pr[EA])

For all integers L ≥ 1, N ≥ 2, 0 < p < 1, A i.i.d. ∼ Ber(p)L×N, the recurrent network with weights w1, . . . , wL and threshold(s) θ as defined above, and with disturbance bound η := ˜ η · θ, 0 < ˜ η < 1, and initialized with any column of A will reproduce a periodic extension

  • f A such that

Pr[EA] < 2LNe−c1 L

N + LNe−c2L

with c1 := 1

8(1 − ˜

η)2p2(1 − p)2 and c2 := DKL

  • 1+˜

η 2 p

  • p
  • .

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 12 / 20

slide-13
SLIDE 13

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Main Result – Dependence of N on L

  • A sufficient condition for the upper bound of Pr[EA] to vanish as L → ∞ is

N ≤ N ∗(L) := c1L ln(L2). In contrast, the upper bound of Pr[EA] diverges to +∞ as L → ∞ if

  • N1(L) :=

c1L ln(L2)r , 0 < r < 1

  • r
  • N2(L) :=

γN ∗(L), γ > 1.

  • N ∗(·) grows faster than L → γLq, for 0 < q < 1, γ > 0, i.e.,

lim

L→∞

γLq N ∗(L) = 0.

  • Asymptotically almost square matrices are memorizable:

∀ε > 0 ∃Lε ∈ N : LN ∗ ≥ L2−ε, ∀L ≥ Lε

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 13 / 20

slide-14
SLIDE 14

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Main Result – Dependence of L on N

101 102 103 104 104 105 106 107 108

N L

Value of L required for the upper bound of Pr[EA] to equal 10−3, 10−6, 10−9, 10−12 (from bottom to top) for p = 1/2, and ˜ η = 1/8.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 14 / 20

slide-15
SLIDE 15

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Multi-pass Memorization

Single-pass

Exactly one pass through the data, i.e., K = N, with ∆wℓ,n := aℓ,n

an−1 − p1L

  • where 1L :=

1, 1, . . . , 1 T ∈ RL and 0 < p < 1.

Multi-pass

Multiple passes through the data, i.e., K ≫ N, with ∆wℓ,n := β(n) aℓ,n −

an−1, w(n−1)

  • ,

for some step size β(n) > 0.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 15 / 20

slide-16
SLIDE 16

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

An Elementary Analysis using Least-squares

For fixed ℓ ∈ {1, . . . , L}, consider the least-squares problem min

wℓ∈ RL N

  • n=1
  • an−1, wℓ − aℓ,n
  • 2 = min

wℓ∈ RL

  • ˜

Awℓ − ˜ aℓ

  • 2,

where ˜ A :=

     

aT

N

aT

1

. . . aT

N−1

     

∈ RN×L, ˜ aℓ :=

  

aℓ,1 . . . aℓ,N

   ∈ RN.

Note that ˜ A is the transposed matrix of

aN, a1, . . . , aN−1 ∈ RL×N,

i.e., of the one time-step cyclic shifted version of A, and ˜ aℓ is the ℓ-th row

  • f A turned into a column vector.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 16 / 20

slide-17
SLIDE 17

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Learning the Weights by Gradient Descent Methods

The least-squares problem can be solved by

  • gradient descent

w(n)

= w(n−1)

+ β(n) ˜ AT ˜ aℓ − ˜ Aw(n−1)

  • ,

which converges to a minimizer if β(n) = β and 0 < β <

2 λmax( ˜ AT ˜ A).

  • stochastic gradient descent

w(n)

= w(n−1)

+ β(n) aℓ,n −

an−1, w(n−1)

  • an−1,

which fulfills the conditions of quasi-Hebbian learning. Clearly, the performance of both methods depends highly on the initial guess w(0)

∈ RL and step size β(n) > 0.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 17 / 20

slide-18
SLIDE 18

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Sufficient Condition for Memorizable Matrices

  • If rank( ˜

A) = N, then min

wℓ∈ RL

  • ˜

Awℓ − ˜ aℓ

  • 2 = 0,

which implies that A is (perfectly) memorizable, i.e., Pr[EA] = 0.

  • For L ≥ N, 0 < p ≤ 1/2, A i.i.d.

∼ Ber(p)L×N, it follows from [1] that Pr

  • rank( ˜

A) = N

  • ≥ 1 −

1 − p + oN(1) N −

− − − →

N→∞ 1,

where limN→∞ oN(1) = 0.

Theorem

Any A i.i.d. ∼ Ber(p)L×N with N = L is memorizable as N → ∞.

[1] K. Tikhomirov, “Singularity of random Bernoulli matrices,” Annals of Math., vol. 191,

  • no. 2, pp. 593–634, March 2020.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 18 / 20

slide-19
SLIDE 19

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Asymptotic Memorization Capacities

The memorization capacity is equal to the total number of bits which a network is able to memorize, thus Cmem ≥ log2 |Atypical| [ bits ], where Atypical is a typical set of matrices for A i.i.d. ∼ Ber(p)L×N, and lim

L→∞

1 L log2 |Atypical| = Hb(p)N. Asymptotic capacity in Hopfield Single-pass Multi-pass bits

1 2 L2 ln(L) 1 2c1Hb(p) L2 ln(L)

Hb(p)L2

bits per neuron

1 2 L ln(L) 1 2c1Hb(p) L ln(L)

Hb(p)L

bits per connection

1 2 1 ln(L) 1 2c1Hb(p) 1 ln(L)

Hb(p)

Recall that c1 := 1

8(1 − ˜

η)2p2(1 − p)2, and Hb(p) := −p log2(p) − (1 − p) log2(1 − p).

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 19 / 20

slide-20
SLIDE 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion

Conclusion

  • Single-pass quasi-Hebbian learning is possible, capacity scales like

Hopfield network...

  • ...but requires more resources (neurons, synapses) than multi-pass

memorization.

  • Multi-pass memorization achieves O(1) bits/synapse, beats

Hopfield network.

  • Perhaps useful for understanding short-term memory vs. long-term

memory in neuroscience.

Online Memorization of Random Firing Sequences by a Recurrent Neural Network 20 / 20