ISIT 2020 Signal and Information Processing Laboratory Institut fr - PowerPoint PPT Presentation

Online Memorization of Random Firing Sequences by a Recurrent Neural Network Patrick Murer and Hans-Andrea Loeliger ETH Zürich ISIT 2020 Signal and Information Processing Laboratory Institut für Signal- und Informationsverarbeitung

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Setting the Stage • Background: Spiking neural networks – Models of biological neural networks – Candidates for neuromorphic hardware – A mode of mathematical signal processing • This paper: – Fully connected recurrent neural network – Memorize long sequences of binary vectors – Using quasi-Hebbian (i.e., “local”) learning rules This paper is not directly related to nonspiking recurrent neural networks (LSTM etc.). Online Memorization of Random Firing Sequences by a Recurrent Neural Network 2 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Preview of Main Results • Single-pass quasi-Hebbian memorization is possible... • ...but requires more resources (neurons, connections) than multi-pass memorization. • Multi-pass memorization achieves O (1) bits per connection (i.e., per synapse), which beats the Hopfield network. • Perhaps useful for understanding short-term memory vs. long-term memory in neuroscience. Online Memorization of Random Firing Sequences by a Recurrent Neural Network 3 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Fully Connected Recurrent Neural Network Model Network with L = 4 neurons which produces y [1] , y [2] , . . . ∈ { 0 , 1 } L : y 1 [ k ] ξ 1 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 1 y 1 [ k +1] y 2 [ k ] ξ 2 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 2 y 2 [ k +1] y 3 [ k ] ξ 3 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 3 y 3 [ k +1] y 4 [ k ] ξ 4 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 4 y 4 [ k +1] Online Memorization of Random Firing Sequences by a Recurrent Neural Network 4 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Neurons with Bounded Disturbance y 1 [ k ] ξ 1 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 1 y 1 [ k +1] y 2 [ k ] ξ 2 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 2 y 2 [ k +1] y 3 [ k ] ξ 3 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 3 y 3 [ k +1] y 4 [ k ] ξ 4 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 4 y 4 [ k +1] Each neuron is a mapping ξ ℓ : R L → { 0 , 1 } defined as � 1 , if � y , w ℓ � + η ℓ ≥ θ ℓ y �→ ξ ℓ ( y ) := 0 , otherwise , where � y , w ℓ � : = w T ℓ y , i.e., output is a threshold on linear combination of inputs. Online Memorization of Random Firing Sequences by a Recurrent Neural Network 5 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Neurons with Bounded Disturbance y 1 [ k ] ξ 1 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 1 y 1 [ k +1] y 2 [ k ] ξ 2 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 2 y 2 [ k +1] y 3 [ k ] ξ 3 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 3 y 3 [ k +1] y 4 [ k ] ξ 4 ( y [ k ]) ∈ { 0 , 1 } z − 1 ξ 4 y 4 [ k +1] • The disturbance (or error) η ℓ is bounded as − η ≤ η ℓ ≤ η, ℓ = 1 , . . . , L. • The bound η will be allowed to grow linearly with L . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 6 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Memorizing Firing Sequences • The goal is to reproduce a firing sequence of length N which is given in the form of a matrix � ∈ { 0 , 1 } L × N � a 1 , . . . , a N A = with columns a 1 , . . . , a N ∈ { 0 , 1 } L . • Thus, if the network is initialized with an arbitrary column y [0] = a n , then it should produce the sequence y [ k ] = a ( k + n ) mod N , k = 1 , 2 , . . . with a 0 := a N . • By contrast, a Hopfield network memorizes static vectors. Online Memorization of Random Firing Sequences by a Recurrent Neural Network 7 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Quasi-Hebbian Learning Given A = ( a ℓ,n ) , we consider learning rules of the following form: Starting from w (0) ∈ R L the weights are updated recursively by ℓ w ( n ) = w ( n − 1) + ∆ w ℓ,n , n = 1 , . . . , K, ℓ ℓ where ∆ w ℓ,n depends only on w ( n − 1) a ℓ,n , and on a n − 1 , and perhaps also on . ℓ • These restrictions essentially agree with those of Hebbian learning... • ...but Hebbian learning is normally unsupervised. • Suitable for hardware implementation (biological or neuromorphic). Online Memorization of Random Firing Sequences by a Recurrent Neural Network 8 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Single-pass vs. Multi-pass Memorization Single-pass Exactly one pass through the data, i.e., K = N , with � a n − 1 − p 1 L � ∆ w ℓ,n := a ℓ,n � 1 , 1 , . . . , 1 � T ∈ R L and 0 < p < 1 . where 1 L := Multi-pass Multiple passes through the data, i.e., K ≫ N , with ∆ w ℓ,n := β ( n ) � �� a n − 1 , w ( n − 1) a ℓ,n − , ℓ for some step size β ( n ) > 0 . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 9 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Single-pass Memorization Single-pass Exactly one pass through the data, i.e., K = N , with � a n − 1 − p 1 L � ∆ w ℓ,n := a ℓ,n � 1 , 1 , . . . , 1 � T ∈ R L and 0 < p < 1 . where 1 L := Multi-pass Multiple passes through the data, i.e., K ≫ N , with ∆ w ℓ,n := β ( n ) � �� a n − 1 , w ( n − 1) a ℓ,n − , ℓ for some step size β ( n ) > 0 . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 10 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Single-pass Memorization of Random Firing Sequences • We analyze the probability of perfect memorization for a random matrix A ∈ { 0 , 1 } L × N with i.i.d. entries a ℓ,n parameterized by p := Pr[ a ℓ,n = 1] , which we denote by A i.i.d. ∼ Ber( p ) L × N . • Then for ℓ = 1 , . . . , L , we fix the weights to w ℓ := w ( N ) , where ℓ   w ( n − 1) , if a ℓ,n = 0 w ( n ) ℓ := w ( n − 1) ℓ  + a n − 1 − p 1 L , if a ℓ,n = 1 , ℓ w (0) := 0 , and the thresholds to ℓ θ ℓ := θ := 1 4 Lp (1 − p ) . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 11 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Main Result Let E A be the event that the memorization of A is not perfect. Theorem (Upper Bound on Pr[ E A ] ) For all integers L ≥ 1 , N ≥ 2 , 0 < p < 1 , A i.i.d. ∼ Ber( p ) L × N , the recurrent network with weights w 1 , . . . , w L and threshold(s) θ as defined above, and with disturbance bound η := ˜ η · θ, 0 < ˜ η < 1 , and initialized with any column of A will reproduce a periodic extension of A such that Pr[ E A ] < 2 LNe − c 1 L N + LNe − c 2 L � � � � η ) 2 p 2 (1 − p ) 2 and c 2 := D KL 1+˜ with c 1 := 1 η 8 (1 − ˜ 2 p � p . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 12 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Main Result – Dependence of N on L • A sufficient condition for the upper bound of Pr [ E A ] to vanish as L → ∞ is c 1 L N ≤ N ∗ ( L ) := ln( L 2 ) . In contrast, the upper bound of Pr[ E A ] diverges to + ∞ as L → ∞ if c 1 L � � N 1 ( L ) := ln( L 2 ) r , 0 < r < 1 or N 2 ( L ) := � γN ∗ ( L ) , � γ > 1 . • N ∗ ( · ) grows faster than L �→ γL q , for 0 < q < 1 , γ > 0 , i.e., γL q lim N ∗ ( L ) = 0 . L →∞ • Asymptotically almost square matrices are memorizable: ∀ ε > 0 ∃ L ε ∈ N : LN ∗ ≥ L 2 − ε , ∀ L ≥ L ε Online Memorization of Random Firing Sequences by a Recurrent Neural Network 13 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Main Result – Dependence of L on N 10 8 10 7 L 10 6 10 5 10 4 10 1 10 2 10 3 10 4 N Value of L required for the upper bound of Pr[ E A ] to equal 10 − 3 , 10 − 6 , 10 − 9 , 10 − 12 (from bottom to top) for p = 1 / 2 , and ˜ η = 1 / 8 . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 14 / 20

Introduction Network Model Learning Rules Single-pass Memorization Multi-pass Memorization Capacities Conclusion Multi-pass Memorization Single-pass Exactly one pass through the data, i.e., K = N , with � a n − 1 − p 1 L � ∆ w ℓ,n := a ℓ,n � 1 , 1 , . . . , 1 � T ∈ R L and 0 < p < 1 . where 1 L := Multi-pass Multiple passes through the data, i.e., K ≫ N , with ∆ w ℓ,n := β ( n ) � �� a n − 1 , w ( n − 1) a ℓ,n − , ℓ for some step size β ( n ) > 0 . Online Memorization of Random Firing Sequences by a Recurrent Neural Network 15 / 20

ISIT 2020 Signal and Information Processing Laboratory Institut fr - PowerPoint PPT Presentation

Online Memorization of Random Firing Sequences by a Recurrent Neural Network Patrick Murer and Hans-Andrea Loeliger ETH Zrich ISIT 2020 Signal and Information Processing Laboratory Institut fr Signal- und Informationsverarbeitung

Explicit R enyi Entropy for Hidden Markov Chains Joachim Breitner, Maciej Skorski ISIT, June

Quantized Corrupted Sensing with Random Dithering Zhongxing Sun Beijing Institute of Technology

Additivity in classical-quantum wiretap channels ISIT 2020 University of Sydney Arkin Tikku,

Bee-Identification Error Exponent with Absentee Bees Anshoo Tandon National University of

Dis iscrete Water Fil illi ling Mult lti-Path Packet Scheduli ling IEEE ISIT 2020 Arno

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche ,

Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020

DoF Analysis for Multipath-Assisted Imaging: Single Frequency Illumination Nishant Mehrotra and

Partially Information Coupled Duo-binary Turbo Codes Xiaowei Wu, Min Qiu, and Jinhong Yuan School

Finite-Blocklength Performance of Sequential Transmission over BSC with Noiseless Feedback

A Broadcast Approach Maha Zohdy, Ali Tajer, Shlomo Shamai RPI RPI Technion ISIT'20 1

Ele lect ron ronic ic V Vis isit it Verif rific icat at ion ion March 19, 19, 2019

Security Analysis on Wireless LAN protocols HORI Yoshiaki hori@csce.kyushu-u.ac.jp Kyushu

In Internatio ional l Vis isit itor Survey Presentatio ion 2019 Cla lassification of f

Usi Using Pa PaQu for or l language acquis isit ition ion r research Jan Odijk CLARIN

Ele lect ron ronic ic V Vis isit it Verif rific icat at ion ion February 19, 19, 2019

On the Thermodynamic Equivalence between Hopfield Networks and Hybrid Boltzmann Machines Enrica

Section 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder

Artificial Neural Networks and Deep Learning Christian Borgelt Dept. of Mathematics / Dept. of

7. Artificial neural networks Introduction to neural networks Despite struggling to understand

Neural Networks Find a way to teach networks to do a certain computation (e.g. ICA) Network

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

ISIT 2020 Signal and Information Processing Laboratory Institut fr - PowerPoint PPT Presentation

Online Memorization of Random Firing Sequences by a Recurrent Neural Network Patrick Murer and Hans-Andrea Loeliger ETH Zrich ISIT 2020 Signal and Information Processing Laboratory Institut fr Signal- und Informationsverarbeitung

Explicit R enyi Entropy for Hidden Markov Chains Joachim Breitner, Maciej Skorski ISIT, June

Quantized Corrupted Sensing with Random Dithering Zhongxing Sun Beijing Institute of Technology

Additivity in classical-quantum wiretap channels ISIT 2020 *University of Sydney Arkin Tikku*,

Bee-Identification Error Exponent with Absentee Bees Anshoo Tandon National University of

Dis iscrete Water Fil illi ling Mult lti-Path Packet Scheduli ling IEEE ISIT 2020 Arno

An Alphabet-Size Bound for the Information Bottleneck Function ISIT 2020 Christoph Hirche ,

Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020

DoF Analysis for Multipath-Assisted Imaging: Single Frequency Illumination Nishant Mehrotra and

Partially Information Coupled Duo-binary Turbo Codes Xiaowei Wu, Min Qiu, and Jinhong Yuan School

Finite-Blocklength Performance of Sequential Transmission over BSC with Noiseless Feedback

A Broadcast Approach Maha Zohdy, Ali Tajer, Shlomo Shamai RPI RPI Technion ISIT'20 1

Ele lect ron ronic ic V Vis isit it Verif rific icat at ion ion March 19, 19, 2019

Security Analysis on Wireless LAN protocols HORI Yoshiaki hori@csce.kyushu-u.ac.jp Kyushu

In Internatio ional l Vis isit itor Survey Presentatio ion 2019 Cla lassification of f

Usi Using Pa PaQu for or l language acquis isit ition ion r research Jan Odijk CLARIN

Ele lect ron ronic ic V Vis isit it Verif rific icat at ion ion February 19, 19, 2019

On the Thermodynamic Equivalence between Hopfield Networks and Hybrid Boltzmann Machines Enrica

Section 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder

Artificial Neural Networks and Deep Learning Christian Borgelt Dept. of Mathematics / Dept. of

7. Artificial neural networks Introduction to neural networks Despite struggling to understand

Neural Networks Find a way to teach networks to do a certain computation (e.g. ICA) Network

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

Additivity in classical-quantum wiretap channels ISIT 2020 University of Sydney Arkin Tikku,