Associative memories 9/25/2014 Memorized associations are - PowerPoint PPT Presentation

Associative memories � 9/25/2014

Memorized associations are ubiquitous Stimulus Response “Bill”

Memorized associations are ubiquitous Stimulus Response Key properties: � • Noise tolerance (generalization) • Graceful saturation • High capacity “Bill”

First attempts: Holography van Heerden, 1963 Willshaw, Longuet-higgins, 1960s Z � r,s ( x ) = r ? s = r ( ⇣ ) r ( x − ⇣ ) d ⇣ Mathematically, this is the convolution-correlation scheme from class 4. Z s � φ r,s ( x ) = s ( τ ) φ r,s + ( τ + x ) d τ

Steinbuch, 1962 Matrix memories Willshaw et al., 1969 Input / Stimulus Before long, it was realized that better results could be obtained with a simpler, more neurally plausible framework. Let’s explore a simple Hebbian scheme. We have input and output lines, and we strength synapses when they’re on together. Output / Response

Storage

Storage What happens here depends on the specific choice of learning rule.

Storage S 3 S 2 S 1 Additive Hebb rule n X R i S T M = i i =1 What happens here R 1 depends on the R 2 specific choice of R 3 learning rule.

Storage S T M S 3 S 2 S 1 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1 What happens here R 1 depends on the R R 2 specific choice of R 3 learning rule.

Storage S T M S 2 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1

Storage S T M S 2 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1 ˆ ˆ R 2 R

Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X RS T M = i =1

Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 ˆ R 1 ˆ R

Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 n X ˆ R i S T i S j + R j || S j || 2 R j = ˆ R 1 ˆ R i 6 = j

Retrieval S T M S 1 Additive Hebb rule n X R i S T M = i i =1 n X ˆ R i S T R j = i S j i =1 0 1 n X ˆ R i S T i S j + R j || S j || 2 R j = ˆ R 1 ˆ R i 6 = j If the S k are orthonormal , then retrieval is exact.

Another perspective Recall the the optimal memory matrix is M ? = RS † If the columns of S are linearly independent, then X † = ( X T X ) − 1 X T , giving M ? = R ( S T S ) − 1 S T . So if the columns of S (the S i ) are orthonormal, M ? = RS T , which is exactly what we got for the simple Hebb rule.

Capacity How much information can a matrix memory store? Model: P patterns of size N 1. | S | = | R | = N × P 2. Each input pattern (column of S ) has m S nonzeros, and each output pattern (column of R ) has m R nonzeros. 3. Binary Hebb rule: M = max( RS T , 1 ) (Each entry is clipped at one.) 4. Threshold recall: ⇢ 1 [ M S j ] k > τ ˆ R jk = 0 else

Capacity How much information can a matrix memory store? Model: P patterns of size N 1. | S | = | R | = N × P 2. Each input pattern (column of S ) has m S nonzeros, and each output pattern (column of R ) has m R nonzeros. 3. Binary Hebb rule: M = max( RS T , 1 ) � (Each entry is clipped at one.) Parameters we can pick 4. Threshold recall: ⇢ 1 [ M S j ] k > τ ˆ R jk = 0 else

Capacity S T M To choose the threshold τ , note that in the absence of noise, a column of M has exactly m S ones. So in order to recover all the ones in R , we better set τ = m S . ˆ ˆ R 2 R

Capacity Sparsity parameters The chance of given weight in M remaining zero throughout the learning process is (1 − m S m R − P mS mR ) P ≈ e = (1 − q ) N 2 N 2 The probability of a spurious one in ˆ R is the probability of exceeding τ purely by chance. This is Nq m S . So the highest value for m S we can choose before make the first error is: m S = − log N Nq m S = 1 ⇒ log q

Capacity From previous slide Sparsity parameters − P mS mR m S = − log N = (1 − q ) N 2 e log q

Capacity Sparsity parameters This is quite good! = log(2) m S m R − P mS mR (1 − q ) = e N 2 ⇒ N 2 N 2

Kanerva, 1988 Memory space The decision to represent memory items as sparse, high- dimensional vectors has some interesting consequences. High-dimensional spaces are counterintuitive. In 1000 dimensions, 0.001 of all patterns are within 451 bits of a given point, and all but 0.001 are within 549 bits. � Points tend to be orthogonal - most point-pairs are “noise-like.”

Kanerva, 1988 Memory space The decision to represent memory items as sparse, high- dimensional vectors has some interesting consequences. High-dimensional spaces are counterintuitive. Linking concepts Almost all pairs of points are far apart, but there are multiple“linking” points that are close to both. �

Matrix memories in the brain: Marr, 1967 Albus, 1971 Marr’s model of the cerebellum Short, live experiment

Matrix memories in the brain: Marr’s model of the cerebellum The cerebellum produces smooth , coordinated motor movements. (And may be involved in cognition as well). 24-year old Chinese woman without a cerebellum

Learn associations straight Purkinje Axons from context to actions, so Motor Output you don’t have to “think” before doing. Mossy Fiber Contextual Input

But how does training work? Purkinje Axons How do the right patterns Motor Output “appear” on the output (Purkinje) lines? Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

Purkinje Axons There’s a remarkable 1-1 correspondence Motor Output between climbing fibers and Purkinje axons. Moreover, each climbing fiber wraps around and around it’s Purkinje axon, making hundreds of synapses; a single AP can make a Purkinje spike. Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

We said that sparsity was a Purkinje Axons key property. How is that Motor Output manifested here? Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input The rest of the brain

Purkinje Axons Motor Output Granule Cells Sparsification Mossy Fiber Contextual Input Climbing Fibers Motor Teaching Input

Purkinje Axons Motor Output Granule Cells Sparsification There are 50 billion granule cells - 3/4 of the brain’s neurons.They’re tiny. � The idea here is that they “blow up” the mossy fiber input into a larger space in Mossy Fiber which the signal can be sparser. Contextual Input � Granule cells code for sets of mossy fibers Climbing Fibers Motor Teaching Input ( codons ), hypothesized to be primitive input features.

Storing structured information We’ve discussed how to store S-R pairs, but human cognition goes way beyond this. Relations � • The kettle is on the table. • The kettle is to the right of the mug.

Storing structured information As before, “concepts” are activation vectors. Kettle Jar

Storing structured information As before, “concepts” are activation vectors. How to represent this? Green(Jar) & Gray(Kett le) Kettle Jar

Storing structured information As before, “concepts” are activation vectors. How to represent this? Green(Jar) & Gray(Kett le) Kettle Jar Green Gray

Storing structured information As before, “concepts” are activation vectors. How to represent this? Now what? Maybe we should just have Green(Jar) & Gray(Kett le) all of these patterns fire at once? � But then how do we know we don’t Kettle Jar Green Gray have Gray(Jar) & Green(Kettle) ? Or, worse, Jar(Kettle) & Green & Gray?

Storing structured information As before, “concepts” are activation vectors. How to represent this? Now what? Maybe we should just have Green(Jar) & Gray(Kett le) all of these patterns fire at once? � But then how do we know we don’t Kettle Jar Green Gray have Gray(Jar) & Green(Kettle) ? Or, worse, Jar(Kettle) & Green & Gray? We need a way to bind predicates to arguments.

Storing structured information As before, “concepts” are activation vectors. How to represent this? We need a way to bind Green(Jar) & Gray(Kett le) predicates to arguments. Green Gray Kettle Jar Binding operator ⊗ ⊕ ⊗ Conjunction operator

Associative memories 9/25/2014 Memorized associations are - PowerPoint PPT Presentation

Associative memories 9/25/2014 Memorized associations are ubiquitous Stimulus Response Bill Memorized associations are ubiquitous Stimulus Response Key properties: Noise tolerance (generalization) Graceful saturation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Associative arrays Associative arrays map a key to a value Keys and values can be different

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

Example: Associative Arrays An environment can be expressed as an associative array, e.g.:

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible block placement schemes

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Modeling the cognitive spatio-temporal operations using associative memories and multiplicative

A Suitcase full of memories: Exploring the meaning of tourism memories for people with dementia

WILLISCROFT FAMILY WILLISCROFT FAMILY WILLISCROFT FAMILY WILLISCROFT FAMILY HISTORY HISTORY

Laws! George Wilson Data61/CSIRO george.wilson@data61.csiro.au 28th August 2018 class Monoid m

Double Groups and Semigroups Science Atlantic 2014 University of New Brunswick, Saint John

Theory of Computation Chapter 4: Boolean Logic Guan-Shieng Huang Apr. 7, 2003 Feb. 19, 2006

Type Theory Proof by reflection Marene Dimmendaal, Pleun Koldewijn Overview - What is proof

Neural Map: Structured Memory for Deep Reinforcement Learning Emilio Parisotto and Ruslan

Fast TracKing at ATLAS Why and How Jamie Saxon University of Chicago What is the FTK?

s Sgn w s i ij j i j Synchronous / Asynchronous updating

The Fast (but not furious) TracKer The Fast (but not furious) TracKer for ATLAS: for ATLAS: a

Associative memories 9/25/2014 Memorized associations are - PowerPoint PPT Presentation

Associative memories 9/25/2014 Memorized associations are ubiquitous Stimulus Response Bill Memorized associations are ubiquitous Stimulus Response Key properties: Noise tolerance (generalization) Graceful saturation

Dense Associative Memories and Deep Learning Dmitry Krotov IBM Research MIT-IBM Watson AI Lab

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Associative arrays Associative arrays map a key to a value Keys and values can be different

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

Example: Associative Arrays An environment can be expressed as an associative array, e.g.:

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible block placement schemes

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Modeling the cognitive spatio-temporal operations using associative memories and multiplicative

A Suitcase full of memories: Exploring the meaning of tourism memories for people with dementia

WILLISCROFT FAMILY WILLISCROFT FAMILY WILLISCROFT FAMILY WILLISCROFT FAMILY HISTORY HISTORY

Laws! George Wilson Data61/CSIRO george.wilson@data61.csiro.au 28th August 2018 class Monoid m

Double Groups and Semigroups Science Atlantic 2014 University of New Brunswick, Saint John

Theory of Computation Chapter 4: Boolean Logic Guan-Shieng Huang Apr. 7, 2003 Feb. 19, 2006

Type Theory Proof by reflection Marene Dimmendaal, Pleun Koldewijn Overview - What is proof

Neural Map: Structured Memory for Deep Reinforcement Learning Emilio Parisotto and Ruslan

Fast TracKing at ATLAS Why and How Jamie Saxon University of Chicago What is the FTK?

s Sgn w s i ij j i j Synchronous / Asynchronous updating

The Fast (but not furious) TracKer The Fast (but not furious) TracKer for ATLAS: for ATLAS: a

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL