Catastrophic Interference and Focused Online Learning: A Minimalist - PowerPoint PPT Presentation

Catastrophic Interference and Focused Online Learning: A Minimalist Example Learning: A More Interesting Example Approximate the running average of: x 1 , x 2 , …., x n Δμ = ε ( x i – μ ) Example: 2, 4, 6 ε = 0.1 60 epochs Catastrophic Interference I: Catastrophic Interference and Focused Focused Learning Learning: A More Interesting Example We presented 60 times {2, 4, 6} (interleaved learning). What would happen if instead we presented: 60 times {2}, then 60 times {4}, then 60 times {6} (focused learning)? Catastrophic interference: Learning about new stuff in a focused manner makes you forget the old stuff.

Rumelhart and Todd (1993) The Benefits of Interleaved Learning Rumelhart and Todd (1993) Catastrophic Interference II: Role of Learning Rate Back to our toy example: Δμ = ε ( x i – μ ) Assuming interleaved learning, should ε be large or small? Answer: Smaller ε will lead to more accurate (but slower) learning.

Incompatible Goals Distributed Versus Sparse Representations Quickly learn specific information Learn general information Distributed representations: Each pattern is represented across (e.g., what dogs are) (e.g., your neighbor’s dog bit you) most units Requires Slow learning (generalize over many Fast learning (often one example) examples) Sparse representations: Each pattern is represented in a small percentage of units Interleaved learning (don’t want to forget Focused learning all about cats when you learn about dogs) Distributed representations (allow Sparse representations (avoid interference) generalization) Are we stuck? Catastrophic Interference III: Complementary Learning Systems Distributed Versus Sparse Representations Sparse representations : Learning about one pattern doesn’t If we can’t have it all in the same system, let’s have two affect what you know about other patterns systems! System 1 : Remember specifics, i.e., learn quickly about new information and minimize interference High learning rate, sparse representations This is good: It prevents interference. This is also bad: Doesn’t allow generalization. System 2 : Learn slowly the overall structure of the environment Distributed representations : Learning about one pattern Small learning rate, distributed representations affects what you know about other patterns Requires interleaved learning System 1 trains System 2 with many interleaved presentations of the individual examples that System 1 learns quickly! This is good: It allows generalization. This is also bad: Interference.

Complementary Learning Systems Long-Term Potentiation in the in the Brain Hippocampus It seems that the brain has two such systems! Figure from http://www.arts.uwaterloo.ca/~bfleming/psych261/lec15no7.htm Fast learning of specific Slow learning of general information: information: High-frequency stimulation of the Schaffer collaterals leads to higher EPSPs Hippocampal system Neocortex in CA1 cells The Hippocampus as an Autoassociator An Autoassociative Network in Region CA3 of the Hippocampus? Features of autoassociative networks Pattern completion : Give the network a bit of a stored pattern (i.e., a cue) and it can reproduce (i.e., recall) the rest Learning is fast if ε is high (approx. 1) Interference is minimized if patterns An autoassociative network don’t overlap Some people have suggested that the extensive recurrent collaterals in CA3 form an autoassociative memory (e.g., Rolls, 1998) Hebbian learning: ∆ w ij = ε a i a j The other areas may be involved in compressing/decompressing information

Complementary Learning Systems Anterograde and Retrograde Amnesia: in the Brain Animals Hippocampal system : Remember specifics, i.e., learn new After lesions to the hippocampus, animals show information quickly and minimize interference anterograde and temporally graded retrograde amnesia for memories that involve cue configurations High learning rate, sparse representations Examples: Neocortex : Learn slowly the overall structure of the environment  Negative patterning Small learning rate, distributed representations tone  reward; light  reward; tone & light  no reward Requires interleaved learning  Spatial learning Hippocampus trains neocortex over time with many interleaved presentations of the specific memories the hippocampus has stored Place Cells in the Rat Hippocampus Anterograde and Retrograde Amnesia: Humans After lesions to the hippocampal system, humans: – Cannot form new explicit memories (e.g., episodic memories): anterograde amnesia – Lose most such memories that were laid out (relatively) shortly before the lesion , but can remember older ones: temporally graded retrograde amnesia

Fast learning: Long-term potentiation (LTP) Role of the Hippocampus Overall, the hippocampus seems to be important for fast learning of arbitrary conjunctive representations – Episodic memories – Spatial learning – Contextual fear conditioning Properties of the Hippocampus Preserved Learning and Memory After For fast learning of arbitrary conjunctive representations, one needs: Hippocampal Lesions Widespread connectivity Sparse representations to many brain areas (to minimize interference) Skills (e.g., tracing a figure viewed in a mirror, reading mirror-reversed print, learning how to ride a bicycle) Suggests a spared system for slow, gradual learning: Neocortex Repetition priming (e.g., reading aloud a pronounceable non-word) Some forms of classical conditioning , such as simple associations between CS and US (e.g., tone → shock) Likely dependent on other systems (e.g. amygdala): Multiple memory systems (e.g., Rolls, 2000)

Where Are We? Training the Neocortex Hippocampus seems good for fast learning of arbitrary Lots of evidence that during ‘off-line’ periods (e.g., slow-wave associations sleep or even quiet wakefulness) there is reactivation of memories in the hippocampus and neocortex Neocortex seems good to slowly learn the general structure of Examples: a domain (e.g., skill learning) • Neurons that are active during a task tend to be more active in subsequent sleep • Neurons whose activity is correlated during a task tend to have The only thing missing for our story is to show that the more correlated activity in subsequent sleep hippocampus trains the neocortex with interleaved presentations of the memories it stores These could be signs of the hippocampus training the cortex Consolidation A Specific Consolidation Experiment Memories seem to be Monkeys were trained on a set of 100 binary discriminations stored first in the (one object has food, the other one doesn’t) hippocampal system and are then consolidated : slowly transferred to the neocortex That’s why older memories are not lost with hippocampal lesions Why is this useful? Zola-Morgan and Squire (1990) To allow slow, interleaved learning in the neocortex!

A More Abstract Model Model of the Zola-Morgan and Squire (1990) Experiment Balance between hippocampal decay, probability of reinstatement Target = 1 if first object has reward Training the network: 1 day = 1 epoch of memories, and neocortical Target = 0 otherwise On every epoch: learning rate is important • Background trials • Learning trials from the experiment • Reinstated trials from hippocampus For example, if hippocampal memories decay too fast given ‘Model’ of hippocampus: the neocortical learning rate, Each experience forms a hippocampal there won’t be enough time for memory trace with strength 1. consolidation Every day, that strength decays by D. The probability of reinstatement is Object 1 Object 2 There is a neat mathematical proportional to that strength (multiplied by a ‘reinstatement’ parameter). Neocortex (hippocampus not formulation of all this (see paper) modeled explicitly) Consolidation Simulation Results Memories seem to be stored first in the hippocampal system and are then consolidated : slowly transferred to the neocortex That’s why older memories are not lost with hippocampal lesions Why is this useful? To allow slow, interleaved learning in the neocortex!

Summary Conclusions Hippocampus: Quickly learn Neocortex: Learn general specific information (e.g., your information We started with computational considerations: neighbor’s dog bit you) (e.g., what dogs are) – Fast learning of specific information cannot be done in the same Uses system as gradual learning of general information (catastrophic interference) Slow learning (generalize over many Fast learning (often one example) examples) – It seems that one needs two systems Interleaved learning (don’t want to We found that the brain probably has two such systems (!) forget all about cats when you learn about dogs) Several people had proposed that there might be consolidation of Sparse representations (avoid Distributed representations (allow memory from the hippocampus to the neocortex. But interference) generalization) McClelland, McNaughton, and O’Reilly explained why the brain is organized this way: it uses complementary systems that work together to solve incompatible computational goals Hippocampus trains neocortex to guarantee interleaved learning

Catastrophic Interference and Focused Online Learning: A Minimalist - PowerPoint PPT Presentation

Catastrophic Interference and Focused Online Learning: A Minimalist Example Learning: A More Interesting Example Approximate the running average of: x 1 , x 2 , ., x n = ( x i ) Example: 2, 4, 6 = 0.1 60 epochs

RCPGP Overview Catastrophic Plan Review San Francisco Bay Area Regional Catastrophic

Northern California Catastrophic Flood Response Plan (NCCFRP) Overview Why a Catastrophic Flood

Minimum energy catastrophic disruptions D.J. Scheeres The University of Michigan Catastrophic

The wave model of light explains diffraction and interference. 31 Diffraction and Interference

A Broadcast Approach Maha Zohdy, Ali Tajer, Shlomo Shamai RPI RPI Technion ISIT'20 1

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Orbital Slots and Spectrum Use in an Era of Interference Interference and Telecommunication

Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild Presenters: Nikhil Kannan, Ying

LAURA GRAEN Tobacco Industry Interference: The Facts and More Tobacco Industry Interference

How I learned to stop worrying and love interference Pierre de Vries ISART 2010 ISART 2010 1

Technician License Course Chapter 8 Lesson Module 17 Interference, Remote & Automatic

Treating Interference as Noise is Optimal for Covert Communication over Interference Channels

Renan Kazazoglu, Alper Ucar, Ediz Cetin, Izzet Kale Outline Introduction Multiple Access

Passive Intermodulation (PIM), an interference challenge for the radio Passive Intermodulation

Centennial 1917 - 2017 1 Electromagnetic Interference Interference to garage door openers

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

EA Anamnesis: Towards an approach for Enterprise Architecture rationalization Georgios

Neuroanatomy primer Take home goals If you want to know this properly, go to:

Arlo Clark-Foos, Ph.D. Medial Temporal Lobes Henry Molaison (HM) (1926-2008) Consequences of

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG, Legon Contact Information:

The Business Data Scientists Their View of Our World The Big Data Junk Yard Their View on our

Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia

1 Panelists Lynn Lobash Associate Director of Readers Services New York Public Library Renee

Catastrophic Interference and Focused Online Learning: A Minimalist - PowerPoint PPT Presentation

Catastrophic Interference and Focused Online Learning: A Minimalist Example Learning: A More Interesting Example Approximate the running average of: x 1 , x 2 , ., x n = ( x i ) Example: 2, 4, 6 = 0.1 60 epochs

RCPGP Overview Catastrophic Plan Review San Francisco Bay Area Regional Catastrophic

Northern California Catastrophic Flood Response Plan (NCCFRP) Overview Why a Catastrophic Flood

Minimum energy catastrophic disruptions D.J. Scheeres The University of Michigan Catastrophic

The wave model of light explains diffraction and interference. 31 Diffraction and Interference

A Broadcast Approach Maha Zohdy, Ali Tajer, Shlomo Shamai RPI RPI Technion ISIT'20 1

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Orbital Slots and Spectrum Use in an Era of Interference Interference and Telecommunication

Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild Presenters: Nikhil Kannan, Ying

LAURA GRAEN Tobacco Industry Interference: The Facts and More Tobacco Industry Interference

How I learned to stop worrying and love interference Pierre de Vries ISART 2010 ISART 2010 1

Technician License Course Chapter 8 Lesson Module 17 Interference, Remote &amp; Automatic

Treating Interference as Noise is Optimal for Covert Communication over Interference Channels

Renan Kazazoglu, Alper Ucar, Ediz Cetin, Izzet Kale Outline Introduction Multiple Access

Passive Intermodulation (PIM), an interference challenge for the radio Passive Intermodulation

Centennial 1917 - 2017 1 Electromagnetic Interference Interference to garage door openers

Interference Alignment via Message-Passing Message-Passing M. Guillaud Motivation Maxime

EA Anamnesis: Towards an approach for Enterprise Architecture rationalization Georgios

Neuroanatomy primer Take home goals If you want to know this properly, go to:

Arlo Clark-Foos, Ph.D. Medial Temporal Lobes Henry Molaison (HM) (1926-2008) Consequences of

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG, Legon Contact Information:

The Business Data Scientists Their View of Our World The Big Data Junk Yard Their View on our

Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia

1 Panelists Lynn Lobash Associate Director of Readers Services New York Public Library Renee

Technician License Course Chapter 8 Lesson Module 17 Interference, Remote & Automatic