Neural Turing Machines Can neural nets learn programs? - PowerPoint PPT Presentation

Neural ¡Turing ¡Machines ¡ Can ¡neural ¡nets ¡learn ¡programs? ¡ ¡ ¡ ¡ Alex ¡Graves ¡ ¡ ¡ Greg ¡Wayne ¡ ¡ ¡ Ivo ¡Danihelka ¡

Contents ¡ 1. IntroducBon ¡ 2. FoundaBonal ¡Research ¡ 3. Neural ¡Turing ¡Machines ¡ 4. Experiments ¡ 5. Conclusions ¡

IntroducBon ¡ • First ¡applicaBon ¡of ¡Machine ¡Learning ¡to ¡logical ¡ flow ¡and ¡external ¡memory ¡

IntroducBon ¡ • First ¡applicaBon ¡of ¡Machine ¡Learning ¡to ¡logical ¡ flow ¡and ¡external ¡memory ¡ • Extend ¡the ¡capabiliBes ¡of ¡neural ¡networks ¡by ¡ coupling ¡them ¡to ¡external ¡memory ¡

IntroducBon ¡ • First ¡applicaBon ¡of ¡Machine ¡Learning ¡to ¡logical ¡ flow ¡and ¡external ¡memory ¡ • Extend ¡the ¡capabiliBes ¡of ¡neural ¡networks ¡by ¡ coupling ¡them ¡to ¡external ¡memory ¡ • Analogous ¡to ¡TM ¡coupling ¡a ¡finite ¡state ¡ machine ¡to ¡infinite ¡tape ¡

IntroducBon ¡ • First ¡applicaBon ¡of ¡Machine ¡Learning ¡to ¡logical ¡ flow ¡and ¡external ¡memory ¡ • Extend ¡the ¡capabiliBes ¡of ¡neural ¡networks ¡by ¡ coupling ¡them ¡to ¡external ¡memory ¡ • Analogous ¡to ¡TM ¡coupling ¡a ¡finite ¡state ¡ machine ¡to ¡infinite ¡tape ¡ • RNN’s ¡have ¡been ¡shown ¡to ¡be ¡Turing-‑ Complete, ¡Siegelmann ¡et ¡al ¡‘95 ¡

IntroducBon ¡ • First ¡applicaBon ¡of ¡Machine ¡Learning ¡to ¡logical ¡ flow ¡and ¡external ¡memory ¡ • Extend ¡the ¡capabiliBes ¡of ¡neural ¡networks ¡by ¡ coupling ¡them ¡to ¡external ¡memory ¡ • Analogous ¡to ¡TM ¡coupling ¡a ¡finite ¡state ¡ machine ¡to ¡infinite ¡tape ¡ • RNN’s ¡have ¡been ¡shown ¡to ¡be ¡Turing-‑ Complete, ¡Siegelmann ¡et ¡al ¡‘95 ¡ • Unlike ¡TM, ¡NTM ¡is ¡completely ¡differenBable ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ – Concept ¡of ¡“working ¡memory”: ¡short-‑term ¡ memory ¡storage ¡and ¡rule ¡based ¡manipulaBon ¡ – Also ¡known ¡as ¡“rapidly ¡created ¡variables” ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ – Concept ¡of ¡“working ¡memory”: ¡short-‑term ¡ memory ¡storage ¡and ¡rule ¡based ¡manipulaBon ¡ – Also ¡known ¡as ¡“rapidly ¡created ¡variables” ¡ – ObservaBonal ¡neuroscience ¡results ¡in ¡the ¡pre-‑ frontal ¡cortex ¡and ¡basal ¡ganglia ¡of ¡monkeys ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡and ¡LinguisBcs ¡ – AI ¡and ¡CogniBve ¡Science ¡were ¡contemporaneous ¡ in ¡1950’s-‑1970’s ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡and ¡LinguisBcs ¡ – AI ¡and ¡CogniBve ¡Science ¡were ¡contemporaneous ¡ in ¡1950’s-‑1970’s ¡ – Two ¡fields ¡parted ¡ways ¡when ¡neural ¡nets ¡received ¡ criBcism, ¡Fodor ¡et ¡al. ¡‘88 ¡ • Incapable ¡of ¡“variable-‑binding” ¡ ¡ – eg ¡“Mary ¡spoke ¡to ¡John” ¡ • Incapable ¡of ¡handling ¡variable ¡sized ¡input ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡and ¡LinguisBcs ¡ – AI ¡and ¡CogniBve ¡Science ¡were ¡contemporaneous ¡ in ¡1950’s-‑1970’s ¡ – Two ¡fields ¡parted ¡ways ¡when ¡neural ¡nets ¡received ¡ criBcism, ¡Fodor ¡et ¡al. ¡’88 ¡ – MoBvated ¡Recurrent ¡Networks ¡research ¡to ¡handle ¡ variable ¡binding ¡and ¡variable ¡length ¡input ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡and ¡LinguisBcs ¡ – AI ¡and ¡CogniBve ¡Science ¡were ¡contemporaneous ¡ in ¡1950’s-‑1970’s ¡ – Two ¡fields ¡parted ¡ways ¡when ¡neural ¡nets ¡received ¡ criBcism, ¡Fodor ¡et ¡al. ¡’88 ¡ – MoBvated ¡Recurrent ¡Networks ¡research ¡to ¡handle ¡ variable ¡binding ¡and ¡variable ¡length ¡input ¡ – Recursive ¡processing ¡hot ¡debate ¡topic ¡in ¡role ¡ inhuman ¡evoluBon ¡(Pinker ¡vs ¡Chomsky) ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡ad ¡LinguisBcs ¡ • Recurrent ¡Neural ¡networks ¡ ¡ – Broad ¡class ¡of ¡machines ¡with ¡distributed ¡and ¡ dynamic ¡state ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡ad ¡LinguisBcs ¡ • Recurrent ¡Neural ¡networks ¡ ¡ – Broad ¡class ¡of ¡machines ¡with ¡distributed ¡and ¡ dynamic ¡state ¡ – Long ¡Short ¡Term ¡Memory ¡RNN’s ¡designed ¡to ¡ handle ¡vanishing ¡and ¡exploding ¡gradient ¡

FoundaBonal ¡Research ¡ • Neuroscience ¡and ¡Psychology ¡ • CogniBve ¡Science ¡ad ¡LinguisBcs ¡ • Recurrent ¡Neural ¡networks ¡ ¡ – Broad ¡class ¡of ¡machines ¡with ¡distributed ¡and ¡ dynamic ¡state ¡ – Long ¡Short ¡Term ¡Memory ¡RNN’s ¡designed ¡to ¡ handle ¡vanishing ¡and ¡exploding ¡gradient ¡ – NaBvely ¡handle ¡variable ¡length ¡structures ¡

Neural ¡Turing ¡Machines ¡

Neural ¡Turing ¡Machines ¡ 1. Reading ¡ – M t ¡ is ¡NxM ¡matrix ¡of ¡memory ¡at ¡Bme ¡t ¡

Neural ¡Turing ¡Machines ¡ 1. Reading ¡ – M t ¡ is ¡NxM ¡matrix ¡of ¡memory ¡at ¡Bme ¡t ¡ – w t ¡

Neural ¡Turing ¡Machines ¡ 1. Reading ¡ 2. WriBng ¡involves ¡both ¡erasing ¡and ¡adding ¡

Neural ¡Turing ¡Machines ¡ 1. Reading ¡ 2. WriBng ¡involves ¡both ¡erasing ¡and ¡adding ¡ 3. Addressing ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡ – 1. ¡Focusing ¡by ¡Content ¡ • Each ¡head ¡produces ¡key ¡vector ¡ k t ¡of ¡length ¡M ¡ • Generated ¡a ¡content ¡based ¡weight ¡ w t c ¡ based ¡on ¡ similarity ¡measure, ¡using ¡‘key ¡strength’ ¡β t ¡ ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡ – 2. ¡InterpolaBon ¡ • Each ¡head ¡emits ¡a ¡scalar ¡interpolaBon ¡gate ¡g t ¡ ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡ – 3. ¡ConvoluBonal ¡shif ¡ • Each ¡head ¡emits ¡a ¡distribuBon ¡over ¡allowable ¡integer ¡ shifs ¡ s t ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡ – 4. ¡Sharpening ¡ • Each ¡head ¡emits ¡a ¡scalar ¡sharpening ¡parameter ¡γ t ¡ ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡(puhng ¡it ¡all ¡together) ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡(puhng ¡it ¡all ¡together) ¡ – This ¡can ¡operate ¡in ¡three ¡complementary ¡modes ¡ • A ¡weighBng ¡can ¡be ¡chosen ¡by ¡the ¡content ¡system ¡ without ¡any ¡modificaBon ¡by ¡the ¡locaBon ¡system ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡(puhng ¡it ¡all ¡together) ¡ – This ¡can ¡operate ¡in ¡three ¡complementary ¡modes ¡ • A ¡weighBng ¡can ¡be ¡chosen ¡by ¡the ¡content ¡system ¡ without ¡any ¡modificaBon ¡by ¡the ¡locaBon ¡system ¡ • A ¡weighBng ¡produced ¡by ¡the ¡content ¡addressing ¡ system ¡can ¡be ¡chosen ¡and ¡then ¡shifed ¡

Neural ¡Turing ¡Machines ¡ • 3. ¡Addressing ¡(puhng ¡it ¡all ¡together) ¡ – This ¡can ¡operate ¡in ¡three ¡complementary ¡modes ¡ • A ¡weighBng ¡can ¡be ¡chosen ¡by ¡the ¡content ¡system ¡ without ¡any ¡modificaBon ¡by ¡the ¡locaBon ¡system ¡ • A ¡weighBng ¡produced ¡by ¡the ¡content ¡addressing ¡ system ¡can ¡be ¡chosen ¡and ¡then ¡shifed ¡ • A ¡weighBng ¡from ¡the ¡previous ¡Bme ¡step ¡can ¡be ¡rotated ¡ without ¡any ¡input ¡from ¡the ¡content-‑based ¡addressing ¡ system ¡

Neural Turing Machines Can neural nets learn programs? - PowerPoint PPT Presentation

Neural Turing Machines Can neural nets learn programs? Alex Graves Greg Wayne Ivo Danihelka Contents 1. IntroducBon 2. FoundaBonal

1 Turing Machines 1.1 Introduction Turing machines provide an answer to the question, What is a

Science (Bridging Course) Turing Machines Gian Diego Tipaldi Topics Covered Turing machines

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Foundations of Computer Science Lecture 26 Turing Machines The Turing Machine: DFA with Random

Foundations of Computer Science Lecture 26 Turing Machines The Turing Machine: DFA with Random

Theory Chapter 3: The Church-Turing Thesis 1 Chapter 3.1 Turing Machines 2 Turing Machines:

Turing Machines (TM) Deterministic Turing Machine (DTM) Nondeterministic Turing Machine

Advanced Topics in Theoretical Computer Science Part 1: Turing Machines and Turing Computability

Outline Super-Turing I. The Limits of Turing Computation or A. Models & Frames of

Turing Machines A more powerful computation model than a PDA ? [Section 9.1] Turing Machines

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Turing Machines Our most powerful model of a computer is the Turing Machine. This is an FA with

TURING MACHINE VARIATIONS ENCODING TURING MACHINES UNIVERSAL TURING MACHINE Your Questions?

Turing Machine properties There are many ways to skin a cat Turing Machines And many ways

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Turing

CISC 4090: Theory of Computation Chapter 3 The Church-Turing Thesis Section 3.1: Turing Machines

MITOCHONDRIAL DISEASES DUE TO NUCLEAR GENE DEFECTS Garry Brown Genetics Unit, Department of

MARI JUANA AND K I DS: AN E VI DE NCE -BASE D E XAMI NAT I ON T e ri Mo se r Wo o

Prevention Cure The Yerkes & Dodson Law Stress Absence 57p 2018 HSE Stats show that 57% of

Senior Brain Health- Cognition or Perspiration? Patrick Foo Assoc. Prof., Psychology Former

Rustin Teamwork Strategy LENA RUSTIN, LCST Oxford Dysfluency Conference

3/9/20 ATTENTION-DEFICIT/HYPERACTIVITY DISORDER (ADHD) Michelle Deen, Registered Psychologist

Geriatric Patient by Alex Tieche PT, DPT Inpatient Senior Therapist Normal Gait Three

Acute M e Myel elogen enou ous L Leukem emia: New ew T Ther erapies es a after er 40 Y

Neural Turing Machines Can neural nets learn programs? - PowerPoint PPT Presentation

Neural Turing Machines Can neural nets learn programs? Alex Graves Greg Wayne Ivo Danihelka Contents 1. IntroducBon 2. FoundaBonal

1 Turing Machines 1.1 Introduction Turing machines provide an answer to the question, What is a

Science (Bridging Course) Turing Machines Gian Diego Tipaldi Topics Covered Turing machines

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Foundations of Computer Science Lecture 26 Turing Machines The Turing Machine: DFA with Random

Foundations of Computer Science Lecture 26 Turing Machines The Turing Machine: DFA with Random

Theory Chapter 3: The Church-Turing Thesis 1 Chapter 3.1 Turing Machines 2 Turing Machines:

Turing Machines (TM) Deterministic Turing Machine (DTM) Nondeterministic Turing Machine

Advanced Topics in Theoretical Computer Science Part 1: Turing Machines and Turing Computability

Outline Super-Turing I. The Limits of Turing Computation or A. Models &amp; Frames of

Turing Machines A more powerful computation model than a PDA ? [Section 9.1] Turing Machines

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Turing Machines Our most powerful model of a computer is the Turing Machine. This is an FA with

TURING MACHINE VARIATIONS ENCODING TURING MACHINES UNIVERSAL TURING MACHINE Your Questions?

Turing Machine properties There are many ways to skin a cat Turing Machines And many ways

COMP3630/6360: Theory of Computation Semester 1, 2020 The Australian National University Turing

CISC 4090: Theory of Computation Chapter 3 The Church-Turing Thesis Section 3.1: Turing Machines

MITOCHONDRIAL DISEASES DUE TO NUCLEAR GENE DEFECTS Garry Brown Genetics Unit, Department of

MARI JUANA AND K I DS: AN E VI DE NCE -BASE D E XAMI NAT I ON T e ri Mo se r Wo o

Prevention Cure The Yerkes &amp; Dodson Law Stress Absence 57p 2018 HSE Stats show that 57% of

Senior Brain Health- Cognition or Perspiration? Patrick Foo Assoc. Prof., Psychology Former

Rustin Teamwork Strategy LENA RUSTIN, LCST Oxford Dysfluency Conference

3/9/20 ATTENTION-DEFICIT/HYPERACTIVITY DISORDER (ADHD) Michelle Deen, Registered Psychologist

Geriatric Patient by Alex Tieche PT, DPT Inpatient Senior Therapist Normal Gait Three

Acute M e Myel elogen enou ous L Leukem emia: New ew T Ther erapies es a after er 40 Y

Outline Super-Turing I. The Limits of Turing Computation or A. Models & Frames of

Prevention Cure The Yerkes & Dodson Law Stress Absence 57p 2018 HSE Stats show that 57% of