E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D - PowerPoint PPT Presentation

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine Intelligence Program Robust Decision Making in Human-System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC)

E XTENDING S EMANTIC AND E PISODIC M EMORY (J OHN L AIRD ) Technical Approach: Objective: Develop algorithms that support effective, 1. Analyze multiple tasks and domains to general, and scalable long-term memory: determine exploitable regularities. 1. Effective: retrieves useful knowledge 2. Develop algorithms that exploit those 2. General: effective across a variety of tasks regularities. 3. Scalable: supports large amounts of 3. Embed within a general cognitive knowledge and long agent lifetimes: architecture. manageable growth in memory and 4. Perform formal analyses and empirical computational requirements evaluations across multiple domains. Budget: DoD Benefit: FY11 FY12 FY13 $99 $195 $205 Develop science and technology to support: Actual/ • Intelligent knowledge-rich autonomous Planned $K $158 $165 $176 systems that have long-term existence, Annual Progress such as autonomous vehicles (ONR, Y Y N Report Submitted? DARPA: ACTUV). • Large-scale, long-term cognitive models Project End Date: June 29, 2013 (AFRL) 2

L IST OF P ROJECT G OALS 1. Episodic memory (experiential & contextualized) – Expand functionality – Improve efficiency of storage (memory) and retrieval (time) 2. Semantic memory (context independent) – Enhance retrieval – Automatic generalization 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Reusing prior experience, noticing familiar situations, … 4. Evaluate on real world domains 5. Extended Goal – Competence-preserving selective retention across multiple memories 3

P ROGRESS T OWARDS G OALS 1. Episodic memory – Expand functionality (recognition) • [AAAI 2012b] – Improve efficiency of storage (memory) and retrieval (time) • Exploits temporal contiguity, structural regularity, high cue structural selectivity, high temporal selectivity, low cue feature co-occurrence • For many different cues and many different tasks, no significant slowdown with experience: runs for days of real time (tens of millions of episodes), faster than real time. • [ICCBR 2009; BRIMS 2011; AAMAS 2012] 2. Semantic memory – Enhance retrieval • Evaluated multiple bias functions: conclude base-level (exponential) activation works best • Developed efficient approximate algorithm that maintains high (>90%) validity – 30-100x fast as prior retrieval algorithms (non base-level activation) (for 3x larger data set) – sub linear slowdown as memory size increases • Exploits small node outdegree, high selectivity, not low co-occurrence of cue features. • [ICCM 2010; AISB 2011; AAAI 2011] • Current research: how to use context – collaboration with Braden Phillips, University of Adelaide on special purpose hardware to support spreading activation in semantic memory – Automatic generalization • Current research: Leverage data maintained for episodic memory 4

P ROGRESS T OWARDS G OALS 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Episodic memory • Seven distinct capabilities: recognition, prospective memory, virtual sensing, action modeling, … • [BRIMS 2011; ACS 2011b; AAAI 2012a] – Semantic memory • Support reconstruction of forgotten working memory • [ACS 2011; ICCM 2012a] 4. Evaluate on real world domains – Episodic memory • Multiple domains including mobile robotics, games, planning problems, linguistics • [BRIMS 2011; AAAI 2012a] – Semantic memory • Word sense disambiguation, mobile robotics • [ICCM 2010; BRIMS 2011; AAAI 2011] 5. Competence preserving retention/forgetting – Working memory • Automatic management of working memory to improve the scalability of episodic memory, utilizing semantic memory • [ACS 2011; ICCM 2012b; Cog Sys 2013]] – Procedural memory • Automatic management or procedural memory using same algorithms as in working-memory management • [ICCM 2012b; Cog Sys 2013] 5

N EW G OALS • Dynamic determination of value-functions for reinforcement learning to support robust decision making. – [ACS 2012; AAAI submitted] 6

O VERVIEW • Goal: – Online learning and decision making in novel domains with very large state spaces. – No a priori knowledge of which features are most important • Approach: – Reinforcement learning with adaptive value function determination using hierarchical tile coding – Only online, incremental methods need apply! • Hypothesis: – Will lead to more robust decision making and learning over small changes to environment and task 7

R EINFORCEMENT L EARNING FOR A CTION S ELECTION • Choose action based on the expected (Q) value stored in a value function – Value function maps from situation-action to expected value. • Value function updated based on reward received and expected future reward (Q Learning: off policy) Value Function (s i , a j ) → q ij Reward a 1 a 3 State: S 1 (s 2 , a 4 ) a 2 a 4 Perception & State: S 2 Internal Structures a 3 a 5 Page 8

V ALUE - FUNCTION FOR L ARGE S TATE S PACES • (s i , a j ) → q ij • s i = (f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , … f n ) • Usually only a subset of features are relevant • If include irrelevant features, slow learning • If don’t include relevant features, suboptimal asymptotic performance • How get the best of both? • First step: hierarchical tile coding (Sutton & Barto, 1998) • Initial results for propositional representations in Puddle World and Mountain Car 9

P UDDLE W ORLD 10

P UDDLE W ORLD 2x2 4x4 8x8 Q-value for (s i , a j ) = ∑ (s it , a j ) [as opposed average] More abstract tilings (2x2) gets more updates, which form the baseline for subtilings Update is distributed across all tiles that contribute to Q-value • Explored variety of distributions: 1/sqrt (updates), even, 1/updates, … 11

Puddle World: Single Level Tilings 0 -10000 -20000 4x4 Cumulative Reward/Episodes -30000 8x8 16x16 -40000 32x32 -50000 64x64 -60000 -70000 -80000 -90000 -100000 0 50 100 150 200 Actions (thousands) 12

Puddle World: Single Level Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actions (thousands) 13

Puddle World: Includes Static Hierarchical Tiling 1-64 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 1-64 static -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actins (thousands) 14

M OUNTAIN C AR 15

Mountain Car: Static Tilings 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 100 200 300 400 500 600 700 800 900 1000 Actions (thousands) 16

Mountain Car: Static Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 17

Mountain Car: Includes Static Hierarchical Tiling 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 1-256 static -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 18

W HY DOES HIERARCHICAL TILING WORK ? • Abstract Q values serve as starting point for learning more specific Q values so they require less learning • Exploits a locality assumption – – There is continuity in the mapping from feature space to Q values at multiple levels of refinement 19

F OR LARGE STATE SPACES , HOW AVOID HUGE MEMORY COSTS ? • Hypothesis: non uniform tiling is sufficient • How do this incrementally and online? • Split a tile if mean Cumulative Absolute Bellman Error (CABE) is half a standard deviation above the mean – CABE is stored proportionally to the credit assignment and the learning rate. – The mean and standard deviations for CABE are tracked 100% incrementally at low computational cost • Incremental and online algorithm 20

1x1 P UDDLE W ORLD 2x2 4x4 8x8 2x2 4x4 8x8 21

A NALYSIS AND E XPECTED R ESULTS • Might lose performance because takes time to “grow” the tiling. • Might gain performance because not wasting updates on useless details. • Expect many fewer “active” Q values 22

Puddle World: Static Hierarchical Tiling Reward and Memory Usage 0 60000 Reward: 1-64 static -200 Memory: 1-64 static 50000 -400 Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 23

Puddle World: Static and Dynamic Hierarchical Tiling Reward and Memory Usage 0 60000 Reward 1-64 static -200 Reward 1-64 dynamic 50000 -400 Memory: 1-64 static Memory: 1-64 dynamic Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 24

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D - PowerPoint PPT Presentation

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical

S emantic Web Architecture Vitaly Vlasov inxaoc@ gmail.com Agenda 1. About S emantic Web,

Xtending our VHDL Xtext formatter with the formatter2 API ir. Titouan Vervack

Ext xtending our Reach: Tele-Health Delivered Grief Support Groups for Rural Hospice Katherine

HIPAA RESEARCH POLICY THE NEW STRUCTURE OF THE EMORY UNIVERSITY HYBRID COVERED ENTITY AND HOW IT

Sporadic torsion David Zureick-Brown Anastassia Etropolski (Emory University) Jackson Morrow

passport.lausd.net P arent A ccess S upport S ystem port al The LAUSD PASSport provides the

I NFORMATION P ROCESSING M ODEL & T HE M ODAL M ODEL OF M EMORY Atkinson & Shiffrin (1968)

E XPRESSIVITY L IMITATIONS OF OWL 1 At least one tree-shaped model for each consistent OWL ontology

Inferring Searcher Intent Eugene Agichtein Emory University Tutorial Website (for expanded and

Counting points, counting fields, and heights on stacks. David Zureick-Brown Emory University

Emory Public Art Collection Over the years, the Emory community has acquired a small but

Medical Education Emory University School of Medicine Bill Eley, MD, MPH Executive Associate

AIDS denialism: the pseudoscience that kills Guido Silvestri, MD Emory University School of

When a Knowledge Base is not Enough Question Answering over Knowledge Bases with External Text

Managing Our Implicit Bias Kimberly Curseen, MD Associate Professor of Internal Medicine Emory

Progress on Mazurs program B David Zureick-Brown Emory University Slides available at

Lecture 4 Capacity of Wireless Channels I-Hsiang Wang ihwang@ntu.edu.tw 3/20,

Selective Private Function Evaluation Johan Wall en Based on Ran Canetti, Yuval Ishai, Ravi

Q4 Financial Results Fiscal 2016 Lee D. Rudow President and CEO Michael J. Tschiderer Chief

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016

Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose

Commission Briefing on Human Capital and Equal Employment Opportunity (EEO) Office of Human

CONTENT Definition of nurse practitioners (NPs) Background for study Methods

Stephanie Nixon University of Toronto Funded by the Canadian Institutes of Health Research,

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D - PowerPoint PPT Presentation

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical

S emantic Web Architecture Vitaly Vlasov inxaoc@ gmail.com Agenda 1. About S emantic Web,

Xtending our VHDL Xtext formatter with the formatter2 API ir. Titouan Vervack

Ext xtending our Reach: Tele-Health Delivered Grief Support Groups for Rural Hospice Katherine

HIPAA RESEARCH POLICY THE NEW STRUCTURE OF THE EMORY UNIVERSITY HYBRID COVERED ENTITY AND HOW IT

Sporadic torsion David Zureick-Brown Anastassia Etropolski (Emory University) Jackson Morrow

passport.lausd.net P arent A ccess S upport S ystem port al The LAUSD PASSport provides the

I NFORMATION P ROCESSING M ODEL &amp; T HE M ODAL M ODEL OF M EMORY Atkinson &amp; Shiffrin (1968)

E XPRESSIVITY L IMITATIONS OF OWL 1 At least one tree-shaped model for each consistent OWL ontology

Inferring Searcher Intent Eugene Agichtein Emory University Tutorial Website (for expanded and

Counting points, counting fields, and heights on stacks. David Zureick-Brown Emory University

Emory Public Art Collection Over the years, the Emory community has acquired a small but

Medical Education Emory University School of Medicine Bill Eley, MD, MPH Executive Associate

AIDS denialism: the pseudoscience that kills Guido Silvestri, MD Emory University School of

When a Knowledge Base is not Enough Question Answering over Knowledge Bases with External Text

Managing Our Implicit Bias Kimberly Curseen, MD Associate Professor of Internal Medicine Emory

Progress on Mazurs program B David Zureick-Brown Emory University Slides available at

Lecture 4 Capacity of Wireless Channels I-Hsiang Wang ihwang@ntu.edu.tw 3/20,

Selective Private Function Evaluation Johan Wall en Based on Ran Canetti, Yuval Ishai, Ravi

Q4 Financial Results Fiscal 2016 Lee D. Rudow President and CEO Michael J. Tschiderer Chief

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016

Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose

Commission Briefing on Human Capital and Equal Employment Opportunity (EEO) Office of Human

CONTENT Definition of nurse practitioners (NPs) Background for study Methods

Stephanie Nixon University of Toronto Funded by the Canadian Institutes of Health Research,

I NFORMATION P ROCESSING M ODEL & T HE M ODAL M ODEL OF M EMORY Atkinson & Shiffrin (1968)