e xtending s emantic and e pisodic m emory to s upport r
play

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D - PowerPoint PPT Presentation

E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical


  1. E XTENDING S EMANTIC AND E PISODIC M EMORY TO S UPPORT R OBUST D ECISION M AKING (FA2386-10-1-4127) PI: John E. Laird (University of Michigan) Graduate Students: Nate Derbinsky, Mitchell Bloch, Mazin Assanie AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine Intelligence Program Robust Decision Making in Human-System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC)

  2. E XTENDING S EMANTIC AND E PISODIC M EMORY (J OHN L AIRD ) Technical Approach: Objective: Develop algorithms that support effective, 1. Analyze multiple tasks and domains to general, and scalable long-term memory: determine exploitable regularities. 1. Effective: retrieves useful knowledge 2. Develop algorithms that exploit those 2. General: effective across a variety of tasks regularities. 3. Scalable: supports large amounts of 3. Embed within a general cognitive knowledge and long agent lifetimes: architecture. manageable growth in memory and 4. Perform formal analyses and empirical computational requirements evaluations across multiple domains. Budget: DoD Benefit: FY11 FY12 FY13 $99 $195 $205 Develop science and technology to support: Actual/ • Intelligent knowledge-rich autonomous Planned $K $158 $165 $176 systems that have long-term existence, Annual Progress such as autonomous vehicles (ONR, Y Y N Report Submitted? DARPA: ACTUV). • Large-scale, long-term cognitive models Project End Date: June 29, 2013 (AFRL) 2

  3. L IST OF P ROJECT G OALS 1. Episodic memory (experiential & contextualized) – Expand functionality – Improve efficiency of storage (memory) and retrieval (time) 2. Semantic memory (context independent) – Enhance retrieval – Automatic generalization 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Reusing prior experience, noticing familiar situations, … 4. Evaluate on real world domains 5. Extended Goal – Competence-preserving selective retention across multiple memories 3

  4. P ROGRESS T OWARDS G OALS 1. Episodic memory – Expand functionality (recognition) • [AAAI 2012b] – Improve efficiency of storage (memory) and retrieval (time) • Exploits temporal contiguity, structural regularity, high cue structural selectivity, high temporal selectivity, low cue feature co-occurrence • For many different cues and many different tasks, no significant slowdown with experience: runs for days of real time (tens of millions of episodes), faster than real time. • [ICCBR 2009; BRIMS 2011; AAMAS 2012] 2. Semantic memory – Enhance retrieval • Evaluated multiple bias functions: conclude base-level (exponential) activation works best • Developed efficient approximate algorithm that maintains high (>90%) validity – 30-100x fast as prior retrieval algorithms (non base-level activation) (for 3x larger data set) – sub linear slowdown as memory size increases • Exploits small node outdegree, high selectivity, not low co-occurrence of cue features. • [ICCM 2010; AISB 2011; AAAI 2011] • Current research: how to use context – collaboration with Braden Phillips, University of Adelaide on special purpose hardware to support spreading activation in semantic memory – Automatic generalization • Current research: Leverage data maintained for episodic memory 4

  5. P ROGRESS T OWARDS G OALS 3. Cognitive capabilities that leverage episodic and semantic memory functionality – Episodic memory • Seven distinct capabilities: recognition, prospective memory, virtual sensing, action modeling, … • [BRIMS 2011; ACS 2011b; AAAI 2012a] – Semantic memory • Support reconstruction of forgotten working memory • [ACS 2011; ICCM 2012a] 4. Evaluate on real world domains – Episodic memory • Multiple domains including mobile robotics, games, planning problems, linguistics • [BRIMS 2011; AAAI 2012a] – Semantic memory • Word sense disambiguation, mobile robotics • [ICCM 2010; BRIMS 2011; AAAI 2011] 5. Competence preserving retention/forgetting – Working memory • Automatic management of working memory to improve the scalability of episodic memory, utilizing semantic memory • [ACS 2011; ICCM 2012b; Cog Sys 2013]] – Procedural memory • Automatic management or procedural memory using same algorithms as in working-memory management • [ICCM 2012b; Cog Sys 2013] 5

  6. N EW G OALS • Dynamic determination of value-functions for reinforcement learning to support robust decision making. – [ACS 2012; AAAI submitted] 6

  7. O VERVIEW • Goal: – Online learning and decision making in novel domains with very large state spaces. – No a priori knowledge of which features are most important • Approach: – Reinforcement learning with adaptive value function determination using hierarchical tile coding – Only online, incremental methods need apply! • Hypothesis: – Will lead to more robust decision making and learning over small changes to environment and task 7

  8. R EINFORCEMENT L EARNING FOR A CTION S ELECTION • Choose action based on the expected (Q) value stored in a value function – Value function maps from situation-action to expected value. • Value function updated based on reward received and expected future reward (Q Learning: off policy) Value Function (s i , a j ) → q ij Reward a 1 a 3 State: S 1 (s 2 , a 4 ) a 2 a 4 Perception & State: S 2 Internal Structures a 3 a 5 Page 8

  9. V ALUE - FUNCTION FOR L ARGE S TATE S PACES • (s i , a j ) → q ij • s i = (f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , … f n ) • Usually only a subset of features are relevant • If include irrelevant features, slow learning • If don’t include relevant features, suboptimal asymptotic performance • How get the best of both? • First step: hierarchical tile coding (Sutton & Barto, 1998) • Initial results for propositional representations in Puddle World and Mountain Car 9

  10. P UDDLE W ORLD 10

  11. P UDDLE W ORLD 2x2 4x4 8x8 Q-value for (s i , a j ) = ∑ (s it , a j ) [as opposed average] More abstract tilings (2x2) gets more updates, which form the baseline for subtilings Update is distributed across all tiles that contribute to Q-value • Explored variety of distributions: 1/sqrt (updates), even, 1/updates, … 11

  12. Puddle World: Single Level Tilings 0 -10000 -20000 4x4 Cumulative Reward/Episodes -30000 8x8 16x16 -40000 32x32 -50000 64x64 -60000 -70000 -80000 -90000 -100000 0 50 100 150 200 Actions (thousands) 12

  13. Puddle World: Single Level Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actions (thousands) 13

  14. Puddle World: Includes Static Hierarchical Tiling 1-64 0 -1000 -2000 Cumulative Reward/Episodes 4x4 8x8 -3000 16x16 1-64 static -4000 -5000 -6000 -7000 0 10 20 30 40 50 Actins (thousands) 14

  15. M OUNTAIN C AR 15

  16. Mountain Car: Static Tilings 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 100 200 300 400 500 600 700 800 900 1000 Actions (thousands) 16

  17. Mountain Car: Static Tilings Expanded 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 17

  18. Mountain Car: Includes Static Hierarchical Tiling 0 -1000 -2000 Cumulative Reward/Episodes 16x16 -3000 32x32 64x64 -4000 128x128 256x256 -5000 1-256 static -6000 -7000 0 10 20 30 40 50 60 70 80 90 100 Actions (thousands) 18

  19. W HY DOES HIERARCHICAL TILING WORK ? • Abstract Q values serve as starting point for learning more specific Q values so they require less learning • Exploits a locality assumption – – There is continuity in the mapping from feature space to Q values at multiple levels of refinement 19

  20. F OR LARGE STATE SPACES , HOW AVOID HUGE MEMORY COSTS ? • Hypothesis: non uniform tiling is sufficient • How do this incrementally and online? • Split a tile if mean Cumulative Absolute Bellman Error (CABE) is half a standard deviation above the mean – CABE is stored proportionally to the credit assignment and the learning rate. – The mean and standard deviations for CABE are tracked 100% incrementally at low computational cost • Incremental and online algorithm 20

  21. 1x1 P UDDLE W ORLD 2x2 4x4 8x8 2x2 4x4 8x8 21

  22. A NALYSIS AND E XPECTED R ESULTS • Might lose performance because takes time to “grow” the tiling. • Might gain performance because not wasting updates on useless details. • Expect many fewer “active” Q values 22

  23. Puddle World: Static Hierarchical Tiling Reward and Memory Usage 0 60000 Reward: 1-64 static -200 Memory: 1-64 static 50000 -400 Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 23

  24. Puddle World: Static and Dynamic Hierarchical Tiling Reward and Memory Usage 0 60000 Reward 1-64 static -200 Reward 1-64 dynamic 50000 -400 Memory: 1-64 static Memory: 1-64 dynamic Cumulative Reward/Episodes -600 40000 -800 Q values -1000 30000 -1200 20000 -1400 -1600 10000 -1800 -2000 0 0 2 4 6 8 10 12 14 16 18 20 Actions (thousands) 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend