for Knowledge Transfer in Reinforcement Learning Benjamin Rosman - PowerPoint PPT Presentation

Structured Representations for Knowledge Transfer in Reinforcement Learning Benjamin Rosman Mobile Intelligent Autonomous Systems Council for Scientific and Industrial Research & School of Computer Science and Applied Maths University of the Witwatersrand South Africa

Robots solving complex tasks Large high- dimensional action and state spaces Many different task instances

Behaviour learning • Reinforcement learning (RL) Action a Reward r State s

Markov decision process (MDP) • 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆 Learn optimal policy: 𝑏 0 𝜌 ∗ ∶ 𝑇 → 𝐵 -1 𝑡 2 1.0 0.3 0.1 0.7 0.5 𝑏 0 𝑏 1 1 0.1 0.9 -0.3 𝑡 0 1.0 𝑏 1 𝑏 0 0.5 𝑏 1 𝑡 1 1.0

Looking into the future • Can’t just rely on immediate rewards • Define value functions : 𝜌 • 𝑊 𝜌 𝑡 = 𝐹 𝜌 𝑆 𝑢 𝑡 𝑢 = 𝑡} 𝑡 𝜌 • 𝑅 𝜌 𝑡, 𝑏 = 𝐹 𝜌 𝑆 𝑢 𝑡 𝑢 = 𝑡, 𝑏 𝑢 = 𝑏} 𝑏 𝑡 • V* (Q*) is a proxy for π * 5

Value functions example • Random policy: • Optimal: 6

RL algorithms • So: solve a large system of nonlinear value function equations (Bellman equations) • Optimal control problem • But: transitions P & rewards R aren’t known! • RL learning is trial-and-error learning to find an optimal policy from experience • Exploration vs exploitation 7

Exploring 99 100 98 97 96 8

Learned value function 9

An algorithm: Q-learning • Initialise 𝑅(𝑡, 𝑏) arbitrarily • Repeat (for each episode): • Initialise 𝑡 • Repeat (for each step of episode): Choose 𝑏 from 𝑡 ( 𝜗 -greedy policy from 𝑅 ) 1. arg max 𝑅(𝑡, 𝑏) 𝑥. 𝑞. 1 − 𝜗 exploit • 𝑏 ← ൝ 𝑏 explore 𝑠𝑏𝑜𝑒𝑝𝑛 𝑥. 𝑞. 𝜗 Take action 𝑏 , observe 𝑠, 𝑡′ 2. Update estimate of 𝑅 3. 𝑏′ 𝑅 𝑡 ′ , 𝑏 ′ − 𝑅(𝑡, 𝑏) • 𝑅 𝑡, 𝑏 ← 𝑅 𝑡, 𝑏 + 𝛽 𝑠 + 𝛿 max learn • 𝑡 ← 𝑡′ • Until 𝑡 is terminal estimated immediate future reward reward 10

Solving tasks

Generalising solutions? ? • How does this help us solve other problems?

Hierarchical RL • Sub-behaviours: options 𝑝 = ⟨𝐽 𝑝 , 𝜌 𝑝 , 𝛾 𝑝 ⟩ • Policy + initiation and termination conditions 𝜌 𝑝 : 𝑇 → 𝐵 𝛾 𝑝 : 𝑇 → [0,1] 𝐽 𝑝 ⊆ 𝑇 • Abstract away low level actions • Does not affect the state space

Abstracting states • Aim : learn an abstract representation of the environment • Use with task-level planners • Based on agent behaviours (skills / options) • General : don’t need to be relearned for every new task Steven James (in collaboration with George Konidaris) S. James, B. Rosman, G. Konidaris. Learning to Plan with Portable Symbols. ICML/IJCAI/AAMAS 2018 Workshop on Planning and Learning, July 2018. S. James, B. Rosman, G. Konidaris. Learning Portable Abstract Representations for High-Level Planning. Under review.

Requirements: planning with skills • Learn the preconditions • Classification problem: • 𝑄 can execute skill? current_state) “SYMBOLS” • Learn the effects • Density estimation: • 𝑄 next_state current_state, skill) • Possible if options are subgoal i.e. 𝑄 next_state current_state, skill)

Subgoal options • 𝑄 next_state current_state, skill) • Partition skills to ensure property holds • e.g. “walk to nearest door”

Generating symbols from skills [Konidaris, 2018] • Results in abstract MDP/propositional PPDDL • But 𝑄(𝑡 ∈ 𝐽 𝑝 ) and 𝑄 𝑡 ′ 𝑝) are distributions/symbols over state space particular to current task • e.g. grounded in a specific set of xy-coordinates

Towards portability • Need a representation that facilitates transfer • Assume agent has sensors which provide it with (lossy) observations • Augment the state space with action-centric observations • Agent space • e.g. robot navigating a building • State space: xy-coordinates • Agent space: video camera

Portable symbols • Learning symbols in agent space • Portable! • But: non-Markov and insufficient for planning • Add the subgoal partition labels to rules • General abstract symbols + grounding → portable rules +

Grounding symbols • Learn abstract symbols • Learning linking functions: • Mapping partition numbers from options to their effects • This gives us a factored MDP or a PPDDL representation • Provably sufficient for planning

Learning grounded symbols USING AGENT-SPACE DATA USING STATE-SPACE DATA

The treasure game

Agent and problem space • State space: 𝑦𝑧 -position of agent, key and treasure, angle of levers and state of lock • Agent space: 9 adjacent cells about the agent

Skills • Options: • GoLeft, GoRight • JumpLeft, JumpRight • DownRight, DownLeft • Interact • ClimbLadder, DescendLadder

Learning portable rules • Cluster to create subgoal agent-space options • Use SVM and KDE to estimate preconditions and effects • Learned rules can be transferred between tasks Interact1 Rule DescendLadder Rule

Grounding rules • Partition options in state space to get partition numbers • Learn grounded rule instances: linking 1 2 + 3

Partitioned rules Precondition: Negative effect: Positive effect: Interact 1 : Interact 3 :

Experiments • Require fewer samples in subsequent tasks

Portable rules • Learn abstract rules and their groundings • Transfer between domain instances • Just by learning linking functions • But what if there is additional structure? • In particular, there are many rule instances (objects of interest)? Ofir Marom Ofir Marom and Benjamin Rosman. Zero-Shot Transfer with Deictic Object-Oriented Representation in Reinforcement Learning. NIPS, 2018.

Example: Sokoban

Sokoban (legal move)

Sokoban (illegal move)

Sokoban (goal)

Representations 𝑡 = (𝑏𝑕𝑓𝑜𝑢 𝑦 = 3, 𝑏𝑕𝑓𝑜𝑢 𝑧 = 4, 𝑐𝑝𝑦1 𝑦 = 4, 𝑐𝑝𝑦1 𝑧 = 4, 𝑐𝑝𝑦2 𝑦 = 3, 𝑐𝑝𝑦2 𝑧 = 2) • Poor scalability • 100s of boxes? • Transferability? • Effects of actions depend on interactions further away, complicating a mapping to agent space

Object-oriented representations • Consider objects explicitly • Object classes have attributes • Relationships based on formal logic:

Propositional OO-MDPs [Duik, 2010] • Describe transition rules using schemas • Propositional Object-Oriented MDPs • Provably efficient to learn (KWIK bounds) 𝐹𝑏𝑡𝑢 ∧ 𝑈𝑝𝑣𝑑ℎ 𝐹𝑏𝑡𝑢 𝑄𝑓𝑠𝑡𝑝𝑜, 𝑋𝑏𝑚𝑚 ⇒ 𝑄𝑓𝑠𝑡𝑝𝑜. 𝑦 ← 𝑄𝑓𝑠𝑡𝑝𝑜. 𝑦 + 0

Benefits • Propositional OO-MDPs • Compact representation • Efficient learning of rules

Limitations • Propositional OO-MDPs are efficient, but restrictive 𝐹𝑏𝑡𝑢 ∧ 𝑈𝑝𝑣𝑑ℎ 𝑋𝑓𝑡𝑢 𝐶𝑝𝑦, 𝑄𝑓𝑠𝑡𝑝𝑜 ∧ 𝑈𝑝𝑣𝑑ℎ 𝐹𝑏𝑡𝑢 𝐶𝑝𝑦, 𝑋𝑏𝑚𝑚 ⇒ 𝐶𝑝𝑦. 𝑦 ← ?

Limitations • Propositional OO-MDPs are efficient, but restrictive • Restriction that preconditions are propositional • Can’t refer to the same box 𝐹𝑏𝑡𝑢 ∧ 𝑈𝑝𝑣𝑑ℎ 𝑋𝑓𝑡𝑢 𝐶𝑝𝑦, 𝑄𝑓𝑠𝑡𝑝𝑜 ∧ 𝑈𝑝𝑣𝑑ℎ 𝐹𝑏𝑡𝑢 𝐶𝑝𝑦, 𝑋𝑏𝑚𝑚 ⇒ 𝐶𝑝𝑦. 𝑦 ← ?

Limitations • Propositional OO-MDPs are efficient, but restrictive • Restriction that preconditions are propositional • Can’t refer to the same box 𝐹𝑏𝑡𝑢 ∧ 𝑈𝑝𝑣𝑑ℎ 𝑋𝑓𝑡𝑢 𝐶𝑝𝑦, 𝑄𝑓𝑠𝑡𝑝𝑜 ∧ 𝑈𝑝𝑣𝑑ℎ 𝐹𝑏𝑡𝑢 𝐶𝑝𝑦, 𝑋𝑏𝑚𝑚 ⇒ 𝐶𝑝𝑦. 𝑦 ← ? Ground instances! But then relearn dynamics for box1, box2, etc.

Deictic OO-MDPs • Deictic predicates instead of propositions • Grounded only with respect to a central deictic object ( “ me ” or “ this ” ) • Relates to other non-grounded objects • Transition dynamics of 𝐶𝑝𝑦. 𝑦 depends on grounded 𝑐𝑝𝑦 object 𝐹𝑏𝑡𝑢 ∧ 𝑈𝑝𝑣𝑑ℎ 𝑋𝑓𝑡𝑢 𝑐𝑝𝑦, 𝑄𝑓𝑠𝑡𝑝𝑜 ∧ 𝑈𝑝𝑣𝑑ℎ 𝐹𝑏𝑡𝑢 𝑐𝑝𝑦, 𝑋𝑏𝑚𝑚 ⇒ 𝑐𝑝𝑦. 𝑦 ← 𝑐𝑝𝑦. 𝑦 + 0 • Also provably efficient

Learning the dynamics • Learning from experience: • For each action, how do attributes change? • KWIK framework • Propositional OO-MDPs: DOORMAX algorithm • Transition dynamics for each attribute and action must be representable as a binary tree • Effects at the leaf nodes • Each possible effect can occur at most at one leaf, except for a failure condition (globally nothing changes)

Learning the dynamics 𝑞 1 𝜚 𝑞 2 𝑞 3 𝜚 𝑠 𝑠 1 2

for Knowledge Transfer in Reinforcement Learning Benjamin Rosman - PowerPoint PPT Presentation

Structured Representations for Knowledge Transfer in Reinforcement Learning Benjamin Rosman Mobile Intelligent Autonomous Systems Council for Scientific and Industrial Research & School of Computer Science and Applied Maths University of

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

KNOWLEDGE ACQUISITION AND CONSTRUCTION Transfer of Knowledge Knowledge acquisition is the

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Technology Acquisition & Transfer What is Technology Transfer ? Technology Transfer is

Managing Knowledge Transfer John Flynt Compliance Knowledge Management Specialist April 4, 2018

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

ESRC and the PUBLIC SECTOR: ESRC and the PUBLIC SECTOR: Knowledge Exchange Knowledge Exchange

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Technology Transfer and Commercialisation 1 05/06/2015 1 Tech Transfer and Commercialisation

Transfer! The VIEWS of Practitioners The RESULTS from ROI Dr Paul Donovan NUIM TRANSFER THAT

Transfer Transfer Transitions: Transitions: First Semester First Semester Persistence and

The PROIEL parallel corpus of old Indo-European New Testament translations Dag Trygve Truslew

communication and collaboration models CSCW Issues and Theory All com puter system s have group

Discourse BSc Artificial Intelligence, Spring 2011 Raquel Fernndez Institute for Logic,

discourse deixis in Mixteco Jackeline Alvarez (Hunter College) and Daniel Kaufman (Queens College

Computational Models of Events Lecture 4: Situational Grounding of Events James Pustejovsky

Presenter: Nguyen Thi Phuong Thao Context of Teaching and Learning English Teaching English

Pragmatic aspects of natural language Vojtch Kov Natural Language Processing Centre

Verbs Relied on aspect to convey a perspective on the action. Developed derivationally related

for Knowledge Transfer in Reinforcement Learning Benjamin Rosman - PowerPoint PPT Presentation

Structured Representations for Knowledge Transfer in Reinforcement Learning Benjamin Rosman Mobile Intelligent Autonomous Systems Council for Scientific and Industrial Research & School of Computer Science and Applied Maths University of

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Technology Transfer or Knowledge Transfer? Russ Somma, Ph.D. SommaTech,LLC Affiliate of IPS

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

KNOWLEDGE ACQUISITION AND CONSTRUCTION Transfer of Knowledge Knowledge acquisition is the

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Technology Acquisition &amp; Transfer What is Technology Transfer ? Technology Transfer is

Managing Knowledge Transfer John Flynt Compliance Knowledge Management Specialist April 4, 2018

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

ESRC and the PUBLIC SECTOR: ESRC and the PUBLIC SECTOR: Knowledge Exchange Knowledge Exchange

Heat Transfer Heat Transfer Introduction Practical occurrences, applications, factors

Technology Transfer and Commercialisation 1 05/06/2015 1 Tech Transfer and Commercialisation

Transfer! The VIEWS of Practitioners The RESULTS from ROI Dr Paul Donovan NUIM TRANSFER THAT

Transfer Transfer Transitions: Transitions: First Semester First Semester Persistence and

The PROIEL parallel corpus of old Indo-European New Testament translations Dag Trygve Truslew

communication and collaboration models CSCW Issues and Theory All com puter system s have group

Discourse BSc Artificial Intelligence, Spring 2011 Raquel Fernndez Institute for Logic,

discourse deixis in Mixteco Jackeline Alvarez (Hunter College) and Daniel Kaufman (Queens College

Computational Models of Events Lecture 4: Situational Grounding of Events James Pustejovsky

Presenter: Nguyen Thi Phuong Thao Context of Teaching and Learning English Teaching English

Pragmatic aspects of natural language Vojtch Kov Natural Language Processing Centre

Verbs Relied on aspect to convey a perspective on the action. Developed derivationally related

Technology Acquisition & Transfer What is Technology Transfer ? Technology Transfer is