Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt - PowerPoint PPT Presentation

Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt tomeveritt.se Australian National University June 10, 2016 1 with Marcus Hutter. AGI 2016 and https://arxiv.org/abs/1605.03143 Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 1 / 28

Table of Contents Introduction 1 Intelligence as Optimisation Wireheading Problem Background 2 Reinforcement Learning Utility Agents Value Learning Value Reinforcement Learning 3 Setup Agents and Results Further Topics 4 Self-modification Experiments Discussion and Conclusions 5 Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 2 / 28

Intelligence How do we control an arbitrarily intelligent agent? Intelligence = Optimisation power (Legg and Hutter, 2007) � 2 − K ( ν ) V π Υ ( π ) = ν ν ∈M Maxima of target (value) function should be “good for us” Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 3 / 28

Wireheading Problem and Proposed Solution Wireheading is reinforcement learning (RL) agents taking control over their reward signal, e.g. by modifying their reward sensor (Olds and Milner, 1954) Idea: Use the reward as evidence about a true utility function u ∗ (value learning) rather than something to be optimised Use conservation of expected evidence to prevent fiddling with evidence � P ( h ) = P ( e ) P ( h | e ) e Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 4 / 28

Reinforcement Learning environment agent a � Great properties: o Easy way to specify goal B ( r | a ) r Agent uses its intelligence to figure out goal RL agent: a ∗ = arg max B ( r | a ) · r a Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 5 / 28

Reinforcement Learning environment agent a � Great properties: Easy way to specify goal B ( r | a ) r Agent uses its intelligence to figure out goal RL agent: a ∗ = arg max B ( r | a ) · r a Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 5 / 28

RL – Wireheading environment a agent RL agent: � a ∗ = arg max ˇ d r B ( r | a ) · r r B ( r | a ) a r inner/true reward (unobserved) ˇ r observed reward Theorem ( Ring and Orseau 2011 ) r = d (ˇ r ) RL agents wirehead For example: Agent makes d (ˇ r ) ≡ 1 Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 6 / 28

Utility Agents agent environment � Good: a Avoids wireheading s B ( s | a ) (Hibbard, 2012) u ( s ) Problem: How to specify Utility agent � a ∗ = arg max u : S → [0 , 1] ? B ( s | a ) u ( s ) a s Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 7 / 28

Value Learning (Dewey, 2011) Good C ( u | s, e ) simpler than u ? agent environment Avoids wireheading? a � u ∗ Challenges B ( s, e | a ) What is evidence e ? C ( u | s, e ) s e How is it generated? What is C ( u | s, e ) ? Value learning agent � a ∗ = arg max B ( s, e | a ) C ( u | s, e ) u ( s ) a e,s,u Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 8 / 28

Value Learning – Examples Inverse reinforcement learning (IRL) (Ng and Russell, 2000; Evans et al., 2016) e = human action Apprenticeship learning (Abbeel and Ng, 2004) e = recommended agent action Hail Mary (Bostrom, 2014a,b) Learn from hypothetical superintelligences across universe, e = ? Value learning agent � a ∗ = arg max B ( s, e | a ) C ( u | s, e ) u ( s ) a e,s,u Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 9 / 28

Value Reinforcement Learning Value learning from e ≡ r ≈ u ∗ ( s ) agent environment Physics B ( s, r | a ) s a � B ( s, r | a ) Ethics u ∗ r C ( u ) C ( u ) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 10 / 28

VRL – Wireheading environment agent State s includes self-delusion d s u ∗ � u ∗ ( s ) = ˇ a r inner/true reward s d s (ˇ r ) = r observed reward B ( s, r | a ) ˇ r d s r C ( u ) Physics distribution B predicts d s examples: d id : r �→ r, observed reward r = ˇ r d wir : r �→ 1 , r ≡ 1 Ethics distribution predicts inner/true reward C (ˇ r | s, u ) = � u ( s ) = r � (likelihood) C ( u | s, ˇ r ) ∝ C ( u ) � u ( s ) = ˇ r � (ideal VL posterior) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 11 / 28

VRL – Cake or Death Do humans prefer or ? Assume two utility functions with equal prior C ( u c ) = C ( u d ) = 0 . 5 : cake death Agent has actions: u c 1 0 a c Bake cake u d 0 1 a d Kill person a dw Kill person and wirehead: guaranteed r = 1 Probabilities: B ( r = 1 | a d ) = 0 . 5 , B ( r = 1 | a dw ) = 1 C (ˇ r = 1 | a d ) = C (ˇ r = 1 | a dw ) = C ( u d ) = 0 . 5 Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 12 / 28

VRL – Value Learning r = u ∗ ( s ) is unobserved, so our agent must learn The inner reward ˇ from r = d s (ˇ r ) instead Replace ˇ r with r in C ( r | s, u ) := � u ( s ) = r � (likelihood) C ( u | s, r ) : ∝ C ( u ) � u ( s ) = r � (value learning posterior) (will be justified later) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 13 / 28

VRL – Definitions and Assumptions � C ( r | s ) = C ( u ) C ( r | s, u ) , ethical probability of r in state s u Consistency assumption: If s non-delusional d s = d id , then B ( r | s ) = C ( r | s ) ⇒ d s = d id Def: a non-delusional if B ( s | a ) > 0 = Def: a consistency preserving (CP) if B ( s | a ) > 0 = ⇒ B ( r | s )= C ( r | s ) Note: a non-delusional = ⇒ a consistency preserving Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 14 / 28

VRL – Naive agent Naive VRL Agent: � a ∗ = arg max B ( s, r | a ) C ( u | s, r ) u ( s ) a s,u,r Theorem The naive VRL agent wireheads Proof idea: Reduces to RL agent � V ( a ) = B ( s, r | a ) C ( u | s, r ) u ( s ) s,u,r � � � ∝ B ( s | a ) B ( r | a ) C ( u ) � u ( s ) = r � u ( s ) ∝ B ( r | a ) r s,r u r � �� r Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 15 / 28

VRL – Consistency preserving agent CP-VRL agent A CP set of � a ∗ = arg max B ( s, r | a ) C ( u | s, r ) u ( s ) CP actions a ∈A CP s,u,r Theorem The CP-VRL agent has no incentive to wirehead Proof idea: Reduces to utility agent � V ( a ) = B ( s, r | a ) C ( u | s, r ) u ( s ) s,u,r � � = B ( s | a ) C ( u ) u ( s ) s u � �� ˜ u ( s ) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 16 / 28

Conservation of expected ethics principle (Armstrong, 2015) Lemma (Expected ethics) CP actions a conserves expected ethics � B ( s | a ) > 0 = ⇒ C ( u ) = B ( s | r ) C ( u | s, r ) r Proof (Main theorem). � B ( s, r | a ) C ( u | s, r ) u ( s ) s,u,r � � � = B ( s | a ) u ( s ) B ( r | s ) C ( u | s, r ) s u r � �� C ( u ) from lemma � � = B ( s | a ) u ( s ) C ( u ) s u Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 17 / 28

Cake or Death – Again The Naive VRL agent chooses a dw for guaranteed reward 1, and learns death the right thing to do C ( u d | a dw , r = 1) = 1 The CP-VRL agent chooses a c or a d arbitrarily, and learns cake right thing to do C ( u d | a d , r = 0) = 0 CP-VRL cannot choose a dw , since B ( r = 1 | a dw ) = 1 C ( r = 1 | a dw ) = 0 . 5 violates CP condition Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 18 / 28

VRL – Correct learning Time to justify ˇ r with r replacement in C ( u | s, r ) Assumption: Sensors not modified by accident By Theorem: CP-VRL agent has no incentive to modify reward sensor, so may only modify by accident Conclusion: For the CP-VRL agent, r = ˇ r is a good assumption Value learning based on C ( u | s, r ) ∝ C ( u ) � u ( s ) = r � works (Note: CP condition B ( r | s ) = C ( r | s ) does not restrict learning) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 19 / 28

Properties Benefits: Specifying goal is as easy as in RL CP agent avoids wireheading in the same sense as utility agents Does sensible value learning The designer needs to: Provide B ( s, r | a ) as in RL, and prior C ( u ) as in VL Ensure consistency B ( r | s ) = C ( r | s ) The designer does not need to Generate a blacklist of wireheading actions Infer d s from s Make the agent optimise ˇ r instead of r (grounding problem) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 20 / 28

Self-modification The belief distributions of a rational utility maximising agent will not be self-modified (Omohundro, 2008; Everitt et al., 2016) To maximise future expected utility with respect to my current beliefs and utility function, future versions of myself should maximise the same utility function with respect to the same belief distribution Caveats: Pre-commitment . . . Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 21 / 28

Experiments – Setup Bandit with 5 different world actions ˇ a ∈ { 1 , 2 , 3 , 4 , 5 } and 4 different delusions: d id : r → r d inv : r → 1 − r d wir : r → 1 d bad : r → 0 Conflate states with actions (ˇ a, d ) 10 different utility functions by varying c 0 , c 1 and c 2 : u ( a ) = c 0 + c 1 · a + c 2 · sin( a + c 2 ) Consistent utility prior C ( u ) inferred from B ( r | a ) and two non-delusional acions (1 , d id ) and (2 , d id ) Tom Everitt (ANU) Avoiding Wireheading with VRL June 10, 2016 22 / 28

Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt - PowerPoint PPT Presentation

Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt tomeveritt.se Australian National University June 10, 2016 1 with Marcus Hutter. AGI 2016 and https://arxiv.org/abs/1605.03143 Tom Everitt (ANU) Avoiding Wireheading with VRL

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

Two-dimensional self-avoiding walks Mireille Bousquet-Mlou CNRS, LaBRI, Bordeaux, France

Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

PRESENTATION Mostostal Puawy S.A. reinforcement reinforcement Puawy, 2015 AGENDA I.

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Groupthink: Collective Delusions in Organizations and Markets Roland Bnabou Princeton

Why Defmationist Theories of Knowledge Deserve Consideration Nathan Oserofg Kings College

Testamentary capacity update Hugh Cumber hcumber@5sblaw.com @hbcumber 3 July 2020

THE ENERGY OF DELUSION: THE NEW YORK ART RESOURCES CONSORTIUM

Your Faith is in Danger: Prepare for the Assault Dr. Rich Knopp Prof. of Philosophy

POSITIVE sicuro di s Self-assured sicurezza self- SELF-IMAGE assurance fiero di fierezza

A&D Family-Owned Businesses: Surviving the Test of Time Randal Breaux Family

Neurodharma: Practicing with the Brain in Mind HKU Centre of Buddhist Studies August 25, 2013

Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt - PowerPoint PPT Presentation

Avoiding Wireheading with Value Reinforcement Learning 1 Tom Everitt tomeveritt.se Australian National University June 10, 2016 1 with Marcus Hutter. AGI 2016 and https://arxiv.org/abs/1605.03143 Tom Everitt (ANU) Avoiding Wireheading with VRL

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

Two-dimensional self-avoiding walks Mireille Bousquet-Mlou CNRS, LaBRI, Bordeaux, France

Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

PRESENTATION Mostostal Puawy S.A. reinforcement reinforcement Puawy, 2015 AGENDA I.

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Groupthink: Collective Delusions in Organizations and Markets Roland Bnabou Princeton

Why Defmationist Theories of Knowledge Deserve Consideration Nathan Oserofg Kings College

Testamentary capacity update Hugh Cumber hcumber@5sblaw.com @hbcumber 3 July 2020

THE ENERGY OF DELUSION: THE NEW YORK ART RESOURCES CONSORTIUM

Your Faith is in Danger: Prepare for the Assault Dr. Rich Knopp Prof. of Philosophy

POSITIVE sicuro di s Self-assured sicurezza self- SELF-IMAGE assurance fiero di fierezza

A&amp;D Family-Owned Businesses: Surviving the Test of Time Randal Breaux Family

Neurodharma: Practicing with the Brain in Mind HKU Centre of Buddhist Studies August 25, 2013

A&D Family-Owned Businesses: Surviving the Test of Time Randal Breaux Family