Controlling Arbitrarily Intelligent Systems Tom Everitt - PowerPoint PPT Presentation

Controlling Arbitrarily Intelligent Systems Tom Everitt tomeveritt.se Australian National University Supervisors: Marcus Hutter, Laurent Orseau, Stephen Gould July 19, 2016 Selfmodification of Policy and Utility Function in Rational Agents. Everitt, Filan, Daswani, and Hutter, AGI 2016 Avoiding Wireheading with Value Reinforcement Learning Everitt and Hutter, AGI 2016 Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 1 / 21

Table of Contents Introduction 1 Utility Modification 2 Sensory Modification 3 Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 2 / 21

Motivation Plenty of recent successes: Self-driving cars IBM Watson Jeopardy victory Boston Dynamics: Big Dog, Atlas Natural Language Processing DQN Atari games AlphaGo Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 3 / 21

Towards Superintelligence Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 4 / 21

Key Question Is it possible, in principle, to design controllable superintelligent systems? Reinforcement learning promising: Agent goal: maximise reward Give the agent reward when happy/satisfied Will interpret “Cook me a good meal” charitably Two problems: Internal wireheading: Agent modifies its goal External wireheading: Agent modifies perceived reward Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 5 / 21

Framework At each time step t , the agent Agent Environment a t submits action a t receives percept e t � e t History æ <t = a 1 e 1 a 2 e 2 . . . a t − 1 e t − 1 information state of agent Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 6 / 21

Goal = Utility Utility function u : ( A × E ) ∗ → [0 , 1] Generalised return: R ( æ 1: ∞ ) = u ( æ < 1 ) + γu ( æ < 2 ) + γ 2 u ( æ < 3 ) + . . . Reward: u ( æ <t ) = r t − 1 e = ( o, r ) � State: u ( æ <t ) = P ( s | æ <t )˜ u ( s ) s ∈S � Value learning: u ( æ <t ) = P ( u i | æ <t ) u i ( æ <t ) u i ∈U (Essentially) any AI optimises function u of its experience æ <t Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 7 / 21

Utility Modification Will the agent want to change its utility function? As humans, utility function is part of our identity: Would you self-modify into someone content just watching TV? Omohundro (2008): Goal-preservation drive An AI will not want to change its goals, because if future versions of the AI want the same goal, then the goal is more likely to be achieved Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 8 / 21

Utility Modification – Formal Model Environment Agent ˇ a t self-mod � e t u t +1 u t utility function at time t a t = (ˇ a t , u t +1 ) Assume the agent is aware of how actions change utility function: “Worst case”: no risk involved Will the agent want to change the utility function to something more easily satisfied? E.g. u ( · ) ≡ 1 (internal wireheading) Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 9 / 21

Different Agents Value = “expected utility” Utility Current u t Future u t +1 V π ( æ <t ) = Q π ( æ <t π ( æ <t )) Policy Current Future Definition (Hedonistic Value) Q he ,π ( æ <k a k ) = E [ u k +1 (ˇ æ 1: k ) + γV he ,π ( æ 1: k ) | ˇ æ <k ˇ a k ] Definition (Ignorant Value) Q ig ,π æ 1: k ) + γV ig ,π ( æ <k a k ) = E [ u t (ˇ ( æ 1: k ) | ˇ æ <k ˇ a k ] t t Definition (Realistic Value) � � æ 1: k ) + γV re ,π k +1 Q re t ( æ <k a k ) = E u t (ˇ ( æ 1: k ) | ˇ æ <k ˇ a k t Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 10 / 21

Different Agents At time step t: Hedonistic agents optimise: ( æ <t +1 ) + γ 2 u t +2 R ( æ 1: ∞ ) = u t ↑ ( æ <t ) + γu t +1 ( æ <t +2 ) · · · ↑ ↑ Ignorant and Realistic agents optimise ↑ ( æ <t +1 ) + γ 2 u t R ( æ 1: ∞ ) = u t ↑ ( æ <t ) + γu t ↑ ( æ <t +2 ) + · · · Realistic agents realise: u t +1 � π ∗ t +1 Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 11 / 21

Results The hedonistic agent self-modifies to u ( · ) ≡ 1 The ignorant agent may self-modify by accident The realistic agent will resist modifications Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 12 / 21

Conclusions The optimal behaviour for a sufficiently self-aware realistic agent is not self-modifying to a different utility function Don’t construct hedonistic agents! Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 13 / 21

Sensory Modification and External Wireheading environment agent a � P ( r | a ) r d r ˇ r = d (ˇ r ) Problem: Actions may affect the agent’s own sensors RL agents strive to optimise V RL ( a ) = � r P ( r | a ) r Theorem: RL agents choose actions leading to d (ˇ r ) ≡ 1 if such actions exist, and the agent realise that they yield full reward (Ring and Orseau, 2011) Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 14 / 21

Use r as Evidence Prior C ( u ) over possible utility functions u : ( A × E ) ∗ → [0 , 1] C ( u, r | a ) = C ( u ) � u ( a ) = r � � �� 1 if true, else 0 The value learning agent (Dewey, 2011) optimises � V V L ( a ) = C ( r | a ) C ( u | r, a ) u ( a ) u,r Theorem: Since � � C ( r | a ) C ( u | r, a ) u ( a ) = C ( u ) u ( a ) u,r u the agent optimises expected utility C ( u ) u ( a ) ; has no incentive to modify reward signal with d Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 15 / 21

Accidental Manipulation of r environment agent a � B ( r | a ) r d ˇ r The environment is described by a joint distribution µ ( u, d, r | a ) = µ ( u ) µ ( d | a ) µ ( r | d, u ) Construct agent with C ( u, d, r | a ) ≈ µ ( u, d, r | a ) (say, C � µ when accumulating experience) � � Q ( a ) = C ( r, d | a ) C ( u | a, r, d ) u ( a ) r,d u Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 16 / 21

Learnability Limits For RL environments µ ( r 1: t | a 1: t ) , a universally learning distribution M exists (see AIXI, Hutter, 2005) M learns to predict any computable environment µ : M ( r t | ar <t a t ) → µ ( r t | ar <t a t ) w. µ .p 1 for any action sequence a 1: ∞ For µ (ˇ r, d, r | a ) , no universal learning distribution can exist Any observed sequence ( a 1 , r 1 ) , ( a 2 , r 2 ) , . . . is explained equally well by many different combinations for u and d No distribution C can learn all computable environments µ ( u, d, r | a ) Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 17 / 21

Beyond RL (C)IRL agents learn about a human utility function u ∗ by observing the actions the human takes � � C ( a h | a ) Q IRL ( a ) = C ( u | a, a h ) u ( a ) a h u The mathematical structure similar to the RL case Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 18 / 21

Conclusions Don’t use RL agents! Value learning agents are better Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 19 / 21

References I Dewey, D. (2011). Learning what to Value. In Artificial General Intelligence , volume 6830, pages 309–314. Everitt, T., Filan, D., Daswani, M., and Hutter, M. (2016). Self-modificication in Rational Agents. In AGI-16 . Springer. Everitt, T. and Hutter, M. (2016). Avoiding Wireheading with Value Reinforcement Learning. In AGI-16 . Springer. Hadfield-Menell, D., Dragan, A., Abbeel, P., and Russell, S. (2016). Cooperative Inverse Reinforcement Learning. Arxiv . Hutter, M. (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability . Lecture Notes in Artificial Intelligence (LNAI 2167). Springer. Martin, J., Everitt, T., and Hutter, M. (2016). Death and Suicide in Universal Artificial Intelligence. In AGI-16 . Springer. Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 20 / 21

References II Omohundro, S. M. (2008). The Basic AI Drives. In Wang, P., Goertzel, B., and Franklin, S., editors, Artificial General Intelligence , volume 171, pages 483–493. IOS Press. Orseau, L. (2014a). Teleporting universal intelligent agents. In AGI-14 , volume 8598 LNAI, pages 109–120. Springer. Orseau, L. (2014b). The multi-slot framework: A formal model for multiple, copiable AIs. In AGI-14 , volume 8598 LNAI, pages 97–108. Springer. Orseau, L. and Armstrong, S. (2016). Safely interruptible agents. In 32nd Conference on Uncertainty in Artificial Intelligence. Ring, M. and Orseau, L. (2011). Delusion, Survival, and Intelligent Agents. In Artificial General Intelligence , pages 11–20. Springer Berlin Heidelberg. Tom Everitt (ANU) Controlling Arbitrarily Intelligent Systems July 19, 2016 21 / 21

Controlling Arbitrarily Intelligent Systems Tom Everitt - PowerPoint PPT Presentation

Controlling Arbitrarily Intelligent Systems Tom Everitt tomeveritt.se Australian National University Supervisors: Marcus Hutter, Laurent Orseau, Stephen Gould July 19, 2016 Selfmodification of Policy and Utility Function in Rational Agents.

Capacity Region of the Gaussian Arbitrarily-Varying Broadcast Channel Fatemeh Hosseinigoki and

Automorphism Groups of Projective Planes with Arbitrarily Many Point and Line Orbits G. Eric

Controlling Quantum Systems Controlling Quantum Systems with Spatial Adiabatic Passage Thomas

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Project Pacifier Controlling a TV with a Clicker device by Florian Thurnwald THURNWALD

Controlling for Context by Standardizing V2A May 20, 2020 2A 1 2A 2 2020 Schield ECOTS

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Intelligent Management Solutions For Keys and Items Traka - iFOB iFOB = intelligent FOB The

Intelligent Computer Mathematics Intelligent Computing? OR Franz Lichtenberger Mathematics

Controlling Particle Diffusion Controlling Particle Diffusion on the Nanoscales Nanoscales on

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

Understanding Understanding and Controlling and Controlling the Risk of the Risk of

Controlling Invasive Species in Controlling Invasive Species in an Urban- -Wildland Interface

6.2 Controlling the Visibility of Data 6.2 Controlling the Visibility of Data Protocol

6.2 Controlling the Visibility of Data the Visibility of Data 6.2 Controlling Area

Why Attackers Lose: Design and Security Analysis of Arbitrarily Large XOR Arbiter PUFs Nils

Pascal - London School of Economics experience. evolution of its learning design: the Universit

The CAPCM Evidence (Welch, Chapter 10-B) Ivo Welch Truthiness So What? Is the CAPCM True? No

The Dean of Students Office Here to Partner, Support, Assist & Engage Rosanna Curti,

Workplace Bullying and Harassment WorkSafe BC Regulations The

LAWYER ETHICS AND A BEHAVIOURAL TURN Richard Moorhead, UCL Centre for Ethics and Law

SHALL NOT NOT EVERY EVERY MAN MAN HOLD HOLD SELF ELF- CONTRO CONTROL TO TO BE BE THE

Working with values and frames to accelerate positive change COMMUNITY Common Cause South Africa

A Difficulty in the Concept of Social Welfare (1950) The original statement of Kenneth J.

Controlling Arbitrarily Intelligent Systems Tom Everitt - PowerPoint PPT Presentation

Controlling Arbitrarily Intelligent Systems Tom Everitt tomeveritt.se Australian National University Supervisors: Marcus Hutter, Laurent Orseau, Stephen Gould July 19, 2016 Selfmodification of Policy and Utility Function in Rational Agents.

Capacity Region of the Gaussian Arbitrarily-Varying Broadcast Channel Fatemeh Hosseinigoki and

Automorphism Groups of Projective Planes with Arbitrarily Many Point and Line Orbits G. Eric

Controlling Quantum Systems Controlling Quantum Systems with Spatial Adiabatic Passage Thomas

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Project Pacifier Controlling a TV with a Clicker device by Florian Thurnwald THURNWALD

Controlling for Context by Standardizing V2A May 20, 2020 2A 1 2A 2 2020 Schield ECOTS

IC220 Set #7: Controlling the Single Cycle Implementation (Chapter Four) 1 Control Selecting

Intelligent Management Solutions For Keys and Items Traka - iFOB iFOB = intelligent FOB The

Intelligent Computer Mathematics Intelligent Computing? OR Franz Lichtenberger Mathematics

Controlling Particle Diffusion Controlling Particle Diffusion on the Nanoscales Nanoscales on

Controlling Controlling Palmer Palmer Amaranth in Amaranth in Soybean Soybean Eric P.

Understanding Understanding and Controlling and Controlling the Risk of the Risk of

Controlling Invasive Species in Controlling Invasive Species in an Urban- -Wildland Interface

6.2 Controlling the Visibility of Data 6.2 Controlling the Visibility of Data Protocol

6.2 Controlling the Visibility of Data the Visibility of Data 6.2 Controlling Area

Why Attackers Lose: Design and Security Analysis of Arbitrarily Large XOR Arbiter PUFs Nils

Pascal - London School of Economics experience. evolution of its learning design: the Universit

The CAPCM Evidence (Welch, Chapter 10-B) Ivo Welch Truthiness So What? Is the CAPCM True? No

The Dean of Students Office Here to Partner, Support, Assist &amp; Engage Rosanna Curti,

Workplace Bullying and Harassment WorkSafe BC Regulations The

LAWYER ETHICS AND A BEHAVIOURAL TURN Richard Moorhead, UCL Centre for Ethics and Law

SHALL NOT NOT EVERY EVERY MAN MAN HOLD HOLD SELF ELF- CONTRO CONTROL TO TO BE BE THE

Working with values and frames to accelerate positive change COMMUNITY Common Cause South Africa

A Difficulty in the Concept of Social Welfare (1950) The original statement of Kenneth J.

The Dean of Students Office Here to Partner, Support, Assist & Engage Rosanna Curti,