AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 - PowerPoint PPT Presentation

AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 tomeveritt.se

AGI Safety “How can we control something that is smarter than ourselves?” ● Key problems: – Value Loading / Value Learning – Corrigibility – Self-preservation https://www.scientificamerican.com/article/skeptic-agenticity/

Value Loading ● Teach AI relevant high level concepts – Human – Happiness – Moral rules (requires understanding) ● Define goal in these terms: “ Maximise human happiness subject to moral constraints ”

The Evil Genie Effect ● Goal: Cure Cancer! King Midas ● AI-generated plan: 1. Make lots of money by beating humans at stock market predictions 2. Solve a few genetic engineering challenges 3. Synthesize a supervirus that wipes out the human species 4. No more cancer https://anentrepreneurswords.files.wordpress.com/2014/06/king-midas.jpg => Explicit goal specification bad idea

Value Learning http://www.markstivers.com/wordpress/?p=955

Reinforcement Learning (AIXI, Q-learning, ...) ● Requires no understanding ● Some problems: – Hard to program reward function – Laborious to give reward manually – Catastrophic exploration – Wireheading http://diysolarpanelsv.com/man-jumping-off-a-cliff-clipart.html

RL Extensions 1: Human Preferences ● Learn reward function from human preferences ● Recent OpenAI/ Google DeepMind paper – Show human short video clips ● Understanding required: – How communicate scenarios to human? What are the salient features? – Which scenarios are possible / plausible / relevant?

RL Extensions 2 (Cooperative) Inverse Reinforcement Learning ● Learn reward function from human actions – Actions are preference statements ● Helicopter flight (Abbeel et al, 2006) ● Understanding required: – Detect action (cf. soccer kick, Bitcoin purchase) – Infer desire from action

Limited oversight ● Inverse RL: – No oversight required (in theory) ● Learning from Human Preferences: – more data-efficient than RL if queries well-chosen

Catastrophic exploration ● RL: “Let’s try!” ● Human Preferences: “Hey Human, should I try?” ● Inverse RL: “What did the human do?”

Wireheading ● RL: Each state is “self-estimating” its reward ● Human Pref. and Inv. RL: Wireheaded states can be “verified” from outside ● (Everitt et. al., IJCAI-17)

Corrigibility ● Agent should allow for software corrections and shut down ● Until recently, considered separate problem (Hadfield-Menell et al., 2016; Wangberg et al., AGI-17 ) Human pressing shutdown button is a – strong preference statement/ – easily interpretable action that the AI should shut down now

Self-Preservation (of values, corrigibility, software, hardware, ...) ● Everitt et al., AGI-16: (some) agents naturally want to self-preserve ● Need understanding of self ● Self-understanding? – AIXI, Q-learning (Off-policy RL) – SARSA, Policy Gradient (On-policy RL) – Cognitive architectures

Summary ● Understand – Concepts => specify goals => EVIL GENIE – Ask and interpret preferences => RL from Human Preferences – Identify and and interpret human actions => Inverse RL – Self-understanding ● Properties – Limited oversight – Safe(r) exploration – Less/no wireheading – Corrigibility – Self-preservation

References ● Deep Reinforcement Learning from Human Preferences. Christiano et al. , NIPS 2017. ● Reinforcement Learning from a Corrupted Reward Channel. Everitt et al. IJCAI, 2017. ● Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. Saunders et al. Arxiv, 2017. ● Cooperative Inverse Reinforcement Learning. Hadfield-Menell et al. NIPS, 2016 ● The Off-Switch Game. Hadfield-Menell et al. Arxiv, 2016. ● A Game-Theoretic Analysis of the Off-Switch Game. Wangberg et al. , AGI 2017. ● Self-Modification of Policy and Utility Function in Rational Agents. Everitt et al., AGI 2016. ● Superintelligence: Paths, Dangers, Strategies. Bostrom , 2014. ● An Application of Reinforcement Learning to Aerobatic Helicopter Flight. Abbeel et al., NIPS, 2006.

AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 - PowerPoint PPT Presentation

AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 tomeveritt.se AGI Safety How can we control something that is smarter than ourselves? Key problems: Value Loading / Value Learning Corrigibility Self-preservation

IN INDUSTRIES (A (AGI) 7 th MARCH 2017 1 INTRODUCTION - AGI AGI is a non-profit body

A REINFORCEMENT LEARNING PERSPECTIVE ON AGI Itamar Arel, Machine Intelligence Lab

ANNUAL GENERAL MEETING OF SHAREHOLDERS May 19, 2020 Bill Lambert Chair, AGI Board of

Intersection Safety Intersection Safety Intersection Safety FHWA Safety Focus Areas FHWA Safety

CYBER CYBER-SAFETY CYBER CYBER SAFETY SAFETY SAFETY BASICS BASICS Engineering Staff College

Safety Presentation The Silence 1 Safety Presentation SAFETY SAFETY OR 2 Safety

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

A peek under the Blue Coat ProxySG internals Raphal Rigo / AGI / TX5IT Ruxcon - 2015-10-24 A

Aviation Safety Cases The Safety Case and Safety Argument Dr Tim Fowler 29 November 2005

CS1063: Understanding CS1063: Understanding CS1063: Understanding CS1063: Understanding

BE Departmental TA Training Bevin Engelward and Agi Stachowiak with Shannon Hughes August 28 th

Nine Ways to Bias Open-Source AGI Toward Friendliness Ben Goertzel and Joel Pitt Novamente LLC

THE CARES ACT // AS OF 3-27-20 // // INDIVIDUALS // Tax Return filing deadlines and payment

Healt lth h Ca Care re Refo form rm in in Japan for Un Unpreceden cedented ed Agi

Autobiography based prediction in a situated AGI agent Ladislau B ol oni Dept. of

Partial Operator Induction with Beta Distribution Nil Geisweiller AGI-18 Prague 1 Problem:

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Compiling Dependent Types Much recent focus on verified compilation of dependently typed

Designing a state transaction machine for Coq Bruno Barras & Enrico Tassi 12 Aug 2012

WebDAV The good, the bad and the evil TobiasSchlitt < toby@php.net > IPC SE 2009

MATTHEW 5:43 You have heard that it was said, Love your neighbor and hate your enemy.

though they are red like crimson, they shall be as wool. Isaiah 1:18 Yes, But How? Psalm 85

How to say that youre special: Can we use bits in the IPv4 header? Runa Barik (UiO), Michael

From minute utes t s to m milli llisec seconds Tips and Tricks for faster SQL queries

Sambuz

Useful Links

Newsletter

Mail Us

AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 - PowerPoint PPT Presentation

AGI Safety and Understanding Tom Everitt (ANU) 2017-08-18 tomeveritt.se AGI Safety How can we control something that is smarter than ourselves? Key problems: Value Loading / Value Learning Corrigibility Self-preservation

IN INDUSTRIES (A (AGI) 7 th MARCH 2017 1 INTRODUCTION - AGI AGI is a non-profit body

A REINFORCEMENT LEARNING PERSPECTIVE ON AGI Itamar Arel, Machine Intelligence Lab

ANNUAL GENERAL MEETING OF SHAREHOLDERS May 19, 2020 Bill Lambert Chair, AGI Board of

Intersection Safety Intersection Safety Intersection Safety FHWA Safety Focus Areas FHWA Safety

CYBER CYBER-SAFETY CYBER CYBER SAFETY SAFETY SAFETY BASICS BASICS Engineering Staff College

Safety Presentation The Silence 1 Safety Presentation SAFETY SAFETY OR 2 Safety

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

A peek under the Blue Coat ProxySG internals Raphal Rigo / AGI / TX5IT Ruxcon - 2015-10-24 A

Aviation Safety Cases The Safety Case and Safety Argument Dr Tim Fowler 29 November 2005

CS1063: Understanding CS1063: Understanding CS1063: Understanding CS1063: Understanding

BE Departmental TA Training Bevin Engelward and Agi Stachowiak with Shannon Hughes August 28 th

Nine Ways to Bias Open-Source AGI Toward Friendliness Ben Goertzel and Joel Pitt Novamente LLC

THE CARES ACT // AS OF 3-27-20 // // INDIVIDUALS // Tax Return filing deadlines and payment

Healt lth h Ca Care re Refo form rm in in Japan for Un Unpreceden cedented ed Agi

Autobiography based prediction in a situated AGI agent Ladislau B ol oni Dept. of

Partial Operator Induction with Beta Distribution Nil Geisweiller AGI-18 Prague 1 Problem:

Consensus and Dissent or: Meta - Consensus Consensus about what we have consensus

Compiling Dependent Types Much recent focus on verified compilation of dependently typed

Designing a state transaction machine for Coq Bruno Barras &amp; Enrico Tassi 12 Aug 2012

WebDAV The good, the bad and the evil TobiasSchlitt &lt; toby@php.net &gt; IPC SE 2009

MATTHEW 5:43 You have heard that it was said, Love your neighbor and hate your enemy.

though they are red like crimson, they shall be as wool. Isaiah 1:18 Yes, But How? Psalm 85

How to say that youre special: Can we use bits in the IPv4 header? Runa Barik (UiO), Michael

From minute utes t s to m milli llisec seconds Tips and Tricks for faster SQL queries

Sambuz

Useful Links

Newsletter

Mail Us

Designing a state transaction machine for Coq Bruno Barras & Enrico Tassi 12 Aug 2012

WebDAV The good, the bad and the evil TobiasSchlitt < toby@php.net > IPC SE 2009