AI Safety
Tom Everitt 27 November 2016
AI Safety Tom Everitt 27 November 2016 Assumed Background E x i s - - PowerPoint PPT Presentation
AI Safety Tom Everitt 27 November 2016 Assumed Background E x i s t e n t i a l r i s k s AI/ML progressing fast E v i l g e n i e e f f e c t Deep Learning, DQN Distinction between: Increasing investments: HLAI 10 years?
Tom Everitt 27 November 2016
– Deep Learning, DQN – Increasing investments:
HLAI 10 years? SuperAI soon after
– “Systemic” risks:
– Evil genie effect – Distinction between:
goals (intelligence)
(value alignment)
capability time human-level now takeoff ? civilisation
how well it optimises a true utility function
performance of agent
http://www.gandgtech.com/utility_industry_technology.php
http://users.eecs.northwestern.edu/~argall/learning.html
Hedonistic Ignorant Realistic Wants to self-modify Doesn’t understand the difference Resists (self)-modification
(Martin et al., 2016)
r = 0 Death r = -1 r = 1
to let human decide, assuming:
– Agent sufficiently uncertain
about u, and
– Agent believes human is
sufficiently rational
Agents (fiddles with details in the learning process)
(Orseau & Armstrong, 2016) Knows u Possibly irrational Doesn’t know u
http://www.cinemablend.com/new/Wachowskis-Planning-Matrix-Trilogy-41905.html
– Intelligent, real-world, reward maximising (RL)
agent will wirehead
– Knowledge-seeking agent will not wirehead
with reward as evidence about true utility function
wirehead
– Learn what a delusion is – No ‘too-good-to-be-true’ condition – Avoid wireheading by accident
– Bayes theorem,
conditional probability
– AIXI/Solomonoff induction
MIRI’s Logical inductor (2016)
deductively limited reasoners
– Converges to probability – Outpaces deduction – Self-trust – Scientific induction
learning more efficient
– Q-learning – Sarsa
AIXI/General RL
http://childpsychologistindia.blogspot.com.au/2013/10/difference-between
Barasz et al. (2014), Critch (2016)
Maximisation:
– Causal DT – Evidential DT – Updateless DT – Timeless DT
(current MIRI research)
– – Options:
– u(ask, bake cake) = 1 – u(kill) = 1.5
Interactive inverse RL (Armstrong and Leike, 2016)
Cake-or-death Open question Cooperative IRL, suicidal agents, safely interruptible agents Self-preservation Model-free AIXI, logical inductors, decision theories Delusionbox, Value RL Assumptions:
Motivated Value Selection. AAAI Workshop
Interactive Inverse Reinforcement Learning. NIPS workshop
Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic. Arxiv
Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents. Arxiv
Learning what to value. AGI
Self-modification of policy and utility function in rational agents, AGI.
Avoiding Wireheading with Value Reinforcement Learning. AGI
Logical Induction. Arxiv
Death and Suicide in Universal Artificial Intelligence, AGI
The Basic AI Drives, AGI
Cooperative Inverse Reinforcement Learning. Arxiv
Safely interruptible agents. UAI
Delusion, Survival, and Intelligent Agents. AGI