SLIDE 1 Applying Formal Verification to Reflective Reasoning
- R. Kumar1
- B. Fallenstein2
1Data61, CSIRO and UNSW
ramana@intelligence.org
2Machine Intelligence Research Institute
benya@intelligence.org
Artificial Intelligence for Theorem Proving, Obergurgl 2017
SLIDE 2
Who am I?
Ramana Kumar
PhD, University of Cambridge Researcher, Data61, CSIRO Theorem Proving in HOL
SLIDE 3
Context: Beneficial AI
Source: Future of Humanity Institute, Oxford. See also: https://intelligence.org/why-ai-safety/
SLIDE 4
Context: Beneficial AI
Technical Agenda
SLIDE 5
Context: Beneficial AI
Technical Agenda
Highly Reliable Agent Design
SLIDE 6
Context: Beneficial AI
Technical Agenda
Highly Reliable Agent Design
◮ Foundations ◮ Basic problems lacking in-principle solutions
SLIDE 7
Context: Beneficial AI
Technical Agenda
Highly Reliable Agent Design
◮ Foundations ◮ Basic problems lacking in-principle solutions
(Note: This is not MIRI’s only research agenda.)
SLIDE 8
One problem within MIRI’s 2014 agenda happened to seem to align with my expertise, theorem proving and self-verification
SLIDE 9
Problem Statement
SLIDE 10
Problem Statement Design a system that
◮ always satisfies some safety property, ◮ but is otherwise capable of arbitrary
self-improvement.
SLIDE 11
Problem of Self Trust Too little self-trust Cannot make simple self-modifications Too much self-trust Unsound reasoning about successors
SLIDE 12
Overview
Reflective Reasoning
◮ Self-Modifying Agents ◮ Vingean Reflection ◮ Suggester-Verifier Architecture ◮ Problem and Partial Solutions
Implementation
◮ Botworld ◮ Formalisation in HOL
SLIDE 13
Reflective Reasoning
SLIDE 14 The Agent Framework
environment agent (π)
action π(oa1:n) = an+1
SLIDE 15 The Agent Framework
environment agent (π)
action π(oa1:n) = an+1
Cartesian boundary
◮ agent computed outside environment
SLIDE 16
Reality is not Cartesian
agent environment
SLIDE 17
Reality is not Cartesian
agent environment πn(on) = (an+1, πn+1)
SLIDE 18
Vingean Principle One can reason only abstractly about a stronger reasoner
SLIDE 19
Vingean Principle One can reason only abstractly about a stronger reasoner
Relevance
Self-improving system must reason about programs it cannot run: its successors
SLIDE 20
Vingean Principle One can reason only abstractly about a stronger reasoner
Relevance
Self-improving system must reason about programs it cannot run: its successors
Approach
Formal logic as a model of abstract reasoning
SLIDE 21
Suggester-Verifier Architecture
SLIDE 22 Suggester-Verifier Architecture
Suggester sophisticated, untrusted
Verifier π, a proof
default π, a
SLIDE 23 Suggester-Verifier Architecture
Suggester sophisticated, untrusted
Verifier π, a proof
default π, a Verify: ⊢ u(h(π, a)) ≥ u(h(default))
SLIDE 24 Suggester-Verifier Architecture
Suggester sophisticated, untrusted
Verifier π, a proof
default π, a Verify: ⊢ u(h(π, a)) ≥ u(h(default)) (≈ Safe(a))
SLIDE 25
Problem with Self-Modification
Argument for Safety of Successor
◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it
has proven to be safe
◮ However, to conclude that an action is actually safe from a
proof is problematic.
SLIDE 26 Problem with Self-Modification
Argument for Safety of Successor
◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it
has proven to be safe
◮ However, to conclude that an action is actually safe from a
proof is problematic. This principle, T ⊢ Tϕ = ⇒ ϕ, is inconsistent. (G¨
SLIDE 27
Partial Solutions Descending Trust T100 ⊢ T99ϕ = ⇒ ϕ, T99 ⊢ T98ϕ = ⇒ ϕ, . . .
SLIDE 28
Partial Solutions Descending Trust T100 ⊢ T99ϕ = ⇒ ϕ, T99 ⊢ T98ϕ = ⇒ ϕ, . . . Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n)
SLIDE 29
Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n)
SLIDE 30
Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n) If Safe(a) ≡ ∀n. Safe(a, n) Take ϕ(n) ≡ n ≤ κ = ⇒ Safe(a, n)
SLIDE 31
Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n) If Safe(a) ≡ ∀n. Safe(a, n) Take ϕ(n) ≡ n ≤ κ = ⇒ Safe(a, n) ∀a. ∀n ≤ t + 1 + κ. Safe(¯ a, n) = ⇒ ∀n ≤ t + κ. Safe(a, n)
SLIDE 32
Implementation
SLIDE 33
Botworld: Concrete Framework for Embedded Agents
Robots can construct/inspect/destroy/program other robots
SLIDE 34
Botworld Formalisation
Semantics
◮ step : state → state
SLIDE 35
Botworld Formalisation
Semantics
◮ step : state → state ◮ Robots run policies in CakeML
SLIDE 36
Botworld Formalisation
Semantics
◮ step : state → state ◮ Robots run policies in CakeML
Counterfactuals
◮ state-with-hole for proposed action
SLIDE 37
Botworld Formalisation
Semantics
◮ step : state → state ◮ Robots run policies in CakeML
Counterfactuals
◮ state-with-hole for proposed action ◮ steph : s-w-h → a → (obs, state) option
SLIDE 38 Suggester-Verifier Implementation
sv(πdefault,σ,obs):
- 1. (π, a) = run πdefault
- 2. (π′, a′, thm) = run σ(obs, π, a)
- 3. Check thm has correct form
- 4. Write (π, a) or (π′, a′) accordingly
SLIDE 39 Suggester-Verifier Implementation
sv(πdefault,σ,obs):
- 1. (π, a) = run πdefault
- 2. (π′, a′, thm) = run σ(obs, π, a)
- 3. Check thm has correct form
- 4. Write (π, a) or (π′, a′) accordingly
Reflection Library
Automation for: LCA ¯ k = ⇒ P implies LCA (k + 1) = ⇒ P
SLIDE 40
Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture.
SLIDE 41
Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture. Eventual Project Discover how far theorem proving technology is from implementing the above...
SLIDE 42
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years.
SLIDE 43
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs)
SLIDE 44
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!
SLIDE 45
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!
Theorem Proving for AI
◮ Specifications Needed!
SLIDE 46
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!
Theorem Proving for AI
◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on
Suggester-Verifier to support logical induction and non-proof-based reasoning
SLIDE 47
Outlook
Implementing a Self-Improving Botworld Agent
◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!
Theorem Proving for AI
◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on
Suggester-Verifier to support logical induction and non-proof-based reasoning
◮ Reducing Problems to Functional Correctness