Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. - - PowerPoint PPT Presentation

applying formal verification to reflective reasoning
SMART_READER_LITE
LIVE PREVIEW

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. - - PowerPoint PPT Presentation

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. Fallenstein 2 1 Data61, CSIRO and UNSW ramana@intelligence.org 2 Machine Intelligence Research Institute benya@intelligence.org Artificial Intelligence for Theorem Proving,


slide-1
SLIDE 1

Applying Formal Verification to Reflective Reasoning

  • R. Kumar1
  • B. Fallenstein2

1Data61, CSIRO and UNSW

ramana@intelligence.org

2Machine Intelligence Research Institute

benya@intelligence.org

Artificial Intelligence for Theorem Proving, Obergurgl 2017

slide-2
SLIDE 2

Who am I?

Ramana Kumar

PhD, University of Cambridge Researcher, Data61, CSIRO Theorem Proving in HOL

slide-3
SLIDE 3

Context: Beneficial AI

Source: Future of Humanity Institute, Oxford. See also: https://intelligence.org/why-ai-safety/

slide-4
SLIDE 4

Context: Beneficial AI

Technical Agenda

slide-5
SLIDE 5

Context: Beneficial AI

Technical Agenda

Highly Reliable Agent Design

slide-6
SLIDE 6

Context: Beneficial AI

Technical Agenda

Highly Reliable Agent Design

◮ Foundations ◮ Basic problems lacking in-principle solutions

slide-7
SLIDE 7

Context: Beneficial AI

Technical Agenda

Highly Reliable Agent Design

◮ Foundations ◮ Basic problems lacking in-principle solutions

(Note: This is not MIRI’s only research agenda.)

slide-8
SLIDE 8

One problem within MIRI’s 2014 agenda happened to seem to align with my expertise, theorem proving and self-verification

slide-9
SLIDE 9

Problem Statement

slide-10
SLIDE 10

Problem Statement Design a system that

◮ always satisfies some safety property, ◮ but is otherwise capable of arbitrary

self-improvement.

slide-11
SLIDE 11

Problem of Self Trust Too little self-trust Cannot make simple self-modifications Too much self-trust Unsound reasoning about successors

slide-12
SLIDE 12

Overview

Reflective Reasoning

◮ Self-Modifying Agents ◮ Vingean Reflection ◮ Suggester-Verifier Architecture ◮ Problem and Partial Solutions

Implementation

◮ Botworld ◮ Formalisation in HOL

slide-13
SLIDE 13

Reflective Reasoning

slide-14
SLIDE 14

The Agent Framework

environment agent (π)

  • bservation+reward

action π(oa1:n) = an+1

slide-15
SLIDE 15

The Agent Framework

environment agent (π)

  • bservation+reward

action π(oa1:n) = an+1

Cartesian boundary

◮ agent computed outside environment

slide-16
SLIDE 16

Reality is not Cartesian

agent environment

slide-17
SLIDE 17

Reality is not Cartesian

agent environment πn(on) = (an+1, πn+1)

slide-18
SLIDE 18

Vingean Principle One can reason only abstractly about a stronger reasoner

slide-19
SLIDE 19

Vingean Principle One can reason only abstractly about a stronger reasoner

Relevance

Self-improving system must reason about programs it cannot run: its successors

slide-20
SLIDE 20

Vingean Principle One can reason only abstractly about a stronger reasoner

Relevance

Self-improving system must reason about programs it cannot run: its successors

Approach

Formal logic as a model of abstract reasoning

slide-21
SLIDE 21

Suggester-Verifier Architecture

slide-22
SLIDE 22

Suggester-Verifier Architecture

Suggester sophisticated, untrusted

  • bservation

Verifier π, a proof

  • r

default π, a

slide-23
SLIDE 23

Suggester-Verifier Architecture

Suggester sophisticated, untrusted

  • bservation

Verifier π, a proof

  • r

default π, a Verify: ⊢ u(h(π, a)) ≥ u(h(default))

slide-24
SLIDE 24

Suggester-Verifier Architecture

Suggester sophisticated, untrusted

  • bservation

Verifier π, a proof

  • r

default π, a Verify: ⊢ u(h(π, a)) ≥ u(h(default)) (≈ Safe(a))

slide-25
SLIDE 25

Problem with Self-Modification

Argument for Safety of Successor

◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it

has proven to be safe

◮ However, to conclude that an action is actually safe from a

proof is problematic.

slide-26
SLIDE 26

Problem with Self-Modification

Argument for Safety of Successor

◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it

has proven to be safe

◮ However, to conclude that an action is actually safe from a

proof is problematic. This principle, T ⊢ Tϕ = ⇒ ϕ, is inconsistent. (G¨

  • del/L¨
  • b)
slide-27
SLIDE 27

Partial Solutions Descending Trust T100 ⊢ T99ϕ = ⇒ ϕ, T99 ⊢ T98ϕ = ⇒ ϕ, . . .

slide-28
SLIDE 28

Partial Solutions Descending Trust T100 ⊢ T99ϕ = ⇒ ϕ, T99 ⊢ T98ϕ = ⇒ ϕ, . . . Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n)

slide-29
SLIDE 29

Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n)

slide-30
SLIDE 30

Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n) If Safe(a) ≡ ∀n. Safe(a, n) Take ϕ(n) ≡ n ≤ κ = ⇒ Safe(a, n)

slide-31
SLIDE 31

Model Polymorphism 0 < κ, T ⊢ ∀n. Tϕ(¯ n) = ⇒ ϕ[κ − 1/κ](n) If Safe(a) ≡ ∀n. Safe(a, n) Take ϕ(n) ≡ n ≤ κ = ⇒ Safe(a, n) ∀a. ∀n ≤ t + 1 + κ. Safe(¯ a, n) = ⇒ ∀n ≤ t + κ. Safe(a, n)

slide-32
SLIDE 32

Implementation

slide-33
SLIDE 33

Botworld: Concrete Framework for Embedded Agents

Robots can construct/inspect/destroy/program other robots

slide-34
SLIDE 34

Botworld Formalisation

Semantics

◮ step : state → state

slide-35
SLIDE 35

Botworld Formalisation

Semantics

◮ step : state → state ◮ Robots run policies in CakeML

slide-36
SLIDE 36

Botworld Formalisation

Semantics

◮ step : state → state ◮ Robots run policies in CakeML

Counterfactuals

◮ state-with-hole for proposed action

slide-37
SLIDE 37

Botworld Formalisation

Semantics

◮ step : state → state ◮ Robots run policies in CakeML

Counterfactuals

◮ state-with-hole for proposed action ◮ steph : s-w-h → a → (obs, state) option

slide-38
SLIDE 38

Suggester-Verifier Implementation

sv(πdefault,σ,obs):

  • 1. (π, a) = run πdefault
  • 2. (π′, a′, thm) = run σ(obs, π, a)
  • 3. Check thm has correct form
  • 4. Write (π, a) or (π′, a′) accordingly
slide-39
SLIDE 39

Suggester-Verifier Implementation

sv(πdefault,σ,obs):

  • 1. (π, a) = run πdefault
  • 2. (π′, a′, thm) = run σ(obs, π, a)
  • 3. Check thm has correct form
  • 4. Write (π, a) or (π′, a′) accordingly

Reflection Library

Automation for: LCA ¯ k = ⇒ P implies LCA (k + 1) = ⇒ P

slide-40
SLIDE 40

Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture.

slide-41
SLIDE 41

Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture. Eventual Project Discover how far theorem proving technology is from implementing the above...

slide-42
SLIDE 42

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years.

slide-43
SLIDE 43

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs)

slide-44
SLIDE 44

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

slide-45
SLIDE 45

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

Theorem Proving for AI

◮ Specifications Needed!

slide-46
SLIDE 46

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

Theorem Proving for AI

◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on

Suggester-Verifier to support logical induction and non-proof-based reasoning

slide-47
SLIDE 47

Outlook

Implementing a Self-Improving Botworld Agent

◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

Theorem Proving for AI

◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on

Suggester-Verifier to support logical induction and non-proof-based reasoning

◮ Reducing Problems to Functional Correctness