applying formal verification to reflective reasoning
play

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. - PowerPoint PPT Presentation

Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. Fallenstein 2 1 Data61, CSIRO and UNSW ramana@intelligence.org 2 Machine Intelligence Research Institute benya@intelligence.org Artificial Intelligence for Theorem Proving,


  1. Applying Formal Verification to Reflective Reasoning R. Kumar 1 B. Fallenstein 2 1 Data61, CSIRO and UNSW ramana@intelligence.org 2 Machine Intelligence Research Institute benya@intelligence.org Artificial Intelligence for Theorem Proving, Obergurgl 2017

  2. Who am I? Ramana Kumar PhD, University of Cambridge Researcher, Data61, CSIRO Theorem Proving in HOL

  3. Context: Beneficial AI Source: Future of Humanity Institute, Oxford. See also: https://intelligence.org/why-ai-safety/

  4. Context: Beneficial AI Technical Agenda

  5. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design

  6. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design ◮ Foundations ◮ Basic problems lacking in-principle solutions

  7. Context: Beneficial AI Technical Agenda Highly Reliable Agent Design ◮ Foundations ◮ Basic problems lacking in-principle solutions (Note: This is not MIRI’s only research agenda.)

  8. One problem within MIRI’s 2014 agenda happened to seem to align with my expertise, theorem proving and self-verification

  9. Problem Statement

  10. Problem Statement Design a system that ◮ always satisfies some safety property, ◮ but is otherwise capable of arbitrary self-improvement.

  11. Problem of Self Trust Too little self-trust Cannot make simple self-modifications Too much self-trust Unsound reasoning about successors

  12. Overview Reflective Reasoning ◮ Self-Modifying Agents ◮ Vingean Reflection ◮ Suggester-Verifier Architecture ◮ Problem and Partial Solutions Implementation ◮ Botworld ◮ Formalisation in HOL

  13. Reflective Reasoning

  14. The Agent Framework observation+reward agent ( π ) environment action π ( oa 1: n ) = a n +1

  15. The Agent Framework observation+reward agent ( π ) environment action π ( oa 1: n ) = a n +1 Cartesian boundary ◮ agent computed outside environment

  16. Reality is not Cartesian environment agent

  17. Reality is not Cartesian environment agent π n ( o n ) = ( a n +1 , � π n +1 � )

  18. Vingean Principle One can reason only abstractly about a stronger reasoner

  19. Vingean Principle One can reason only abstractly about a stronger reasoner Relevance Self-improving system must reason about programs it cannot run: its successors

  20. Vingean Principle One can reason only abstractly about a stronger reasoner Relevance Self-improving system must reason about programs it cannot run: its successors Approach Formal logic as a model of abstract reasoning

  21. Suggester-Verifier Architecture

  22. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default

  23. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default Verify: ⊢ u ( h ( π, a )) ≥ u ( h (default))

  24. Suggester-Verifier Architecture observation π, a Suggester Verifier sophisticated, untrusted proof π, a or default Verify: ⊢ u ( h ( π, a )) ≥ u ( h (default)) ( ≈ Safe( a ))

  25. Problem with Self-Modification Argument for Safety of Successor ◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it has proven to be safe ◮ However, to conclude that an action is actually safe from a proof is problematic.

  26. Problem with Self-Modification Argument for Safety of Successor ◮ To create a successor, must prove that its actions will be safe ◮ If successor follows s-v architecture, it will only take actions it has proven to be safe ◮ However, to conclude that an action is actually safe from a proof is problematic. This principle, T ⊢ � T � ϕ � = ⇒ ϕ , is inconsistent. (G¨ odel/L¨ ob)

  27. Partial Solutions Descending Trust T 100 ⊢ � T 99 � ϕ � = ⇒ ϕ , T 99 ⊢ � T 98 � ϕ � = ⇒ ϕ , . . .

  28. Partial Solutions Descending Trust T 100 ⊢ � T 99 � ϕ � = ⇒ ϕ , T 99 ⊢ � T 98 � ϕ � = ⇒ ϕ , . . . Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n )

  29. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n )

  30. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n ) If Safe( a ) ≡ ∀ n . Safe( a , n ) Take ϕ ( n ) ≡ n ≤ κ = ⇒ Safe( a , n )

  31. Model Polymorphism 0 < κ, T ⊢ ∀ n . � T � ϕ (¯ n ) � = ⇒ ϕ [ κ − 1 /κ ]( n ) If Safe( a ) ≡ ∀ n . Safe( a , n ) Take ϕ ( n ) ≡ n ≤ κ = ⇒ Safe( a , n ) ∀ a . �� ∀ n ≤ t + 1 + κ. Safe(¯ a , n ) � = ⇒ ∀ n ≤ t + κ. Safe( a , n )

  32. Implementation

  33. Botworld: Concrete Framework for Embedded Agents Robots can construct/inspect/destroy/program other robots

  34. Botworld Formalisation Semantics ◮ step : state → state

  35. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML

  36. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML Counterfactuals ◮ state-with-hole for proposed action

  37. Botworld Formalisation Semantics ◮ step : state → state ◮ Robots run policies in CakeML Counterfactuals ◮ state-with-hole for proposed action ◮ steph : s-w-h → a → (obs , state) option

  38. Suggester-Verifier Implementation sv( π default , σ ,obs): 1. ( π, a ) = run π default 2. ( π ′ , a ′ , thm) = run σ (obs , π, a ) 3. Check thm has correct form 4. Write ( π, a ) or ( π ′ , a ′ ) accordingly

  39. Suggester-Verifier Implementation sv( π default , σ ,obs): 1. ( π, a ) = run π default 2. ( π ′ , a ′ , thm) = run σ (obs , π, a ) 3. Check thm has correct form 4. Write ( π, a ) or ( π ′ , a ′ ) accordingly Reflection Library Automation for: �� LCA ¯ k = ⇒ P � implies LCA ( k + 1) = ⇒ P

  40. Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture.

  41. Implementation Challenge Project Proposal Build a Botworld agent that self-modifies into a provably safe agent of the same architecture. Eventual Project Discover how far theorem proving technology is from implementing the above...

  42. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years.

  43. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs)

  44. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice!

  45. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed!

  46. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on Suggester-Verifier to support logical induction and non-proof-based reasoning

  47. Outlook Implementing a Self-Improving Botworld Agent ◮ Looks possible, but with more effort than anticipated ◮ I would estimate 4 person-years. (building on > 25 in prereqs) ◮ Improvements on model polymorphism would be nice! Theorem Proving for AI ◮ Specifications Needed! ◮ Novel Architectures for AI Systems, e.g., improve on Suggester-Verifier to support logical induction and non-proof-based reasoning ◮ Reducing Problems to Functional Correctness

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend