sequential extensions of causal and evidential decision
play

Sequential Extensions of Causal and Evidential Decision Theory Tom - PowerPoint PPT Presentation

Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT15 29 September 2015 Outline Agent Models Decision Theory Sequential Decision Making Conclusion


  1. Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT’15 — 29 September 2015

  2. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  3. Dualistic Agent Model action a t percept e t agent environment

  4. Dualistic Agent Model action a t percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

  5. Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment

  6. Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

  7. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  8. Newcomb’s Problem Presented by [Nozick, 1969] Actions: (1) take the opaque box or (2) take both boxes

  9. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome

  10. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012]

  11. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012] In Newcomb’s problem: taking both boxes causes you to have $1000 more

  12. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome

  13. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014]

  14. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014] In Newcomb’s problem: taking just the opaque box is good news because that means it likely contains $1,000,000

  15. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state

  16. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common!

  17. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time

  18. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect

  19. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code

  20. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code ◮ Example: Multi-Agent setting with multiple copies of one agent

  21. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  22. Sequential Decision Making

  23. The Causal Graph One-shot: s a e

  24. The Causal Graph One-shot: s a e Sequential: s . . . a 1 e 1 a 2 e 2

  25. Notation ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  26. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  27. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� � � �� � e t µ ( e t | past , a t ) future utility

  28. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� � � �� � e t µ ( e t | past , a t ) future utility Sequential policy-evidential decision theory (SPEDT): � � � V pev ( æ < t a t ) := µ ( e t | æ < t a t , π t +1: m ) u ( e t ) + V pev ( æ < t a t e t ) � �� � e t � �� � µ ( e t | past ,π ) future utility

  29. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  30. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� � e t ∈E � �� � µ ( e t | past , do ( a t )) future utility

  31. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� � e t ∈E � �� � µ ( e t | past , do ( a t )) future utility Proposition (Policy-Causal = Action-Causal). For all histories æ < t and percepts e t : µ ( e t | æ < t , do ( a t )) = µ ( e t | æ < t , do ( π t : m )) .

  32. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  33. Examples action-evidential policy-evidential causal Newcomb � � × Newcomb w/ precommit × � � Newcomb w/ looking × × × Toxoplasmosis × × � Seq. Toxoplasmosis × × � Formal description in [Everitt et al., 2015] and source code at http://jan.leike.name

  34. Conclusion ◮ How should physicalistic agents make decisions?

  35. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT

  36. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making

  37. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better?

  38. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility)

  39. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves

  40. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example

  41. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example ◮ How physicalistic agents make decisions optimally is unsolved

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend