information particle filter tree an online algorithm for
play

Information Particle Filter Tree: An Online Algorithm for POMDPs - PowerPoint PPT Presentation

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and mer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT


  1. Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and Ömer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT – The Research University in the Helmholtz Association

  2. POMDPs Model decision problems under uncertainty Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  3. POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  4. POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  5. POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reasoning in high dimensional belief space → Difficult to solve! Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  6. POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reasoning in high dimensional belief space → Difficult to solve! Can POMDP solvers be improved by considering information? Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  7. Information Measures Optimal value function 𝑊 ∗ and information measures have similar shape → “more information = higher value” Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 3 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  8. Information Measures Optimal value function 𝑊 ∗ and information measures have similar shape → “more information = higher value” Motivation Speed up planning Allow active information gathering Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 8 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  9. POMDPs Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  10. POMDPs Extension of POMDP framework Belief-dependent reward model Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  11. POMDPs Extension of POMDP framework Belief-dependent reward model Solvers exist only for Discrete problems Piecewise linear and convex Offline computation Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  12. POMDPs Extension of POMDP framework Belief-dependent reward model Solvers exist only for Discrete problems Piecewise linear and convex Offline computation How can POMDPs on continuous domains be solved online? Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  13. Approach - Information Particle Filter Tree Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate on particle sets Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 5 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  14. Approach - Information Particle Filter Tree Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate on particle sets → Online anytime algorithm → Continuous problems Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 5 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  15. Potential-Based Reward Shaping Reward shaping changes the optimal policy Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  16. Potential-Based Reward Shaping Reward shaping changes the optimal policy BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2] [2] Eck et. al. “Potential - based reward shaping for finite horizon online POMDP planning.” (2016) Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  17. Potential-Based Reward Shaping Reward shaping changes the optimal policy BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2] 𝑊 ∗ serves as a particularly effective potential [2] Eck et. al. “Potential - based reward shaping for finite horizon online POMDP planning.” (2016) Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  18. Information-Theoretic Reward Shaping Information measures have similar shape to 𝑊 ∗ Convex on belief space → Use as heuristic for 𝑊 ∗ Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 7 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  19. Information-Theoretic Reward Shaping Information measures have similar shape to 𝑊 ∗ Convex on belief space → Use as heuristic for 𝑊 ∗ Two potential-based shaping functions Discounted information gain Undiscounted information gain Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 7 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  20. Information-Theoretic Reward Shaping Information measures have similar shape to 𝑊 ∗ Convex on belief space → Use as heuristic for 𝑊 ∗ Two potential-based shaping functions Discounted information gain Undiscounted information gain Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 7 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  21. Solving POMDPs in Continuous Domains Based on Particle Filter Tree (PFT) Algorithm [3] MCTS → continuous states Double Progressive Widening (DPW) → continuous actions & observations [3] Sunberg and Kochenderfer , “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018) Figure: Simulation phase of PFT. Reward Shaping IPFT Experiments Conclusion Introduction 8 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  22. Solving POMDPs in Continuous Domains Based on Particle Filter Tree (PFT) Algorithm [3] MCTS → continuous states Double Progressive Widening (DPW) → continuous actions & observations Solves belief MDP Small weighted particle sets Update with mean particle return [3] Sunberg and Kochenderfer , “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018) Figure: Simulation phase of PFT. Reward Shaping IPFT Experiments Conclusion Introduction 22 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  23. Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 23 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  24. Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Evaluate on weighted particle sets, e.g. Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 24 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

  25. Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Evaluate on weighted particle sets, e.g. Particle-based kernel density estimate Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 25 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend