Information Particle Filter Tree: An Online Algorithm for POMDPs - PowerPoint PPT Presentation

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and Ömer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT – The Research University in the Helmholtz Association

POMDPs Model decision problems under uncertainty Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reasoning in high dimensional belief space → Difficult to solve! Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Model decision problems under uncertainty Cover uncertainties in Models Environment Future behavior of others Reasoning in high dimensional belief space → Difficult to solve! Can POMDP solvers be improved by considering information? Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 2 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Information Measures Optimal value function 𝑊 ∗ and information measures have similar shape → “more information = higher value” Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 3 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Information Measures Optimal value function 𝑊 ∗ and information measures have similar shape → “more information = higher value” Motivation Speed up planning Allow active information gathering Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 8 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Figure: Probabilistic graphical model of a POMDP. Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Extension of POMDP framework Belief-dependent reward model Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Extension of POMDP framework Belief-dependent reward model Solvers exist only for Discrete problems Piecewise linear and convex Offline computation Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

POMDPs Extension of POMDP framework Belief-dependent reward model Solvers exist only for Discrete problems Piecewise linear and convex Offline computation How can POMDPs on continuous domains be solved online? Figure: Probabilistic graphical model of a POMDP. [1] Araya-López et al., “ A POMDP Extension with Belief-dependent Rewards ,” (2010) Reward Shaping IPFT Experiments Conclusion Introduction 4 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Approach - Information Particle Filter Tree Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate on particle sets Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 5 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Approach - Information Particle Filter Tree Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate on particle sets → Online anytime algorithm → Continuous problems Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 5 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Potential-Based Reward Shaping Reward shaping changes the optimal policy Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Potential-Based Reward Shaping Reward shaping changes the optimal policy BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2] [2] Eck et. al. “Potential - based reward shaping for finite horizon online POMDP planning.” (2016) Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Potential-Based Reward Shaping Reward shaping changes the optimal policy BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2] 𝑊 ∗ serves as a particularly effective potential [2] Eck et. al. “Potential - based reward shaping for finite horizon online POMDP planning.” (2016) Reward Shaping IPFT Experiments Conclusion Introduction 6 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Information-Theoretic Reward Shaping Information measures have similar shape to 𝑊 ∗ Convex on belief space → Use as heuristic for 𝑊 ∗ Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 7 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Information-Theoretic Reward Shaping Information measures have similar shape to 𝑊 ∗ Convex on belief space → Use as heuristic for 𝑊 ∗ Two potential-based shaping functions Discounted information gain Undiscounted information gain Figure: Shape of optimal value function and negative entropy. Reward Shaping IPFT Experiments Conclusion Introduction 7 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Solving POMDPs in Continuous Domains Based on Particle Filter Tree (PFT) Algorithm [3] MCTS → continuous states Double Progressive Widening (DPW) → continuous actions & observations [3] Sunberg and Kochenderfer , “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018) Figure: Simulation phase of PFT. Reward Shaping IPFT Experiments Conclusion Introduction 8 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Solving POMDPs in Continuous Domains Based on Particle Filter Tree (PFT) Algorithm [3] MCTS → continuous states Double Progressive Widening (DPW) → continuous actions & observations Solves belief MDP Small weighted particle sets Update with mean particle return [3] Sunberg and Kochenderfer , “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018) Figure: Simulation phase of PFT. Reward Shaping IPFT Experiments Conclusion Introduction 22 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 23 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Evaluate on weighted particle sets, e.g. Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 24 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT) Particle set approximates belief Evaluate on weighted particle sets, e.g. Particle-based kernel density estimate Figure: Simulation phase of IPFT. Reward Shaping IPFT Experiments Conclusion Introduction 25 Information Particle Filter Tree Algorithm for Continuous POMDPs ICML, July 2020

Information Particle Filter Tree: An Online Algorithm for POMDPs - PowerPoint PPT Presentation

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and mer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

Multi-rate Signal Processing 8. General Alias-Free Conditions for Filter Banks 9. Tree Structured

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

CSE-571 Probabilistic Robotics Bayes Filter Implementations Particle filters Motivation So

Why the Junction Tree Algorithm? The Junction Tree Algorithm The JTA is a general-purpose

Repetitive prefix in Agul and its areal/genetic background (2) nk .i-l jur an al al

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer

PROMYS FOR T EACHERS (P F T) Ryota Matsuura St. Olaf College March 24, 2011 A RNOLD R OSS ONCE

Disclosures Co-author patent application regarding Sinusitis diagnostics and treatments

The verbal prefix za -: Meanings and Aspect in Tale of Igors campaign Slavic Linguistic

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

Measuring Competition in Spatial Retail Paul B. Ellickson 1 Paul L.E. Grieco 2 Oleksii Khvastunov 2

MapReduce Online Tyson Condie UC Berkeley Joint work with Neil Conway, Peter Alvaro, and Joseph

Information Particle Filter Tree: An Online Algorithm for POMDPs - PowerPoint PPT Presentation

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and mer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT

STUFF FILTER POPULARITY FILTER PERSONALITY FILTER TALENT FILTER GODS FILTER WE HAVE THE

Recursive State Estimation 2 Lecture 8 Recap Today Kalman Filter Extended Kalman Filter

Kalman filter Kalman Filter Kalman filter is used to filter true system states from noisy

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Filter Design Specifications Chaiwoot Boonyasiriwat September 29, 2020 Filter Design

THE REPO DOES NOT FORGET STEP 1: GIT FILTER-BRANCH git filter-branch --index-filter 'git rm -rf

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

PURE POWER FILTERS brand guarantee the high quality and performance I N D E X Introduction Air

Multi-rate Signal Processing 8. General Alias-Free Conditions for Filter Banks 9. Tree Structured

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

CSE-571 Probabilistic Robotics Bayes Filter Implementations Particle filters Motivation So

Why the Junction Tree Algorithm? The Junction Tree Algorithm The JTA is a general-purpose

Repetitive prefix in Agul and its areal/genetic background (2) nk .i-l jur an al al

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical &amp; Computer

PROMYS FOR T EACHERS (P F T) Ryota Matsuura St. Olaf College March 24, 2011 A RNOLD R OSS ONCE

Disclosures Co-author patent application regarding Sinusitis diagnostics and treatments

The verbal prefix za -: Meanings and Aspect in Tale of Igors campaign Slavic Linguistic

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

Measuring Competition in Spatial Retail Paul B. Ellickson 1 Paul L.E. Grieco 2 Oleksii Khvastunov 2

MapReduce Online Tyson Condie UC Berkeley Joint work with Neil Conway, Peter Alvaro, and Joseph

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer