Information Particle Filter Tree: An Online Algorithm for POMDPs - - PowerPoint PPT Presentation

information particle filter tree an online algorithm for
SMART_READER_LITE
LIVE PREVIEW

Information Particle Filter Tree: An Online Algorithm for POMDPs - - PowerPoint PPT Presentation

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains Johannes Fischer * and mer Sahin Tas * *Equal contribution International Conference on Machine Learning 2020 www.kit.edu KIT


slide-1
SLIDE 1

www.kit.edu

International Conference on Machine Learning 2020

KIT – The Research University in the Helmholtz Association

Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains

Johannes Fischer * and Ömer Sahin Tas *

*Equal contribution

slide-2
SLIDE 2

Information Particle Filter Tree Algorithm for Continuous POMDPs

Model decision problems under uncertainty

ICML, July 2020 2

POMDPs

Introduction IPFT Experiments Conclusion Reward Shaping

slide-3
SLIDE 3

Information Particle Filter Tree Algorithm for Continuous POMDPs

Model decision problems under uncertainty

ICML, July 2020 2

POMDPs

Cover uncertainties in

Models Environment Future behavior of others

Introduction IPFT Experiments Conclusion Reward Shaping

slide-4
SLIDE 4

Information Particle Filter Tree Algorithm for Continuous POMDPs

Model decision problems under uncertainty

ICML, July 2020 2

POMDPs

Figure: Probabilistic graphical model of a POMDP.

Cover uncertainties in

Models Environment Future behavior of others

Introduction IPFT Experiments Conclusion Reward Shaping

slide-5
SLIDE 5

Information Particle Filter Tree Algorithm for Continuous POMDPs

Model decision problems under uncertainty

ICML, July 2020 2

POMDPs

Figure: Probabilistic graphical model of a POMDP.

Cover uncertainties in

Models Environment Future behavior of others

Reasoning in high dimensional belief space

→Difficult to solve!

Introduction IPFT Experiments Conclusion Reward Shaping

slide-6
SLIDE 6

Information Particle Filter Tree Algorithm for Continuous POMDPs

Model decision problems under uncertainty

ICML, July 2020 2

POMDPs

Figure: Probabilistic graphical model of a POMDP.

Can POMDP solvers be improved by considering information? Cover uncertainties in

Models Environment Future behavior of others

Reasoning in high dimensional belief space

→Difficult to solve!

Introduction IPFT Experiments Conclusion Reward Shaping

slide-7
SLIDE 7

Information Particle Filter Tree Algorithm for Continuous POMDPs 3

Information Measures

Optimal value function 𝑊∗ and information measures have similar shape

→“more information = higher value”

Figure: Shape of optimal value function and negative entropy.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-8
SLIDE 8

Information Particle Filter Tree Algorithm for Continuous POMDPs 8

Information Measures

Optimal value function 𝑊∗ and information measures have similar shape

→“more information = higher value”

Motivation

Speed up planning Allow active information gathering

Figure: Shape of optimal value function and negative entropy.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-9
SLIDE 9

Information Particle Filter Tree Algorithm for Continuous POMDPs

Figure: Probabilistic graphical model of a POMDP.

4

POMDPs

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-10
SLIDE 10

Information Particle Filter Tree Algorithm for Continuous POMDPs

Figure: Probabilistic graphical model of a POMDP.

Extension of POMDP framework

4

POMDPs

[1] Araya-López et al., “A POMDP Extension with Belief-dependent Rewards,” (2010)

Belief-dependent reward model

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-11
SLIDE 11

Information Particle Filter Tree Algorithm for Continuous POMDPs

Figure: Probabilistic graphical model of a POMDP.

Extension of POMDP framework

4

POMDPs

[1] Araya-López et al., “A POMDP Extension with Belief-dependent Rewards,” (2010)

Belief-dependent reward model Solvers exist only for

Discrete problems Piecewise linear and convex Offline computation

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-12
SLIDE 12

Information Particle Filter Tree Algorithm for Continuous POMDPs

Figure: Probabilistic graphical model of a POMDP.

Extension of POMDP framework

4

POMDPs

[1] Araya-López et al., “A POMDP Extension with Belief-dependent Rewards,” (2010)

Belief-dependent reward model Solvers exist only for

Discrete problems Piecewise linear and convex Offline computation

Introduction IPFT Experiments Conclusion Reward Shaping

How can POMDPs on continuous domains be solved online?

ICML, July 2020

slide-13
SLIDE 13

Information Particle Filter Tree Algorithm for Continuous POMDPs

Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate

  • n particle sets

5

Approach - Information Particle Filter Tree

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-14
SLIDE 14

Information Particle Filter Tree Algorithm for Continuous POMDPs

Adapt MCTS-based POMDP solver Approximate belief by particles Evaluate

  • n particle sets

5

Approach - Information Particle Filter Tree

→Online anytime algorithm →Continuous problems

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-15
SLIDE 15

Information Particle Filter Tree Algorithm for Continuous POMDPs 6

Potential-Based Reward Shaping

Reward shaping changes the optimal policy

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-16
SLIDE 16

Information Particle Filter Tree Algorithm for Continuous POMDPs 6

Potential-Based Reward Shaping

Reward shaping changes the optimal policy

[2] Eck et. al. “Potential-based reward shaping for finite horizon online POMDP planning.” (2016)

BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2]

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-17
SLIDE 17

Information Particle Filter Tree Algorithm for Continuous POMDPs 6

Potential-Based Reward Shaping

Reward shaping changes the optimal policy

[2] Eck et. al. “Potential-based reward shaping for finite horizon online POMDP planning.” (2016)

BUT: Optimal policy is invariant under potential-based reward shaping for infinite horizon [2] 𝑊∗ serves as a particularly effective potential

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-18
SLIDE 18

Information Particle Filter Tree Algorithm for Continuous POMDPs 7

Information-Theoretic Reward Shaping

Information measures have similar shape to 𝑊∗

Convex on belief space

→Use as heuristic for 𝑊∗

Introduction IPFT Experiments Conclusion Reward Shaping

Figure: Shape of optimal value function and negative entropy.

ICML, July 2020

slide-19
SLIDE 19

Information Particle Filter Tree Algorithm for Continuous POMDPs 7

Information-Theoretic Reward Shaping

Information measures have similar shape to 𝑊∗

Convex on belief space

→Use as heuristic for 𝑊∗

Two potential-based shaping functions

Discounted information gain Undiscounted information gain

Introduction IPFT Experiments Conclusion Reward Shaping

Figure: Shape of optimal value function and negative entropy.

ICML, July 2020

slide-20
SLIDE 20

Information Particle Filter Tree Algorithm for Continuous POMDPs 7

Information-Theoretic Reward Shaping

Information measures have similar shape to 𝑊∗

Convex on belief space

→Use as heuristic for 𝑊∗

Two potential-based shaping functions

Discounted information gain Undiscounted information gain

Introduction IPFT Experiments Conclusion Reward Shaping

Figure: Shape of optimal value function and negative entropy.

ICML, July 2020

slide-21
SLIDE 21

Information Particle Filter Tree Algorithm for Continuous POMDPs 8

Solving POMDPs in Continuous Domains

Based on Particle Filter Tree (PFT) Algorithm [3]

MCTS → continuous states Double Progressive Widening (DPW)

→ continuous actions & observations

[3] Sunberg and Kochenderfer, “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018)

Figure: Simulation phase of PFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-22
SLIDE 22

Information Particle Filter Tree Algorithm for Continuous POMDPs 22

Solving POMDPs in Continuous Domains

Based on Particle Filter Tree (PFT) Algorithm [3]

MCTS → continuous states Double Progressive Widening (DPW)

→ continuous actions & observations

Solves belief MDP Small weighted particle sets Update with mean particle return

[3] Sunberg and Kochenderfer, “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces,” (2018)

Figure: Simulation phase of PFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-23
SLIDE 23

Information Particle Filter Tree Algorithm for Continuous POMDPs 23

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT)

Particle set approximates belief

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-24
SLIDE 24

Information Particle Filter Tree Algorithm for Continuous POMDPs 24

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT)

Particle set approximates belief Evaluate

  • n weighted particle sets, e.g.

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-25
SLIDE 25

Information Particle Filter Tree Algorithm for Continuous POMDPs 25

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT)

Particle set approximates belief Evaluate

  • n weighted particle sets, e.g.

Particle-based kernel density estimate

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-26
SLIDE 26

Information Particle Filter Tree Algorithm for Continuous POMDPs 26

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT)

Particle set approximates belief Evaluate

  • n weighted particle sets, e.g.

Particle-based kernel density estimate

Averaging over many particle sets leads to better entropy estimate

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-27
SLIDE 27

Information Particle Filter Tree Algorithm for Continuous POMDPs 27

Solving POMDPs in Continuous Domains - Information Particle Filter Tree (IPFT)

Particle set approximates belief Evaluate

  • n weighted particle sets, e.g.

Particle-based kernel density estimate

Averaging over many particle sets leads to better entropy estimate

→IPFT can solve arbitrary POMDPs on

continuous domains

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-28
SLIDE 28

Information Particle Filter Tree Algorithm for Continuous POMDPs 10

Experiments – Light Dark

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-29
SLIDE 29

Information Particle Filter Tree Algorithm for Continuous POMDPs 10

Experiments – Light Dark

Goal: execute 𝑏 = 0 at 𝑡 = 0 Consider action spaces

Figure: Light Dark environment.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-30
SLIDE 30

Information Particle Filter Tree Algorithm for Continuous POMDPs 10

Experiments – Light Dark

Goal: execute 𝑏 = 0 at 𝑡 = 0 Consider action spaces

Figure: Light Dark environment. Figure: Continuous Light Dark environment.

Continuous state space Transition noise Increased observation noise

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-31
SLIDE 31

Information Particle Filter Tree Algorithm for Continuous POMDPs

f

11

Results – Light Dark

Table: Mean reward and standard deviation of 1000 simulations.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-32
SLIDE 32

Information Particle Filter Tree Algorithm for Continuous POMDPs

f

11

Results – Light Dark

Table: Mean reward and standard deviation of 1000 simulations.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-33
SLIDE 33

Information Particle Filter Tree Algorithm for Continuous POMDPs

f

11

Results – Light Dark

Table: Mean reward and standard deviation of 1000 simulations. Figure: Exemplary trajectories of POMCPOW (left) and IPFT (right) in Continuous Light Dark problem.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-34
SLIDE 34

Information Particle Filter Tree Algorithm for Continuous POMDPs 12

Laser Tag

Figure: Laser Tag problem.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-35
SLIDE 35

Information Particle Filter Tree Algorithm for Continuous POMDPs 12

Laser Tag

Figure: Laser Tag problem.

Introduction IPFT Experiments Conclusion Reward Shaping

Table: Mean reward and standard deviation

  • f 1000 simulations.

ICML, July 2020

slide-36
SLIDE 36

Information Particle Filter Tree Algorithm for Continuous POMDPs 13

Hyperparameter Sensitivity Analysis

Figure: Mean reward and standard deviation of 1000 simulations of the Continuous Light Dark problem for different parameters.

Introduction IPFT Experiments Conclusion Reward Shaping ICML, July 2020

slide-37
SLIDE 37

Information Particle Filter Tree Algorithm for Continuous POMDPs 14

Conclusion

Introduction IPFT Experiments Conclusion Reward Shaping

Can POMDP solvers be improved by considering information?

ICML, July 2020

slide-38
SLIDE 38

Information Particle Filter Tree Algorithm for Continuous POMDPs 14

Conclusion

Information-theoretic reward shaping

→Helps by guiding agent to informative beliefs

Introduction IPFT Experiments Conclusion Reward Shaping

Can POMDP solvers be improved by considering information?

ICML, July 2020

slide-39
SLIDE 39

Information Particle Filter Tree Algorithm for Continuous POMDPs 14

Conclusion

Information-theoretic reward shaping

→Helps by guiding agent to informative beliefs

Introduction IPFT Experiments Conclusion Reward Shaping

Can POMDP solvers be improved by considering information? How can POMDPs on continuous domains be solved online?

ICML, July 2020

slide-40
SLIDE 40

Information Particle Filter Tree Algorithm for Continuous POMDPs 14

Conclusion

Information-theoretic reward shaping

→Helps by guiding agent to informative beliefs

Figure: Simulation phase of IPFT.

Introduction IPFT Experiments Conclusion Reward Shaping

Can POMDP solvers be improved by considering information? IPFT combines PFT algorithm with POMDPs

→General online solver for continuous POMDPs

How can POMDPs on continuous domains be solved online?

ICML, July 2020