Planning and Optimization
- G4. Asymptotically Suboptimal Monte-Carlo Methods
Gabriele R¨
- ger and Thomas Keller
Universit¨ at Basel
Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo - - PowerPoint PPT Presentation
Planning and Optimization G4. Asymptotically Suboptimal Monte-Carlo Methods Gabriele R oger and Thomas Keller Universit at Basel December 5, 2018 Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary Content
Universit¨ at Basel
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
ℓ∈L(s) c(ℓ) +
i
N(s) ≤ k is a counter for the number of state-value estimates for state s in first k algorithm iterations and Ck(s) is cost of k-th iteration for state s (assume Ci(s) = 0 for iterations without estimate for s)
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 1 1 1 1 1 2 1 1 1 1 1 4 2 1 6 5 3 1 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 7 7 7 8 6 6 6 7 5 4 5 9 5 3 7 5 5 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 7 7 7 8 6 6 6 7 5 4 5 9 5 3 7 5 5 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 1 1 1 1 3 4 1 1 5 1 1 5 6 1 6 1 1 1 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 9 8 7 8 11 8 6 7 9 4 5 6 9 3 7 1 3 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 8 7.5 7 8 8.5 7 6 7 7 4 5 7.5 7 3 7 3 4 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 7.2 6.3 6.3 8.3 7.0 5.6 5.3 7.2 6.5 4.0 4.3 4.7 6.3 3.0 8.8 1.8 4.0 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 7.69 6.89 6.51 8.48 8.22 6.69 5.51 7.16 6.57 4.0 4.51 4.99 5.43 3.0 8.50 2.40 4.55 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 7.60 6.75 6.49 8.44 7.88 6.48 5.49 6.80 6.54 4.0 4.49 4.84 5.56 3.0 8.33 2.44 4.58 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
domain-dependent knowledge (e.g., games like Bridge, Skat) classical planner (FF-Hindsight, Yoon et. al, 2008)
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 s1 s2 s3 s4 s5 s6 a1 a2 10
2 5 3 5
20 6
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 s1 s2 s3 s4 s5 s6 a1 a2 10 20 6
(sample probability: 60%)
s0 s1 s2 s3 s4 s5 s6 a1 a2 10 20 6
(sample probability: 40%)
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 s1 s2 s3 s4 s5 s6 a1 a2 10 20 6
(sample probability: 60%)
s0 s1 s2 s3 s4 s5 s6 a1 a2 10 20 6
(sample probability: 40%) with k → ∞: ˆ Qk(s0, a1) → 4 ˆ Qk(s0, a2) → 6
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
domain-dependent knowledge (e.g., games like Bridge, Skat) classical planner (FF-Hindsight, Yoon et. al, 2008)
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Sample outcomes of all actions ⇒ deterministic (classical) planning problem Compute policy by solving the sample Simulate the policy
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 1 1 1 1 1 2 1 1 1 1 1 4 2 1 6 5 3 1 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 9 6 7 11 7 7 6 9 5 4 5 8 6 3 13 3 3 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 9 6 7 11 7 7 6 9 5 4 5 8 6 3 13 3 3 2 1
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 9.3 6.9 7.0 11.4 9.0 6.8 6.0 8.8 7.6 4.0 5.0 5.4 5.5 3.0 8.2 2.2 4.6 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 10.06 7.63 7.0 10.66 9.2 6.69 6.0 8.43 6.52 4.0 5.0 5.13 5.54 3.0 8.42 2.37 4.55 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
s0 10.11 7.78 7.0 11.09 8.99 6.42 6.0 8.56 6.52 4.0 5.0 5.11 5.46 3.0 8.24 2.53 4.53 2.0 1.0
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary
Motivation Monte-Carlo Methods HOP Policy Simulation Sparse Sampling Summary