Facilitating Testing and Debugging of Markov Decision Processes - - PowerPoint PPT Presentation

facilitating testing and debugging of markov decision
SMART_READER_LITE
LIVE PREVIEW

Facilitating Testing and Debugging of Markov Decision Processes - - PowerPoint PPT Presentation

Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer What are Markov Decision Processes


slide-1
SLIDE 1 Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer
slide-2
SLIDE 2 What are Markov Decision Processes (MDPs)? 1 Sequential Decision Making Under Uncertainty
  • Autonomous Helicopter0
Wildfire Suppression Mountain Car Logistics1 Medical Diagnosis2
slide-3
SLIDE 3 Outline 2
  • 1. Markov Decision Processes (MDPs)
Basic Introduction Testing
  • 2. MDPvis
Design Testing Examples MDPvis Use Case Study
  • 3. Concluding
slide-4
SLIDE 4 Notation, M = ⟨S,A,P,R,γ,P0⟩ 3 Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming (1st ed.). Wiley-Interscience. S All States of the World P0 Starting State Distribution A Available Actions R(s, a) Rewards γ∈ (0, 1) Discount P State Transition Probabilities (Simulators) π(s) → a Policy MDPs: Basic Introduction
slide-5
SLIDE 5 Starting in 1935, the United States adopted the “10 AM policy” We need a more nuanced approach. Motivating Domain of Wildfire 4 MDPs: Basic Introduction Houtman, R. M., Montgomery, C. A., Gagnon, A. R., Calkin, D. E., Dietterich,
  • T. G., McGregor, S., & Crowley, M. (2013). Allowing a Wildfire to Burn:
Estimating the Effect on Future Fire Suppression Costs. International Journal
  • f Wildland Fire, 22(7), 871–882.
  • http://www.fs.fed.us/sites/default/files/2015-Fire-Budget-Report.pdf
slide-6
SLIDE 6 Modeling Wildfire 5 MDPs: Basic Introduction S All the possible configurations of trees/ignitions P0 A snapshot of the current forest, with a random fire A Suppress or let-burn R(s, a) Timber harvest, Suppression Expense γ∈ (0, 1) 0.96 (Forest Service Standard) P Several Simulators π(s) → a Suppress all fires Represents a challenging and more general class of MDPs
  • High Dimensional States
  • Large State Space
  • Integrates Several Simulators
slide-7
SLIDE 7 6 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy
slide-8
SLIDE 8 7 MDPs: Basic Introduction P0 Start with Today’s Landscape Simulators Optimizer Rewards Policy
slide-9
SLIDE 9 8 MDPs: Basic Introduction P0 Generate an ignition and weather Simulators Optimizer Rewards Policy
slide-10
SLIDE 10 9 MDPs: Basic Introduction P0 Generate an ignition and weather Simulators Optimizer Rewards Policy
slide-11
SLIDE 11 10 MDPs: Basic Introduction P0 Select an Action Simulators Optimizer Rewards Policy
slide-12
SLIDE 12 11 MDPs: Basic Introduction P0 Fire Suppression Effort Simulators Optimizer Rewards Policy $(95,000) Fire Suppression Costs
slide-13
SLIDE 13 12 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy Update Vegetation for Wildfire
slide-14
SLIDE 14 13 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy Update Vegetation for harvest $20,000 Harvest Revenue
slide-15
SLIDE 15 14 MDPs: Basic Introduction P0 Generate an ignition and weather Simulators Optimizer Rewards Policy
slide-16
SLIDE 16 15 MDPs: Basic Introduction P0 Select an Action Simulators Optimizer Rewards Policy
slide-17
SLIDE 17 16 MDPs: Basic Introduction P0 Fire Suppression Effort $(15,000) Fire Suppression Costs Simulators Optimizer Rewards Policy
slide-18
SLIDE 18 17 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy Update Vegetation for Wildfire
slide-19
SLIDE 19 18 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy Update Vegetation for Harvest $20,000 Harvest Revenue
slide-20
SLIDE 20 19 MDPs: Basic Introduction P0 (Continue Until Reaching the Horizon) Simulators Optimizer Rewards Policy
slide-21
SLIDE 21 20 MDPs: Basic Introduction P0 Simulators Optimizer Rewards Policy A High Dimensional Probabilistic Time Series …And this is just one of many!
slide-22
SLIDE 22 MDPs: Basic Introduction Simulators Optimizer Rewards Policy Monte Carlo Rollouts P0 P0 P0 P0
slide-23
SLIDE 23 P0 P0 P0 P0 MDPs: Basic Introduction Simulators Optimizer Rewards Policy

{

All visited states influence optimizer
slide-24
SLIDE 24 MDPs: Basic Introduction Simulators Optimizer Rewards Policy Update Policy P0 P0 P0 P0
slide-25
SLIDE 25 P0 MDPs: Basic Introduction Simulators Optimizer Rewards Policy The Rollout Distribution Changes! P0 P0 P0
slide-26
SLIDE 26 MDP Testing Challenges
  • Bugs are probabilistically expressed in a high
dimensional temporal dataset.
  • The dataset changes with changes to parameters.
  • The optimizer sees more of the state and policy space
than the user. 25 Testing requires exploring rollouts and parameters
slide-27
SLIDE 27 MDP Debugging and Fault Isolation
  • Deactivate/modify components to isolate fault
Ø e.g. Balance reward magnitude and frequency 26 Debug MDP specification and integration with parameter changes
slide-28
SLIDE 28 Testing and Debugging Process 27
  • 2. Visualize the data
  • 3. Change Parameters
MDPs: Testing/Debugging
  • 1. Generate Rollouts
slide-29
SLIDE 29 Outline 28
  • 1. Markov Decision Processes (MDPs)
Basic Introduction Testing
  • 2. MDPvis
Design Testing Examples MDPvis Use Case Study
  • 3. Concluding
slide-30
SLIDE 30 29 Introducing MDPvis MDPvis: Design
slide-31
SLIDE 31 What are the elements of the MDPvis design? 30 Parameters
  • History
  • Distributions at Time Step
  • Distributions Through Time
  • State Snapshots
  • MDPvis: Design
slide-32
SLIDE 32 Parameter Areas 31 MDPvis: Design
slide-33
SLIDE 33 History Area 32 MDPvis: Design
slide-34
SLIDE 34 33 Visualization Areas MDPvis: Design
slide-35
SLIDE 35 34 State Variable Distributions for a Fixed Time Step MDPvis: Design Time step 9
slide-36
SLIDE 36 35 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 π1: Let-Burn π2: Suppress-All MDPvis: Design
slide-37
SLIDE 37 36 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design
slide-38
SLIDE 38 37 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design
slide-39
SLIDE 39 38 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design
slide-40
SLIDE 40 39 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design
slide-41
SLIDE 41 40 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design Rescale
slide-42
SLIDE 42 41 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design Take Difference in Counts
slide-43
SLIDE 43 42 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design Re-plot
slide-44
SLIDE 44 43 State Variable Distributions for a Fixed Time Step Comparison π1 – π2 MDPvis: Design Let-Burn Dominates Suppress-All in this time step
slide-45
SLIDE 45 44 State Variable Distributions through Time MDPvis: Design All Time Steps
slide-46
SLIDE 46 45 State Variable Distributions through Time MDPvis: Design 100th Percentile 0th Percentile 50th Percentile All Time Steps Event Number
slide-47
SLIDE 47 46 State Variable Distributions through Time MDPvis: Design Comparison π1 – π2 π1: Let-Burn π2: Suppress-All π1 percentile is greater π2 percentile is greater
slide-48
SLIDE 48 47 State Variable Distributions through Time MDPvis: Design Comparison π1 – π2 π1: Let-Burn π2: Suppress-All Let-Burn is Always Better Across All Time Steps
slide-49
SLIDE 49 State details 48 Allow MDP Simulator to Generate State Visualizations

[ ]

[ ] , , , [ ] , , , [ ] , , ,

MDPvis: Integration
slide-50
SLIDE 50 Parameter Space Analysis (PSA) 49 Categories
  • Sensitivity
  • Optimization
Outliers
  • Partition
  • Uncertainty
  • Fitting
  • Sedlmair, M., Heinzl, C., Bruckner, S., Piringer, H., & Möller, T. (2014). Visual
parameter space analysis: A conceptual framework. Visualization and Computer Graphics, IEEE Transactions on, 20(12). MDPs: Testing/Debugging “[PSA] is the systematic variation of model input parameters, generating outputs for each combination
  • f parameters, and investigating the relation between
parameter settings and corresponding outputs.”
slide-51
SLIDE 51 50 MDPs: Testing/Debugging Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
  • Interaction
Expectation Buggy Result Is the policy Sensitive to the state?
  • 1. Brush suppression choice to select Let-Burn
  • 2. Date is a determinant of suppression choice
  • 3. Date does not determine suppression choice
slide-52
SLIDE 52 51 MDPs: Testing/Debugging Interaction Is the optimization sensitive to the reward signal?
  • 3. We don’t suppress fires if we can’t harvest trees
  • 4. We spend money on suppression
Expectation
  • 1. Zero-out harvest rewards
  • 2. Re-optimize and generate rollouts
Buggy Result Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
slide-53
SLIDE 53 52 MDPs: Testing/Debugging Interaction Expectation Buggy Result Are the largest fires realistic?
  • 1. Change the year to the one with
the largest fire
  • 3. Fire break prevents spread
  • 4. Fire break doesn’t prevent spread
Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
  • Spread
No Spread Spread Spread!
  • 2. Brush histogram to view largest fires
slide-54
SLIDE 54 MDPs: Testing/Debugging Interaction Expectation Buggy Result Do policies partition the state space?
  • 1. Generate suppress-all rollouts
  • 2. Generate let-burn-all rollouts
  • 3. Click the “compare rollouts” button
  • 4. Let-burn-all fires will be larger in the present, and smaller in the future
  • 5. Let-burn-all fires are the same in the present, and larger in the future
Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
slide-55
SLIDE 55 Use Case Study of MDPvis 54 MDPvis: Evaluation We tested a new wildfire policy domain
  • Visualization Developer: 1 Ph.D. Student in Computer Science
  • New Fire Domain Developer: 1 Ph.D. Student in Forestry
  • Wildfire Optimization Expert: 1 faculty research assistant
We found numerous bugs
slide-56
SLIDE 56 Evaluation of MDPvis 55 MDPvis: Evaluation Viewed Largest Fires in Rollouts Second Largest Fire Harvest Areas Largest Fire Fires are not spreading east! Hidden except in most extreme fire by harvests
slide-57
SLIDE 57 Evaluation of MDPvis 56 MDPvis: Evaluation
  • 1. Compare: Same model with
different policies
  • 2. Expect: Same ignition date
in both rollout sets.
  • 3. Actual: Policies change the
weather.
slide-58
SLIDE 58 Conclusion 57 Summary Concluding We need visualization IDEs for MDPs!
slide-59
SLIDE 59 MDPVis.github.io 58 Interactive Demo * Not robust to many simultaneous requests Concluding
slide-60
SLIDE 60 59 Thanks
  • Reviewers: <you know who you are>
  • Advisor: Thomas Dietterich
  • Research Group: Ronald Metoyer, Claire Montgomery,
Rachel Houtman, Mark Crowley, Hailey Buckingham
  • Funder: National Science Foundation
This material is based upon work supported by the National Science Foundation under Grant No. 1331932.
  • Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Concluding MDPVis.github.io
slide-61
SLIDE 61 Questions? 60 End. Concluding Contact Email: VLHCC@SeanBMcGregor.com Twitter: @SeanMcGregor MDPVis.github.io
slide-62
SLIDE 62 Come to the Full Demo! 61 End. Concluding Contact Email: VLHCC@SeanBMcGregor.com Twitter: @SeanMcGregor MDPVis.github.io
slide-63
SLIDE 63 62 MDPs: Testing/Debugging Interaction Expectation Buggy Result How consistent is the policy for small changes to the model?
  • 1. Optimize and generate rollouts
  • 2. Add air tankers to the model
  • 3. Optimize and generate rollouts
  • 4. Click the “Compare Rollouts” button
  • 5. Policy is identical
  • 6. Many differences in policy distribution
Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
slide-64
SLIDE 64 63 MDPs: Testing/Debugging Pre-Process Expectation Buggy Result Does the growth rate match the historical dataset?
  • 1. Add a variable for the
growth percentile within the historic data
  • 3. The percentiles meet the y-axis at their label
  • 4. The percentiles are unpredictable
  • 2. Assign the policy to
the historical policy (suppress all) Interaction Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting
slide-65
SLIDE 65 Let’s Construct a Simple MDP: “Pixel Forest” 64 ⟨S,P0,A,R,γ,P⟩ S0 S1 a1 a0 a1 a0 0.9 0.1 1.0 1.0 0.1 0.9 a0: Do Nothing a1: Remove Fuels
  • 3γt
  • 1γt
  • 100γt
  • 0γt
MDPs: Basic Introduction