Facilitating Testing and Debugging of Markov Decision Processes - PowerPoint PPT Presentation

Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer

What are Markov Decision Processes (MDPs)? Sequential Decision Making Under Uncertainty � Wildfire Suppression � Autonomous Helicopter 0 � Mountain Car � 1 Logistics 1 � Medical Diagnosis 2 �

Outline 1. Markov Decision Processes (MDPs) � Basic Introduction � Testing � 2. MDPvis � Design � Testing Examples � MDPvis Use Case Study � 3. Concluding � 2

MDPs: Basic Introduction � Notation, M = ⟨ S , A , P , R , γ , P 0 ⟩ S � All States of the World � P 0 � Starting State Distribution � A � Available Actions � R ( s , a ) � Rewards � γ ∈ (0, 1) � Discount � P � State Transition Probabilities (Simulators) � π (s) → a � Policy � Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming (1st ed.). Wiley-Interscience. � 3

MDPs: Basic Introduction � Motivating Domain of Wildfire Starting in 1935, the United States adopted the “ 10 AM policy ” We need a more nuanced approach. Houtman, R. M., Montgomery, C. A., Gagnon, A. R., Calkin, D. E., Dietterich, T. G., McGregor, S., & Crowley, M. (2013). Allowing a Wildfire to Burn: 4 Estimating the Effect on Future Fire Suppression Costs. International Journal of Wildland Fire, 22(7), 871–882. � � http://www.fs.fed.us/sites/default/files/2015-Fire-Budget-Report.pdf �

MDPs: Basic Introduction � Modeling Wildfire S � All the possible configurations of trees/ignitions � P 0 � A snapshot of the current forest, with a random fire � A � Suppress or let-burn � R ( s , a ) � Timber harvest, Suppression Expense � γ ∈ (0, 1) � 0.96 (Forest Service Standard) � P � Several Simulators � π (s) → a � Suppress all fires � Represents a challenging and more general class of MDPs � • High Dimensional States � • Large State Space � • Integrates Several Simulators � 5

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � P 0 � 6

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Start with Today’s Landscape � P 0 � 7

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Generate an ignition and weather � P 0 � 8

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Select an Action � P 0 � 10

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Fire Suppression Costs � Fire Suppression Effort � $(95,000) � P 0 � 11

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Wildfire � P 0 � 12

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Harvest Revenue � Update Vegetation for harvest � $20,000 � P 0 � 13

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Select an Action � P 0 � 15

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Fire Suppression Costs � Fire Suppression Effort � $(15,000) � P 0 � 16

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Wildfire � P 0 � 17

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Harvest � Harvest Revenue � $20,000 � P 0 � 18

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � (Continue Until Reaching the Horizon) � P 0 � 19

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � A High Dimensional Probabilistic Time Series � P 0 � …And this is just one of many! � 20

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Monte Carlo Rollouts � P 0 � P 0 � P 0 � P 0 �

MDPs: Basic Introduction � { � Simulators � Optimizer � Rewards � Policy � All visited states influence optimizer � P 0 � P 0 � P 0 � P 0 �

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Policy � P 0 � P 0 � P 0 � P 0 �

MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � The Rollout Distribution Changes! � P 0 � P 0 � P 0 � P 0 �

MDP Testing Challenges • Bugs are probabilistically expressed in a high dimensional temporal dataset. • The dataset changes with changes to parameters. • The optimizer sees more of the state and policy space than the user. Testing requires exploring rollouts and parameters � 25

MDP Debugging and Fault Isolation • Deactivate/modify components to isolate fault Ø e.g. Balance reward magnitude and frequency Debug MDP specification and integration with parameter changes � 26

MDPs: Testing/Debugging � Testing and Debugging Process 1. Generate Rollouts � 2. Visualize the data � 3. Change Parameters � 27

Outline 1. Markov Decision Processes (MDPs) � Basic Introduction � Testing � 2. MDPvis � Design � Testing Examples � MDPvis Use Case Study � 3. Concluding � 28

MDPvis: Design � Introducing MDPvis 29

MDPvis: Design � What are the elements of the MDPvis design? Parameters � History � Distributions at Time Step � Distributions Through Time � State Snapshots � 30

MDPvis: Design � Parameter Areas 31

MDPvis: Design � History Area 32

MDPvis: Design � Visualization Areas 33

MDPvis: Design � State Variable Distributions for a Fixed Time Step Time step 9 � 34

MDPvis: Design � State Variable Distributions for a Fixed Time Step π 1 : Let-Burn � π 2 : Suppress-All � Comparison � π 1 – π 2 � 35

MDPvis: Design � State Variable Distributions for a Fixed Time Step Comparison � π 1 – π 2 � 36

MDPvis: Design � State Variable Distributions for a Fixed Time Step Rescale � Comparison � π 1 – π 2 � 40

MDPvis: Design � State Variable Distributions for a Fixed Time Step Take Difference in Counts � Comparison � π 1 – π 2 � 41

MDPvis: Design � State Variable Distributions for a Fixed Time Step Re-plot � Comparison � π 1 – π 2 � 42

MDPvis: Design � State Variable Distributions for a Fixed Time Step Let-Burn Dominates Suppress-All in this time step � Comparison � π 1 – π 2 � 43

MDPvis: Design � State Variable Distributions through Time All Time Steps 44

MDPvis: Design � State Variable Distributions through Time 100 th Percentile � 50 th Percentile � 0 th Percentile � Event Number � All Time Steps 45

MDPvis: Design � State Variable Distributions through Time π 1 : Let-Burn � π 2 : Suppress-All � Comparison � π 1 – π 2 � π 1 percentile is greater � 46 π 2 percentile is greater �

MDPvis: Design � State Variable Distributions through Time π 1 : Let-Burn � π 2 : Suppress-All � Let-Burn is Comparison � Always Better π 1 – π 2 � Across All Time Steps � 47

MDPvis: Integration � State details Allow MDP Simulator to Generate State Visualizations � [ � ] � [ � ] � , � , � , � [ � ] � , � , � , � [ � ] � , � , � , � 48

MDPs: Testing/Debugging � Parameter Space Analysis (PSA) “[PSA] is the systematic variation of model input parameters, generating outputs for each combination of parameters, and investigating the relation between parameter settings and corresponding outputs. ” � � Categories Sensitivity � Optimization � Outliers � Partition � Uncertainty � Fitting � Sedlmair, M., Heinzl, C., Bruckner, S., Piringer, H., & Möller, T. (2014). Visual 49 parameter space analysis: A conceptual framework. Visualization and Computer Graphics, IEEE Transactions on, 20(12). �

MDPs: Testing/Debugging � Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting � � � Is the policy Sensitive to the state? � Interaction � 1. Brush suppression choice to select Let-Burn � Expectation � 2. Date is a determinant of suppression choice � Buggy Result � 3. Date does not determine suppression choice � 50

Facilitating Testing and Debugging of Markov Decision Processes - PowerPoint PPT Presentation

Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer What are Markov Decision Processes

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Debugging Debugging Tools Module Overview Introduction to Debugging Problems in Production

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Testing and Debugging for Concurrent Programs Yi-Fan Tsai yifan.tsai@colorado.edu Concurrency

Software Testing Overview What is software testing? General testing criteria Testing

Does the Markov decision process fit the data Testing for the Markov property in sequential

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Hyperball -- Hypernuclear spectroscopy at J-PARC -- H. Tamura Tohoku Univ. 1.Physics

in RHIC and LHC R. Bruce, A. Drees, W. Fischer, S. Gilardoni, J.M. Jowett, S.R. Klein, S.

Physiopathology of Radiation- Induced Neurotoxicity John R. Fike, Ph.D. Brain and Spinal Injury

FOTOSAN is a water dispersion of Nanometric Silver with antibacterial properties, in combination

Assessment of Fire Suppression Options for Westside John Scarpulla San Francisco Public

Trade and Development Board, fifty-ninth session Geneva, 1728 September 2012 Item 4: Plenary

1 Innovation at Statistics Netherlands 1 How does the programme work(1/2)? Road map for

Making sense of energy in a chaotic environment by Chris Yelland CEng www.ee.co.za Outline