Statistical Model Checking for Markov Decision Processes David - PowerPoint PPT Presentation

Probabilisitic MC and Statistical MC Probabilistic BLTL The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

Probabilisitic MC and Statistical MC Probabilistic BLTL The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ Proposition This is a well posed problem. David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial. David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial. The (decision) problem for MC for MDPS is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ for all σ David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

SMC for MDPs Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 13 / 37

SMC for MDPs SMC for MDPS Basic idea “Learn the most adversarial scheduler (or a good enough approximation) by successively refining an initial guess” David Henriques (CMU) SMC for MDPs QEST’12 14 / 37

θ φ ≡ σ SMC for MDPs Scheduler Evaluation Same ideas as classical Statistical Model Checking David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

SMC for MDPs Scheduler Evaluation Same ideas as classical Statistical Model Checking Evaluate Probability Treshold Traces θ Answer BLTL formula Sample φ ≡ p 1 U <12 ( G <10 ( ¬ p 3 )) Hypothesis Sufficient Traces Testing Statistical Evidence Fully Probabilistic System + σ David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c a David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c 1000 tries 0 successes a David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c 1000 tries 0 successes a 500 tries 500 successes David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) 700 tries b c 525 successes 1000 tries 0 successes a 500 tries 500 successes David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) 700 tries b Q (s,b) = 0 c 525 successes Q(s,c) = ¾ 1000 tries 0 successes a 500 tries 500 successes Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b Q (s,b) = 0 c Q(s,c) = ¾ a Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c Q(s,c) = ¾ a Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c Q(s,c) = ¾ a σ ’(s,a) = 1/(1+ ¾+0) Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c σ ’(s,c) = 3/7 σ ’(s,b) = 0 Q(s,c) = ¾ a σ ’(s,a) = 4/7 Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ (s,c) = 1/3 σ ’(s,b) = 0 σ (s,b) = 1/3 a σ ’(s,a) = 4/7 σ ’(s,a) = 1/3 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ (s,c) = 1/3 σ ’(s,b) = 0 σ (s,b) = 1/3 a σ ’(s,a) = 4/7 σ ’(s,b) = 1/3*h + 0 * (1-h) > 0 σ ’(s,a) = 1/3 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 1/3*h + 3/7 * (1-h) σ ’(s,b) = 1/3*h + 0 * (1-h) a σ ’(s,a) = 1/3*h + 4/7 * (1-h) David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 ε a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 1- ε a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 *(1- ε ) σ ’(s,b) = 0 *(1- ε ) a σ ’(s,a) = ε + 4/7 *(1- ε ) David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

SMC for MDPs If at first you don’t succeed... If σ makes P σ ( { π : π | = ϕ } ) > θ , the property is surely false. David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

SMC for MDPs If at first you don’t succeed... If σ makes P σ ( { π : π | = ϕ } ) > θ , the property is surely false. If not We may be converging towards a local optimum; The property may be true; David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

SMC for MDPs If at first you don’t succeed... Algorithms like this are called “False-biased Monte Carlo Algorithms” We can trust False Input Algorithm We have to True reconsider a couple of times Confidence increases exponentially with the number of times we restart. Theorem David Henriques (CMU) SMC for MDPs QEST’12 21 / 37

Why does it work? Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 22 / 37

Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) Notice that the MC problem can be reduced to finding V ( σ s i ) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) Notice that the MC problem can be reduced to finding V ( σ s i ) V σ ( s ) = � σ ( s , a ) Q σ ( s , a ) a ∈A ( s ) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

Why does it work? Value Definition [Local Update] Let σ and σ ′ be two schedulers. The local update of σ by σ ′ in s , σ [ σ ( s ) → σ ′ ( s )] is the scheduler the behaves like σ everywhere but in s , where it behaves as σ ′ . σ ′ σ s s σ [ σ ( s → σ ′ ( s ))] David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

Why does it work? Value Definition [Local Update] Let σ and σ ′ be two schedulers. The local update of σ by σ ′ in s , σ [ σ ( s ) → σ ′ ( s )] is the scheduler the behaves like σ everywhere but in s , where it behaves as σ ′ . σ ′ σ s s s σ [ σ ( s → σ ′ ( s ))] David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

Why does it work? Value Theorem [SB] Let σ and σ ′ be two schedulers and ∀ s ∈ S : V σ [ σ ( s ) → σ ′ ( s )] ( s ) ≥ V σ ( s ), then ∀ s ∈ S : V σ ′ ( s ) ≥ V σ ( s ) Corollary Let σ be the input scheduler and σ ′ be the output of Scheduler Improvement. Then ∀ s ∈ S : V σ ′ ( s ) ≥ V σ ( s ) and, in particular V σ ′ ( s i ) ≥ V σ ( s i ) Proof David Henriques (CMU) SMC for MDPs QEST’12 25 / 37

Experimental Validation Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 26 / 37

Experimental Validation Experimental Validation We divided models in three categories Heavily structured models Structured models Unstructured models Comparisons were made against PRISM, a state-of-the-art probabilistic model checker David Henriques (CMU) SMC for MDPs QEST’12 27 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol   David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol   David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol  David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol  David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

Experimental Validation Highly Structured Models θ 0.5 0.8 0.85 0.9 0.95 PRISM CSMA out F F F T T 0.86 3 4 t 1.7 11.5 35.9 115.7 111.9 136 0.3 0.4 0.45 0.5 0.8 PRISM θ CSMA out F F F T T 0.48 3 6 t 2.5 9.4 18.8 133.9 119.3 2995 0.5 0.7 0.8 0.9 0.95 PRISM θ CSMA out F F F F T 0.93 4 4 t 3.5 3.7 17.5 69.0 232.8 16244 θ 0.5 0.7 0.8 0.9 0.95 PRISM CSMA out F F F F F timeout 4 6 t 3.7 4.1 4.2 26.2 258.9 timeout θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 5 t 4.9 11.1 124.7 104.7 103.2 1.6 θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 6 t 5.0 11.3 127.0 104.9 102.9 1.6 David Henriques (CMU) SMC for MDPs QEST’12 29 / 37

Experimental Validation Highly Structured Models Takeaways Symmetry makes the number of “meaningful” actions relatively small; SMC works well in highly structured systems; Exact methods still work best in most cases; David Henriques (CMU) SMC for MDPs QEST’12 30 / 37

Experimental Validation Structured Models Motion Planning - Two robots move around an n by n plant Safe 1 U ≤ 30 � Safe ′ 1 U ≤ 30 RendezVous � � �� P ≤ θ ( pickup 1 ∧ Safe 2 U ≤ 30 � Safe ′ 2 U ≤ 30 RendezVous � � �� ∧ pickup 2 ∧ ) David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

Statistical Model Checking for Markov Decision Processes David - PowerPoint PPT Presentation

Statistical Model Checking for Markov Decision Processes David Henriques Joint work with Jo ao Martins, Paolo Zuliani, Andr e Platzer and Edmund M. Clarke QEST, September 18 th , 2012 David Henriques (CMU) SMC for MDPs QEST12 1 / 37

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

Probabilistic Model Checking Probabilistic Model Checking Marta Kwiatkowska Kwiatkowska Marta

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games 1 Pranav Ashok,

From Model Checking to Proof Checking ... and Back Kedar Namjoshi Bell Labs April 29, 2005

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

Visual Encodings of Temporal Times are often imprecise The

Statistical models for neural encoding, decoding, information estimation, and optimal on-line

AIQL : Enabling Efficient Attack Investigation from System Monitoring Data Peng Gao 1 , Xusheng

t ttst t

Similarity encoding for learning on dirty categorical variables Ga el Varoquaux

SFB 1102: Information Density and Linguistic Encoding The Empirical Basis of Slavic The Empirical

More Efficient Cryptographic Multilinear Maps from Ideal Lattices Ron Steinfeld Clayton School