Hybrid System Falsification and Reinforcement Learning Formal - PowerPoint PPT Presentation

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems Clovis Eberhart David Sprunger National Institute of Technology, Japan SOKENDAI lesson, July 1, 8, and 22 1 / 31

Quick reminder Falsification: method to find counterexamples to a property, useful in the world of formal methods, black-box method, relies on optimisation algorithms. Hybrid system: continuous and discrete parameters, non-linear behaviour, very expressive. Formulas: expressed in a temporal logic, boolean and robustness semantics. 2 / 31

Refining robustness 1 Time staging 2 Coverage-based falsification 3 3 / 31

Table of Contents Refining robustness 1 Time staging 2 Coverage-based falsification 3 4 / 31

Refining robustness Why? more expressivity (i.e., finer modelling) more techniques (e.g., optimisation techniques work better) Attention more expressivity ❀ more complex algorithms 5 / 31

Refining robustness Why? more expressivity (i.e., finer modelling) more techniques (e.g., optimisation techniques work better) Attention more expressivity ❀ more complex algorithms (here, however, only sliding-window algorithms) 5 / 31

Space-time robustness Donz´ e, A. and Maler O. Robust satisfaction of temporal logic over real-valued signals . FORMATS 2010. Until now, robustness is spatial. Problems: 6 / 31

Space-time robustness Donz´ e, A. and Maler O. Robust satisfaction of temporal logic over real-valued signals . FORMATS 2010. Until now, robustness is spatial. Problems: all these signals verify ✸ [ a , b ] x > 0 with the same robustness 6 / 31

Space-time robustness Donz´ e, A. and Maler O. Robust satisfaction of temporal logic over real-valued signals . FORMATS 2010. Until now, robustness is spatial. Problems: all these signals verify ✸ [ a , b ] x > 0 with the same robustness the similarity between these two signals is lost when computing ρ ( σ, ✸ [ a , b ] x > 0) 6 / 31

Space-time robustness Donz´ e, A. and Maler O. Robust satisfaction of temporal logic over real-valued signals . FORMATS 2010. Until now, robustness is spatial. Problems: all these signals verify ✸ [ a , b ] x > 0 with the same robustness the similarity between these two signals is lost when computing ρ ( σ, ✸ [ a , b ] x > 0) ❀ missing a temporal component 6 / 31

Adding time Assumption: set P = { p 1 , . . . , p n } of atomic propositions. Standard boolean semantics: χ ( σ, ϕ, t ). Time robustness θ − ( σ, p , t ) = χ ( σ, p , t ) · max { d ≥ 0 | ∀ t ′ ∈ [ t − d , t ] .χ ( σ, p , t ′ ) = χ ( σ, p , t ) } θ + ( σ, p , t ) = χ ( σ, p , t ) · max { d ≥ 0 | ∀ t ′ ∈ [ t , t + d ] .χ ( σ, p , t ′ ) = χ ( σ, p , t ) } θ s ( σ, ¬ ϕ, t ) = − θ s ( σ, ϕ, t ) . . . 7 / 31

Interpreting θ + and θ − θ + ( σ, ϕ, t ) = s > 0: σ � ϕ for at least time s θ + ( σ, ϕ, t ) = s < 0: σ � ϕ for at least time s θ − ( σ, ϕ, t ) = s > 0: σ � ϕ since at least time s θ − ( σ, ϕ, t ) = s < 0: σ � ϕ since at least time s 8 / 31

Space-time Robustness Assumption: atomic propositions are functions (e.g., x 2 + y 2 ). Standard robustness semantics: ρ ( σ, ϕ, t ). Space-time robustness For any c ∈ R : θ + c ( σ, f , t ) = θ + ( χ c ( σ, f , t )), θ − c ( σ, f , t ) = θ − ( χ c ( σ, f , t )), θ s c ( σ, ¬ ϕ, t ) = − θ s c ( σ, ϕ, t ). . . . Interpretation: θ + c ( σ, ϕ, t ) = s > 0: ρ ( σ, ϕ, t ) > c for at least time s , . . . 9 / 31

Space-time Robustness Assumption: atomic propositions are functions (e.g., x 2 + y 2 ). Standard robustness semantics: ρ ( σ, ϕ, t ). Space-time robustness For any c ∈ R : θ + c ( σ, f , t ) = θ + ( χ c ( σ, f , t )), θ − c ( σ, f , t ) = θ − ( χ c ( σ, f , t )), θ s c ( σ, ¬ ϕ, t ) = − θ s c ( σ, ϕ, t ). . . . Interpretation: θ + c ( σ, ϕ, t ) = s > 0: ρ ( σ, ϕ, t ) > c for at least time s , . . . Remarks: hopefully more efficient how to choose c ? not more expressive 9 / 31

More flexibility Akazaki T. and Hasuo I. Time robustness in MTL and expressivity in hybrid system falsification . CAV 2015. Spatial robustness: Temporal robustness: 10 / 31

AvSTL Syntax AP = x < r | x ≤ r | x > r | x ≥ r ϕ = ⊤ | ⊥ | AP | ¬ ϕ | ϕ ∨ ϕ | ϕ ∧ ϕ | ϕ U I ϕ | ϕ R I ϕ | ϕ U I ϕ | ϕ R I ϕ Semantics ρ + ( σ, x < r , t ) = max { 0 , r − σ ( x )( t ) } ρ − ( σ, x < r , t ) = min { 0 , r − σ ( x )( t ) } . . . ρ + ( σ, ¬ ϕ, t ) = ρ − ( σ, ϕ, t ) � b ρ + ( σ, ϕ U [ a , b ] ψ, t ) = 1 a ρ ( σ, ϕ U [ a , b ] ∩ [0 ,τ ] ψ, t ) d τ b − a . . . 11 / 31

Example Robustnesses: ρ + , ρ − ϕ = x ≥ 0: ϕ = F I ( x ≥ 0): Consequences: temporal aspects spatial aspects 12 / 31

Expressivity expeditiousness: F [0 , a ] ϕ deadline: F [0 , a ] ϕ ∨ F [ a , b ] ϕ persistence: G [0 , a ] ϕ ∧ G [ a , b ] ϕ 13 / 31

Experimental results 14 / 31

Time staging Zhang, Z., Ernst, G., Sedwards, S., Arcaini, P., and Hasuo, I. Two-Layered Falsification of Hybrid Systems Guided by Monte Carlo Tree Search . EMSOFT 2018. Ernst, G., Sedwards, S., Zhang, Z., and Hasuo, I. Fast Falsification of Hybrid Systems using Probabilistically Adaptive Input . QEST 2019. Idea σ out causally dependent on σ in optimisation methods blind to this dependence ❀ modify the algorithm to take it into account 16 / 31

A picture is worth a thousand words 17 / 31

High-Level Algorithm Alternate between: Monte-Carlo Tree Search to find a good zone, hill-climbing to find a good point in the zone. 18 / 31

Monte-Carlo Tree Search Each node equipped with: robustness estimate, number of visits. To choose a node, balance between: an exploitation score (bigger with smaller robustness estimates), an exploration score (bigger with fewer visits to the node). 19 / 31

Robustness estimates To get robustness estimates: complete the signal by pure hill-climbing. For example, for a newly-expanded node: 20 / 31

Experimental results Interpretation: MTCS explores more, so: better results on hard problems slower on simple problems 21 / 31

Adaptive Las Vegas Tree Search To build signal σ incrementally: randomly choose a level l of “granularity” (initially, low granularity is favoured), choose σ ′ = D l ( σ ), where D l chooses “finer” signals for large l (shorter time, more precise value), adapt D l according to ρ ( σσ ′ , ϕ, t ). 22 / 31

Experimental results Interpretation: falsifying signals are often coarse, or slight variations of such, so explored very fast by this algorithm, robustness scores that concern discrete variables are hard to manipulate for optimisation algorithm (not continuous) 23 / 31

Idea Adimoolam, A., Dang, T., Donz´ e, A., Kapinski, J., and Jin, X. Classification and coverage-based falsification for embedded control systems . CAV 2017. Trade-off between: define a coverage metric of the input space, alternate between: a global search to classify the search space into zones, local searches on the promising zones to converge to a minimum. 25 / 31

High-level algorithm Input: t max Output: a u such that M ( u ) � ϕ S = sample N points at random; R = zones( S ); while t < t max do subdivide( R ); S += biased-sampling( R ); S += singularity-sampling( R ); S += local-search( R ); end for u in S do if ρ ( u ) < 0 then return u end end return None 26 / 31

Subdivision Goal: divide the search space into rectangles with different average robustnesses. Input: R a list of rectangles, S a list of sampled points, K a threshold Output: a list of subdivided rectangles for r in R do pop( R , r ); if | S ∩ r | > K then H = argmin(Γ H ( R , S ), H hyperplane); push( R , r ∩ H − , r ∩ H + ); end end Γ ( d , r , p ) ( R , S ) = � x ∈ S ∩ R e ( d , r , p ) ( x ) e ( d , r , p ) ( x ) = max { p ( ρ ( x ) − µ )( x d − r ) , 0 } 27 / 31

Samplings Biased sampling Goal: increase coverage and decrease robustness. Idea: sample according to a weighted sum of two distributions: P i c : proportional to the numbers of unoccupied cells in rectangle R i , P i r : takes into consideration how the robustness of sampled points varies from the average. Singularity sampling Goal: sample more in rectangles with “singular” samples (robustness much lower than average in rectangle). 28 / 31

Local search Goal: converge to a minimum faster by using local search with a good seed. 29 / 31

Experimental results Interpretation: other methods got caught in local minima. 30 / 31

Conclusion different notions of robustness: can be more expressive can make algorithms more efficient time staging: explores more hence can resolve harder problems coverage-based falsification: theoretical result (if there exists an ε -robust counterexample, there is a grid size such that will find it) coverage helps falsification by exploring more, thus avoiding local minima 31 / 31

Hybrid System Falsification and Reinforcement Learning Formal - PowerPoint PPT Presentation

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems Clovis Eberhart David Sprunger National Institute of Technology, Japan SOKENDAI lesson, July 1, 8, and 22 1 / 31 Quick reminder Falsification:

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Simplified Approach for Representing Part-Whole Relations in OWL-DL Ontologies A.Aziz Altowayan

Expressivity Analysis for PL-Languages Manfred Jaeger Kristian Kersing, Luc De Raedt Aalborg

Motivation SMT Theories of Interest History of SMT Eager approach Lazy approach Optimizations

Higher-Order Automated Theorem Provers uller 1 Christoph Benzm Freie Universit at Berlin

Interaction-Based Privacy Threat Elicitation Laurens Sion , Kim Wuyts, Koen Yskout, Dimitri Van

Parallel Programming Must Be Deterministic by Default Robert Bocchino , Vikram Adve, Sarita Adve,

A survey on SNARKs Carla R` afols Elliptic Curve Cryptography, Bochum, December 3rd Carla R`

Human-Computer Partnerships Wendy E. Mackay Inria, Universit Paris-Saclay 11 October 2018 What