Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by - PowerPoint PPT Presentation

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by the Provincia Autonoma di Trento (project EMTELOS). j.w.w. Luthfi Darmawan Supported by the European Master’s Program in Computational Logic (EMCL). ATVA’11, Taipei, Taiwan, October 11–14, 2011

LTL Satisfiability – (Our) Motivation 2 Verification gains momentum ⇒ specifications become object of interest Investigation of specifications Property Simulation (e.g., RAT [PSC+06]) – Example traces, possibly with constraints – Makes properties executable Property Assurance (e.g., RAT [PSC+06]) – Possibilities, assertions Sanity Checks (e.g., RAT [PSC+06]; [RV10]; [FKSFV08]) – Satisfiability, non-validity, non-redundancy Boil down to LTL satisfiability. (Note: checks beyond satisfiability are important, too.) Author: V. Schuppan

LTL Satisfiability – (Our) Motivation 3 Some specific triggers Antichains for LTL satisfiability ([WDMR08]) – Claims advantage of ALASKA over NuSMV-BDD LTL satisfiability solver comparison by Rozier and Vardi ([RV10]) – Focus on LTL satisfiability via explicit and BDD-based symbolic model checking, i.e., no SAT-based, temporal-resolution, tableaux-based tools. Interest in Temporal Resolution and its potential for extraction of unsatisfiable cores ([Sch10]) No recent evaluation of LTL satisfiability solvers using a broad range of algorithms, a broad range of benchmarks, and comprehensive criteria available. Author: V. Schuppan

Objective 4 Objective: compare performance of off-the-shelf solvers for propositional LTL satisfiability. This is a comparison of tools (as opposed to one of algorithms). – Different features (e.g., preprocessing/simplification). – Different maturity. – Different programming languages. Author: V. Schuppan

Outline 5 1. Introduction 2. LTL Solvers 3. Benchmarks 4. Methodology 5. Findings 6. Conclusions Author: V. Schuppan

Selection of LTL Solvers 6 Reduction to Model Checking – Selected: ALASKA , NuSMV-BDD , NuSMV-SBMC – Ruled out: explicit state model checkers [RV10], Cadence SMV (BDDs) [RV10], SAL [RV10], VIS (BDDs) – Last hardware model checking competition [HWMCC10] focuses on safety Tableau-Based Algorithms – Selected: LWB , pltl – Ruled out: TWB , LTL Tableau [GKS09] Temporal Resolution – Selected: TRP++ , TSPASS – Ruled out: TeMP [HKR+04,LH10] Author: V. Schuppan

Selection of Benchmarks 7 Use families from previous comparisons [WDMR08,RV10,HS02]. – But: restrict number of instances in random category. Add families not used for LTL satisfiability before: acacia , amba , forobots . Create new families: O1formula , O2formula , phltl . Scale up families. Add variants that enforce non-trivial behavior. To my knowledge this is the most comprehensive set of benchmarks in comparing propositional LTL satisfiability solvers to date. Author: V. Schuppan

Benchmarks 8 Family Description #Inst./uns. Max. | φ | Source Category application acacia Arbiters and traffic light controllers 71/- 426 [FJR09] alaska lift Elevator specifications 136/34 4450 [WDMR08] alaska szymanski Mutual exclusion protocol 4/- 183 [WDMR08] anzu amba Microcontroller buffer architecture 51/- 6173 [BGJ+07a] anzu genbuf Generalized buffer 60/- 5805 [BGJ+07b] forobots Model of a robot with properties 39/25 636 [BDF09] Category crafted rozier counter 4 variants of a serial counter 78/- 751 [RV10] rozier pattern 8 scalable patterns to trigger diffi- 244/- 7992 [RV10] culties in LTL to B¨ uchi translators schup. O1/2form. Expon. behavior in some solvers 54/42 6001 schuppan phltl Temporal variant of pigeonhole 18/10 40501 Category random rozier formulas Obtained by generating a syntax 2000/57 185 [RV10] tree [DGV99] trp Obtained by lifting propositional 970/397 1422 [HS02] CNF into fixed temporal structure Author: V. Schuppan

Setup, Flow 9 Flow Setup Preliminary Stage Hardware/Software – Purpose: reduce number of configu- – Intel Xeon 3.0 GHz rations for TRP++ and TSPASS – 4 GB memory – 10 second time out – Red Had Linux 5.4, 64 bit kernel 2.6.18 Main Stage – Measure time, mem- – All remaining configurations for all ory with run [BJ] solvers No shuffling of bench- – 60 second time out marks Graphical Evaluation One run per instance and – Choose one winning configuration solver configuration per solver based on highest score Memory out: 2 GB Author: V. Schuppan

Scoring 10 Objective: – Score by highest number of solved instances; break ties by lower time taken on solved instances. (Frequently used.) Problem: – Benchmark families with very different numbers of instances. – Smallest family has 4 instances; largest has > 2000 . Solution: – Arrange benchmark families in tree. – Assign equal weight to the (immediate) children of each node. Caveat: – The weight of an instance may change between different scores, e.g., share of solved instances (all instances count) and run time on solved instances (only solved instances count). – The weight of an instance may change between different solvers for the same score, e.g., run time on solved instances (only instances solved by particular solver count). Author: V. Schuppan

Cactus Plots vs. Contour/Discrete Raw Data Plots 11 Cactus Plots – are standard; – easily allow to identify the winner when ranking by highest number of solved instances with ties broken by time spent on solved instances; – break the correlation between different solvers on the same instance. Contour/Discrete Raw Data Plots – retain the correlation between different solvers on the same instance; – easily allow to identify similar and complementary behavior; – easily allow to see performance of a solver on subfamilies; – easily allow to see difficulty of instances and subfamilies. Author: V. Schuppan

Correctness 12 We found 1 or 2 bugs each in ALASKA , NuSMV , TRP++ , and TSPASS . – Kindly fixed quickly by tool authors. We also found bugs in LWB . – We contacted the developers. – We received no response. – 187 out of 7446 instances (known) buggy. – 13 wrong results. – Others abnormal termination. – Hors concours. Author: V. Schuppan

Winning Configurations per Tool 13 We select one winning configuration (column max) per tool. model construction disabled ( sat and unsat instances) tool winning configuration max min vbs 0.581 0.322 0.595 noc nos nob ALASKA 0.740 0.656 0.800 LWB mod 0.743 0.607 0.823 dcx fflt dyn elbwd NuSMV-BDD 0.723 0.651 0.726 nodcx c NuSMV-SBMC 0.694 0.687 0.702 pltl tree 0.752 0.593 0.776 s r noal bfs nop fsr TRP++ 0.667 0.479 0.670 ext nogrp nosev sub nosls rfmrr- TSPASS norbmrr nomod mor Numbers: weighted share of solved instances. vbs: virtual best solver. Author: V. Schuppan

Run Times (Contour/Discrete Raw Data Plots) 14 application ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS s anzu−genbuf acacia alaska−lift anzu−amba forobots crafted ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS rozier−pattern schuppan rozier−counter random ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS trp rozier−formulas ≤ 0 . 1 sec; > 0 . 1 sec, ≤ 1 sec; > 1 sec, ≤ 10 sec; > 10 sec, ≤ 60 sec; unsolved. Author: V. Schuppan

Run Times Sat vs. Unsat Instances 15 alaska- lift forobots formulas rozier- trp NuSMV- NuSMV- pltl ALASKA LWB TRP++ TSPASS SBMC BDD key: sat unsat x-axes: number of solved instances y-axes: run time Instances selected to “equalize” features. Author: V. Schuppan

Run Times by Instance Size 16 cation appli- crafted formulas rozier- trp NuSMV- NuSMV- pltl ALASKA LWB TRP++ TSPASS BDD SBMC x-axes: instance size y-axes: run time Author: V. Schuppan

Model Sizes 17 all satisfiable instances 20480 vbs 10000 ALASKA LWB NuSMV-BDD NuSMV-SBMC TSPASS 1000 [# states] 100 10 1 Author: V. Schuppan

Right y-axis: weighted average run time on solved instances [seconds]. Left y-axis: weighted share of solved instances. portfolio members. Unrealistic. Best case scenario for portfolio solvers without communication between Mode: perfect oracle selects best portfolio member for any given instance. Potential of a Portfolio Solver (Perfect Oracle) 0.5 0.6 0.7 0.8 0.9 1 0 6 4 3 1 2 5 02 04 01 56 06 05 03 34 056 13 14 46 weighted average run time on solved instances 013 034 26 014 25 046 12 012 24 024 45 045 026 025 456 weighted share of solved instances 0456 256 134 23 023 0256 123 0123 36 0134 124 0124 346 036 0346 234 0234 35 356 035 0356 345 3456 0345 03456 246 0246 16 1234 01234 245 0245 016 146 2456 02456 0146 15 156 015 0156 145 1456 0145 01456 126 0126 1246 01246 236 0236 125 0125 1256 01256 2346 02346 136 1346 1245 01245 12456 012456 0136 01346 235 0235 2356 02356 2345 02345 23456 023456 135 1356 1345 13456 0135 01345 01356 013456 1236 01236 12346 012346 1235 01235 12345 012345 12356 012356 Author: V. Schuppan 123456 0123456 all 0.1 0.316 1 3.16 10 18

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by - PowerPoint PPT Presentation

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by the Provincia Autonoma di Trento (project EMTELOS). j.w.w. Luthfi Darmawan Supported by the European Masters Program in Computational Logic (EMCL). ATVA11, Taipei, Taiwan,

Math 211 Math 211 Lecture #14 M ATLAB s ODE Solvers September 26, 2003 2 Matlab Solvers

Tableau-based decision method for testing satisfiability of the linear temporal logic LTL

Variable and clause elimination for LTL satisfiability checking Martin Suda Max Planck Institut

CTL vs. LTL Robert Bellarmine Krug Department of Computer Sciences University of Texas at Austin

L INEAR T EMPORAL L OGIC (LTL) 1 Presented by Rehab Ashari Sahar Habib C

Truck Shipment Example: One-Time 8. Using the same LTL shipment, find online one-time (spot) LTL

Truck Shipment Example: Periodic 19. If the value of the product increased to $85,000 per ton,

Enhancing Unsatisfiable Cores for LTL with Information on Temporal Relevance Viktor Schuppan

SAT and SMT Murphy Berzish Overview Boolean Satisfiability (SAT) problem SAT solvers:

Satisfiability Modulo Theories SMT solvers are finding their way in many different application

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

The Satisfiability Problem [HMU06,Chp.10b] Satisfiability (SAT) Problem Cooks

Satisfiability Warsaw lectures V.W. Marek Department of Computer Science University of Kentucky

On Clause Learning Algorithms for Satisfiability Sam Buss SDF@60 Kurt G odel Research Center

Synthesis from LTL Specifications with Mean-Payoff Objectives Aaron Bohy 1 ere 1 Emmanuel Filiot 2

Automated Reasoning LTL Model Checking Alan Bundy Automated Reasoning LTL Model Checking

Conflicts of Interest Foot Care and Amputation Prevention: Vascular Surgery and Podiatry NONE

Adversarial Classification Under Differential Privacy Jairo Giraldo Alvaro A. Cardenas Murat

Graphillion: ZDD-Based Compilation Tool for Graph Enumeration and Random Sampling Shin-ichi

miniBDD miniBDD For teaching/learning purposes Designed for ease of use (there are more

Automated Analysis of Reli liability Architecture Fondazione Bruno Kessler Marco Bozzano,

2 Temporal Logic Model Checking Many designs, especially digital hardware designs, can profitably

A Robot Framework Test Suite for OpenAFS Michael Meffie, Sine Nomine Associates June 21, 2019 A

Using Binary Decision Diagrams to Enumerate Inductive Logic Programming Solutions 1 Key