evaluating ltl satisfiability solvers
play

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by - PowerPoint PPT Presentation

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by the Provincia Autonoma di Trento (project EMTELOS). j.w.w. Luthfi Darmawan Supported by the European Masters Program in Computational Logic (EMCL). ATVA11, Taipei, Taiwan,


  1. Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by the Provincia Autonoma di Trento (project EMTELOS). j.w.w. Luthfi Darmawan Supported by the European Master’s Program in Computational Logic (EMCL). ATVA’11, Taipei, Taiwan, October 11–14, 2011

  2. LTL Satisfiability – (Our) Motivation 2 Verification gains momentum ⇒ specifications become object of interest Investigation of specifications Property Simulation (e.g., RAT [PSC+06]) – Example traces, possibly with constraints – Makes properties executable Property Assurance (e.g., RAT [PSC+06]) – Possibilities, assertions Sanity Checks (e.g., RAT [PSC+06]; [RV10]; [FKSFV08]) – Satisfiability, non-validity, non-redundancy Boil down to LTL satisfiability. (Note: checks beyond satisfiability are important, too.) Author: V. Schuppan

  3. LTL Satisfiability – (Our) Motivation 3 Some specific triggers Antichains for LTL satisfiability ([WDMR08]) – Claims advantage of ALASKA over NuSMV-BDD LTL satisfiability solver comparison by Rozier and Vardi ([RV10]) – Focus on LTL satisfiability via explicit and BDD-based symbolic model checking, i.e., no SAT-based, temporal-resolution, tableaux-based tools. Interest in Temporal Resolution and its potential for extraction of unsatisfi- able cores ([Sch10]) No recent evaluation of LTL satisfiability solvers using a broad range of algorithms, a broad range of benchmarks, and comprehensive criteria available. Author: V. Schuppan

  4. Objective 4 Objective: compare performance of off-the-shelf solvers for propositional LTL satisfiability. This is a comparison of tools (as opposed to one of algorithms). – Different features (e.g., preprocessing/simplification). – Different maturity. – Different programming languages. Author: V. Schuppan

  5. Outline 5 1. Introduction 2. LTL Solvers 3. Benchmarks 4. Methodology 5. Findings 6. Conclusions Author: V. Schuppan

  6. Selection of LTL Solvers 6 Reduction to Model Checking – Selected: ALASKA , NuSMV-BDD , NuSMV-SBMC – Ruled out: explicit state model checkers [RV10], Cadence SMV (BDDs) [RV10], SAL [RV10], VIS (BDDs) – Last hardware model checking competition [HWMCC10] focuses on safety Tableau-Based Algorithms – Selected: LWB , pltl – Ruled out: TWB , LTL Tableau [GKS09] Temporal Resolution – Selected: TRP++ , TSPASS – Ruled out: TeMP [HKR+04,LH10] Author: V. Schuppan

  7. Selection of Benchmarks 7 Use families from previous comparisons [WDMR08,RV10,HS02]. – But: restrict number of instances in random category. Add families not used for LTL satisfiability before: acacia , amba , forobots . Create new families: O1formula , O2formula , phltl . Scale up families. Add variants that enforce non-trivial behavior. To my knowledge this is the most comprehensive set of benchmarks in comparing propositional LTL satisfiability solvers to date. Author: V. Schuppan

  8. Benchmarks 8 Family Description #Inst./uns. Max. | φ | Source Category application acacia Arbiters and traffic light controllers 71/- 426 [FJR09] alaska lift Elevator specifications 136/34 4450 [WDMR08] alaska szymanski Mutual exclusion protocol 4/- 183 [WDMR08] anzu amba Microcontroller buffer architecture 51/- 6173 [BGJ+07a] anzu genbuf Generalized buffer 60/- 5805 [BGJ+07b] forobots Model of a robot with properties 39/25 636 [BDF09] Category crafted rozier counter 4 variants of a serial counter 78/- 751 [RV10] rozier pattern 8 scalable patterns to trigger diffi- 244/- 7992 [RV10] culties in LTL to B¨ uchi translators schup. O1/2form. Expon. behavior in some solvers 54/42 6001 schuppan phltl Temporal variant of pigeonhole 18/10 40501 Category random rozier formulas Obtained by generating a syntax 2000/57 185 [RV10] tree [DGV99] trp Obtained by lifting propositional 970/397 1422 [HS02] CNF into fixed temporal structure Author: V. Schuppan

  9. Setup, Flow 9 Flow Setup Preliminary Stage Hardware/Software – Purpose: reduce number of configu- – Intel Xeon 3.0 GHz rations for TRP++ and TSPASS – 4 GB memory – 10 second time out – Red Had Linux 5.4, 64 bit kernel 2.6.18 Main Stage – Measure time, mem- – All remaining configurations for all ory with run [BJ] solvers No shuffling of bench- – 60 second time out marks Graphical Evaluation One run per instance and – Choose one winning configuration solver configuration per solver based on highest score Memory out: 2 GB Author: V. Schuppan

  10. Scoring 10 Objective: – Score by highest number of solved instances; break ties by lower time taken on solved instances. (Frequently used.) Problem: – Benchmark families with very different numbers of instances. – Smallest family has 4 instances; largest has > 2000 . Solution: – Arrange benchmark families in tree. – Assign equal weight to the (immediate) children of each node. Caveat: – The weight of an instance may change between different scores, e.g., share of solved instances (all instances count) and run time on solved instances (only solved instances count). – The weight of an instance may change between different solvers for the same score, e.g., run time on solved instances (only instances solved by particular solver count). Author: V. Schuppan

  11. Cactus Plots vs. Contour/Discrete Raw Data Plots 11 Cactus Plots – are standard; – easily allow to identify the winner when ranking by highest number of solved instances with ties broken by time spent on solved instances; – break the correlation between different solvers on the same instance. Contour/Discrete Raw Data Plots – retain the correlation between different solvers on the same instance; – easily allow to identify similar and complementary behavior; – easily allow to see performance of a solver on subfamilies; – easily allow to see difficulty of instances and subfamilies. Author: V. Schuppan

  12. Correctness 12 We found 1 or 2 bugs each in ALASKA , NuSMV , TRP++ , and TSPASS . – Kindly fixed quickly by tool authors. We also found bugs in LWB . – We contacted the developers. – We received no response. – 187 out of 7446 instances (known) buggy. – 13 wrong results. – Others abnormal termination. – Hors concours. Author: V. Schuppan

  13. Winning Configurations per Tool 13 We select one winning configuration (column max) per tool. model construction disabled ( sat and unsat instances) tool winning configuration max min vbs 0.581 0.322 0.595 noc nos nob ALASKA 0.740 0.656 0.800 LWB mod 0.743 0.607 0.823 dcx fflt dyn elbwd NuSMV-BDD 0.723 0.651 0.726 nodcx c NuSMV-SBMC 0.694 0.687 0.702 pltl tree 0.752 0.593 0.776 s r noal bfs nop fsr TRP++ 0.667 0.479 0.670 ext nogrp nosev sub nosls rfmrr- TSPASS norbmrr nomod mor Numbers: weighted share of solved instances. vbs: virtual best solver. Author: V. Schuppan

  14. Run Times (Contour/Discrete Raw Data Plots) 14 application ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS s anzu−genbuf acacia alaska−lift anzu−amba forobots crafted ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS rozier−pattern schuppan rozier−counter random ALASKA LWB NuSMV−BDD NuSMV−SBMC pltl TRP++ TSPASS trp rozier−formulas ≤ 0 . 1 sec; > 0 . 1 sec, ≤ 1 sec; > 1 sec, ≤ 10 sec; > 10 sec, ≤ 60 sec; unsolved. Author: V. Schuppan

  15. Run Times Sat vs. Unsat Instances 15 alaska- lift forobots formulas rozier- trp NuSMV- NuSMV- pltl ALASKA LWB TRP++ TSPASS SBMC BDD key: sat unsat x-axes: number of solved instances y-axes: run time Instances selected to “equalize” features. Author: V. Schuppan

  16. Run Times by Instance Size 16 cation appli- crafted formulas rozier- trp NuSMV- NuSMV- pltl ALASKA LWB TRP++ TSPASS BDD SBMC x-axes: instance size y-axes: run time Author: V. Schuppan

  17. Model Sizes 17 all satisfiable instances 20480 vbs 10000 ALASKA LWB NuSMV-BDD NuSMV-SBMC TSPASS 1000 [# states] 100 10 1 Author: V. Schuppan

  18. Right y-axis: weighted average run time on solved instances [seconds]. Left y-axis: weighted share of solved instances. portfolio members. Unrealistic. Best case scenario for portfolio solvers without communication between Mode: perfect oracle selects best portfolio member for any given instance. Potential of a Portfolio Solver (Perfect Oracle) 0.5 0.6 0.7 0.8 0.9 1 0 6 4 3 1 2 5 02 04 01 56 06 05 03 34 056 13 14 46 weighted average run time on solved instances 013 034 26 014 25 046 12 012 24 024 45 045 026 025 456 weighted share of solved instances 0456 256 134 23 023 0256 123 0123 36 0134 124 0124 346 036 0346 234 0234 35 356 035 0356 345 3456 0345 03456 246 0246 16 1234 01234 245 0245 016 146 2456 02456 0146 15 156 015 0156 145 1456 0145 01456 126 0126 1246 01246 236 0236 125 0125 1256 01256 2346 02346 136 1346 1245 01245 12456 012456 0136 01346 235 0235 2356 02356 2345 02345 23456 023456 135 1356 1345 13456 0135 01345 01356 013456 1236 01236 12346 012346 1235 01235 12345 012345 12356 012356 Author: V. Schuppan 123456 0123456 all 0.1 0.316 1 3.16 10 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend