Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by - - PowerPoint PPT Presentation

evaluating ltl satisfiability solvers
SMART_READER_LITE
LIVE PREVIEW

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by - - PowerPoint PPT Presentation

Evaluating LTL Satisfiability Solvers Viktor Schuppan Supported by the Provincia Autonoma di Trento (project EMTELOS). j.w.w. Luthfi Darmawan Supported by the European Masters Program in Computational Logic (EMCL). ATVA11, Taipei, Taiwan,


slide-1
SLIDE 1

Evaluating LTL Satisfiability Solvers

Viktor Schuppan

Supported by the Provincia Autonoma di Trento (project EMTELOS).

j.w.w. Luthfi Darmawan

Supported by the European Master’s Program in Computational Logic (EMCL).

ATVA’11, Taipei, Taiwan, October 11–14, 2011

slide-2
SLIDE 2

LTL Satisfiability – (Our) Motivation

2

Verification gains momentum ⇒ specifications become object of interest Investigation of specifications Property Simulation (e.g., RAT [PSC+06]) – Example traces, possibly with constraints – Makes properties executable Property Assurance (e.g., RAT [PSC+06]) – Possibilities, assertions Sanity Checks (e.g., RAT [PSC+06]; [RV10]; [FKSFV08]) – Satisfiability, non-validity, non-redundancy Boil down to LTL satisfiability. (Note: checks beyond satisfiability are important, too.)

Author: V. Schuppan

slide-3
SLIDE 3

LTL Satisfiability – (Our) Motivation

3

Some specific triggers Antichains for LTL satisfiability ([WDMR08]) – Claims advantage of ALASKA over NuSMV-BDD LTL satisfiability solver comparison by Rozier and Vardi ([RV10]) – Focus on LTL satisfiability via explicit and BDD-based symbolic model checking, i.e., no SAT-based, temporal-resolution, tableaux-based tools. Interest in Temporal Resolution and its potential for extraction of unsatisfi- able cores ([Sch10]) No recent evaluation of LTL satisfiability solvers using a broad range of algorithms, a broad range of benchmarks, and comprehensive criteria available.

Author: V. Schuppan

slide-4
SLIDE 4

Objective

4

Objective: compare performance of

  • ff-the-shelf solvers for propositional LTL satisfiability.

This is a comparison of tools (as opposed to one of algorithms). – Different features (e.g., preprocessing/simplification). – Different maturity. – Different programming languages.

Author: V. Schuppan

slide-5
SLIDE 5

Outline

5

  • 1. Introduction
  • 2. LTL Solvers
  • 3. Benchmarks
  • 4. Methodology
  • 5. Findings
  • 6. Conclusions

Author: V. Schuppan

slide-6
SLIDE 6

Selection of LTL Solvers

6

Reduction to Model Checking – Selected: ALASKA, NuSMV-BDD, NuSMV-SBMC – Ruled out: explicit state model checkers [RV10], Cadence SMV (BDDs) [RV10], SAL [RV10], VIS (BDDs) – Last hardware model checking competition [HWMCC10] focuses on safety Tableau-Based Algorithms – Selected: LWB, pltl – Ruled out: TWB, LTL Tableau [GKS09] Temporal Resolution – Selected: TRP++, TSPASS – Ruled out: TeMP [HKR+04,LH10]

Author: V. Schuppan

slide-7
SLIDE 7

Selection of Benchmarks

7

Use families from previous comparisons [WDMR08,RV10,HS02]. – But: restrict number of instances in random category. Add families not used for LTL satisfiability before: acacia, amba, forobots. Create new families: O1formula, O2formula, phltl. Scale up families. Add variants that enforce non-trivial behavior. To my knowledge this is the most comprehensive set of benchmarks in comparing propositional LTL satisfiability solvers to date.

Author: V. Schuppan

slide-8
SLIDE 8

Benchmarks

8

Family Description #Inst./uns. Max. |φ| Source Category application acacia Arbiters and traffic light controllers 71/- 426 [FJR09] alaska lift Elevator specifications 136/34 4450 [WDMR08] alaska szymanski Mutual exclusion protocol 4/- 183 [WDMR08] anzu amba Microcontroller buffer architecture 51/- 6173 [BGJ+07a] anzu genbuf Generalized buffer 60/- 5805 [BGJ+07b] forobots Model of a robot with properties 39/25 636 [BDF09] Category crafted rozier counter 4 variants of a serial counter 78/- 751 [RV10] rozier pattern 8 scalable patterns to trigger diffi- culties in LTL to B¨ uchi translators 244/- 7992 [RV10]

  • schup. O1/2form. Expon. behavior in some solvers

54/42 6001 schuppan phltl Temporal variant of pigeonhole 18/10 40501 Category random rozier formulas Obtained by generating a syntax tree [DGV99] 2000/57 185 [RV10] trp Obtained by lifting propositional CNF into fixed temporal structure 970/397 1422 [HS02]

Author: V. Schuppan

slide-9
SLIDE 9

Setup, Flow

9

Flow Setup Preliminary Stage – Purpose: reduce number of configu- rations for TRP++ and TSPASS – 10 second time out Main Stage – All remaining configurations for all solvers – 60 second time out Graphical Evaluation – Choose one winning configuration per solver based on highest score Hardware/Software – Intel Xeon 3.0 GHz – 4 GB memory – Red Had Linux 5.4, 64 bit kernel 2.6.18 – Measure time, mem-

  • ry with run [BJ]

No shuffling

  • f

bench- marks One run per instance and solver configuration Memory out: 2 GB

Author: V. Schuppan

slide-10
SLIDE 10

Scoring

10

Objective: – Score by highest number of solved instances; break ties by lower time taken on solved instances. (Frequently used.) Problem: – Benchmark families with very different numbers of instances. – Smallest family has 4 instances; largest has > 2000. Solution: – Arrange benchmark families in tree. – Assign equal weight to the (immediate) children of each node. Caveat: – The weight of an instance may change between different scores, e.g., share of solved instances (all instances count) and run time on solved instances (only solved instances count). – The weight of an instance may change between different solvers for the same score, e.g., run time on solved instances (only instances solved by particular solver count).

Author: V. Schuppan

slide-11
SLIDE 11

Cactus Plots vs. Contour/Discrete Raw Data Plots

11

Cactus Plots – are standard; – easily allow to identify the winner when ranking by highest number of solved instances with ties broken by time spent on solved instances; – break the correlation between different solvers on the same instance. Contour/Discrete Raw Data Plots – retain the correlation between different solvers on the same instance; – easily allow to identify similar and complementary behavior; – easily allow to see performance of a solver on subfamilies; – easily allow to see difficulty of instances and subfamilies.

Author: V. Schuppan

slide-12
SLIDE 12

Correctness

12

We found 1 or 2 bugs each in ALASKA, NuSMV, TRP++, and TSPASS. – Kindly fixed quickly by tool authors. We also found bugs in LWB. – We contacted the developers. – We received no response. – 187 out of 7446 instances (known) buggy. – 13 wrong results. – Others abnormal termination. – Hors concours.

Author: V. Schuppan

slide-13
SLIDE 13

Winning Configurations per Tool

13

We select one winning configuration (column max) per tool.

model construction disabled (sat and unsat instances) tool winning configuration max min vbs ALASKA noc nos nob 0.581 0.322 0.595 LWB mod 0.740 0.656 0.800 NuSMV-BDD dcx fflt dyn elbwd 0.743 0.607 0.823 NuSMV-SBMC nodcx c 0.723 0.651 0.726 pltl tree 0.694 0.687 0.702 TRP++ s r noal bfs nop fsr 0.752 0.593 0.776 TSPASS ext nogrp nosev sub nosls rfmrr- norbmrr nomod mor 0.667 0.479 0.670

Numbers: weighted share of solved instances. vbs: virtual best solver.

Author: V. Schuppan

slide-14
SLIDE 14

Run Times (Contour/Discrete Raw Data Plots)

14

acacia alaska−lift s anzu−amba anzu−genbuf forobots TSPASS TRP++ pltl NuSMV−SBMC NuSMV−BDD LWB ALASKA

application

rozier−counter rozier−pattern schuppan TSPASS TRP++ pltl NuSMV−SBMC NuSMV−BDD LWB ALASKA

crafted

rozier−formulas trp TSPASS TRP++ pltl NuSMV−SBMC NuSMV−BDD LWB ALASKA

random

≤ 0.1 sec; > 0.1 sec, ≤ 1 sec; > 1 sec, ≤ 10 sec; > 10 sec, ≤ 60 sec; unsolved.

Author: V. Schuppan

slide-15
SLIDE 15

Run Times Sat vs. Unsat Instances

15

alaska- lift forobots rozier- formulas trp ALASKA LWB NuSMV- BDD NuSMV- SBMC pltl TRP++ TSPASS

key: sat unsat x-axes: number of solved instances y-axes: run time Instances selected to “equalize” features.

Author: V. Schuppan

slide-16
SLIDE 16

Run Times by Instance Size

16

appli- cation crafted rozier- formulas trp ALASKA LWB NuSMV- BDD NuSMV- SBMC pltl TRP++ TSPASS

x-axes: instance size y-axes: run time

Author: V. Schuppan

slide-17
SLIDE 17

Model Sizes

17

1 10 100 1000 10000 20480

[# states] all satisfiable instances

vbs ALASKA LWB NuSMV-BDD NuSMV-SBMC TSPASS

Author: V. Schuppan

slide-18
SLIDE 18

Potential of a Portfolio Solver (Perfect Oracle)

18

0.5 0.6 0.7 0.8 0.9 1

6 4 3 1 2 5 02 04 01 56 06 05 03 34 056 13 14 46 013 034 26 014 25 046 12 012 24 024 45 045 026 025 456 0456 256 134 23 023 0256 123 0123 36 0134 124 0124 346 036 0346 234 0234 35 356 035 0356 345 3456 0345 03456 246 0246 16 1234 01234 245 0245 016 146 2456 02456 0146 15 156 015 0156 145 1456 0145 01456 126 0126 1246 01246 236 0236 125 0125 1256 01256 2346 02346 136 1346 1245 01245 12456 012456 0136 01346 235 0235 2356 02356 2345 02345 23456 023456 135 1356 1345 13456 0135 01345 01356 013456 1236 01236 12346 012346 1235 01235 12345 012345 12356 012356 123456 0123456 all

0.1 0.316 1 3.16 10 weighted share of solved instances weighted average run time on solved instances

Mode: perfect oracle selects best portfolio member for any given instance. Best case scenario for portfolio solvers without communication between portfolio members. Unrealistic. Left y-axis: weighted share of solved instances. Right y-axis: weighted average run time on solved instances [seconds].

Author: V. Schuppan

slide-19
SLIDE 19

Potential of a Portfolio Solver (Perfect Task Switcher)

19

0.5 0.6 0.7 0.8 0.9 1

6 4 3 02 1 2 5 04 56 01 06 03 05 34 056 13 14 26 026 46 024 034 25 025 014 24 046 013 256 0256 012 12 023 23 134 45 045 0456 456 0123 36 0124 346 0134 124 123 036 0234 0246 0346 234 246 0356 356 01234 35 0245 035 02456 03456 3456 2456 0345 245 1234 345 0236 016 16 0146 146 02346 236 all 2346 0126 01246 1246 126 02356 156 15 01346 0156 023456 1346 015 136 0235 0136 2356 01456 1456 0145 145 012346 235 01236 12346 02345 23456 2345 01256 1236 0125 1256 012456 01245 12456 125 1245 1356 13456 01356 013456 01345 135 1345 0135 012356 0123456 012345 123456 01235 12356 12345 1235

0.1 0.316 1 3.16 10 weighted share of solved instances weighted average run time on solved instances

Mode: task switching with no overhead and infinitely small time slices be- tween portfolio members. Reference case scenario for portfolio solvers without communication between portfolio members. Should be beaten. Left y-axis: weighted share of solved instances. Right y-axis: weighted average run time on solved instances [seconds].

Author: V. Schuppan

slide-20
SLIDE 20

Potential of a Portfolio Solver (Fast Presolver)

20

Even a simplistic portfolio solver can yield considerable benefits. Pick 4 best 2-configuration portfolios. Run one as fast presolver [XHH+08] for a short time. If that fails, run the

  • ther solver for the remaining time.

Results 1st as fast presolver 2nd as fast presolver 1 second 2 seconds 1 second 2 seconds share time share time share time share time (LWB, TRP++) 0.880 1.09 0.885 1.30 0.841 1.26 0.850 1.45 (LWB, TSPASS) 0.868 0.88 0.874 1.10 0.850 1.20 0.858 1.48 (NuSMV-SBMC, TRP++) 0.823 1.03 0.841 1.18 0.860 0.97 0.862 1.31 (NuSMV-SBMC, TSPASS) 0.813 1.00 0.831 1.21 0.837 1.17 0.840 1.42 Reference 1st in 2nd in perfect

  • perf. task

isolation isolation

  • racle

switcher share time share time share time share time (LWB, TRP++) 0.740 2.59 0.752 3.03 0.896 0.89 0.894 1.12 (LWB, TSPASS) 0.740 2.59 0.667 1.91 0.889 1.16 0.881 1.27 (NuSMV-SBMC, TRP++) 0.723 1.47 0.752 3.03 0.880 1.11 0.874 1.37 (NuSMV-SBMC, TSPASS) 0.723 1.47 0.667 1.91 0.867 1.41 0.853 1.60

Author: V. Schuppan

slide-21
SLIDE 21

A Performance Advantage of ALASKA over NuSMV-BDD?

21

0.1 1 10 60 to mo 0.1 1 10 60tomo

NuSMV-BDD EL backward [sec] NuSMV-BDD EL forward [sec]

[WDMR08] claims an advantage of ALASKA over NuSMV-BDD. A difference beyond representation (antichains vs. BDDs) was direction of fixed point computations. (Forward fixed point computation was not avail- able in NuSMV-BDD when [WDMR08] was done.) Using appropriate options in NuSMV and forward rather than backward fixed point computation ALASKA does not outperform NuSMV-BDD.

Author: V. Schuppan

slide-22
SLIDE 22

The End

22

Summary Identification of reference solvers with options at instance level. No solver dominates. Rather, complementary behavior. We don’t declare any single solver to be the winner. A portfolio approach seems worth trying. Benchmarks, data, more plots available: http://www.schuppan.de/viktor/atva11/. Future Work Check out participants of HWMCC’11. Consider explicit state model checkers that handle the property on-the-fly. Have a proper competition?

Author: V. Schuppan

slide-23
SLIDE 23

Thanks

23

Thanks to

... you for your attention, ... J.-F. Raskin and N. Maquet for help with ALASKA and hosting the first author for one week, ... R. Gor´ e and F. Widmann for help with pltl, ... B. Konev and M. Ludwig for help with TRP++ and TSPASS, ... C. Dixon for the forobots family, ... B. Jobstmann and G. Hofferek for help with the amba family, ... K. Rozier for feedback, ... A. Artale for supervising the second author’s MSc thesis, ... the ES group at FBK, esp. A. Cimatti, A. Mariotti, and M. Roveri for discussion and support.

Questions?

Author: V. Schuppan

slide-24
SLIDE 24

References

24

BDF09

  • A. Behdenna, C. Dixon, and M. Fisher. Deductive verification of simple foraging robotic behaviours. International Journal of Intelligent Computing and

Cybernetics, 2009. BGJ+07a

  • R. Bloem, S. Galler, B. Jobstmann, N. Piterman, A. Pnueli, M. Weiglhofer. Automatic hardware synthesis from specifications: a case study. DATE’07.

BGJ+07b

  • R. Bloem, S. Galler, B. Jobstmann, N. Piterman, A. Pnueli, M. Weiglhofer. Specify, compile, run: Hardware from PSL. COCV’07.

FJR09

  • E. Filiot, N. Jin, J. Raskin. An antichain algorithm for LTL realizability. CAV’09.

FKSFV08

  • D. Fisman, O. Kupferman, S. Sheinvald-Faragy, M. Vardi. A Framework for Inherent Vacuity. HVC’08.

GKS09

  • V. Goranko, A. Kyrikov, D. Shkatov: Tableau Tool for Testing Satisfiability in LTL: Implementation and Experimental Analysis. M4M’09.

HKR+04

  • U. Hustadt, B. Konev, A. Riazanov, A. Voronkov. TeMP: A Temporal Monodic Prover. IJCAR’04.

HS02

  • U. Hustadt, R. Schmidt. Scientific benchmarking with temporal logic decision procedures. KR’02.

HWMCC10

  • A. Biere, K. Claessen. Hardware Model Checking Competition (presentation, slides only). HWVW’10.

LH10

  • M. Ludwig and U. Hustadt. Implementing a fair monodic temporal logic prover. AI Commun 23 (2-3) 2001: 69–96.

PSC+06

  • I. Pill, S. Semprini, R. Cavada, M. Roveri, R. Bloem, A. Cimatti. Formal analysis of hardware requirements. DAC’06.

RV10

  • K. Rozier and M. Vardi. LTL satisfiability checking. STTT, 12(2):123–137, 2010.

Sch10

  • V. Schuppan. Towards a Notion of Unsatisfiable and Unrealizable Cores for LTL. Science of Computer Programming, 2010. In press.

WDMR08

  • M. De Wulf, L. Doyen, N. Maquet, and J.-F. Raskin. Antichains: Alternative algorithms for LTL satisfibility and model-checking. TACAS’08.

XHH+08

  • L. Xu, F. Hutter, H. Hoos, and K. Leyton-Brown. SATzilla: Portfolio-based Algorithm Selection for SAT”. J. Artif. Intell. Res. (JAIR) 32 (2008): 565–606.

Author: V. Schuppan