Methodology for Comparison and Ranking of SAT Solvers Mladen Nikoli - - PowerPoint PPT Presentation

methodology for comparison and ranking of sat solvers
SMART_READER_LITE
LIVE PREVIEW

Methodology for Comparison and Ranking of SAT Solvers Mladen Nikoli - - PowerPoint PPT Presentation

Introduction Preliminaries Methodology Evaluation Related work Conclusions Methodology for Comparison and Ranking of SAT Solvers Mladen Nikoli c Third Workshop on Formal and Automated Theorem Prooving and Applications January 29, 2010.


slide-1
SLIDE 1

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Methodology for Comparison and Ranking of SAT Solvers

Mladen Nikoli´ c Third Workshop on Formal and Automated Theorem Prooving and Applications January 29, 2010.

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-2
SLIDE 2

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-3
SLIDE 3

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-4
SLIDE 4

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Comparison of SAT solvers

SAT solvers Importance of SAT solver comparison

Large number of proposed modifications each year Their usefulness is not self-evident We need to discriminate better between good and bad ideas

Current approach

Unreliable Sometimes inconclusive No discussion if the observed difference could arise by chance

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-5
SLIDE 5

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Motivation

Graph coloring Industrial Solver Best Worst Best Worst MiniSAT 09z 180 157 159 112 minisat cumr r 190 180 150 108 minisat2 200 183 140 93 MiniSat2hack 200 183 141 94

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-6
SLIDE 6

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Main goals

Eliminate chance effects from the comparison Decide if there is an overall positive or negative effect Give an information on statistical significance of the difference

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-7
SLIDE 7

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Main difficulties

Censored observations Comparison of distributions of solving times for one instance Combining conclusions obtained on individual instances

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-8
SLIDE 8

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-9
SLIDE 9

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Statistical hypothesis testing

Null hypothesis H0 Test statistic T p = P(|T| ≥ t|H0) If p < α then reject H0 Effect size

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-10
SLIDE 10

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Comparing two distributions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-11
SLIDE 11

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Point biserial correlation

Point biserial correlation ρpb can be estimated by rpb = N

i=1(Xi − X)(Yi − Y )

N

i=1(Xi − X)2

N

i=1(Yi − Y )2

ρpb, rpb ∈ [−1, +1]

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-12
SLIDE 12

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Point biserial correlation

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-13
SLIDE 13

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Handling censored data

Gehan statistic WG E(WG) = P(X > Y ) − P(X < Y )

1−E(WG ) 2

= P(X < Y )

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-14
SLIDE 14

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-15
SLIDE 15

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Sketch of the methodology

H0: no difference in solver performance Choose the level of statistical significance α Calculate differences di between samples of solving times of Fi Under the null hypothesis the average of di shouldn’t be too large Estimate the p value and check the significance of the average difference Check and interpret the effect size

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-16
SLIDE 16

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Choice of function d

What could be a good choice for function d?

ρpb? π = P(X < Y )?

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-17
SLIDE 17

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Choice of function d

Theorem Under some reasonable conditions the following relations hold WG = SRSY n1n2 rpb (1) var(WG)

S2

RS2 Y

n2

1n2 2 var(rpb)

→ 1 (n1 + n2 → ∞) (2) where SX =

  • n1+n2
  • i=1

(Xi − X)2

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-18
SLIDE 18

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Determining statistical significance

How is the average of di distributed (choosing rpb for di)? z = 1 M

M

  • i=1

z(ri) z ∼ N

  • 1

M

M

  • i=0

z(ρi), 1 M2

M

  • i=1

var(ri) (1 − r2

i )2

  • Mladen Nikoli´

cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-19
SLIDE 19

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Determining effect size

Averages of estimates of ρpb or π on individual formulae

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-20
SLIDE 20

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Ranking

Potential problems with transitivity P(A > B) > 1

2, P(B > C) > 1 2 ⇒ P(A > C) > 1 2

Kendall-Wei method

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-21
SLIDE 21

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-22
SLIDE 22

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Results of comparison

α = 0.05 Only the difference between S3 and S4 is insignificant ρpb π S1 S2 S3 S4 S1 S2 S3 S4 S1

  • 0.326

0.636 0.636

  • 0.320

0.140 0.141 S2

  • 0.326
  • 0.465

0.464 0.680

  • 0.239

0.239 S3

  • 0.636
  • 0.465
  • 0.010

0.860 0.761

  • 0.506

S4

  • 0.636
  • 0.464
  • 0.010
  • 0.859

0.761 0.494

  • Mladen Nikoli´

cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-23
SLIDE 23

Introduction Preliminaries Methodology Evaluation Related work Conclusions

How many shuffles do we need?

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-24
SLIDE 24

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-25
SLIDE 25

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Related work

Daniel Le Berre, Laurent Simon (2004) — shuffling might be important for SAT solver comparison Franc Brglez, et al. (2005, 2007) — use of standard statistical tests to compare two solvers on one instance yielding p value (statistical significance)

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-26
SLIDE 26

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Overview

1 Introduction 2 Preliminaries 3 Methodology 4 Evaluation 5 Related work 6 Conclusions

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers

slide-27
SLIDE 27

Introduction Preliminaries Methodology Evaluation Related work Conclusions

Conclusions

Current approach is unreliable New, statistically founded, methodology

Offers more reliable information Could make identifying good ideas easier

Total computational cost can actually stay the same

Mladen Nikoli´ cThird Workshop on Formal and Automated Theorem Prooving and Applications Methodology for Comparison and Ranking of SAT Solvers