Sound and Quasi-Complete Detection of Infeasible Test Requirements - - PowerPoint PPT Presentation

sound and quasi complete detection of infeasible test
SMART_READER_LITE
LIVE PREVIEW

Sound and Quasi-Complete Detection of Infeasible Test Requirements - - PowerPoint PPT Presentation

Sound and Quasi-Complete Detection of Infeasible Test Requirements Robin David S ebastien Bardin Micka el Delahaye Nickola Kosmatov 30 juillet 2015 Outline Introduction Overview Checking assertion validity Implementation


slide-1
SLIDE 1

— 30 juillet 2015

Sound and Quasi-Complete Detection of Infeasible Test Requirements

Robin David S´ ebastien Bardin Micka¨ el Delahaye Nickola¨ ı Kosmatov

slide-2
SLIDE 2

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 2/27

slide-3
SLIDE 3

Context

Testing process Generate a test input Run it and check for errors Estimate coverage : if enough stop, else loop Coverage criteria [decision, mcdc, mutants, etc.] play a major role generate tests, decide when to stop, assess quality of testing definition : systematic way of deriving test requirements

CEA - 30 juillet 2015 - 3/27

slide-4
SLIDE 4

Context

Testing process Generate a test input Run it and check for errors Estimate coverage : if enough stop, else loop Coverage criteria [decision, mcdc, mutants, etc.] play a major role generate tests, decide when to stop, assess quality of testing definition : systematic way of deriving test requirements

CEA - 30 juillet 2015 - 3/27

The enemy : Infeasible test requirements waste generation effort, imprecise coverage ratios cause : structural coverage criteria are ... structural detecting infeasible test requirements is undecidable → Recognized as a hard and important issue in testing

slide-5
SLIDE 5

Context

Testing process Generate a test input Run it and check for errors Estimate coverage : if enough stop, else loop Coverage criteria [decision, mcdc, mutants, etc.] play a major role generate tests, decide when to stop, assess quality of testing definition : systematic way of deriving test requirements

CEA - 30 juillet 2015 - 3/27

Testing oriented *but* scope beyond that : → original combination of two formal methods

slide-6
SLIDE 6

Our goals and results

→ Focus on white-box (structural) coverage criteria Goals : automatic detection of infeasible test requirements sound method [thus, incomplete] applicable to a large class of coverage criteria strong detection power, reasonable detection speed rely as much as possible on existing verification methods

CEA - 30 juillet 2015 - 4/27

slide-7
SLIDE 7

Our goals and results

→ Focus on white-box (structural) coverage criteria Goals : automatic detection of infeasible test requirements sound method [thus, incomplete] applicable to a large class of coverage criteria strong detection power, reasonable detection speed rely as much as possible on existing verification methods Results automatic, sound and generic method new combination of existing verification technologies experimental results : strong detection power [95%], reasonable detection speed [≤ 1s/obj.], improve test generation yet to be proved : scalability on large programs ?

[promising results..]

CEA - 30 juillet 2015 - 4/27

slide-8
SLIDE 8

Our goals and results

→ Focus on white-box (structural) coverage criteria Goals : automatic detection of infeasible test requirements sound method [thus, incomplete] applicable to a large class of coverage criteria strong detection power, reasonable detection speed rely as much as possible on existing verification methods Results automatic, sound and generic method new combination of existing verification technologies experimental results : strong detection power [95%], reasonable detection speed [≤ 1s/obj.], improve test generation yet to be proved : scalability on large programs ?

[promising results..]

CEA - 30 juillet 2015 - 4/27

Take away

VA ⊕ WP better than VA, WP plug-in Frama-C

slide-9
SLIDE 9

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 5/27

slide-10
SLIDE 10

Background : Labels

Annotate programs with labels [ICST 2014]

predicate attached to a specific program instruction

Label (loc, ϕ) is covered if a test execution

reaches the instruction at loc satisfies the predicate ϕ

Good for us

can easily encode a large class of coverage criteria [see after] in the scope of standard program analysis techniques

CEA - 30 juillet 2015 - 6/27

slide-11
SLIDE 11

Background : Labels

Annotate programs with labels [ICST 2014]

predicate attached to a specific program instruction

Label (loc, ϕ) is covered if a test execution

reaches the instruction at loc satisfies the predicate ϕ

Good for us

can easily encode a large class of coverage criteria [see after] in the scope of standard program analysis techniques infeasible label (loc, ϕ) ⇔ valid assertion (loc, assert¬ϕ)

CEA - 30 juillet 2015 - 6/27

slide-12
SLIDE 12

Infeasible labels, valid assertions

int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //l1: res == 0 // infeasible }

CEA - 30 juillet 2015 - 7/27

slide-13
SLIDE 13

Infeasible labels, valid assertions

int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //@assert res = 0 // valid }

CEA - 30 juillet 2015 - 7/27

slide-14
SLIDE 14

Standard coverage criteria

Also Weak Mutation, GACC (weak MCDC) etc.

CEA - 30 juillet 2015 - 8/27

slide-15
SLIDE 15

Overview of the approach

CEA - 30 juillet 2015 - 9/27

labels as a unifying criteria label infeasibility ⇔ assertion validity s-o-t-a verification for assertion checking

slide-16
SLIDE 16

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 10/27

slide-17
SLIDE 17

Checking assertion validity

Two broad categories of sound assertion checkers Value Analysis : state-approximation

compute an invariant of the program then, analyze all assertions (labels) in one run

Weakest-Precondition calculus : Goal-oriented checking

perform a dedicated check for each assertion a single check usually easier, but many of them

CEA - 30 juillet 2015 - 11/27

slide-18
SLIDE 18

Focus : checking assertion validity (2)

VA WP sound for assert validity

  • blackbox reuse
  • local precision

×

  • calling context
  • ×

calls / loop effects

  • ×

global precision

× ×

scalability wrt. #labels

  • scalability wrt. code size

×

  • hypothesis : VA is interprocedural

CEA - 30 juillet 2015 - 12/27

slide-19
SLIDE 19

VA and WP may fail

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //l1: res == 0 }

CEA - 30 juillet 2015 - 13/27

slide-20
SLIDE 20

VA and WP may fail

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //@assert res = 0 }

CEA - 30 juillet 2015 - 13/27

slide-21
SLIDE 21

VA and WP may fail

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //@assert res = 0 // both VA and WP fail }

CEA - 30 juillet 2015 - 13/27

slide-22
SLIDE 22

Proposal : VA ⊕ WP (1)

Goal = get the best of the two worlds idea : VA passes to WP the global info. it lacks Which information, and how to transfer it ? VA computes (internally) some form of invariants WP naturally takes into account assumptions //@ assume → Solution : VA exports its invariants on the form of WP-assumptions (Frama-C→ACSL)

CEA - 30 juillet 2015 - 14/27

slide-23
SLIDE 23

Proposal : VA ⊕ WP (1)

Goal = get the best of the two worlds idea : VA passes to WP the global info. it lacks Which information, and how to transfer it ? VA computes (internally) some form of invariants WP naturally takes into account assumptions //@ assume → Solution : VA exports its invariants on the form of WP-assumptions (Frama-C→ACSL) Notes : No manually-inserted WP-assumption

CEA - 30 juillet 2015 - 14/27

slide-24
SLIDE 24

VA⊕WP succeeds!

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { int res; if(x+a >= x) res = 1; else res = 0; //l1: res == 0 }

CEA - 30 juillet 2015 - 15/27

slide-25
SLIDE 25

VA⊕WP succeeds!

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { //@assume 0 <= a <= 20 //@assume 0 <= x <= 1000 int res; if(x+a >= x) res = 1; else res = 0; //@assert res != 0 }

CEA - 30 juillet 2015 - 15/27

slide-26
SLIDE 26

VA⊕WP succeeds!

int main () { int a = nondet (0 .. 20); int x = nondet (0 .. 1000); return g(x,a); } int g(int x, int a) { //@assume 0 <= a <= 20 //@assume 0 <= x <= 1000 int res; if(x+a >= x) res = 1; else res = 0; //@assert res != 0 // VA ⊕ WP succeeds }

CEA - 30 juillet 2015 - 15/27

slide-27
SLIDE 27

Proposal : VA ⊕ WP (2)

Exported invariants

  • nly names appearing in program

independent from memory size

non-relational information

linear in VA

  • nly numerical information

sets, intervals, congruence

CEA - 30 juillet 2015 - 16/27

slide-28
SLIDE 28

Proposal : VA ⊕ WP (2)

Soundness ok as long as VA is sound Exhaustivity of “export” only affect deductive power Finding the right trade-off in practice : exhaustive export has very low overhead

CEA - 30 juillet 2015 - 17/27

slide-29
SLIDE 29

Invariant export strategies

int fun(int a, int b, int c) { //@assume a [...] //@assume b [...] //@assume c [...] int x=c; //@assert a < b if(a < b) {...} else {...} }

CEA - 30 juillet 2015 - 18/27

Parameters annotations

slide-30
SLIDE 30

Invariant export strategies

int fun(int a, int b, int c) { int x=c; //@assume a [...] //@assume b [...] //@assert a < b if(a < b) {...} else {...} }

CEA - 30 juillet 2015 - 18/27

Label annotations

slide-31
SLIDE 31

Invariant export strategies

int fun(int a, int b, int c) { //@assume a [...] //@assume b [...] //@assume c [...] int x=c; //@assume x [...] //@assume a [...] //@assume b [...] //@assert a < b if(a < b) {...} else {...} }

CEA - 30 juillet 2015 - 18/27

Complete annotations

slide-32
SLIDE 32

Invariant export strategies

int fun(int a, int b, int c) { //@assume a [...] //@assume b [...] //@assume c [...] int x=c; //@assume x [...] //@assume a [...] //@assume b [...] //@assert a < b if(a < b) {...} else {...} }

Conclusion: Complete annotation very slight overhead

(but label annotation experimentaly the best trade-off).

CEA - 30 juillet 2015 - 18/27

Complete annotations

slide-33
SLIDE 33

Summary

VA WP VA ⊕ WP sound for assert validity

  • blackbox reuse
  • local precision

×

  • calling context
  • ×
  • calls / loop effects
  • ×
  • global precision

× × ×

scalability wrt. #labels

  • scalability wrt. code size

×

  • ?

CEA - 30 juillet 2015 - 19/27

slide-34
SLIDE 34

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 20/27

slide-35
SLIDE 35

Implementation inside LTest

CEA - 30 juillet 2015 - 21/27 Program Annotated Program Test Suite Label Annotation Test Generation Existing Test Suite Coverage Report Test Execution Uncoverable Labels Uncoverable Detection

Frama-C plugin called LTest sound detection ! several modes : VA, WP, VA ⊕ WP based on PathCrawler for DSE⋆ and test generation Service cooperation share label statuses Covered, Infeasible, ?

slide-36
SLIDE 36

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 22/27

slide-37
SLIDE 37

Experiments

RQ1 : How effective are the static analyzers in detecting infeasible test requirements ? RQ2 : To what extent can we improve test generation by detecting infeasible test requirements ? Standard (test generation) benchmarks [Siemens, Verisec, Mediabench] 12 programs (50-300 loc), 3 criteria (CC, MCC, WM) 26 pairs (program, coverage criterion) 1,270 test requirements, 121 infeasible ones

CEA - 30 juillet 2015 - 23/27

slide-38
SLIDE 38

RQ1 : detection power

#Lab #Inf VA WP VA ⊕ WP #d %d #d %d #d %d Total 1,270 121 84 69% 73 60% 118 98% Min 0% 0% 2 67% Max 29 29 100% 15 100% 29 100% Mean 4.7 3.2 63% 2.8 82% 4.5 95% #d : number of detected infeasible labels %d : ratio of detected infeasible labels

Verif : VA ⊕ WP perform better than VA or WP alone Testing : VA ⊕ WP achieves almost perfect detection

CEA - 30 juillet 2015 - 24/27

slide-39
SLIDE 39

RQ2 : Impact on test generation

→ report a more accurate coverage ratio

Coverage ratio reported by DSE⋆ Detection method None VA WP VA ⊕WP Perfect* Total 90.5% 96.9% 95.9% 99.2% 100.0% Min 61.54% 80.0% 67.1% 91.7% 100.0% Max 100.00% 100.0% 100.0% 100.0% 100.0% Mean 91.10% 96.6% 97.1% 99.2% 100.0% * preliminary, manual detection of infeasible labels

→ speedup test generation Beware can be slower in the worse case Gain, max : 55x, mean :2.2x (wit RT)

CEA - 30 juillet 2015 - 25/27

slide-40
SLIDE 40

Outline

Introduction Overview Checking assertion validity Implementation Experiments Conclusion

CEA - 30 juillet 2015 - 26/27

slide-41
SLIDE 41

Conclusion

Challenge detection of infeasible test requirements Results automatic, sound and generic method

rely on labels and a new combination VA ⊕ WP

promising experimental results

strong detection power [95%] reasonable detection speed [≤ 1s/obj.] improve test generation [better coverage ratios, speedup]

Future work : scalability on larger programs explore trade-offs of VA ⊕ WP application for verification(safety), and security → LTest available at http://micdel.fr/ltest.html

CEA - 30 juillet 2015 - 27/27

slide-42
SLIDE 42

Questions ?

Direction de la Recherche Technologique D´ epartement d’Ing´ enierie des Logiciels et des Syst` emes Laboratoire de Sˆ uret´ e des Logiciels Commissariat l’´ energie atomique et aux ´ energies alternatives Institut Carnot CEA LIST Centre de Saclay — 91191 Gif-sur-Yvette Cedex Etablissement public ` a caract` ere industriel et commercial — RCS Paris B 775 685 019