Pruning the Search Space in Path-based Test Generation Motivation - - PowerPoint PPT Presentation

pruning the search space in path based test generation
SMART_READER_LITE
LIVE PREVIEW

Pruning the Search Space in Path-based Test Generation Motivation - - PowerPoint PPT Presentation

Pruning the Search Space in Path-based Test Generation Motivation SE S ebastien Bardin Heuristics sebastien.bardin@cea.fr Experiments CEA-LIST, Software Security Labs Conclusion (joint work with Philippe Herrmann) S ebastien Bardin,


slide-1
SLIDE 1

Motivation SE Heuristics Experiments Conclusion

Pruning the Search Space in Path-based Test Generation

S´ ebastien Bardin sebastien.bardin@cea.fr

CEA-LIST, Software Security Labs

(joint work with Philippe Herrmann)

S´ ebastien Bardin, Philippe Herrmann 1/ 31

slide-2
SLIDE 2

Motivation SE Heuristics Experiments Conclusion

Context

Automatic test data generation from source code (STDG) The test suite must achieve a global structural coverage objective all instructions, all branches, etc. Do not consider the oracle generation issue : assume an external automatic oracle perfect oracle (back-to-back testing) partial oracle (assertions / contracts)

S´ ebastien Bardin, Philippe Herrmann 2/ 31

slide-3
SLIDE 3

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution

Symbolic Execution (SE) is a very fruitful approach for STDG efficiency robustness SE in a nutshell Constraint-based reasoning : translate a part of the program into a logical formula ϕ, such that a solution of ϕ is a relevant TD Path-based approach : focus on a single path at once + enumerate (bounded) paths simple formulas, only conjunctions (no quantifier / fixpoint) Concolic paradigm : combination of symbolic and dynamic execution robustness to “difficult-to-model” programming features

S´ ebastien Bardin, Philippe Herrmann 3/ 31

slide-4
SLIDE 4

Motivation SE Heuristics Experiments Conclusion

A few prototypes

PathCrawler (CEA) 2004 Dart (Bell Labs), Cute (Uni. of Illinois / Berkeley) 2005 Exe (Stanford) 2006 Jpf (NASA) 2007 Osmose (CEA), Sage (Microsoft), Pex (Microsoft) 2008

S´ ebastien Bardin, Philippe Herrmann 4/ 31

slide-5
SLIDE 5

Motivation SE Heuristics Experiments Conclusion

Main Limitations

Two major bottlenecks for Symbolic Execution

  • 1. constraint solving (along a single path)
  • 2. # paths

Path explosion phenomenon nesting loops and conditional instructions inlining of function calls Moreover : SE require a user-defined path-bound k things get worse if k is over-estimated sometimes, very long paths to exhibit specific behaviours Our goal : lower the path explosion in SE

S´ ebastien Bardin, Philippe Herrmann 5/ 31

slide-6
SLIDE 6

Motivation SE Heuristics Experiments Conclusion

Not all Paths are Relevant for STDG

Irrelevant paths In practice, SE enumerates all k-paths But the true goal is to cover “items” (instr., branches) Some paths are very unlikely to improve the current coverage Idea : detect a priori irrelevant paths to discard them and lower the path explosion Our results

  • 1. three complementary heuristics to prune likely redundant paths
  • 2. implementation in the Osmose tool and experiments

S´ ebastien Bardin, Philippe Herrmann 6/ 31

slide-7
SLIDE 7

Motivation SE Heuristics Experiments Conclusion

Outline

Context Symbolic Execution Heuristics Experiments Conclusion

S´ ebastien Bardin, Philippe Herrmann 7/ 31

slide-8
SLIDE 8

Motivation SE Heuristics Experiments Conclusion

Path Predicate

π a finite path of the program P D the input space of P V ∈ D an input vector Path predicate A path predicate for π is a formula ϕπ interpreted on D s.t. if V | = ϕπ then the execution of P on V exercices π at runtime. More formally : let π =

t1

− →

t2

− → . . .

tn

− → the greatest path predicate ¯ ϕπ = wpre(t1, wpre(t2, . . . wpre(tn, ⊤))) a path predicate ϕπ such that ϕπ ⇒ ¯ ϕπ A path predicate is typically computed via strongest postcondition

S´ ebastien Bardin, Philippe Herrmann 8/ 31

slide-9
SLIDE 9

Motivation SE Heuristics Experiments Conclusion

Framework of Symbolic Execution

Path-based test data generation 1 choose an uncovered (k-bounded) path π 2 compute one of its path predicates ϕπ 3 solve ϕπ : solution = TD exercising path π 4 update coverage, if still something to cover then goto 1 Parameter 1 - Logical theory : not relevant here Parameter 2 - Path enumeration strategy : here, standard DFS Extension - Concolic execution

S´ ebastien Bardin, Philippe Herrmann 9/ 31

slide-10
SLIDE 10

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution, Basic Procedure (BP)

choose path compute path predicate, solve it, update cover choose the next path by DFS backtracking, and so on

S´ ebastien Bardin, Philippe Herrmann 10/ 31

slide-11
SLIDE 11

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution, Basic Procedure (BP)

choose path compute path predicate, solve it, update cover choose the next path by DFS backtracking, and so on

S´ ebastien Bardin, Philippe Herrmann 10/ 31

slide-12
SLIDE 12

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution, Basic Procedure (BP)

choose path compute path predicate, solve it, update cover choose the next path by DFS backtracking, and so on

S´ ebastien Bardin, Philippe Herrmann 10/ 31

slide-13
SLIDE 13

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution, Basic Procedure (BP)

choose path compute path predicate, solve it, update cover choose the next path by DFS backtracking, and so on

S´ ebastien Bardin, Philippe Herrmann 10/ 31

slide-14
SLIDE 14

Motivation SE Heuristics Experiments Conclusion

Symbolic Execution, Basic Procedure (BP)

choose path compute path predicate, solve it, update cover choose the next path by DFS backtracking, and so on

S´ ebastien Bardin, Philippe Herrmann 10/ 31

slide-15
SLIDE 15

Motivation SE Heuristics Experiments Conclusion

Outline

Context Symbolic Execution Heuristics Experiments Conclusion

S´ ebastien Bardin, Philippe Herrmann 11/ 31

slide-16
SLIDE 16

Motivation SE Heuristics Experiments Conclusion

Heuristic 1 : Look-Ahead (LA)

Procedure BP tries to cover a new path at each iteration BUT this new path does not necessarily cover new items the resolution time is wasted more useless paths will be explored from this prefix

True False

main

On the example, full coverage requires at most 3 TD, while there are ≈ 2k+1 paths of length ≤ k

S´ ebastien Bardin, Philippe Herrmann 12/ 31

slide-17
SLIDE 17

Motivation SE Heuristics Experiments Conclusion

Idea

Check if uncovered items may be reached from the current

  • instruction. If not, solve the current prefix but do not expand it

Optimistic check based on the CFG abstraction of the program The Look-Ahead heuristic enjoys nice properties soundness : discard only redundant paths relative completeness : BP+LA achieves always the same coverage than BP path reduction : BP+LA explores always less path than BP Difficulty : efficient computation of the (CFG) reachability set

S´ ebastien Bardin, Philippe Herrmann 13/ 31

slide-18
SLIDE 18

Motivation SE Heuristics Experiments Conclusion

Reachability Set Computation

Procedure ReachSet : node → Set(node) Standard worklist algorithm has the following problems in our context all reachability sets are computed at the same time, even if BP will not use all of them not designed for interprocedural or context-sensitive analysis

S´ ebastien Bardin, Philippe Herrmann 14/ 31

slide-19
SLIDE 19

Motivation SE Heuristics Experiments Conclusion

Reachability Set Computation (2)

Efficient interprocedural analysis Efficient computation lazy computation computation cache Interprocedural analysis compact representation of sets of nodes : manipulate CFG nodes and Call Graph (CG) nodes function summaries : propagate reachable CG nodes (from CG) lazy computation and computation cache extend to CG

S´ ebastien Bardin, Philippe Herrmann 15/ 31

slide-20
SLIDE 20

Motivation SE Heuristics Experiments Conclusion

Reachability Set Computation (3)

Context-sensitive analysis the current stack is passed as an argument, if the current node can reach a ret instruction, then the procedure is recursively launched on the top of the stack (return site) ReachSet-context(node,stack, rset) : c := ReachSet(node); r := c ∪ rset if (stack.empty or ret ∈ c) then return r ; else return ReachSet-context(stack.top,stack.tail, r) Remark : the computation cache is still a map from node to set, rather than a map from (node, stack) to set

S´ ebastien Bardin, Philippe Herrmann 16/ 31

slide-21
SLIDE 21

Motivation SE Heuristics Experiments Conclusion

Heuristic 2 : Max-CallDepth (MCD)

Nested function calls are often the major source of path explosion BP explores all the paths in cal- lees But in unit testing, need to co- ver only paths of the top-level function

function f Return b =?= 0 b := 1 b := 0 True False c =?= 0 call f main

Example : only two TD to cover the main function, but ≈ 2k+1 paths

S´ ebastien Bardin, Philippe Herrmann 17/ 31

slide-22
SLIDE 22

Motivation SE Heuristics Experiments Conclusion

Idea

(claim) top-level paths rarely depend only on specific behaviours in deep function calls MCD heuristic : prevent backtracking in deep nested function calls Implementation : a user-defined mcd parameter, a counter depth updated by call and ret, performs branching only if depth ≤ mcd Theoretically : take care, the MCD heuristic is not sound Empirically : experimental results show a very large pruning and no loss in coverage (see after)

S´ ebastien Bardin, Philippe Herrmann 18/ 31

slide-23
SLIDE 23

Motivation SE Heuristics Experiments Conclusion

Heuristic 3 : Solve-First (SF)

DFS has two main drawbacks in our context if # TD is limited, DFS focuses only on a deep narrow portion

  • f the program (slow coverage speed)

longer (and more complex ?) prefixes are solved first Example : assume #node = 2n+1, all paths are feasible, goal = instruction coverage

  • nly two TD are necessary

BP+LA : n+1 TD

true true true true true

S´ ebastien Bardin, Philippe Herrmann 19/ 31

slide-24
SLIDE 24

Motivation SE Heuristics Experiments Conclusion

Idea

Slight modification of the concolic DFS procedure

  • n a choice point, choose which branch B1 will be followed

(symbolically) first immediately solve the other branch B2 (TD2), execute TD2 and update coverage info, store TD2 execute symbolically the procedure through branch B1 (as usual) when backtracking through B2, TD2 can be retrieved if needed Mostly the DFS symbolic execution, except than along a given prefix, every alternative branch has been concretely expanded once minimal overhead along a path, shorter prefixes are solved first some distant portion of the program (in a DFS ordering) are exercised very early

S´ ebastien Bardin, Philippe Herrmann 20/ 31

slide-25
SLIDE 25

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-26
SLIDE 26

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-27
SLIDE 27

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-28
SLIDE 28

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-29
SLIDE 29

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-30
SLIDE 30

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-31
SLIDE 31

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-32
SLIDE 32

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-33
SLIDE 33

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-34
SLIDE 34

Motivation SE Heuristics Experiments Conclusion

Idea (2)

S´ ebastien Bardin, Philippe Herrmann 21/ 31

slide-35
SLIDE 35

Motivation SE Heuristics Experiments Conclusion

Summary

relative # path implementation completeness reduction in BP Look-Ahead yes always efficient reach. test Max-CallDepth no not sure easy Solve-First yes not sure easy (concolic setting)

S´ ebastien Bardin, Philippe Herrmann 22/ 31

slide-36
SLIDE 36

Motivation SE Heuristics Experiments Conclusion

Outline

Context Symbolic Execution Heuristics Experiments Conclusion

S´ ebastien Bardin, Philippe Herrmann 23/ 31

slide-37
SLIDE 37

Motivation SE Heuristics Experiments Conclusion

About experiments

Heuristics implemented in the Osmose tool (SE for executable files) Small C programs cross-compiled to C509 and PPC architectures Configuration : Intel Pentium M 2Ghz, RAM 1.2 GBytes, Linux program #I #Br #F CD # T check-pressure 59 10 3 1 4 square 3x3 272 46 1 43 square 4x4 274 46 1 123 hysteresis 91 16 2 1 35 merge 56 24 3 1 70 triangle 102 38 5 3 15 ppc-square 4x4 226 30 1 125 ppc-hysteresis 76 16 2 1 251 ppc-merge 188 16 3 2 2 ppc-triangle 40 18 3 2 19

#I :

  • n. of instructions

#Br :

  • n. of branches

#F :

  • n. of functions

CD : maximal call depth #T :

  • n. of tests (full Br cover)

S´ ebastien Bardin, Philippe Herrmann 24/ 31

slide-38
SLIDE 38

Motivation SE Heuristics Experiments Conclusion

Results

Notations : BP (Basic Procedure), UT (Unit Testing) Comparisons BP+LA vs BP BP+UT+MCD vs BP+UT BP+SF vs BP

average benefit win-loss max benefit max loss (time | #path) W/D/L LA

  • 57% | -57%

7/2/1 | 8/2/0

  • 80% | -85%

+4% | MCD

  • 85% | -72%

5/1/0 | 5/1/0

  • 97% | -80%

| SF+LA

  • 61% | -80%

4/0/5 | 5/0/4

  • 86% | -98%

+120% | +50%

S´ ebastien Bardin, Philippe Herrmann 25/ 31

slide-39
SLIDE 39

Motivation SE Heuristics Experiments Conclusion

Summary (2)

theoretical empirical relative # path relative # path completeness reduction completeness reduction LA yes always yes

  • 57%

MCD no not sure yes

  • 72%

SF+LA yes not sure yes

  • 80%

S´ ebastien Bardin, Philippe Herrmann 26/ 31

slide-40
SLIDE 40

Motivation SE Heuristics Experiments Conclusion

Other experiments

LA overhead : reachability set is computed, but test inclusion always answers yes

  • verhead

mean variability RS computed on backtrack only +0% +0% - +1% RS computed at each branch +2.4% +0% - +7%

S´ ebastien Bardin, Philippe Herrmann 27/ 31

slide-41
SLIDE 41

Motivation SE Heuristics Experiments Conclusion

Outline

Context Symbolic Execution Heuristics Experiments Conclusion

S´ ebastien Bardin, Philippe Herrmann 28/ 31

slide-42
SLIDE 42

Motivation SE Heuristics Experiments Conclusion

Related work (1)

Path enumeration strategy for better coverage speed best-first search (Exe, Sage, Pex) : active prefixes are ranked, and the best one is expanded hybrid search (Cute) : DFS + random Redundant paths discard a path prefix if similar to an already expanded path prefix rwset (Exe), state caching / state abstraction (Jpf) discard a path prefix when it cannot reach an interesting state yogi and the Synergy approach Concurrent systems and interleaving dynamic partial orders (Cute)

S´ ebastien Bardin, Philippe Herrmann 29/ 31

slide-43
SLIDE 43

Motivation SE Heuristics Experiments Conclusion

Related work (2)

Functon calls Techniques similar to MCD when the maximal depth is reached, a call returns ⊤ (Jpf) function concretisation (Cute) can also be used for path pruning Other techniques lazy handling of function calls via uninterpreted symbols (Sage) incremental construction of a summary function (Dart) user-defined function specification (PathCrawler)

S´ ebastien Bardin, Philippe Herrmann 30/ 31

slide-44
SLIDE 44

Motivation SE Heuristics Experiments Conclusion

Conclusion

We propose three heuristics to perform path pruning in Symbolic Execution easy to implement, whatever the path enumeration strategy is all the three techniques are complementary Very encouraging results for Look-Ahead and Max-CallDepth on limited benchmarks Solve-First shows a positive global gain, but much more variability Future work experiments on larger programs and with other path search methods application to search-based testing ?

S´ ebastien Bardin, Philippe Herrmann 31/ 31