Balancing Soundness and Efficiency for Practical Testing of - - PowerPoint PPT Presentation

balancing soundness and efficiency for practical testing
SMART_READER_LITE
LIVE PREVIEW

Balancing Soundness and Efficiency for Practical Testing of - - PowerPoint PPT Presentation

Balancing Soundness and Efficiency for Practical Testing of Configurable Systems Sabrina Souto Marcelo dAmorim Rohit Gheyi UEPB, Brazil UFCG, Brazil UFPE, Brazil sabrinadfs@gmail.com rohit@dsc.ufcg.edu.br damorim@cin.ufpe.br


slide-1
SLIDE 1

Balancing Soundness and Efficiency for Practical Testing

  • f Configurable Systems

Sabrina Souto Marcelo d’Amorim Rohit Gheyi

UEPB, Brazil UFPE, Brazil UFCG, Brazil

sabrinadfs@gmail.com damorim@cin.ufpe.br rohit@dsc.ufcg.edu.br

slide-2
SLIDE 2

Configurable Systems

Many other examples!

2

Configurable System

...

Configurations

slide-3
SLIDE 3

Bugs in Configurable Systems

Configuration-related bug!

3

Configurable System

System Configurations

...

slide-4
SLIDE 4

Testing Configurable Systems

4

Monolithic System Tests Tests

… …

System Configurations

...

slide-5
SLIDE 5

Limitations of Existing Techniques

Efficiency (#samples) Efficacy (#failures)

5

slide-6
SLIDE 6

Limitations of Existing Techniques

* Exhaustive

6

Efficiency (#samples) Efficacy (#failures)

Find all bugs Very expensive

slide-7
SLIDE 7

* Default

Limitations of Existing Techniques

* Exhaustive

7

Efficiency (#samples) Efficacy (#failures)

Find all bugs Very expensive Very efficient Can miss bugs

slide-8
SLIDE 8

* Sampling

Limitations of Existing Techniques

* Default * Exhaustive

8

Efficiency (#samples) Efficacy (#failures)

Try to find bugs with less samples False positives and false negatives Very efficient Can miss bugs Find all bugs Very expensive

slide-9
SLIDE 9

* Dynamic (SPLat [FSE’13,SPLC’15])

Limitations of Existing Techniques

* Sampling * Default

9

Efficiency (#samples) Efficacy (#failures)

Consider code and test It may not scale in all cases Very efficient Can miss bugs Find all bugs Very expensive

* Exhaustive

Try to find bugs with less samples False positives and false negatives

slide-10
SLIDE 10

Sampling + SPLat S-SPLat

Limitations of Existing Techniques

* Dynamic (SPLat) * Sampling * Default * Exhaustive

10

Efficiency (#samples) Efficacy (#failures)

slide-11
SLIDE 11

Example

11 11

SPLat S-SPLat (one-enabled) Sampling (one-enabled)

slide-12
SLIDE 12

Example

Notepad

  • 17 configuration variables
  • Only 3 are reached by toolbar()

12 12

Test

SPLat S-SPLat (one-enabled) Sampling (one-enabled)

slide-13
SLIDE 13

Example

Notepad

  • 17 configuration variables
  • Only 3 are reached by toolbar()

13

SPLat S-SPLat (one-enabled) Sampling (one-enabled)

13

Test

17 configurations

slide-14
SLIDE 14

Notepad

  • 17 configuration variables
  • Only 3 are reached by toolbar()

Example

14 14

Test

SPLat S-SPLat (one-enabled) Sampling (one-enabled)

17 configurations 6 configurations

slide-15
SLIDE 15

Example

Notepad

  • 17 configuration variables
  • Only 3 are reached by toolbar()

15

  • ne-enabled

2 configurations

Test

6 configurations

SPLat S-SPLat (one-enabled) Sampling (one-enabled)

17 configurations

slide-16
SLIDE 16

S-SPLat

Input

Tests Feature Model (Optional)

Output

C1, T1 C2, T1 C1, T2 C5, T2 C4, T3 … ... Tests executed with reachable and satisfiable configurations

16

Instrumented Configurable System

...

Sampling Heuristic

slide-17
SLIDE 17

S-SPLat

Output

C1, T1 C2, T1 C1, T2 C5, T2 C4, T3 … ... Tests executed with reachable and satisfiable configurations

17

Look for next reachable configuration

Run the test Ti

Yes Otherwise

Check:

  • Sampling heuristic
  • Feature model

Find reachable variables

Input

Tests Feature Model (Optional) Instrumented Configurable System

...

Sampling Heuristic

For all tests

slide-18
SLIDE 18

EVALUATION

18

slide-19
SLIDE 19

Research Questions

RQ1  Which heuristics maximize efficiency (#samples)? RQ2  Which heuristics maximize efficacy (#failures)? RQ3  Which heuristics (basic or combination) maximize efficiency and efficacy?

19

slide-20
SLIDE 20

Scenarios

20

Software Product Lines (SPLs)

Version 4.8.2 Version 6.1 8 subjects

  • All existing tests
  • All existing options
  • 3,557 tests
  • 50 most frequently

cited options in bug reports

  • 17K+ tests
  • 2k+ variables
slide-21
SLIDE 21

Evaluation

21

SPLs

slide-22
SLIDE 22

Evaluation Techniques

SPLs

Techniques:

  • 1. SPLat
  • 2. SPLat + med
  • 3. SPLat + oe
  • 4. SPLat + od
  • 5. SPLat + pw
  • 6. SPLat + ran

8 subjects

Evaluation SPLs Evaluation

22

[ICSE’16,ASE’14]

slide-23
SLIDE 23

RQ1: Which heuristics maximize efficiency (#samples)?

23

Findings

RQ2: Which heuristics maximize efficacy (#failures)?

SPLat+ SPLat+

Evaluation SPLs Evaluation

SPLat and SPLat+ SPLat+

slide-24
SLIDE 24

Findings

Combinations of heuristics

  • oe x od x med x pw
  • c1 = oe+od
  • c2 = oe+med
  • c3 = oe+pw …
  • c11 = oe+od+med+pw

RQ3: Which heuristics maximize efficiency (#samples) and

efficacy (#failures)?

24

Evaluation SPLs Evaluation

slide-25
SLIDE 25

Findings

SPLat did not scale for some subjects The sampling heuristics reduced the number of samples explored by SPLat yet retaining their ability to reveal failures.

#failures #samples

25

Evaluation SPLs Evaluation

RQ3: Which heuristics maximize efficiency (#samples) and

efficacy (#failures)?

SPLat+c11 (oe + od + med + pw)

  • ptimized #failures at the expense
  • f #samples

SPLat+Most-enabled-disabled

  • ptimized #samples at the expense
  • f #failures
slide-26
SLIDE 26

Evaluation

26

slide-27
SLIDE 27

Evaluation Techniques

Techniques:

  • 1. SPLat
  • 2. SPLat + med
  • 3. SPLat + oe
  • 4. SPLat + od
  • 5. SPLat + pw
  • 6. SPLat + ran

27

Evaluation SPLs Evaluation

Version 4.8.2 Version 6.1

[ICSE’16,ASE’14]

slide-28
SLIDE 28

28

Findings

Version 6.1

Evaluation SPLs Evaluation

SPLat+ and SPLat+ SPLat+ SPLat+ SPLat+

RQ1: Which heuristics maximize efficiency (#samples)? RQ2: Which heuristics maximize efficacy (#failures)?

slide-29
SLIDE 29

29

It is preferable to pick the best performing heuristics in the leftmost group  the best choices!

Bugs found

2 new bugs reported. #samples #bugs Version 6.1

Evaluation SPLs Evaluation

RQ3: Which heuristics maximize efficiency (#samples) and efficacy (#failures)?

Findings

slide-30
SLIDE 30

#bugs #samples

All five bugs were captured. SPLat+c2(oe+med) found all bugs with a relatively small number of samples.

30

Bugs found

Version 4.8.2

Evaluation SPLs Evaluation

RQ3: Which heuristics maximize efficiency (#samples) and efficacy (#failures)?

Findings

slide-31
SLIDE 31
  • For SPLs  c11 (oe+od+med+pw)
  • For GCC  c2(oe+med)
  • For SPLs and GCC  c7 (oe+od+med)
  • [ICSE 2016] A comparison of 10 sampling algorithms for configurable systems.
  • Combine different simple heuristics
  • Avoid heuristics with a large number of requirements

31

Lessons Learned

slide-32
SLIDE 32

32

S-SPLat found a good balance between bugs and samples

The sampling heuristics helped to reduced the number of samples explored by SPLat without loss the ability to find bugs

S-SPLat could deal with scalability

It revealed bugs in potentially large configuration spaces

https://sabrinadfs.github.io/s-splat/ sabrinadfs@gmail.com

slide-33
SLIDE 33

BACKUP SLIDES

33

slide-34
SLIDE 34

Evaluation SPLs Evaluation GCC

Technique #samples

RQ1: #samples RQ2: #failures

#failures

Não é possível exibir esta imagem no momento.

Technique

SPLat and ran explored much samples. med explored the smallest sample sets.

  • d explored the largest sample sets.
  • d and pw found almost the same

number of failures as splat but they required much fewer samples.

34

slide-35
SLIDE 35

Evaluation SPLs Evaluation GCC

RQ3: #samples x #failures

#failures #samples

  • Combinations of heuristics
  • oe x od x med x pw
  • c1 = oe+od
  • c2 = oe+med
  • c3 = oe+pw…
  • c11 = oe+od+med+pw

SPLat and med optimize one dimension at the expense of the other. c11 (oe + od + med + pw) performed consistently well in all cases. The sampling heuristics reduced the number of samples explored by SPLat yet retaining their ability to reveal failures.

35

slide-36
SLIDE 36

Technique

#bugs #samples

Technique Technique

#bugs #samples

Technique

RQ1: #samples

Evaluation SPLs Evaluation GCC

Version 6.1

RQ2: #bugs

36

pw found more failures. It was one of the most expensive techniques.

  • e and od found almost

the same number of failures as pw but with much fewer samples.

slide-37
SLIDE 37

Discussion

  • c2 found all crashes with a relatively low

number of configurations

  • c7 performed better, it detected most failures and crashes

through a relatively small number of configurations

  • Combine different simple heuristics instead of using one

that entails a larger number of test requirements

  • S-SPLat is promising to reveal errors in potentially large

configuration spaces

37

Evaluation SPLs Evaluation GCC

slide-38
SLIDE 38

Handling Constraints

38

SPLs

Complex models

  • 54% of the selected configurations are invalid
  • 43% of failures are false positives

GCC

The use of validation is not necessary

  • Crashes was only revealed in valid configurations

The techniques performed consistently with and without feature constraints

slide-39
SLIDE 39

S-SPLat x Regular Sampling

Additional Evaluations

39

Regular Sampling detected the same bugs as S-SPLat with more configurations. New results are proportional to the change in the sampling rates of random.

Random Sampling with more rates: 10% and 30%

Evaluation SPLs Evaluation GCC

slide-40
SLIDE 40

Threats to Validity and Limitations

  • The selection of subjects
  • We used subjects from a variety of sources, including a large configurable

system with hundreds of options

  • Eventual implementation errors
  • We thoroughly checked our implementation and our experimental results
  • Our

datasets and implementations are publicly available: https://sabrinadfs.github.io/s-splat/

  • SPLat currently only supports systems with dynamically bound

feature variables ])

  • It remains to investigate how SPLat and S-SPLat would perform on

systems with #ifdef variability

40