Sampling Effect on Performance Prediction of Configurable Systems : - - PowerPoint PPT Presentation

sampling effect on performance prediction of configurable
SMART_READER_LITE
LIVE PREVIEW

Sampling Effect on Performance Prediction of Configurable Systems : - - PowerPoint PPT Presentation

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves Pereira, Mathieu Acher, Hugo Martin, Jean-Marc Jezequel 1 Configurable systems 2 Configurable systems 2 Configurable systems Pros


slide-1
SLIDE 1

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study

Juliana Alves Pereira, Mathieu Acher, Hugo Martin, Jean-Marc Jezequel

1

slide-2
SLIDE 2

Configurable systems

2

slide-3
SLIDE 3

Configurable systems

2

slide-4
SLIDE 4

Configurable systems

Pros

  • Adaptive
  • Lots of options

2

slide-5
SLIDE 5

Configurable systems

Pros

  • Adaptive
  • Lots of options

Cons

  • Lots of options (and interactions)
  • Increasingly complex

2

slide-6
SLIDE 6

Configurable systems

Pros

  • Adaptive
  • Lots of options

Cons

  • Lots of options (and interactions)
  • Increasingly complex

Machine learning to the rescue

2

slide-7
SLIDE 7

Machine Learning and Configurable systems

3

slide-8
SLIDE 8

Machine Learning and Configurable systems Sampling

3

slide-9
SLIDE 9

Machine Learning and Configurable systems Sampling Measuring

3

slide-10
SLIDE 10

Machine Learning and Configurable systems Sampling Measuring Learning

3

slide-11
SLIDE 11

Machine Learning and Configurable systems Sampling Measuring Learning Validation

3

slide-12
SLIDE 12

Machine Learning and Configurable systems Sampling Measuring Learning Validation

3

slide-13
SLIDE 13

Distance-Based Sampling of Software Configuration Spaces

4

slide-14
SLIDE 14

Distance-Based Sampling of Software Configuration Spaces

  • C. Kaltenecker, A. Grebhahn, N. Siegmund, J. Guo and S. Apel, "Distance-Based

Sampling of Software Configuration Spaces," 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 2019, pp. 1084-1094.

4

slide-15
SLIDE 15

Distance-Based Sampling of Software Configuration Spaces

  • C. Kaltenecker, A. Grebhahn, N. Siegmund, J. Guo and S. Apel, "Distance-Based

Sampling of Software Configuration Spaces," 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 2019, pp. 1084-1094.

  • Proposing a new sampling solution : Distance-Based Sampling

4

slide-16
SLIDE 16

Distance-Based Sampling of Software Configuration Spaces

  • C. Kaltenecker, A. Grebhahn, N. Siegmund, J. Guo and S. Apel, "Distance-Based

Sampling of Software Configuration Spaces," 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 2019, pp. 1084-1094.

  • Proposing a new sampling solution : Distance-Based Sampling
  • Empirical study on 10 subject systems and 6 sampling strategies

4

slide-17
SLIDE 17

Sampling strategies

  • Coverage-based

5

slide-18
SLIDE 18

Sampling strategies

  • Coverage-based
  • Solver-based
  • Randomized solver-based

5

slide-19
SLIDE 19

Sampling strategies

  • Coverage-based
  • Solver-based
  • Randomized solver-based
  • Random

5

slide-20
SLIDE 20

Sampling strategies

  • Coverage-based
  • Solver-based
  • Randomized solver-based
  • Random
  • Distance-based
  • Diversified distance-based

5

slide-21
SLIDE 21

Subject systems

  • 7z
  • BerkeleyDB-C
  • Dune MGS
  • HIPAcc
  • Java GC
  • LLVM
  • LRZIP
  • Polly
  • VPXENC
  • x264

6

slide-22
SLIDE 22

Subject systems

  • 7z
  • BerkeleyDB-C
  • Dune MGS
  • HIPAcc
  • Java GC
  • LLVM
  • LRZIP
  • Polly
  • VPXENC
  • x264

Experiment setup

  • Machine learning based on multiple

linear regression and feature-forward selection

  • Mean Relative Error (MRE)

6

slide-23
SLIDE 23

Results

  • Coverage-based is dominant at low sample size
  • Diversified distance-based is dominant on higher sample size
  • Diversified distance-based is close to random sampling accuracy, even better

in some cases

7

slide-24
SLIDE 24

Is it true?

slide-25
SLIDE 25

Replicating the experiment

9

slide-26
SLIDE 26

Replicating the experiment

  • Subject system : x264, video encoder

9

slide-27
SLIDE 27

Replicating the experiment

  • Subject system : x264, video encoder

9

slide-28
SLIDE 28

Replicating the experiment

  • Subject system : x264, video encoder
  • Changing the input video : 17 videos

9

slide-29
SLIDE 29

Replicating the experiment

  • Subject system : x264, video encoder
  • Changing the input video : 17 videos
  • Changing the measured non-functional property

9

slide-30
SLIDE 30

Experimental setup

What does vary?

  • Sampling strategy (6 strategies)
  • Sample size (3 sample size)
  • Encoded video (17 videos)
  • System configuration (1152 configurations)
  • Measured property (Encoding time, encoding size)

10

slide-31
SLIDE 31

Experimental setup

What does vary?

  • Sampling strategy (6 strategies)
  • Sample size (3 sample size)
  • Encoded video (17 videos)
  • System configuration (1152 configurations)
  • Measured property (Encoding time, encoding size)

What doesn’t vary?

  • Learning algorithm (Performance-Influence Model)
  • Learning algorithm hyperparameters
  • Configurable Software (x264)
  • Version
  • Hardware

10

slide-32
SLIDE 32

Experimental setup

What does vary?

  • Sampling strategy (6 strategies)
  • Sample size (3 sample size)
  • Encoded video (17 videos) 🔵
  • System configuration (1152 configurations)
  • Measured property (Encoding time, encoding size) 🔵

What doesn’t vary?

  • Learning algorithm (Performance-Influence Model)
  • Learning algorithm hyperparameters
  • Configurable Software (x264) 🔵
  • Version
  • Hardware

10

slide-33
SLIDE 33

Experimental setup

What does vary?

  • Sampling strategy (6 strategies)
  • Sample size (3 sample size)
  • Encoded video (17 videos) 🔵
  • System configuration (1152 configurations)
  • Measured property (Encoding time, encoding size) 🔵

What doesn’t vary?

  • Learning algorithm (Performance-Influence Model)
  • Learning algorithm hyperparameters
  • Configurable Software (x264) 🔵
  • Version 🔶
  • Hardware 🔶

10

slide-34
SLIDE 34

Results

11

slide-35
SLIDE 35

11

Results table for encoding time

slide-36
SLIDE 36

11

Results table for encoding time

slide-37
SLIDE 37

11

Results table for encoding time

slide-38
SLIDE 38

11

Results table for encoding time

slide-39
SLIDE 39

11

Results table for encoding time

slide-40
SLIDE 40

11

Results table for encoding time

slide-41
SLIDE 41

11

Results table for encoding time

slide-42
SLIDE 42

11

Results table for encoding size

slide-43
SLIDE 43

11

Results table for encoding size

slide-44
SLIDE 44

11

Results table for encoding size

slide-45
SLIDE 45

11

Results table for encoding size

slide-46
SLIDE 46

11

Results table for encoding size

slide-47
SLIDE 47

11

Results table for encoding size

slide-48
SLIDE 48

Results

11

slide-49
SLIDE 49

Results

  • High variation between videos, between non-functional properties

11

slide-50
SLIDE 50

Results

  • High variation between videos, between non-functional properties
  • Encoding time :

○ Similar results ○ Random sampling dominant over Diversified Distance-based sampling 11

slide-51
SLIDE 51

Results

  • High variation between videos, between non-functional properties
  • Encoding time :

○ Similar results ○ Random sampling dominant over Diversified Distance-based sampling

  • Encoding size :

○ Random sampling and randomized solver-based sampling overall dominant ○ Most strategies present good and similar accuracy for higher sample size 11

slide-52
SLIDE 52

Replicability

  • Fully replicable experiment

12

slide-53
SLIDE 53

Replicability

  • Fully replicable experiment

12

slide-54
SLIDE 54

Replicability

  • Fully replicable experiment
  • Dataset for video encoding time and size available

12

slide-55
SLIDE 55

Replicability

  • Fully replicable experiment
  • Dataset for video encoding time and size available
  • Docker image with all data and scripts for performance prediction and results

aggregation : https://github.com/jualvespereira/ICPE2020

12

slide-56
SLIDE 56

What’s next?

13

slide-57
SLIDE 57

What’s next?

  • How do version and hardware affect the sampling effectiveness?

13

slide-58
SLIDE 58

What’s next?

  • How do version and hardware affect the sampling effectiveness?
  • How does machine learning technique affect the sampling effectiveness?

13

slide-59
SLIDE 59

What’s next?

  • How do version and hardware affect the sampling effectiveness?
  • How does machine learning technique affect the sampling effectiveness?
  • How to leverage the fact that some sampling strategies overperform by

focusing on important options?

13

slide-60
SLIDE 60

Conclusion

14

slide-61
SLIDE 61

Conclusion

  • Random sampling is a strong baseline, hard to challenge

14

slide-62
SLIDE 62

Conclusion

  • Random sampling is a strong baseline, hard to challenge
  • Diversified distance-based sampling is a strong alternative

14

slide-63
SLIDE 63

Conclusion

  • Random sampling is a strong baseline, hard to challenge
  • Diversified distance-based sampling is a strong alternative
  • Researchers should be aware that effectiveness of sampling strategies can

be biased by inputs and performance property used

14