S V V .lu software verification & validation Achieving - - PowerPoint PPT Presentation

s v v
SMART_READER_LITE
LIVE PREVIEW

S V V .lu software verification & validation Achieving - - PowerPoint PPT Presentation

S V V .lu software verification & validation Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search Lionel Briand Definition of Software Testing ISTQB: Software testing is a process of executing


slide-1
SLIDE 1

.lu

software verification & validation

V V S

Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search

Lionel Briand

slide-2
SLIDE 2

Definition of Software Testing

  • ISTQB: “Software testing is a process of executing a program
  • r application with the intent of finding the software bugs. It

can also be stated as the process of validating and verifying that a software program or application or product meets the business and technical requirements that guided its design and development.”

2

slide-3
SLIDE 3

Scope

  • The main challenge in testing software systems is

scalability

  • Addressing scalability entails effective automation
  • Lessons learned from industrial research collaborations:

satellite, automotive, finance, energy …

  • Experiences from combining metaheuristic search,

machine learning, and other AI techniques, in addressing testing scalability

3

slide-4
SLIDE 4

Scalability

  • The extent to which a technique can be applied on large or

complex artifacts (e.g., input spaces, code, models) and still provide useful, automated support with acceptable effort, CPU, and memory?

4

slide-5
SLIDE 5

Collaborative Research @ SnT

5

  • Research in context
  • Addresses actual needs
  • Well-defined problem
  • Long-term collaborations
  • Our lab is the industry
slide-6
SLIDE 6

SVV Dept.

6

  • Established in 2012, part of the SnT centre
  • Requirements Engineering, Security Analysis, Design Verification,

Automated Testing, Runtime Monitoring

  • ~ 25 lab members
  • Partnerships with industry
  • ERC Advanced grant
slide-7
SLIDE 7

Outline

  • Overview, problem definition
  • Example research projects with industry partners:
  • Vulnerability testing (Banking)
  • Testing advanced driver assistance systems
  • Testing controllers (automotive)
  • Stress testing critical task deadlines (Energy)
  • Reflections and lessons learned

7

slide-8
SLIDE 8

Introduction

8

slide-9
SLIDE 9

Software Testing

9 SW Representation

(e.g., specifications)

SW Code

Derive Test cases Execute Test cases

Compare

Expected Results or properties Get Test Results

Test Oracle

[Test Result==Oracle] [Test Result!=Oracle]

Automation!

slide-10
SLIDE 10

Search-Based Software Testing

  • Express test generation problem

as a search or optimization problem

  • Search for test input data with

certain properties, i.e., constraints

  • Non-linearity of software (if, loops,

…): complex, discontinuous, non- linear search spaces (Baresel)

  • Many search algorithms

(metaheuristics), from local search to global search, e.g., Hill Climbing, Simulated Annealing and Genetic Algorithms

Fitness Input domain

Genetic Algorithms are global searches, sampling man

“Search-Based Software Testing: Past, Present and Future” Phil McMinn Genetic Algorithm

10 Input domain

portion of input domain denoting required test data randomly-generated inputs

Random search may fail to fulfil low-probability

slide-11
SLIDE 11

Vulnerability Testing

11

slide-12
SLIDE 12

42% 32% 9% 4% 3% 3% 3% 2% 2%

Code Injection Manipulated data structures Collect and analyze information Indicator Employ probabilistic techniques Manipulate system resources Subvert access control Abuse existing functionality Engage in deceptive …

X-Force Threat Intelligence Index 2017

12

https://www.ibm.com/security/xforce/

More than 40% of all attacks were injection attacks (e.g., SQLi)

slide-13
SLIDE 13

Web Applications

13

Server SQL Database Client

slide-14
SLIDE 14

Web Applications

14

Web form

str1 str2 Username Password OK

SQL query

SELECT * FROM Users WHERE (usr = ‘str1’ AND psw = ‘str2’)

Name Surname … John Smith …

Result Server SQL Database Client

slide-15
SLIDE 15

Injection Attacks

15

SQL query

Name Surname … Aria Stark … John Snow … … … …

Query result

SELECT * FROM Users WHERE (usr = ‘’ AND psw = ‘’) OR 1=1 --

Server SQL Database Client Web form

‘) OR 1=1 --

Username Password OK

slide-16
SLIDE 16

Protection Layers

Server SQL Database Client Data input Validation and Sanitization Database Firewall Web Application Firewall

16

slide-17
SLIDE 17

Web Application Firewalls (WAFs)

17

Server

malicious malicious malicious legitimate

WAF

slide-18
SLIDE 18

WAF Rule Set

18

Rule set of Apache ModSecurity

https://github.com/SpiderLabs/ModSecurity

slide-19
SLIDE 19

Misconfigured WAFs

19

BLOCKED

False Positive

ALLOWED

False Negative

slide-20
SLIDE 20

Grammar-based Attack Generation

  • BNF grammar for SQLi attacks
  • Random strategy: randomly selected production rules are

applied recursively until only terminals are left

  • Random strategy not efficient for bypassing attacks that are

difficult to find

  • Machine learning? Search?
  • How to guide the search? How can ML help?
slide-21
SLIDE 21

Anatomy of SQLi attacks

21

‘ OR“a”=“a”# Bypassing Attack

<START> <sq> <wsp> <sqliAttack> <cmt> <boolAttack> <opOR> <boolTrueExpr>

OR

<bynaryTrue> <dq> <ch> <dq> <opEq> <dq> <ch> <dq>

“ a ” = “ a ”

<sQuoteContext>

‘ # _

Derivation Tree ‘ _ OR”a”=“a” # S ={ Attack Slices

slide-22
SLIDE 22

Learning Attack Patterns

22

S1 S2 S3 S4 … Sn Outcome A1 1 1 0 … 0 Passed A2 0 1 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked

Training Set

Passed Blocked

S4

Yes No Yes No Yes No

S3 S2

Decision Tree

Sn S1 …

  • Random trees
  • Random forest
slide-23
SLIDE 23

Learning Attack Patterns

23

S1 S2 S3 S4 … Sn Outcome A1 1 1 0 … 0 Passed A2 0 1 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked

Passed Blocked

S4

Yes No Yes No Yes No

S3 S2 Sn S1 …

Training Set Decision Tree Attack Pattern S2 ∧ ¬ Sn ∧ S1

slide-24
SLIDE 24

Machine Learning

Generating Attacks via ML and EAs

24

Evolutionary Algorithm (EA)

Iteratively refine successful attack conditions

Passed Blocked

S4

Yes No Yes No Yes No

S3 S2 Sn S1 …

slide-25
SLIDE 25

Some Results

Apache ModSecurity

25 Distinct Attacks

Industrial WAFs

Distinct Attacks

Machine Learning-driven attack generation led to more distinct, successful attacks being discovered faster

slide-26
SLIDE 26

Related Work

  • Automated repair of WAFs
  • Automated testing targeting XML and SQL injections in web

applications

26

slide-27
SLIDE 27

Testing Advanced Driving Assistance Systems

27

slide-28
SLIDE 28

Cyber-Physical Systems

  • A system of collaborating computational elements controlling

physical entities

28

slide-29
SLIDE 29

Advanced Driver Assistance Systems (ADAS)

29

Automated Emergency Braking (AEB) Pedestrian Protection (PP) Lane Departure Warning (LDW) Traffic Sign Recognition (TSR)

slide-30
SLIDE 30

Automotive Environment

  • Highly varied environments, e.g., road topology, weather, building

and pedestrians …

  • Huge number of possible scenarios, e.g., determined by

trajectories of pedestrians and cars

  • ADAS play an increasingly critical role
  • A challenge for testing

30

slide-31
SLIDE 31

Advanced Driver Assistance Systems (ADAS)

Decisions are made over time based on sensor data

31

Sensors Controller Actuators Decision Sensors /Camera Environment ADAS

slide-32
SLIDE 32

A General and Fundamental Shift

  • Increasingly so, it is easier to learn behavior from data using

machine learning, rather than specify and code

  • Deep learning, reinforcement learning …
  • Example: Neural networks (deep learning)
  • Millions of weights learned
  • No explicit code, no specifications
  • Verification, testing?

32

slide-33
SLIDE 33

CPS Development Process

33 Functional modeling:

  • Controllers
  • Plant
  • Decision

Continuous and discrete Simulink models Model simulation and testing

Architecture modelling

  • Structure
  • Behavior
  • Traceability

System engineering modeling (SysML) Analysis:

  • Model execution and

testing

  • Model-based testing
  • Traceability and

change impact analysis

  • ...

(partial) Code generation

Deployed executables on target platform Hardware (Sensors ...) Analog simulators Testing (expensive)

Hardware-in-the-Loop Stage Software-in-the-Loop Stage Model-in-the-Loop Stage

slide-34
SLIDE 34

Automotive Environment

  • Highly varied environments, e.g., road topology, weather, building

and pedestrians …

  • Huge number of possible scenarios, e.g., determined by

trajectories of pedestrians and cars

  • ADAS play an increasingly critical role
  • A challenge for testing

34

slide-35
SLIDE 35

Our Goal

  • Developing an automated testing technique

for ADAS

35

  • To help engineers efficiently and

effectively explore the complex test input space of ADAS

  • To identify critical (failure-revealing) test

scenarios

  • Characterization of input conditions that

lead to most critical situations, e.g., safety violations

slide-36
SLIDE 36

36

Automated Emergency Braking System (AEB)

36

“Brake-request” when braking is needed to avoid collisions

Decision making

Vision (Camera) Sensor Brake Controller Objects’ position/speed

slide-37
SLIDE 37

Example Critical Situation

  • “AEB properly detects a pedestrian in front of the car with a

high degree of certainty and applies braking, but an accident still happens where the car hits the pedestrian with a relatively high speed”

37

slide-38
SLIDE 38

Testing ADAS

38

A simulator based on Physical/Mathematical models On-road testing Simulation-based (model) testing

slide-39
SLIDE 39

Testing via Physics-based Simulation

39

ADAS (SUT)

Simulator (Matlab/Simulink) Model (Matlab/Simulink)

▪ Physical plant (vehicle / sensors / actuators) ▪ Other cars ▪ Pedestrians ▪ Environment (weather / roads / traffic signs)

Test input Test output

time-stamped output

slide-40
SLIDE 40

AEB Domain Model

  • visibility:

VisibilityRange

  • fog: Boolean
  • fogColor:

FogColor

Weather

  • frictionCoeff:

Real

Road

1

  • v0 : Real

Vehicle

  • : Real
  • : Real
  • : Real
  • :Real

Pedestrian

  • simulationTime:

Real

  • timeStep: Real

Test Scenario

1 1

  • ModerateRain
  • HeavyRain
  • VeryHeavyRain
  • ExtremeRain

«enumeration» RainType

  • ModerateSnow
  • HeavySnow
  • VeryHeavySnow
  • ExtremeSnow

«enumeration» SnowType

  • DimGray
  • Gray
  • DarkGray
  • Silver
  • LightGray
  • None

«enumeration» FogColor

1

WeatherC

{{OCL} self.fog=false implies self.visibility = “300” and self.fogColor=None}

Straight

  • height:

RampHeight

Ramped

  • radius:

CurvedRadius

Curved

  • snowType:

SnowType

Snow

  • rainType:

RainType

Rain Normal

  • 5 - 10 - 15 - 20
  • 25 - 30 - 35 - 40

«enumeration» CurvedRadius (CR)

  • 4 - 6 - 8 - 10 - 12

«enumeration» RampHeight (RH)

  • 10 - 20 - 30 - 40 - 50
  • 60 - 70 - 80 - 90 - 100
  • 110 - 120 - 130 - 140
  • 150 - 160 - 170 - 180
  • 190 - 200 - 210 - 220
  • 230 - 240 - 250 - 260
  • 270 - 280 - 290 - 300

«enumeration» VisibilityRange

  • : TTC: Real
  • : certaintyOfDetection:

Real

  • : braking: Boolean

AEB Output

  • : Real
  • : Real

Output functions Mobile

  • bject

Position vector

  • x: Real
  • y: Real

Position

1 1 1 1 1

Static input

1

Output

1 1

Dynamic input xp yp vp θp vc v3 v2 v1 F1 F2

slide-41
SLIDE 41

ADAS Testing Challenges

  • Test input space is large, complex and multidimensional
  • Explaining failures and fault localization are difficult
  • Execution of physics-based simulation models is computationally

expensive

41

slide-42
SLIDE 42

Black-Box Search-based Testing

42

Test input generation (NSGA II) Evaluating test inputs

  • Select best tests
  • Generate new tests

(candidate) test inputs

  • Simulate every (candidate) test
  • Compute fitness functions

Fitness values Test cases revealing worst case system behaviors Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles

slide-43
SLIDE 43

Search: Genetic Evolution

43

Initial input Fitness computation Selection Breeding

slide-44
SLIDE 44

Better Guidance

  • Fitness computations rely on simulations and are very

expensive

  • Search needs better guidance

44

slide-45
SLIDE 45

Decision Trees

45

Partition the input space into homogeneous regions

All points

Count 1200 “non-critical” 79% “critical” 21% “non-critical” 59% “critical” 41% Count 564 Count 636 “non-critical” 98% “critical” 2% Count 412 “non-critical” 49% “critical” 51% Count 152 “non-critical” 84% “critical” 16% Count 230 Count 182

vp

0 >= 7.2km/h

vp

0 < 7.2km/h

θp

0 < 218.6

θp

0 >= 218.6

RoadTopology(CR = 5, Straight, RH = [4 − 12](m)) RoadTopology (CR = [10 − 40](m))

“non-critical” 31% “critical” 69% “non-critical” 72% “critical” 28%

slide-46
SLIDE 46

Genetic Evolution Guided by Classification

46

Initial input Fitness computation Classification Selection Breeding

slide-47
SLIDE 47

Search Guided by Classification

47

Test input generation (NSGA II) Evaluating test inputs

Build a classification tree Select/generate tests in the fittest regions Apply genetic operators

Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles (candidate) test inputs

  • Simulate every (candidate) test
  • Compute fitness functions

Fitness values Test cases revealing worst case system behaviors + A characterization of critical input regions

slide-48
SLIDE 48

NSGAII-DT vs. NSGAII

48

NSGAII-DT outperforms NSGAII

HV 0.0 0.4 0.8 GD 0.05 0.15 0.25 SP 2 0.6 1.0 1.4 6 10 14 18 22 24 Time (h)

NSGAII-DT NSGAII

slide-49
SLIDE 49

Testing Controllers

49

slide-50
SLIDE 50

Dynamic Continuous Controllers

50

slide-51
SLIDE 51

MiL Test Cases

51

Model Simulation Input Signals Output Signal(s)

S3 t S2 t S1 t S3 t S2 t S1 t

Test Case 1 Test Case 2

slide-52
SLIDE 52
  • Supercharger bypass flap controller

üFlap position is bounded within [0..1] üImplemented in MATLAB/Simulink ü34 (sub-)blocks decomposed into 6 abstraction levels

Supercharger Bypass Flap Supercharger Bypass Flap

Flap position = 0 (open) Flap position = 1 (closed)

Simple Example

52

slide-53
SLIDE 53

Initial Desired Value Final Desired Value time time

Desired Value Actual Value

T/2 T T/2 T

Test Input Test Output

Plant Model Controller (SUT)

Desired value

Error

Actual value

System output

+

  • MiL Testing of Controllers

53

slide-54
SLIDE 54

Configurable Controllers at MIL

Plant Model + + +

Σ

+

  • e(t)

actual(t) desired(t)

Σ

KP e(t)

KD

de(t) dt

KI R e(t) dt P I D

  • utput(t)

Time-dependent variables Configuration Parameters

54

slide-55
SLIDE 55

Requirements and Test Objectives

Initial Desired (ID) Desired ValueI (input) Actual Value (output) Final Desired (FD) time T/2 T Smoothness Responsiveness Stability

55

slide-56
SLIDE 56

A Search-Based Test Approach

Initial Desired (ID) Final Desired (FD)

Worst Case(s)?

  • Search directed by model

execution feedback

  • Controller’s dynamic behavior

can be complex

  • Meta-heuristic search in (large)

input space: Finding worst case inputs

  • Possible because of automated
  • racle (feedback loop)
  • Different worst cases for

different requirements

56

slide-57
SLIDE 57

Initial Solution

HeatMap Diagram

  • 1. Exploration

List of Critical Regions Domain Expert Worst-Case Scenarios

+

Controller- plant model Objective Functions based on Requirements

  • 2. Single-State

Search

time

Desired Value Actual Value

1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Initial Desired Final Desired

57

slide-58
SLIDE 58

Results

  • We found much worse scenarios during MiL testing than our

partner had found so far

  • These scenarios are also run at the HiL level, where testing is

much more expensive: MiL results => test selection for HiL

  • But further research was needed:
  • Simulations are expensive
  • Configuration parameters

58

slide-59
SLIDE 59

Final Solution

+

Controller Model (Simulink) Worst-Case Scenarios List of Critical Partitions Regression Tree 1.Exploration with Dimensionality Reduction 2.Search with Surrogate Modeling Objective Functions Domain Expert

Visualization of the 8-dimension space using regression trees Dimensionality reduction to identify the significant variables (Elementary Effect Analysis) Surrogate modeling to predict the fitness function and speed up the search (Neural network)

59

slide-60
SLIDE 60

Regression Tree

All Points FD>=0.43306

Count Mean Std Dev Count Mean Std Dev

FD<0.43306

Count Mean Std Dev

ID>=0.64679

Count Mean Std Dev Count Mean Std Dev

Cal5>=0.020847 Cal5>0.020847

Count Mean Std Dev Count Mean Std Dev

Cal5>=0.014827 Cal5<0.014827

Count Mean Std Dev Count Mean Std Dev 1000 0.007822 0.0049497

ID<0.64679

574 0.0059513 0.0040003 426 0.0103425 0.0049919 373 0.0047594 0.0034346 201 0.0081631 0.0040422 182 0.0134555 0.0052883 244 0.0080206 0.0031751 70 0.0106795 0.0052045 131 0.0068185 0.0023515

60

slide-61
SLIDE 61

Surrogate Modeling

Any supervised learning or statistical technique providing fitness predictions with confidence intervals

  • 1. Predict higher fitness with high

confidence: Move to new position, no simulation

  • 2. Predict lower fitness with high

confidence: Do not move to new position, no simulation

  • 3. Low confidence in prediction:

Simulation

Surrogate Model Real Function

x Fitness

61

slide-62
SLIDE 62

ü Our approach is able to identify more critical violations of the

controller requirements that had neither been found with default/fixed configurations nor by manual testing.

MiL-Testing different configurations Stability Smoothness Responsiveness MiL-Testing fixed configurations Manual MiL-Testing

  • 2.2% deviation

24% over/undershoot 20% over/undershoot 5% over/undershoot 170 ms response time 80 ms response time 50 ms response time

Results

62

slide-63
SLIDE 63

Schedulability Analysis and Testing

63

slide-64
SLIDE 64

Problem and Context

  • Schedulability analysis encompasses techniques that try to

predict whether (critical) tasks are schedulable, i.e., meet their deadlines

  • Stress testing runs carefully selected test cases that have

a high probability of leading to deadline misses

  • Stress testing is complementary to schedulability analysis
  • Testing is typically expensive, e.g., hardware in the loop
  • Finding stress test cases is difficult

64

slide-65
SLIDE 65

Finding Stress Test Cases is Hard

65

1 2 3 4 5 6 7 8 9 j0, j1 , j2 arrive at at0 , at1 , at2 and must finish before dl0 , dl1 , dl2 J1 can miss its deadline dl1 depending on when at2 occurs! 1 2 3 4 5 6 7 8 9

j0 j1 j2 j0 j1 j2

at0 dl0 dl1 at1 dl2 at2 T T at0 dl0 dl1 at1 at2 dl2

slide-66
SLIDE 66

Challenges and Solutions

  • Ranges for arrival times form a very large input space
  • Task interdependencies and properties constrain what

parts of the space are feasible

  • Solution: We re-expressed the problem as a constraint
  • ptimization problem and used a combination of constraint

programming (IBM CPLEX) and meta-heuristic search (GA)

66

slide-67
SLIDE 67

Constraint Optimization

67

Constraint Optimization Problem

Static Properties of Tasks

(Constants)

Dynamic Properties of Tasks

(Variables)

Performance Requirement

(Objective Function)

OS Scheduler Behaviour

(Constraints)

slide-68
SLIDE 68

Combining CP and GA

68

  • Fig. 3: Overview of GA+CP: the solutions

, and in the initial population of GA evolve into

slide-69
SLIDE 69

Case Study

69

Drivers

(Software-Hardware Interface)

Control Modules Alarm Devices (Hardware) Multicore Architecture

Real-Time Operating System

System monitors gas leaks and fire in

  • il extraction platforms
slide-70
SLIDE 70

Summary

  • We provided a solution for generating stress test cases by combining

meta-heuristic search and constraint programming

  • Meta-heuristic search (GA) identifies high risk regions in the

input space

  • Constraint programming (CP) finds provably worst-case

schedules within these (limited) regions

  • Achieve (nearly) GA efficiency and CP effectiveness
  • Our approach can be used both for stress testing and

schedulability analysis (assumption free)

70

slide-71
SLIDE 71

Reflecting

71

slide-72
SLIDE 72

Search-Based Solutions

  • Versatile
  • Helps relax assumptions compared to exact approaches
  • Helps decrease modeling requirements
  • Scalability, e.g., easy to parallelize
  • Requires massive empirical studies
  • Search is rarely sufficient by itself

72

slide-73
SLIDE 73

Multidisciplinary Approach

  • Single-technology approaches rarely work in practice
  • Combined search with:
  • Machine learning
  • Solvers, e.g., CP, SMT
  • Statistical approaches, e.g., sensitivity analysis
  • System and environment modeling and simulation

73

slide-74
SLIDE 74

Objectives

  • Reduce search space
  • Better guide and focus search
  • Compute fitness and provide guidance
  • Avoid expensive and useless fitness computations
  • Explain failures (e.g., decision trees)
  • Get more guarantees (e.g., constraint programming)

74

slide-75
SLIDE 75

Acknowledgements

  • Shiva Nejati
  • Reza Matinnejad
  • Raja Ben Abdessalem
  • Stefano Di Alesio
  • Dennis Appelt
  • Annibale Panichella

75

slide-76
SLIDE 76

Selected References

  • L. Briand et al. “Testing the untestable: Model testing of complex software-intensive systems”,

IEEE/ACM ICSE 2016, V2025

  • R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search

Using Surrogate Models”, IEEE/ACM ASE 2014 (Distinguished paper award)

  • S. Di Alesio et al. “Combining genetic algorithms and constraint programming to support stress

testing of task deadlines”, ACM Transactions on Software Engineering and Methodology (TOSEM), 25(1):4, 2015

  • R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary

Algorithms”, IEEE/ACM ICSE 2018

  • D. Appelt et al., “A Machine Learning-Driven Evolutionary Approach for Testing Web Application

Firewalls”, To appear in IEEE Transaction on Reliability

  • More on: https://wwwen.uni.lu/snt/people/lionel_briand?page=Publications

76

slide-77
SLIDE 77

.lu

software verification & validation

V V S

Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search

Lionel Briand