.lu
software verification & validationS V V .lu software verification & validation Achieving - - PowerPoint PPT Presentation
S V V .lu software verification & validation Achieving - - PowerPoint PPT Presentation
S V V .lu software verification & validation Achieving Scalability in Software Testing with Machine Learning and Metaheuristic Search Lionel Briand Definition of Software Testing ISTQB: Software testing is a process of executing
Definition of Software Testing
- ISTQB: “Software testing is a process of executing a program
- r application with the intent of finding the software bugs. It
can also be stated as the process of validating and verifying that a software program or application or product meets the business and technical requirements that guided its design and development.”
2
Scope
- The main challenge in testing software systems is
scalability
- Addressing scalability entails effective automation
- Lessons learned from industrial research collaborations:
satellite, automotive, finance, energy …
- Experiences from combining metaheuristic search,
machine learning, and other AI techniques, in addressing testing scalability
3
Scalability
- The extent to which a technique can be applied on large or
complex artifacts (e.g., input spaces, code, models) and still provide useful, automated support with acceptable effort, CPU, and memory?
4
Collaborative Research @ SnT
5
- Research in context
- Addresses actual needs
- Well-defined problem
- Long-term collaborations
- Our lab is the industry
SVV Dept.
6
- Established in 2012, part of the SnT centre
- Requirements Engineering, Security Analysis, Design Verification,
Automated Testing, Runtime Monitoring
- ~ 25 lab members
- Partnerships with industry
- ERC Advanced grant
Outline
- Overview, problem definition
- Example research projects with industry partners:
- Vulnerability testing (Banking)
- Testing advanced driver assistance systems
- Testing controllers (automotive)
- Stress testing critical task deadlines (Energy)
- Reflections and lessons learned
7
Introduction
8
Software Testing
9 SW Representation
(e.g., specifications)
SW Code
Derive Test cases Execute Test cases
Compare
Expected Results or properties Get Test Results
Test Oracle
[Test Result==Oracle] [Test Result!=Oracle]
Automation!
Search-Based Software Testing
- Express test generation problem
as a search or optimization problem
- Search for test input data with
certain properties, i.e., constraints
- Non-linearity of software (if, loops,
…): complex, discontinuous, non- linear search spaces (Baresel)
- Many search algorithms
(metaheuristics), from local search to global search, e.g., Hill Climbing, Simulated Annealing and Genetic Algorithms
Fitness Input domain
Genetic Algorithms are global searches, sampling man
“Search-Based Software Testing: Past, Present and Future” Phil McMinn Genetic Algorithm
10 Input domain
portion of input domain denoting required test data randomly-generated inputs
Random search may fail to fulfil low-probability
Vulnerability Testing
11
42% 32% 9% 4% 3% 3% 3% 2% 2%
Code Injection Manipulated data structures Collect and analyze information Indicator Employ probabilistic techniques Manipulate system resources Subvert access control Abuse existing functionality Engage in deceptive …
X-Force Threat Intelligence Index 2017
12
https://www.ibm.com/security/xforce/
More than 40% of all attacks were injection attacks (e.g., SQLi)
Web Applications
13
Server SQL Database Client
Web Applications
14
Web form
str1 str2 Username Password OK
SQL query
SELECT * FROM Users WHERE (usr = ‘str1’ AND psw = ‘str2’)
Name Surname … John Smith …
Result Server SQL Database Client
Injection Attacks
15
SQL query
Name Surname … Aria Stark … John Snow … … … …
Query result
SELECT * FROM Users WHERE (usr = ‘’ AND psw = ‘’) OR 1=1 --
Server SQL Database Client Web form
‘) OR 1=1 --
Username Password OK
Protection Layers
Server SQL Database Client Data input Validation and Sanitization Database Firewall Web Application Firewall
16
Web Application Firewalls (WAFs)
17
Server
malicious malicious malicious legitimate
WAF
WAF Rule Set
18
Rule set of Apache ModSecurity
https://github.com/SpiderLabs/ModSecurity
Misconfigured WAFs
19
BLOCKED
False Positive
ALLOWED
False Negative
Grammar-based Attack Generation
- BNF grammar for SQLi attacks
- Random strategy: randomly selected production rules are
applied recursively until only terminals are left
- Random strategy not efficient for bypassing attacks that are
difficult to find
- Machine learning? Search?
- How to guide the search? How can ML help?
Anatomy of SQLi attacks
21
‘ OR“a”=“a”# Bypassing Attack
<START> <sq> <wsp> <sqliAttack> <cmt> <boolAttack> <opOR> <boolTrueExpr>
OR
<bynaryTrue> <dq> <ch> <dq> <opEq> <dq> <ch> <dq>
“ a ” = “ a ”
<sQuoteContext>
‘ # _
Derivation Tree ‘ _ OR”a”=“a” # S ={ Attack Slices
Learning Attack Patterns
22
S1 S2 S3 S4 … Sn Outcome A1 1 1 0 … 0 Passed A2 0 1 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked
Training Set
Passed Blocked
S4
Yes No Yes No Yes No
S3 S2
Decision Tree
Sn S1 …
- Random trees
- Random forest
Learning Attack Patterns
23
S1 S2 S3 S4 … Sn Outcome A1 1 1 0 … 0 Passed A2 0 1 0 … 0 Blocked … … … … … … … … Am 1 1 1 1 … 1 Blocked
Passed Blocked
S4
Yes No Yes No Yes No
S3 S2 Sn S1 …
Training Set Decision Tree Attack Pattern S2 ∧ ¬ Sn ∧ S1
Machine Learning
Generating Attacks via ML and EAs
24
Evolutionary Algorithm (EA)
Iteratively refine successful attack conditions
Passed Blocked
S4
Yes No Yes No Yes No
S3 S2 Sn S1 …
Some Results
Apache ModSecurity
25 Distinct Attacks
Industrial WAFs
Distinct Attacks
Machine Learning-driven attack generation led to more distinct, successful attacks being discovered faster
Related Work
- Automated repair of WAFs
- Automated testing targeting XML and SQL injections in web
applications
26
Testing Advanced Driving Assistance Systems
27
Cyber-Physical Systems
- A system of collaborating computational elements controlling
physical entities
28
Advanced Driver Assistance Systems (ADAS)
29
Automated Emergency Braking (AEB) Pedestrian Protection (PP) Lane Departure Warning (LDW) Traffic Sign Recognition (TSR)
Automotive Environment
- Highly varied environments, e.g., road topology, weather, building
and pedestrians …
- Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
- ADAS play an increasingly critical role
- A challenge for testing
30
Advanced Driver Assistance Systems (ADAS)
Decisions are made over time based on sensor data
31
Sensors Controller Actuators Decision Sensors /Camera Environment ADAS
A General and Fundamental Shift
- Increasingly so, it is easier to learn behavior from data using
machine learning, rather than specify and code
- Deep learning, reinforcement learning …
- Example: Neural networks (deep learning)
- Millions of weights learned
- No explicit code, no specifications
- Verification, testing?
32
CPS Development Process
33 Functional modeling:
- Controllers
- Plant
- Decision
Continuous and discrete Simulink models Model simulation and testing
Architecture modelling
- Structure
- Behavior
- Traceability
System engineering modeling (SysML) Analysis:
- Model execution and
testing
- Model-based testing
- Traceability and
change impact analysis
- ...
(partial) Code generation
Deployed executables on target platform Hardware (Sensors ...) Analog simulators Testing (expensive)
Hardware-in-the-Loop Stage Software-in-the-Loop Stage Model-in-the-Loop Stage
Automotive Environment
- Highly varied environments, e.g., road topology, weather, building
and pedestrians …
- Huge number of possible scenarios, e.g., determined by
trajectories of pedestrians and cars
- ADAS play an increasingly critical role
- A challenge for testing
34
Our Goal
- Developing an automated testing technique
for ADAS
35
- To help engineers efficiently and
effectively explore the complex test input space of ADAS
- To identify critical (failure-revealing) test
scenarios
- Characterization of input conditions that
lead to most critical situations, e.g., safety violations
36
Automated Emergency Braking System (AEB)
36
“Brake-request” when braking is needed to avoid collisions
Decision making
Vision (Camera) Sensor Brake Controller Objects’ position/speed
Example Critical Situation
- “AEB properly detects a pedestrian in front of the car with a
high degree of certainty and applies braking, but an accident still happens where the car hits the pedestrian with a relatively high speed”
37
Testing ADAS
38
A simulator based on Physical/Mathematical models On-road testing Simulation-based (model) testing
Testing via Physics-based Simulation
39
ADAS (SUT)
Simulator (Matlab/Simulink) Model (Matlab/Simulink)
▪ Physical plant (vehicle / sensors / actuators) ▪ Other cars ▪ Pedestrians ▪ Environment (weather / roads / traffic signs)
Test input Test output
time-stamped output
AEB Domain Model
- visibility:
VisibilityRange
- fog: Boolean
- fogColor:
FogColor
Weather
- frictionCoeff:
Real
Road
1
- v0 : Real
Vehicle
- : Real
- : Real
- : Real
- :Real
Pedestrian
- simulationTime:
Real
- timeStep: Real
Test Scenario
1 1
- ModerateRain
- HeavyRain
- VeryHeavyRain
- ExtremeRain
«enumeration» RainType
- ModerateSnow
- HeavySnow
- VeryHeavySnow
- ExtremeSnow
«enumeration» SnowType
- DimGray
- Gray
- DarkGray
- Silver
- LightGray
- None
«enumeration» FogColor
1
WeatherC
{{OCL} self.fog=false implies self.visibility = “300” and self.fogColor=None}
Straight
- height:
RampHeight
Ramped
- radius:
CurvedRadius
Curved
- snowType:
SnowType
Snow
- rainType:
RainType
Rain Normal
- 5 - 10 - 15 - 20
- 25 - 30 - 35 - 40
«enumeration» CurvedRadius (CR)
- 4 - 6 - 8 - 10 - 12
«enumeration» RampHeight (RH)
- 10 - 20 - 30 - 40 - 50
- 60 - 70 - 80 - 90 - 100
- 110 - 120 - 130 - 140
- 150 - 160 - 170 - 180
- 190 - 200 - 210 - 220
- 230 - 240 - 250 - 260
- 270 - 280 - 290 - 300
«enumeration» VisibilityRange
- : TTC: Real
- : certaintyOfDetection:
Real
- : braking: Boolean
AEB Output
- : Real
- : Real
Output functions Mobile
- bject
Position vector
- x: Real
- y: Real
Position
1 1 1 1 1
Static input
1
Output
1 1
Dynamic input xp yp vp θp vc v3 v2 v1 F1 F2
ADAS Testing Challenges
- Test input space is large, complex and multidimensional
- Explaining failures and fault localization are difficult
- Execution of physics-based simulation models is computationally
expensive
41
Black-Box Search-based Testing
42
Test input generation (NSGA II) Evaluating test inputs
- Select best tests
- Generate new tests
(candidate) test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness values Test cases revealing worst case system behaviors Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles
Search: Genetic Evolution
43
Initial input Fitness computation Selection Breeding
Better Guidance
- Fitness computations rely on simulations and are very
expensive
- Search needs better guidance
44
Decision Trees
45
Partition the input space into homogeneous regions
All points
Count 1200 “non-critical” 79% “critical” 21% “non-critical” 59% “critical” 41% Count 564 Count 636 “non-critical” 98% “critical” 2% Count 412 “non-critical” 49% “critical” 51% Count 152 “non-critical” 84% “critical” 16% Count 230 Count 182
vp
0 >= 7.2km/h
vp
0 < 7.2km/h
θp
0 < 218.6
θp
0 >= 218.6
RoadTopology(CR = 5, Straight, RH = [4 − 12](m)) RoadTopology (CR = [10 − 40](m))
“non-critical” 31% “critical” 69% “non-critical” 72% “critical” 28%
Genetic Evolution Guided by Classification
46
Initial input Fitness computation Classification Selection Breeding
Search Guided by Classification
47
Test input generation (NSGA II) Evaluating test inputs
Build a classification tree Select/generate tests in the fittest regions Apply genetic operators
Input data ranges/dependencies + Simulator + Fitness functions defined based on Oracles (candidate) test inputs
- Simulate every (candidate) test
- Compute fitness functions
Fitness values Test cases revealing worst case system behaviors + A characterization of critical input regions
NSGAII-DT vs. NSGAII
48
NSGAII-DT outperforms NSGAII
HV 0.0 0.4 0.8 GD 0.05 0.15 0.25 SP 2 0.6 1.0 1.4 6 10 14 18 22 24 Time (h)
NSGAII-DT NSGAII
Testing Controllers
49
Dynamic Continuous Controllers
50
MiL Test Cases
51
Model Simulation Input Signals Output Signal(s)
S3 t S2 t S1 t S3 t S2 t S1 t
Test Case 1 Test Case 2
- Supercharger bypass flap controller
üFlap position is bounded within [0..1] üImplemented in MATLAB/Simulink ü34 (sub-)blocks decomposed into 6 abstraction levels
Supercharger Bypass Flap Supercharger Bypass Flap
Flap position = 0 (open) Flap position = 1 (closed)
Simple Example
52
Initial Desired Value Final Desired Value time time
Desired Value Actual Value
T/2 T T/2 T
Test Input Test Output
Plant Model Controller (SUT)
Desired value
Error
Actual value
System output
+
- MiL Testing of Controllers
53
Configurable Controllers at MIL
Plant Model + + +
Σ
+
- e(t)
actual(t) desired(t)
Σ
KP e(t)
KD
de(t) dt
KI R e(t) dt P I D
- utput(t)
Time-dependent variables Configuration Parameters
54
Requirements and Test Objectives
Initial Desired (ID) Desired ValueI (input) Actual Value (output) Final Desired (FD) time T/2 T Smoothness Responsiveness Stability
55
A Search-Based Test Approach
Initial Desired (ID) Final Desired (FD)
Worst Case(s)?
- Search directed by model
execution feedback
- Controller’s dynamic behavior
can be complex
- Meta-heuristic search in (large)
input space: Finding worst case inputs
- Possible because of automated
- racle (feedback loop)
- Different worst cases for
different requirements
56
Initial Solution
HeatMap Diagram
- 1. Exploration
List of Critical Regions Domain Expert Worst-Case Scenarios
+
Controller- plant model Objective Functions based on Requirements
- 2. Single-State
Search
time
Desired Value Actual Value
1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Initial Desired Final Desired
57
Results
- We found much worse scenarios during MiL testing than our
partner had found so far
- These scenarios are also run at the HiL level, where testing is
much more expensive: MiL results => test selection for HiL
- But further research was needed:
- Simulations are expensive
- Configuration parameters
58
Final Solution
+
Controller Model (Simulink) Worst-Case Scenarios List of Critical Partitions Regression Tree 1.Exploration with Dimensionality Reduction 2.Search with Surrogate Modeling Objective Functions Domain Expert
Visualization of the 8-dimension space using regression trees Dimensionality reduction to identify the significant variables (Elementary Effect Analysis) Surrogate modeling to predict the fitness function and speed up the search (Neural network)
59
Regression Tree
All Points FD>=0.43306
Count Mean Std Dev Count Mean Std Dev
FD<0.43306
Count Mean Std Dev
ID>=0.64679
Count Mean Std Dev Count Mean Std Dev
Cal5>=0.020847 Cal5>0.020847
Count Mean Std Dev Count Mean Std Dev
Cal5>=0.014827 Cal5<0.014827
Count Mean Std Dev Count Mean Std Dev 1000 0.007822 0.0049497
ID<0.64679
574 0.0059513 0.0040003 426 0.0103425 0.0049919 373 0.0047594 0.0034346 201 0.0081631 0.0040422 182 0.0134555 0.0052883 244 0.0080206 0.0031751 70 0.0106795 0.0052045 131 0.0068185 0.0023515
60
Surrogate Modeling
Any supervised learning or statistical technique providing fitness predictions with confidence intervals
- 1. Predict higher fitness with high
confidence: Move to new position, no simulation
- 2. Predict lower fitness with high
confidence: Do not move to new position, no simulation
- 3. Low confidence in prediction:
Simulation
Surrogate Model Real Function
x Fitness
61
ü Our approach is able to identify more critical violations of the
controller requirements that had neither been found with default/fixed configurations nor by manual testing.
MiL-Testing different configurations Stability Smoothness Responsiveness MiL-Testing fixed configurations Manual MiL-Testing
- 2.2% deviation
24% over/undershoot 20% over/undershoot 5% over/undershoot 170 ms response time 80 ms response time 50 ms response time
Results
62
Schedulability Analysis and Testing
63
Problem and Context
- Schedulability analysis encompasses techniques that try to
predict whether (critical) tasks are schedulable, i.e., meet their deadlines
- Stress testing runs carefully selected test cases that have
a high probability of leading to deadline misses
- Stress testing is complementary to schedulability analysis
- Testing is typically expensive, e.g., hardware in the loop
- Finding stress test cases is difficult
64
Finding Stress Test Cases is Hard
65
1 2 3 4 5 6 7 8 9 j0, j1 , j2 arrive at at0 , at1 , at2 and must finish before dl0 , dl1 , dl2 J1 can miss its deadline dl1 depending on when at2 occurs! 1 2 3 4 5 6 7 8 9
j0 j1 j2 j0 j1 j2
at0 dl0 dl1 at1 dl2 at2 T T at0 dl0 dl1 at1 at2 dl2
Challenges and Solutions
- Ranges for arrival times form a very large input space
- Task interdependencies and properties constrain what
parts of the space are feasible
- Solution: We re-expressed the problem as a constraint
- ptimization problem and used a combination of constraint
programming (IBM CPLEX) and meta-heuristic search (GA)
66
Constraint Optimization
67
Constraint Optimization Problem
Static Properties of Tasks
(Constants)
Dynamic Properties of Tasks
(Variables)
Performance Requirement
(Objective Function)
OS Scheduler Behaviour
(Constraints)
Combining CP and GA
68
- Fig. 3: Overview of GA+CP: the solutions
, and in the initial population of GA evolve into
Case Study
69
Drivers
(Software-Hardware Interface)
Control Modules Alarm Devices (Hardware) Multicore Architecture
Real-Time Operating System
System monitors gas leaks and fire in
- il extraction platforms
Summary
- We provided a solution for generating stress test cases by combining
meta-heuristic search and constraint programming
- Meta-heuristic search (GA) identifies high risk regions in the
input space
- Constraint programming (CP) finds provably worst-case
schedules within these (limited) regions
- Achieve (nearly) GA efficiency and CP effectiveness
- Our approach can be used both for stress testing and
schedulability analysis (assumption free)
70
Reflecting
71
Search-Based Solutions
- Versatile
- Helps relax assumptions compared to exact approaches
- Helps decrease modeling requirements
- Scalability, e.g., easy to parallelize
- Requires massive empirical studies
- Search is rarely sufficient by itself
72
Multidisciplinary Approach
- Single-technology approaches rarely work in practice
- Combined search with:
- Machine learning
- Solvers, e.g., CP, SMT
- Statistical approaches, e.g., sensitivity analysis
- System and environment modeling and simulation
73
Objectives
- Reduce search space
- Better guide and focus search
- Compute fitness and provide guidance
- Avoid expensive and useless fitness computations
- Explain failures (e.g., decision trees)
- Get more guarantees (e.g., constraint programming)
74
Acknowledgements
- Shiva Nejati
- Reza Matinnejad
- Raja Ben Abdessalem
- Stefano Di Alesio
- Dennis Appelt
- Annibale Panichella
75
Selected References
- L. Briand et al. “Testing the untestable: Model testing of complex software-intensive systems”,
IEEE/ACM ICSE 2016, V2025
- R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search
Using Surrogate Models”, IEEE/ACM ASE 2014 (Distinguished paper award)
- S. Di Alesio et al. “Combining genetic algorithms and constraint programming to support stress
testing of task deadlines”, ACM Transactions on Software Engineering and Methodology (TOSEM), 25(1):4, 2015
- R. Ben Abdessalem et al., "Testing Vision-Based Control Systems Using Learnable Evolutionary
Algorithms”, IEEE/ACM ICSE 2018
- D. Appelt et al., “A Machine Learning-Driven Evolutionary Approach for Testing Web Application
Firewalls”, To appear in IEEE Transaction on Reliability
- More on: https://wwwen.uni.lu/snt/people/lionel_briand?page=Publications
76
.lu
software verification & validation