Deviations in Load Testing of Large Scale Systems Haroon Malik - - PowerPoint PPT Presentation
Deviations in Load Testing of Large Scale Systems Haroon Malik - - PowerPoint PPT Presentation
Automatic Detection of Performance Deviations in Load Testing of Large Scale Systems Haroon Malik Software Analysis and Intelligence Lab (SAIL) Queens University, Kingston, Canada Large scale systems need to satisfy performance constraints
Large scale systems need to satisfy performance constraints
2
PERFROMANCE PEROBLEMS
- System not responding fast enough
- Taking too much of an important resource(s)
- Hanging and/or crashing under heavy load
Symptoms Include:
- High response time
- Increased Latency &
- Low throughput under load
3
LOAD TESTING
Performance Analysts use load testing to detect early performance problems in the system before they become critical field problems
4
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
5
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
6
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
7
- 2. LOAD TEST EXECUTION
MONITORING TOOL LOAD GENERATOR- 1
SYSTEM
PERFORMANCE REPOSITORY LOAD GENERATOR- 2
8
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
9
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
10
Environment Setup Load Test Execution Load Test Analysis Report Generation
LOAD TESTING STEPS 1 2 3 4
11
CHALLENGES WITH LOAD TEST ANAYSIS
Limited Knowledge Large Number of Counters
1 2 3
12
CHALLENGES WITH LOAD TEST ANAYSIS
Limited Knowledge Large Number of Counters
1 2 3
13
CHALLENGES WITH LOAD TEST ANAYSIS
Limited Knowledge Large Number of Counters
1 2 3
14
I Propose 4 Methodologies
15
3 Unsupervised 1 Supervised To Automatically Analyze the Load Test Results
Use Performance Counters to Construct Performance Signature
16
%CPU Idle %CPU Busy Byte Commits Disk writes/sec % Cache Faults/ Sec Bytes received
PERFORMANCE COUNTERS ARE HIGHLY CORRELAED
CPU DISK (IOPS) NETWORK MEMORY TRANSACTIONS/SEC
17
HIGH LEVEL OVERVIEW OF OUR METHODOLOGIES
Data Preparation Signature Generation Deviation Detection
Baseline Test New Test
Sanitization Standardization
Performance Report
Input Load Test
18
Load Test Extracting Centroids Signature Data Reduction
Clustering
Load Test Random Sampling Signature
UNSUPERVISED SIGNATURE GENERATION
Random Sampling Methodology Clustering Methodology
Signature Load Test Dimension Reduction
(PCA)
Identifying Top k Performance Counters Mapping Ranking Analyst tunes weight parameter
PCA Methodology
19
SUPERVISED SIGNATURE GENERATION
Identifying Top k Performance Counters
- i. Count
- ii. % Frequency
Attribute Selection
OneR
Genetic Search
… … …
SPC1 SPC2 SPC10
. . .
Partitioning the Data
Prepared Load Test Labeling
(only for baseline)
WRAPPER Methodology
Signature
20
DEVIATION DETECTION TECHNIQUES
Using Control Chart Using Methodology- Specific Techniques
21
For Clustering and Random Sampling Methodologies For PCA and WRAPPER Methodologies
CONTROL CHART
2 4 6 8 10 12 14 16 1 2 3 4 5 6 7 8 9 10 11
Performance Counter Value Time (min)
Baseline Load Test New Load Test Baseline LCL, UCL
The Upper/Lower Control Limits (U/LCL) are the upper/lower limit of the range of a counter under the normal behavior of the system
Baseline CL
22
DEVIATION DETECTION
Clustering and Random Sampling PCA Approach
Performance Report Comparing PCA Counter Weights Baseline Signature New Test Signature
WRAPPER Approach
23
Baseline Signature New Test Signature Control Chart Performance Report Performance Report Logistic Regression Baseline Signature New Test Signature
CASE STUDY
How effective are our signature-based approaches in detecting performance deviations in load tests?
RQ
24
CASE STUDY
How effective are our signature-based approaches in detecting performance deviations in load tests?
RQ
25
Evaluation Using: Precision, Recall and F-measure An Ideal approach should predict a minimal and correct set of performance deviations.
SUBJECT OF STUDY
System: Open Source Domain: Ecommerce
Type of data:
- 1. Data From Our Experiments
with an Open Source Benchmark Application
DVD Store
26
System: Industrial System Domain: Telecom
Type of data:
- 1. Load Test Repository
- 2. Data From Our Experiments on
the Company’s Testing Platform
FAULT INJECTION
Category Faults
Software Failure CPU Stress Memory Stress Abnormal Workload Operator Errors Interfering Workload Unscheduled Replication
27
CASE STUDY FINDINGS
Effectiveness
Precision/Recall/F-measure
Practical Differences
28
CASE STUDY FINDINGS
(Effectiveness)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 WRAPPER PCA Clustering Random Precision Recall F-Measure
Random Sampling has the lowest effectiveness On Avg. and in all experiments, PCA performs better than Clustering approach. WRAPPER dominates the best supervised approach, i.e., PCA
29
CASE STUDY FINDINGS
(Effectiveness)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 WRAPPER PCA Clustering Random Precision Recall F-Measure
30
Overall, there is an excellent balance of high precision and recall of both the WRAPPER and PCA approaches (on average 0.95, 0.94 and 0.82, 0.84 respectively) for deviation detection
Real Time Analysis Stability Manual Overhead
CASE STUDY FINDINGS
(Practical Differences)
31
REAL TIME ANALYSIS
WRAPPER--- deviations
- n a per-observation
basis. PCA --- requires a certain amount of observations (wait time).
32
STABILITY
We refer to ‘Stability’ as the ability of an approach to remain effective while its signature size is reduced.
33
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 45 40 35 30 25 20 15 10 5 4 3 2 1
F-Measure Signature Size Unsupervised (PCA) Supervised(Wrapper)
STABILITY
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 60 50 40 30 20 10 4 2
F-Measure Signature Size Unsupervised (PCA) Supervised(Wrapper)
WRAPPER methodology is more stable than PCA approach
34
MANUAL OVERHEAD
WRAPPER approach requires all
- bservations of the baseline
performance counter data to be labeled as Pass/Fail
35
MANUAL OVERHEAD
Marking each observation is time consuming
36
2010
37