A Novel Automated Approach for Software Effort Estimation Based on Data Augmentation
Liyan Song12, Leandro L. Minku1, Xin Yao12
1 University of Birmingham, UK 2 Southern University of Science and Technology, China
A Novel Automated Approach for Software Effort Estimation Based on - - PowerPoint PPT Presentation
A Novel Automated Approach for Software Effort Estimation Based on Data Augmentation Liyan Song 12 , Leandro L. Minku 1 , Xin Yao 12 1 University of Birmingham, UK 2 Southern University of Science and Technology, China Software Effort Estimation
Liyan Song12, Leandro L. Minku1, Xin Yao12
1 University of Birmingham, UK 2 Southern University of Science and Technology, China
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
(e.g., in person-hours).
can be problematic.
2
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
3
Machine Learning Algorithm
Project id x1 = size x2 = reliability x3 = language … y = effort ? 1 1000 medium Java … 850 2 1000 low Matlab … 500 3 900 large C# … 1000 … … … … … …
New project x required effort y
Previous projects are used as training data
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
algorithms to try to tackle this issue.
4
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
We generate additional synthetic projects based on existing ones.
5
Synthetic projects can enrich the representativeness of the area where they are generated, potentially leading to better SEE models.
y x0
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
6
probability.
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
With probability τ, uniformly sample a new value from: {v1, v2, …, vk} \ {xic}
7
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
Sample a new value from B(n=2xic,p=1/2)
8
xic-3 xic-2 xic-1 xic xic+1 xic+2 xic+3 Probability
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
Sample a new value from xic + N(0,σ2), where σ is a pre-defined parameter that should assume small values.
9
xic
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
Sample a new value from y + sign(e) . |N(0,σ2)| e = sum of all Normal values used to displace the numeric size- related features.
10
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
performance.
help improving the baseline predictive performance.
scale.
11
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
12
Training set size
ISBSG (International Software Benchmarking Standards Group) SEACRAFT (Software Engineering Artifacts Can Really Assist Future Tasks)
#Data/#Fea
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
help improving prediction performance over its baseline? When? Could it be detrimental?
approach significantly improved MAElog, according to Wilcoxon Rank Sum tests with Holm-Bonferroni corrections across data sets.
data sets.
training set size.
13
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
14
Improvements were frequently large when training sets were small or medium, especially for the small training sets. MAElog for Small Training Set Size
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
15
Improvements were frequently medium or large when training sets were small or medium. MAElog for Small Training Set Size
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull 16
Improvements had small or insignificant effect size for all training set sizes, but there was no significant detrimental effect. MAElog for Small Training Set Size
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
improvement varies depending on the baseline model?
17
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
magnitude of improvement varies depending on the baseline model?
data and large noise.
18
x0 y
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
entire space.
19
x0 y
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
neighbours, reducing the effect of synthetic data.
20
x0 y
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
against the existing data augmentation approach from the SEE literature?
21
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
22
Proposed approach performs always similarly or better across data sets, with larger effect sizes for small or medium training sets when using LR, ATLM, RVM or RT.
MAElog for Small Training Set Size
SMOTE SMOTE
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
training sets when using LR/ATLM and RT/RVM.
better robustness to large noise. Their effect depends on intrinsic aspects of the base learner such as globality and locality.
existing data augmentation approach for SEE. Effect size is larger especially for small/medium training sets when using LR/ATLM and RT/RVM.
23
The proposed approach can help to improve predictive performance when there is lack of training data.
A Novel SEE Approach Based on Data Augmentation Leandro Minku http://www.cs.bham.ac.uk/~minkull
24