Using ML to Design a Flexible LOC Counter Mirosaw Ochodek Miroslaw - - PowerPoint PPT Presentation

using ml to design a flexible loc counter
SMART_READER_LITE
LIVE PREVIEW

Using ML to Design a Flexible LOC Counter Mirosaw Ochodek Miroslaw - - PowerPoint PPT Presentation

MaLTeSQuE2017, Feb 21 st, 2017, Klagenfurt Using ML to Design a Flexible LOC Counter Mirosaw Ochodek Miroslaw Staron Dominik Bargowski Wilhelm Meding Regina Hebig Workshop on Machine Learning Techniques for SoKware Quality EvaluaNon


slide-1
SLIDE 1

Using ML to Design a Flexible LOC Counter

Mirosław Ochodek Miroslaw Staron Dominik Bargowski Wilhelm Meding Regina Hebig

MaLTeSQuE2017, Feb 21st, 2017, Klagenfurt Workshop on Machine Learning Techniques for SoKware Quality EvaluaNon

slide-2
SLIDE 2

So7ware size

2

Size

#Defects Size Defects density = Cost predicNon ProducNvity Metrics normalizaNon

slide-3
SLIDE 3

The Problem

3

Introduces (unknown) measurement error, problems with reliability of the measurement, difficulNes in measuring mulN-language code base…

I m p r
  • v
i n g M e a s u r e m e n t C e r t a i n t y b y U s i n g C a l i b r a t i
  • n
t
  • F
i n d S y s t e m a t i c M e a s u r e m e n t E r r
  • r
– A C a s e
  • f
L i n e s
  • f
  • C
  • d
e M e a s u r e Miroslaw Staron1, Darko Durisic2, and Rakesh Rana1 1 C
  • m
p u t e r S c i e n c e a n d E n g i n e e r i n g , U n i v e r s i t y
  • f
G
  • t
h e n b u r g , S w e d e n m i r
  • s
l a w . s t a r
  • n
/ r a k e s h . r a n a @ g u . s e , 2 V
  • l
v
  • C
a r G r
  • u
p , S w e d e n d a r k
  • .
d u r i s i c @ v
  • l
v
  • c
a r s . c
  • m
A b s t r a c t . B a s e m e a s u r e s s u c h a s t h e n u m b e r
  • f
l i n e s
  • f
  • c
  • d
e a r e
  • f
  • t
e n u s e d t
  • m
a k e p r e d i c t i
  • n
s a b
  • u
t s u c h p h e n
  • m
e n a a s p r
  • j
e c t e ff
  • r
t , p r
  • d
u c t q u a l i t y
  • r
m a i n t e n a n c e e ff
  • r
t . H
  • w
e v e r , q u i t e
  • f
t e n w e r e l y
  • n
t h e m e a s u r e m e n t i n s t r u m e n t s w h e r e t h e e x a c t a l g
  • r
i t h m f
  • r
c a l c u l a t i n g t h e v a l u e
  • f
t h e m e a s u r e i s n
  • t
k n
  • w
n . T h e
  • b
j e c t i v e
  • f
  • u
r r e s e a r c h i s t
  • e
x p l
  • r
e h
  • w
w e c a n i n c r e a s e t h e c e r t a i n t y
  • f
b a s e m e a s u r e s i n s
  • f
t
  • w
a r e e n g i n e e r i n g . W e c
  • n
d u c t a b e n c h m a r k i n g s t u d y w h e r e w e u s e f
  • u
r m e a s u r e m e n t i n s t r u m e n t s f
  • r
l i n e s
  • f
  • c
  • d
e m e a s u r e m e n t w i t h u n k n
  • w
n c e r t a i n t y t
  • m
e a s u r e fi v e c
  • d
e b a s e s . O u r r e s u l t s s h
  • w
t h a t w e c a n a d
  • j
u s t t h e m e a s u r e m e n t v a l u e s b y a s m u c h a s 2 % k n
  • w
i n g t h e s y s t e m a t i c e r r
  • r
  • f
t h e t
  • l
. W e c
  • n
c l u d e t h a t c a l i b r a t i n g t h e m e a s u r e m e n t i n s t r u
  • m
e n t s c a n s i g n i fi c a n t l y c
  • n
t r i b u t e t
  • i
n c r e a s e d a c c u r a c y i n m e a s u r e m e n t p r
  • c
e s s e s i n s
  • f
t w a r e e n g i n e e r i n g . T h i s w i l l i m p a c t t h e a c c u r a c y
  • f
p r e
  • d
i c t i
  • n
s ( e . g .
  • f
e ff
  • r
t i n s
  • f
t w a r e p r
  • j
e c t s ) a n d t h e r e f
  • r
e i n c r e a s e t h e c
  • s
t
  • e
ffi c i e n c y
  • f
s
  • f
t w a r e e n g i n e e r i n g p r
  • c
e s s e s . 1 Introduction With the introduction of the measurement information model in the interna- tional ISO/IEC 15939 standard for measurement processes the discipline of soft- ware engineering evolved from discussing metrics in general to categorizing them into three categories – base measures, derived measures and indicators. The use
  • f base measures is fundamental for the construction of derived measures and
  • indicators. The base measures are also the types of measures which are collected
directly and are a result of a measurement method. In many cases this measure- ment method is an automated algorithm (e.g. a script) which we can refer to as the measurement instrument which quantifies an attribute of interest into a number. Since in software engineering we do not have reference measurement etalons as we do in other disciplines (e.g. kilogram or meter for physics), we often rely
  • n arbitrary definitions of the base quantities. One of such quantities is the size
  • f programs measured as the number of lines of code. Even though the num-
ber of lines of code of a given program is a deterministic and fully quantifiable

Output: 2512 LOC

Four tools Error (vs. median) up to ~20%

slide-4
SLIDE 4

Poten>al solu>ons

4

?

A tool based on Programming Language (PL) parsers A machine learning (ML) approach

  • Explicitly known rules for coun3ng that

can be somehow formulated

  • 100% accurate according to the rules
  • Requires implementaNon for each PL
  • Can be also implemented to allow for

some configuraNon of rules (however, probably somehow limited)

  • It is difficult to explicitly define the rules

(either not known or too complex)

  • Learns from examples (require training set)
  • ClassificaNon error depending on the

quality of training set

  • Doesn’t require new implementaNon for

new language (however, may require a new training set)

slide-5
SLIDE 5

Poten>al solu>ons

5

?

A tool based on Programming Language (PL) parsers A machine learning (ML) approach

  • Explicitly known rules for coun3ng that

can be somehow formulated

  • 100% accurate according to the rules
  • Requires implementaNon for each PL
  • Can be also implemented to allow for

some configuraNon of rules (however, probably somehow limited)

  • It is difficult to explicitly define the rules

(either not known or too complex)

  • Learns from examples (require training set)
  • ClassificaNon error depending on the

quality of training set

  • Doesn’t require new implementaNon for

new language (however, may require a new training set)

slide-6
SLIDE 6

Idea of the solu>on

  • Flexible lines of code counter (CCFlex)

– A user teaches the tool which lines should be counted based on a sample (a training set)

6

10 LOC JusNficaNon

slide-7
SLIDE 7

Idea of the solu>on

7

slide-8
SLIDE 8

Feature acquisi>on

8

Each line is characterized by a set of features and its decision class (count or ignore) We parse the text to extract those features.

File type #Characters If … Decision class java 25 TRUE … Count … … … … …

slide-9
SLIDE 9

Feature acquisi>on

  • Plain text (F01-F04):

– File extension – Full and trimmed length (characters) – Tokens

  • Programming language (F05-F19):

– Assignment, – Brackets, – Class, – Comment, – Semicolons, – …

9

ID Name Type Description F01 File extension Nominal The extension of the file (e.g., java, cpp, etc.) F02 Full length Numeric The number of characters in the line. F03 Length Numeric The number of characters in the line after removing all leading and trailing white characters. F04 Tokens Numeric The number of tokens in the line (the line is split based on white characters). F05 Semicolons Numeric The number of semicolons in the line. F06 Comments Boolean The line includes any of //, /*, */

  • r after trimming starts with *.

F07 Assignments Numeric the number of single assignment signs in the line (=). F08 Brackets Numeric The number of brackets: (, )in the line. F09 Square brackets Numeric The number of square brackets: [, ] in the line. F10 Curly brackets Numeric The number of curly brackets: {, } in the line. F11 Class Boolean The word ”class” appears in the line. F12 For Boolean The word ”for” appears in the line. F13 If Boolean The word ”if” appears in the line. F14 While Boolean The word ”while” appears in the line. F15 Case Boolean The word ”case” appears in the line. F16 Try Boolean The word ”try” appears in the line. F17 Catch Boolean The word ”catch” appears in the line. F18 Expect Boolean The word ”expect” appears in the line. F19 Member access Numeric Counts members accessors: . or

  • >
slide-10
SLIDE 10

Feature acquisi>on

  • Bag of words approach (automa>c)

– Tokenize: ()[]{}!@#$%ˆ&*-=;:’”\|‘ ̃,.<>/? – Treat split character as a token – Calculate thresholds:

  • Frequencies of tokens in the code base (min. 5)
  • % of files a token is present in (min. 25%)

– If thresholds are met:

  • Fi: the number of Nmes the tokeni occurs in a line

10

slide-11
SLIDE 11

Preliminary valida>on

  • RQ1: What level of predicNon quality can be

achieved by the proposed approach?

  • RQ2: How the automaNc features acquisiNon

affects the classificaNon quality?

  • RQ3: How the choice of classificaNon

algorithm affects the classificaNon quality?

11

slide-12
SLIDE 12

Code databases

  • 2402 physical lines of code in total

– Eclipse: 475 LOC, – Jasper Reports 757 LOC, – Spring MVC: 1170 LOC

  • ELOC (Count 1492 / Ignore 910)
  • Subjec>ve (Count 1237, Ignore 1165)

12

slide-13
SLIDE 13

Valida>on schemes

10 x 10-fold cross-validaNon (18 schemes)

  • two datasets

– ELOC – SubjecNve;

  • three feature sets

– All: F01–F19 and acquired automaNcally; – Auto: F01–F04 and acquired automaNcally; – Predefined: F01–F19;

  • three classificaNon algorithms (PART, JRip, J48).

13

slide-14
SLIDE 14

Predic>on quality measures

  • Accuracy
  • Precision
  • Recall
  • F-score
  • Ma‚hews CorrelaNon Coefficient (MCC)

14

slide-15
SLIDE 15

Results

RQ1: What level of predicNon quality can be achieved by the proposed approach?

15

slide-16
SLIDE 16

Results

16

Dataset Features set Classifier Accuracy % Precision Recall F-Measure MCC ELOC All PART 99.55±0.45 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All JRip 99.53±0.47 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined PART 99.53±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined JRip 99.56±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Auto PART 99.38±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.99±0.01 ELOC Auto JRip 99.28±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 ELOC Auto J48 99.18±0.54 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 Subjective All PART 97.34±1.14 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective All JRip 96.54±1.20 0.98±0.01 0.95±0.02 0.97±0.01 0.93±0.02 Subjective All J48 97.18±1.07 0.98±0.01 0.97±0.02 0.97±0.01 0.94±0.02 Subjective Predefined PART 95.05±1.45 0.97±0.02 0.93±0.02 0.95±0.01 0.90±0.03 Subjective Predefined JRip 95.32±1.44 0.97±0.02 0.93±0.02 0.95±0.02 0.91±0.03 Subjective Predefined J48 95.10±1.42 0.97±0.02 0.94±0.02 0.95±0.01 0.90±0.03 Subjective Auto PART 97.33±1.08 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective Auto JRip 96.38±1.14 0.98±0.01 0.95±0.02 0.96±0.01 0.93±0.02 Subjective Auto J48 97.08±1.09 0.98±0.01 0.96±0.02 0.97±0.01 0.94±0.02

slide-17
SLIDE 17

Results

17

Dataset Features set Classifier Accuracy % Precision Recall F-Measure MCC ELOC All PART 99.55±0.45 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All JRip 99.53±0.47 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined PART 99.53±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined JRip 99.56±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Auto PART 99.38±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.99±0.01 ELOC Auto JRip 99.28±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 ELOC Auto J48 99.18±0.54 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 Subjective All PART 97.34±1.14 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective All JRip 96.54±1.20 0.98±0.01 0.95±0.02 0.97±0.01 0.93±0.02 Subjective All J48 97.18±1.07 0.98±0.01 0.97±0.02 0.97±0.01 0.94±0.02 Subjective Predefined PART 95.05±1.45 0.97±0.02 0.93±0.02 0.95±0.01 0.90±0.03 Subjective Predefined JRip 95.32±1.44 0.97±0.02 0.93±0.02 0.95±0.02 0.91±0.03 Subjective Predefined J48 95.10±1.42 0.97±0.02 0.94±0.02 0.95±0.01 0.90±0.03 Subjective Auto PART 97.33±1.08 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective Auto JRip 96.38±1.14 0.98±0.01 0.95±0.02 0.96±0.01 0.93±0.02 Subjective Auto J48 97.08±1.09 0.98±0.01 0.96±0.02 0.97±0.01 0.94±0.02

Very high accuracy: 95.05 - 99.60% Higher accuracy for ELOC

slide-18
SLIDE 18

Results

18

Dataset Features set Classifier Accuracy % Precision Recall F-Measure MCC ELOC All PART 99.55±0.45 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All JRip 99.53±0.47 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined PART 99.53±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined JRip 99.56±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Auto PART 99.38±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.99±0.01 ELOC Auto JRip 99.28±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 ELOC Auto J48 99.18±0.54 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 Subjective All PART 97.34±1.14 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective All JRip 96.54±1.20 0.98±0.01 0.95±0.02 0.97±0.01 0.93±0.02 Subjective All J48 97.18±1.07 0.98±0.01 0.97±0.02 0.97±0.01 0.94±0.02 Subjective Predefined PART 95.05±1.45 0.97±0.02 0.93±0.02 0.95±0.01 0.90±0.03 Subjective Predefined JRip 95.32±1.44 0.97±0.02 0.93±0.02 0.95±0.02 0.91±0.03 Subjective Predefined J48 95.10±1.42 0.97±0.02 0.94±0.02 0.95±0.01 0.90±0.03 Subjective Auto PART 97.33±1.08 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective Auto JRip 96.38±1.14 0.98±0.01 0.95±0.02 0.96±0.01 0.93±0.02 Subjective Auto J48 97.08±1.09 0.98±0.01 0.96±0.02 0.97±0.01 0.94±0.02

Very high Precision and Recall (0.93-1.00) Slight preference towards Precision Small standard deviaNons

slide-19
SLIDE 19

Results

RQ2: How the automaNc features acquisiNon affects the classificaNon quality?

19

slide-20
SLIDE 20

Results

20

Dataset Features set Classifier Accuracy % Precision Recall F-Measure MCC ELOC All PART 99.55±0.45 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All JRip 99.53±0.47 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined PART 99.53±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined JRip 99.56±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Auto PART 99.38±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.99±0.01 ELOC Auto JRip 99.28±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 ELOC Auto J48 99.18±0.54 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 Subjective All PART 97.34±1.14 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective All JRip 96.54±1.20 0.98±0.01 0.95±0.02 0.97±0.01 0.93±0.02 Subjective All J48 97.18±1.07 0.98±0.01 0.97±0.02 0.97±0.01 0.94±0.02 Subjective Predefined PART 95.05±1.45 0.97±0.02 0.93±0.02 0.95±0.01 0.90±0.03 Subjective Predefined JRip 95.32±1.44 0.97±0.02 0.93±0.02 0.95±0.02 0.91±0.03 Subjective Predefined J48 95.10±1.42 0.97±0.02 0.94±0.02 0.95±0.01 0.90±0.03 Subjective Auto PART 97.33±1.08 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective Auto JRip 96.38±1.14 0.98±0.01 0.95±0.02 0.96±0.01 0.93±0.02 Subjective Auto J48 97.08±1.09 0.98±0.01 0.96±0.02 0.97±0.01 0.94±0.02

All features provided the best results for both datasets Predefined slightly be‚er for ELOC and worse for SubjecNve

slide-21
SLIDE 21

Automa>c features acquisi>on

21 ELOC, All ELOC, Predefined ELOC, Auto Subjective, All Subjective, Predefined Subjective, Auto Brackets Brackets

  • Freq. of ”*”

Assignment Assignment

  • Freq. of ”*”

Comments Comments

  • Freq. of ”(”
  • Freq. of ”*”

Comments

  • Freq. of ”available”

Semicolons Full length

  • Freq. of ”;”
  • Freq. of ”available”

If

  • Freq. of ”:”

Full length Semicolons

  • Freq. of ”/”
  • Freq. of ”:”

While

  • Freq. of ”=”

Full length

  • Freq. of ”has”

Full length

  • Freq. of ”has”
  • Freq. of ”implied”

Length

  • Freq. of ”implied”
  • Freq. of ”license”

Semicolons

  • Freq. of ”license”
  • Freq. of ”none”

Tokens

  • Freq. of ”none”
  • Freq. of ”reserved”
  • Freq. of ”reserved”
  • Freq. of ”return”
  • Freq. of ”return”
  • Freq. of ”see”
  • Freq. of ”see”
  • Freq. of ”software”
  • Freq. of ”software”

Full length Full length Length Length Tokens Tokens

WEKA WrapperSubsetEval (classifier: J48) and the BestFirst method (selecNon based on Accuracy and RMSE, five folds, threshold = 0.01).

slide-22
SLIDE 22

Results

RQ3: How the choice of classificaNon algorithm affects the classificaNon quality?

22

slide-23
SLIDE 23

Results

23

Dataset Features set Classifier Accuracy % Precision Recall F-Measure MCC ELOC All PART 99.55±0.45 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All JRip 99.53±0.47 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC All J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined PART 99.53±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined JRip 99.56±0.46 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Predefined J48 99.60±0.41 1.00±0.01 1.00±0.00 1.00±0.00 0.99±0.01 ELOC Auto PART 99.38±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.99±0.01 ELOC Auto JRip 99.28±0.47 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 ELOC Auto J48 99.18±0.54 1.00±0.01 0.99±0.01 0.99±0.01 0.98±0.01 Subjective All PART 97.34±1.14 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective All JRip 96.54±1.20 0.98±0.01 0.95±0.02 0.97±0.01 0.93±0.02 Subjective All J48 97.18±1.07 0.98±0.01 0.97±0.02 0.97±0.01 0.94±0.02 Subjective Predefined PART 95.05±1.45 0.97±0.02 0.93±0.02 0.95±0.01 0.90±0.03 Subjective Predefined JRip 95.32±1.44 0.97±0.02 0.93±0.02 0.95±0.02 0.91±0.03 Subjective Predefined J48 95.10±1.42 0.97±0.02 0.94±0.02 0.95±0.01 0.90±0.03 Subjective Auto PART 97.33±1.08 0.98±0.01 0.97±0.02 0.97±0.01 0.95±0.02 Subjective Auto JRip 96.38±1.14 0.98±0.01 0.95±0.02 0.96±0.01 0.93±0.02 Subjective Auto J48 97.08±1.09 0.98±0.01 0.96±0.02 0.97±0.01 0.94±0.02

Nearly no differences between the selected ones PART >? J48 >? JRip

slide-24
SLIDE 24

Limita>ons & hard cases

  • Block comments
  • MulNple meaningful lines of code in one line
  • A single meaningful line in many lines

24

slide-25
SLIDE 25

Ques>ons

25