[PPT] - from Developer Communications Sebastiano Jairo PowerPoint Presentation

SLIDE 1

Mining Source Code Descriptions from Developer Communications

Sebastiano Jairo Massimiliano Andrian Gerardo Panichella Aponte Di Penta Marcus Canfora

SLIDE 2

Context: Software Project

Documentation

Source Code

Developer

Class diagram Sequence diagram

Program Comprehension Maintenance Tasks

SLIDE 3

Context: Software Project

Documentation

Source Code

Developer understanding

Class diagram Sequence diagram

Program Comprehension

Difficult

Maintenance Tasks

SLIDE 4

Context: Software Project

Documentation

Source Code

Developer understanding describes

Class diagram Sequence diagram

Program Comprehension

understanding Difficult

Maintenance Tasks

SLIDE 5

Source Code

Developer

Coming back to the reality...

Context: Software Project

Program Comprehension Maintenance Tasks

understanding Difficult

SLIDE 6

We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.

Idea

In such situations developers need to infer knowledge from,

the source code itself source code descriptions in external artifacts.

Developer

SLIDE 7

We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.

Idea

In such situations developers need to infer knowledge from,

the source code itself source code descriptions in external artifacts.

Developer

..................................................

When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index.

..................................................

CLASS: IndexSplitter METHOD: split

SLIDE 8

A Five Step-Approach for Mining Method Descriptions

Developer

SLIDE 9

Step 1: Downloading emails/bugs reports and tracing them

nto classes

Two heuristics

The discussion contains a fully-qualified class name (e.g.,

rg.apache.lucene.analysis.MappingCharFilter); or the email contains a

file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit

Developer Discussion

When call the method .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. IndexSplitter

SLIDE 10

Step 1: Downloading emails/bugs reports and tracing them

nto classes

Two heuristics

The discussion contains a fully-qualified class name (e.g.,

rg.apache.lucene.analysis.MappingCharFilter); or the email contains a

file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit

Developer Discussion

When call the method .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.

CLASS: IndexSplitter

IndexSplitter

SLIDE 11

Step 2: Extracting paragraphs

Two heuristics

We consider as paragraphs, text section separated by one or more

white lines

We prune out paragraph description from source code fragments

and/or stack Traces "by using an approach inspired by the work of

Bacchelli et al.

Developer Discussion

When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.

PAR 2 PAR 3 PAR 1

SLIDE 12

Step 2: Extracting paragraphs

Two heuristics

We consider as paragraphs, text section separated by one or more

white lines

We prune out paragraph description from source code fragments

and/or stack Traces "by using an approach inspired by the work of

Bacchelli et al.

Developer Discussion

When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.

PAR 2 PAR 3 PAR 1

SLIDE 13

When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index.

...................................................................................... ...................................................................................... ...................................................................................... ......................................................................................

Step 3: Tracing paragraphs onto methods

These paragraphs must respect the following two conditions:

A) A valid paragraph must contain the keyword “method” B) and the method name must be followed by a open parenthesis—

i.e., we match “foo(”

Developer Discussion PAR 1 CLASS: IndexSplitter METHOD: split( A) B)

SLIDE 14

Step 4: Heuristic based Filtering

We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:

.......................... Problem seems to come from MainMethodeSearchEngine in

rg.eclipse.jdt.internal.ui.launcher

The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is

ON. This method add all types sub-

types as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution ..........................

CLASS: MainMethodSearchEngine

(IProgressMonitor, IJavaSearchScope, boolean)

METHOD: serachMainMethods SCORE

SLIDE 15

Step 4: Heuristic based Filtering

We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:

a) Method parameters: % of parameters s1= mentioned in the

paragraphs. Value

between 0 and 1

1 if the method does not have parameters

.......................... Problem seems to come from MainMethodeSearchEngine in

rg.eclipse.jdt.internal.ui.launcher

The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is

ON. This method add all types sub-

types as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution ..........................

CLASS: MainMethodSearchEngine

(IProgressMonitor, IJavaSearchScope, boolean)

METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE

SLIDE 16

a) Method parameters: % of parameters s1= mentioned in the

paragraphs. Value

between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise

1 if the method does not have parameters Equal to one if the method is void.

.......................... Problem seems to come from MainMethodeSearchEngine in

rg.eclipse.jdt.internal.ui.launcher

The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is

ON. This method add all types sub-

types as soon as the given scope encloses them without testing if sub-types have a main! After IType[] before the excecution ..........................

CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE

(IProgressMonitor, IJavaSearchScope, boolean) return

1+ % parameter = 100% -> s1= 1 =

Step 4: Heuristic based Filtering

We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:

SLIDE 17

a) Method parameters: % of parameters s1= mentioned in the

paragraphs. Value

between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise

1 if the method does not have parameters Equal to one if the method is void.

c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise

.......................... Problem seems to come from MainMethodeSearchEngine in

rg.eclipse.jdt.internal.ui.launcher

The Method searchMainMethods ,there's a to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is

ON. This method add all types sub-

types as soon as the given scope encloses them without testing if sub-types have a main! After IType[] before the ..........................

CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE =

return

1+

(IProgressMonitor, IJavaSearchScope, boolean) excecution call

0+ 1 % parameter = 100% -> s1= 1 = 2

Step 4: Heuristic based Filtering

We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:

SLIDE 18

We selected paragraphs that have:

1. s1 ≥ thP = 0.5
2. s2 + s3 + s4 ≥ thH = 1

SCORE = 1+ 0+ 1 % parameter = 100% -> s1= 1 ≥ 0.5 = 2 ≥ 1 a) Method parameters: % of parameters s1= mentioned in the

paragraphs. Value

between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise

1 if the method does not have parameters Equal to one if the method is void.

c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“execute” keywords appears in the paragraph, 0 otherwise

Step 4: Heuristic based Filtering

We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:

OK

SLIDE 19

Step 5: Similarity based Filtering

We rank filtered paragraphs through their textual similarity with the method they are likely describing. Removing:

English stop words;
Programming

language keywords Using:

Camel Case

splitting the on remaining words

Vector Space Model

METHOD PARAGRAPH SCORE Similarity

Method_3

Paragraph_4

2.5 96.1%

Method_1

Paragraph_1

2.5 95.6%

Method_2

Paragraph_2

1.5 97.4%

Method_3

Paragraph_3

1.5 86.2%

Method_1

Paragraph_3

1.5 79.0%

Method_3

Paragraph_2

1.5 77.5%

Method_2

Paragraph_4

1.5 64.3%

Method_2

Paragraph_3

1.3 83.2%

Method_3

Paragraph_1

1.3 73.9%

Method_2

Paragraph_1

1.3 68.7%

Method_1

Paragraph_4

1.3 53.6%

SLIDE 20

Step 5: Similarity based Filtering

We rank filtered paragraphs through their textual similarity with the method they are likely describing. Removing:

English stop words;
Programming

language keywords Using:

Camel Case

splitting the on remaining words

Vector Space Model

METHOD PARAGRAPH SCORE Similarity

Method_3

Paragraph_4

2.5 96.1%

Method_1

Paragraph_1

2.5 95.6%

Method_2

Paragraph_2

1.5 97.4%

Method_3

Paragraph_3

1.5 86.2%

Method_1

Paragraph_3

1.5 79.0%

Method_3

Paragraph_2

1.5 77.5%

Method_2

Paragraph_4

1.5 64.3%

Method_2

Paragraph_3

1.3 83.2%

Method_3

Paragraph_1

1.3 73.9%

Method_2

Paragraph_1

1.3 68.7%

Method_1

Paragraph_4

1.3 53.6%

th>=0.80

SLIDE 21

Empirical Study

Goal: analyze source code descriptions in developer

discussions

Purpose: investigating how developer discussions

describe methods of Java Source Code

Quality focus: find good method description in

messages exchanged among contributors/developers

Context: Bug-report and mailing lists of two Java

Project

Apache Lucene and Eclipse

SLIDE 22

Context

SLIDE 23

Research Questions

RQ1 (method coverage): How many methods from

the analyzed software systems are described by the paragraphs identified by the proposed approach?

RQ2 (precision): How precise is the proposed approach

in identifying method descriptions?

RQ3 (missing descriptions): How many potentially

good method descriptions are missed by the approach?

SLIDE 24

RQ1: How many methods from the analyzed software

systems are described by the paragraphs identified by the proposed approach?

SLIDE 25

RQ1: How many methods from the analyzed software

systems are described by the paragraphs identified by the proposed approach?

SLIDE 26

RQ1: How many methods from the analyzed software

systems are described by the paragraphs identified by the proposed approach?

SLIDE 27

RQ2: How precise is the proposed approach in identifying method descriptions?

We sampled 250 descriptions from each project

SLIDE 28

RQ2: How precise is the proposed approach in identifying method descriptions?

We sampled 250 descriptions from each project

SLIDE 29

RQ2: How precise is the proposed approach in identifying method descriptions?

We sampled 250 descriptions from each project

SLIDE 30

RQ3: How many potentially good method descriptions are missed by the approach?

TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic

System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33

We sampled 100 descriptions from each project

SLIDE 31

Conclusion

SLIDE 32

Conclusion

SLIDE 33

Conclusion

SLIDE 34