from Developer Communications Sebastiano Jairo - - PowerPoint PPT Presentation
from Developer Communications Sebastiano Jairo - - PowerPoint PPT Presentation
Mining Source Code Descriptions from Developer Communications Sebastiano Jairo Massimiliano Andrian Gerardo Panichella Aponte Di Penta Marcus Canfora Context:
Context: Software Project
Documentation
Source Code
Developer
Class diagram Sequence diagram
Program Comprehension Maintenance Tasks
Context: Software Project
Documentation
Source Code
Developer understanding
Class diagram Sequence diagram
Program Comprehension
Difficult
Maintenance Tasks
Context: Software Project
Documentation
Source Code
Developer understanding describes
Class diagram Sequence diagram
Program Comprehension
understanding Difficult
Maintenance Tasks
Source Code
Developer
Coming back to the reality...
Context: Software Project
Program Comprehension Maintenance Tasks
understanding Difficult
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
Idea
In such situations developers need to infer knowledge from,
the source code itself source code descriptions in external artifacts.
Developer
We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code.
Idea
In such situations developers need to infer knowledge from,
the source code itself source code descriptions in external artifacts.
Developer
..................................................
When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index.
..................................................
CLASS: IndexSplitter METHOD: split
A Five Step-Approach for Mining Method Descriptions
Developer
Step 1: Downloading emails/bugs reports and tracing them
- nto classes
Two heuristics
The discussion contains a fully-qualified class name (e.g.,
- rg.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit
Developer Discussion
When call the method .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. IndexSplitter
Step 1: Downloading emails/bugs reports and tracing them
- nto classes
Two heuristics
The discussion contains a fully-qualified class name (e.g.,
- rg.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit
Developer Discussion
When call the method .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
CLASS: IndexSplitter
IndexSplitter
Step 2: Extracting paragraphs
Two heuristics
We consider as paragraphs, text section separated by one or more
white lines
We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.
Developer Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
PAR 2 PAR 3 PAR 1
Step 2: Extracting paragraphs
Two heuristics
We consider as paragraphs, text section separated by one or more
white lines
We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.
Developer Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
PAR 2 PAR 3 PAR 1
When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index.
...................................................................................... ...................................................................................... ...................................................................................... ......................................................................................
Step 3: Tracing paragraphs onto methods
These paragraphs must respect the following two conditions:
A) A valid paragraph must contain the keyword “method” B) and the method name must be followed by a open parenthesis—
i.e., we match “foo(”
Developer Discussion PAR 1 CLASS: IndexSplitter METHOD: split( A) B)
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:
.......................... Problem seems to come from MainMethodeSearchEngine in
- rg.eclipse.jdt.internal.ui.launcher
The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is
- ON. This method add all types sub-
types as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution ..........................
CLASS: MainMethodSearchEngine
(IProgressMonitor, IJavaSearchScope, boolean)
METHOD: serachMainMethods SCORE
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:
a) Method parameters: % of parameters s1= mentioned in the
- paragraphs. Value
between 0 and 1
1 if the method does not have parameters
.......................... Problem seems to come from MainMethodeSearchEngine in
- rg.eclipse.jdt.internal.ui.launcher
The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is
- ON. This method add all types sub-
types as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution ..........................
CLASS: MainMethodSearchEngine
(IProgressMonitor, IJavaSearchScope, boolean)
METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE
a) Method parameters: % of parameters s1= mentioned in the
- paragraphs. Value
between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not have parameters Equal to one if the method is void.
.......................... Problem seems to come from MainMethodeSearchEngine in
- rg.eclipse.jdt.internal.ui.launcher
The Method searchMainMethods ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is
- ON. This method add all types sub-
types as soon as the given scope encloses them without testing if sub-types have a main! After IType[] before the excecution ..........................
CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE
(IProgressMonitor, IJavaSearchScope, boolean) return
1+ % parameter = 100% -> s1= 1 =
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:
a) Method parameters: % of parameters s1= mentioned in the
- paragraphs. Value
between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not have parameters Equal to one if the method is void.
c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
.......................... Problem seems to come from MainMethodeSearchEngine in
- rg.eclipse.jdt.internal.ui.launcher
The Method searchMainMethods ,there's a to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is
- ON. This method add all types sub-
types as soon as the given scope encloses them without testing if sub-types have a main! After IType[] before the ..........................
CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE =
return
1+
(IProgressMonitor, IJavaSearchScope, boolean) excecution call
0+ 1 % parameter = 100% -> s1= 1 = 2
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:
We selected paragraphs that have:
- 1. s1 ≥ thP = 0.5
- 2. s2 + s3 + s4 ≥ thH = 1
SCORE = 1+ 0+ 1 % parameter = 100% -> s1= 1 ≥ 0.5 = 2 ≥ 1 a) Method parameters: % of parameters s1= mentioned in the
- paragraphs. Value
between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the paragraph contains the s2= keyword “return”. If YES Value equal 1, 0 otherwise
1 if the method does not have parameters Equal to one if the method is void.
c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“execute” keywords appears in the paragraph, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score:
OK
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they are likely describing. Removing:
- English stop words;
- Programming
language keywords Using:
- Camel Case
splitting the on remaining words
- Vector Space Model
METHOD PARAGRAPH SCORE Similarity
Method_3
Paragraph_4
2.5 96.1%
Method_1
Paragraph_1
2.5 95.6%
Method_2
Paragraph_2
1.5 97.4%
Method_3
Paragraph_3
1.5 86.2%
Method_1
Paragraph_3
1.5 79.0%
Method_3
Paragraph_2
1.5 77.5%
Method_2
Paragraph_4
1.5 64.3%
Method_2
Paragraph_3
1.3 83.2%
Method_3
Paragraph_1
1.3 73.9%
Method_2
Paragraph_1
1.3 68.7%
Method_1
Paragraph_4
1.3 53.6%
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they are likely describing. Removing:
- English stop words;
- Programming
language keywords Using:
- Camel Case
splitting the on remaining words
- Vector Space Model
METHOD PARAGRAPH SCORE Similarity
Method_3
Paragraph_4
2.5 96.1%
Method_1
Paragraph_1
2.5 95.6%
Method_2
Paragraph_2
1.5 97.4%
Method_3
Paragraph_3
1.5 86.2%
Method_1
Paragraph_3
1.5 79.0%
Method_3
Paragraph_2
1.5 77.5%
Method_2
Paragraph_4
1.5 64.3%
Method_2
Paragraph_3
1.3 83.2%
Method_3
Paragraph_1
1.3 73.9%
Method_2
Paragraph_1
1.3 68.7%
Method_1
Paragraph_4
1.3 53.6%
th>=0.80
Empirical Study
- Goal: analyze source code descriptions in developer
discussions
- Purpose: investigating how developer discussions
describe methods of Java Source Code
- Quality focus: find good method description in
messages exchanged among contributors/developers
- Context: Bug-report and mailing lists of two Java
Project
- Apache Lucene and Eclipse
Context
Research Questions
- RQ1 (method coverage): How many methods from
the analyzed software systems are described by the paragraphs identified by the proposed approach?
- RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?
- RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified by the proposed approach?
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying method descriptions?
We sampled 250 descriptions from each project
RQ3: How many potentially good method descriptions are missed by the approach?
TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic
System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33
We sampled 100 descriptions from each project