Mining Version Histories to Guide Software Changes
by T. Zimmermann, P. Weißgerber, S. Diehl, A. Zeller
in IEEE Transaction on Software Engineering,
- Vol. 31, No. 6., June 2005
Mining Version Histories to Guide Software Changes by T. Zimmermann, - - PowerPoint PPT Presentation
Mining Version Histories to Guide Software Changes by T. Zimmermann, P. Weigerber, S. Diehl, A. Zeller in IEEE Transaction on Software Engineering, Vol. 31, No. 6., June 2005 The Idea Can we make similar suggestions for software changes?
Extend Eclipse IDE with a new preference Preferences are stored in a field fKeys[]
Which of the 27,000 files? Which of the 20,000 classes? Which of the 200,000 methods?
fKeys[] and initDefaults() use the same variables Usage does not induce change Usage can be detected only within the source code
Eclipse has 12,000 non-Java files
Programmer who changed fKeys[] also changed …
The CVS archive for Eclipse has more than 47,000
where c – syntactic category; i – identifier; p – parent entity
ROSE retrieves changes and transactions from CVS
CVS provides only file versioning Per-file changes are grouped into transactions
Files -> Transactions -> Sliding window approach
Two subsequent changes, the same author, 200 second apart
Branches and Merges in CVS
Rose ignores changes that affect more than 30 entities
Rules are mined from transactions Rules are mined with Apriori Algorithm [Agrawal’94] The generated rules have the form:
The rules have a probabilistic interpretation
The programmer performs a change – “a situation”: ROSE suggests further changes by applying matching
Matching rule = situation = antecedent
The suggestion = union of the consequents of all the
The # of rules depends on support count and
If something is added to software, there is no way to
E.g., the developer adds “Foo” constant to Comp.java ROSE can do that in “operation” dimension
GCC arrays that define the cost of different assembler
The arrays have been altered 9 times; 9 out of 11 times,
Python and C files – detecting evolutionary couplings
It would require cross-language program analysis to
POSTGRES documentation
The ROSE server determines coupling and rules The ROSE client guides the programmer along related
How good are rules at predicting changes? Training period: ROSE infers rules from the past Evaluation period: ROSE applies the mined rules In evaluation period, every transaction T is checked:
Navigation: given one change in T, does ROSE point to
Error Prevention: given all but one change from T, does
Closure: given all changes of T, does ROSE stay silent?
Granularity
Files and functions
Maintenance
No addition or deletions
Multiple Dimensions
What is the benefit of add_to and del_from?
History
How much history? Usefulness over time? Quality or
Recent Changes
Relevance of old changes
Recall: How many relevant entities are returned? Precision: How many of the returned entities are
The programmer has changed one single entity. Can
The programmer has changed several entities but one.
The programmer made all necessary changes. How often
Given one initial item, ROSE makes predictions in 66
On average, the predictions contain 33 percent of all
For those queries for which ROSE makes
In 3 percent of the queries where one item is missing,
A warning predicts 75 percent of the items that need to
ROSE’s warning about missing items should be taken
Only 2 percent of all transactions cause a false alarm (!)
8 projects; 100,000 transactions
CVS limitation
Taxonomies: identify patterns of changes Sequence rules: detect rules across multiple
Further data sources: log messages, bug databases Refactoring: ROSE does not recognize renamings of
Program analysis: can improve the overall approach Rule presentation: visualization of rules can help