Mining Version Histories to Guide Software Changes by T. Zimmermann, - - PowerPoint PPT Presentation

mining version histories to guide software changes
SMART_READER_LITE
LIVE PREVIEW

Mining Version Histories to Guide Software Changes by T. Zimmermann, - - PowerPoint PPT Presentation

Mining Version Histories to Guide Software Changes by T. Zimmermann, P. Weigerber, S. Diehl, A. Zeller in IEEE Transaction on Software Engineering, Vol. 31, No. 6., June 2005 The Idea Can we make similar suggestions for software changes?


slide-1
SLIDE 1

Mining Version Histories to Guide Software Changes

by T. Zimmermann, P. Weißgerber, S. Diehl, A. Zeller

in IEEE Transaction on Software Engineering,

  • Vol. 31, No. 6., June 2005
slide-2
SLIDE 2

The Idea

Can we make similar suggestions for software changes?

slide-3
SLIDE 3

Extending Eclipse Preferences

 Extend Eclipse IDE with a new preference  Preferences are stored in a field fKeys[]

slide-4
SLIDE 4

Extending Eclipse Preferences

 What else do you need to change?

 Which of the 27,000 files?  Which of the 20,000 classes?  Which of the 200,000 methods?

 Program analysis

 fKeys[] and initDefaults() use the same variables  Usage does not induce change  Usage can be detected only within the source code

 Eclipse has 12,000 non-Java files

slide-5
SLIDE 5

Learning from History

 Programmer who changed fKeys[] also changed …

slide-6
SLIDE 6

From CVS to Transactions

 The CVS archive for Eclipse has more than 47,000

transactions

slide-7
SLIDE 7

ROSE in a Nutshell

slide-8
SLIDE 8

Changes -> Transactions -> Rules

Entity – a triple (c, i, p),

 where c – syntactic category; i – identifier; p – parent entity

Example: (method, initDefaults(), (class, Comp, (file, Comp.java, …))

Operations on entities: add_to, del_from, alter

Transaction – the set of changes simultaneously submitted by a developer to a version archive

slide-9
SLIDE 9

Getting Syntactic Entities

slide-10
SLIDE 10

Light-Weight Analysis with ROSE

slide-11
SLIDE 11

Light-Weight Analysis with ROSE

Rose analyzes C/C++, JAVA, PYTHON, TEX and TEXINFO files We get modified methods, variables and subsections

slide-12
SLIDE 12

Changes -> Transactions -> Rules

 ROSE retrieves changes and transactions from CVS

[Berliner’90]

 CVS provides only file versioning  Per-file changes are grouped into transactions

 Files -> Transactions -> Sliding window approach

[Fogel’02]

 Two subsequent changes, the same author, 200 second apart

 Branches and Merges in CVS

 Rose ignores changes that affect more than 30 entities

slide-13
SLIDE 13

Changes -> Transactions -> Rules

 Rules are mined from transactions  Rules are mined with Apriori Algorithm [Agrawal’94]  The generated rules have the form:

antecedent(s) => consequent (s)

 The rules have a probabilistic interpretation

Evidence: support count (# of transactions) and confidence (the strength of the correspondence)

slide-14
SLIDE 14

Evolutionary Coupling

slide-15
SLIDE 15

Evolutionary Coupling

slide-16
SLIDE 16

Evolutionary Coupling

Support: How much evidence (= simultaneous changes)? Confidence: How much relevant is coupling for participants?

slide-17
SLIDE 17

Evolutionary Coupling

Support: How much evidence (= simultaneous changes)? Confidence: How much relevant is coupling for participants?

slide-18
SLIDE 18

Applying Rules

 The programmer performs a change – “a situation”:  ROSE suggests further changes by applying matching

rules

 Matching rule = situation = antecedent

 The suggestion = union of the consequents of all the

matching rules

 The # of rules depends on support count and

confidence

slide-19
SLIDE 19

Multi-Dimensional Rules

 If something is added to software, there is no way to

predict the change based on history

 E.g., the developer adds “Foo” constant to Comp.java  ROSE can do that in “operation” dimension

slide-20
SLIDE 20

Examples of Rules

 GCC arrays that define the cost of different assembler

  • perations for INTEL CPUs

 The arrays have been altered 9 times; 9 out of 11 times,

the change is triggered by a change in the type:

slide-21
SLIDE 21

Examples of Rules

 Python and C files – detecting evolutionary couplings

in different programming languages

 It would require cross-language program analysis to

detect this coupling

slide-22
SLIDE 22

Examples of Rules

 POSTGRES documentation

slide-23
SLIDE 23

ROSE Server and Client

 The ROSE server determines coupling and rules  The ROSE client guides the programmer along related

changes

slide-24
SLIDE 24

Evaluation

 How good are rules at predicting changes?  Training period: ROSE infers rules from the past  Evaluation period: ROSE applies the mined rules  In evaluation period, every transaction T is checked:

 Navigation: given one change in T, does ROSE point to

further changes in T?

 Error Prevention: given all but one change from T, does

ROSE point to the missing change?

 Closure: given all changes of T, does ROSE stay silent?

slide-25
SLIDE 25

Evaluating Additional Questions

 Granularity

 Files and functions

 Maintenance

 No addition or deletions

 Multiple Dimensions

 What is the benefit of add_to and del_from?

 History

 How much history? Usefulness over time? Quality or

recommendations depending of the development cycle and releases?

 Recent Changes

 Relevance of old changes

slide-26
SLIDE 26

Projects Used for Evaluation

slide-27
SLIDE 27

Precision vs. Recall

 Recall: How many relevant entities are returned?  Precision: How many of the returned entities are

relevant?

slide-28
SLIDE 28

Precision vs. Feedback / Support Count vs. Confidence

slide-29
SLIDE 29

Results: Navigation, Prevention, Closure

slide-30
SLIDE 30

Navigation, Prevention, Closure

 The programmer has changed one single entity. Can

ROSE suggest other entities that should be changed?

 The programmer has changed several entities but one.

Does ROSE find the missing one?

 The programmer made all necessary changes. How often

does ROSE still suggest a missing change?

slide-31
SLIDE 31

Results for Fine Granularity

slide-32
SLIDE 32

Results: Navigation

 Given one initial item, ROSE makes predictions in 66

percent of all queries

 On average, the predictions contain 33 percent of all

items changed

 For those queries for which ROSE makes

recommendations, in 7 percent of the cases, a correct location is within ROSE’s topmost three suggestions

slide-33
SLIDE 33

Results: Prevention and Closure

 In 3 percent of the queries where one item is missing,

ROSE issues a correct warning

 A warning predicts 75 percent of the items that need to

be changed

 ROSE’s warning about missing items should be taken

seriously …

 Only 2 percent of all transactions cause a false alarm (!)

slide-34
SLIDE 34

Results for Coarse Granularity

slide-35
SLIDE 35

Results for Maintenance

 Rose shows the best predictive power for

changes to existing entities

slide-36
SLIDE 36

Threads to Validity

 Kinds of version histories and software projects

 8 projects; 100,000 transactions

 Transactions do not record the order

 CVS limitation

 Quality of transactions?  User studies?

slide-37
SLIDE 37

Summary

For stable systems like GCC, ROSE gives precise suggestions (recommendation in 63% of transactions, precision – 30%, in 90%

  • f all recommendations – 3 topmost suggestions contain correct

entity)

For rapidly changing systems like KOFFICE, most useful suggestions are at the file level (because prediction new functions –

  • ut of reach for any approach)

Predictive power of ROSE is best during maintenance phases

In about 2-7% of all erroneous transactions, ROSE correctly detects the missing change (only 2% of all transactions cause false alarm)

ROSE detects coupling between non-program entities (e.g. docs, manuals, mappings)

slide-38
SLIDE 38

Future Work

 Taxonomies: identify patterns of changes  Sequence rules: detect rules across multiple

transactions

 Further data sources: log messages, bug databases  Refactoring: ROSE does not recognize renamings of

methods or files

 Program analysis: can improve the overall approach  Rule presentation: visualization of rules can help

slide-39
SLIDE 39

Downloads

ROSE is publicly available as a plug-in for ECLIPSE For details and downloads visit http://www.st.cs.uni-sb.de/softevo