Automatic Inference of Structural Changes for Matching Across - - PowerPoint PPT Presentation

automatic inference of structural changes for matching
SMART_READER_LITE
LIVE PREVIEW

Automatic Inference of Structural Changes for Matching Across - - PowerPoint PPT Presentation

Automatic Inference of Structural Changes for Matching Across Program Versions Miryung Kim, David Notkin, Dan Grossman Computer Science & Engineering University of Washington Foo.mA(float) Foo.mA() Foo.mB(float) Foo.mB() Foo.mC()


slide-1
SLIDE 1

Automatic Inference of Structural Changes for Matching Across Program Versions

Miryung Kim, David Notkin, Dan Grossman Computer Science & Engineering University of Washington

Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

slide-2
SLIDE 2

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Code Matching Problem

slide-3
SLIDE 3

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Our Approach: Matching with Change Rules

Change Rules

slide-4
SLIDE 4

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Our Approach: Matching with Change Rules

Change Rules

all methods in Boo class take int argument instead of bool.

slide-5
SLIDE 5

Motivations for Matching Code

  • A fundamental building block for mining

software repositories

  • Also a basis for classic software evolution

research and tools

  • Software version merging
  • Regression testing
  • Profile propagation
slide-6
SLIDE 6

Matching is Challenging.

  • Matching is hard due to code addition &

deletion, copy & paste, refactorings, etc.

  • Delta between two versions can be very

large.

  • For many uses, matching results must be

concise and comprehensible.

slide-7
SLIDE 7

Outline

  • background
  • our rule-based matching approach
  • inference algorithm
  • evaluation
  • potential applications of change rules
slide-8
SLIDE 8

Matching Problem ≈ Change Identification Problem

The problem of identifying code matches The problem of identifying changes

slide-9
SLIDE 9

Existing Approaches

diff, Syntactic Diff (CDiff), Semantic Diff, JDiff,

  • rigin analysis, refactoring reconstruction

tools, etc.

Individually compare code elements at particular granularities using similarity measures

slide-10
SLIDE 10

P P’

Limitations of Existing Approaches

slide-11
SLIDE 11

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Limitations of Existing Approaches

slide-12
SLIDE 12

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Limitations of Existing Approaches

slide-13
SLIDE 13

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Limitations of Existing Approaches

slide-14
SLIDE 14

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Cannot disambiguate among many potential matches

Limitations of Existing Approaches

slide-15
SLIDE 15

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Difficult to spot inconsistent and incomplete changes

Limitations of Existing Approaches

slide-16
SLIDE 16

P P’

Limitations of Existing Approaches

Output is an unstructured, usually lengthy list of matches

slide-17
SLIDE 17

P P’

Limitations of Existing Approaches

Output is an unstructured, usually lengthy list of matches

move axis drawing classes from chart to chart.axis add boolean input arg to all chart creation APIs

slide-18
SLIDE 18

P P’

Limitations of Existing Approaches

Output is an unstructured, usually lengthy list of matches

slide-19
SLIDE 19

Outline

✓background

  • our rule-based matching approach
  • inference algorithm
  • evaluation
  • potential applications of change rules
slide-20
SLIDE 20
  • Our change rule can concisely describe a

set of related refactorings and API changes at or above the method header level.

  • Our tool automatically infers a set of

likely change rules between two versions of a program.

Our Rule-based Matching Approach

slide-21
SLIDE 21

P P’

Represent a high-level change pattern using a change rule ➡ Easy to understand change intent

move axis drawing classes from chart to chart.axis add boolean input arg to all chart creation APIs

Our Contribution 1. Comprehensibility

for all x in chart.*Axis*.*(*) packageReplace(x, chart, chart.axis) for all x in Factory.create*Chart(*) argAppend(x, boolean)

slide-22
SLIDE 22

P P’

Our Contribution 2. Conciseness

R1 R2 R3 R4 R5 R6

Concisely represent large deltas using a small number of change rules

slide-23
SLIDE 23

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Find matches evidenced by a more general change pattern ➡ Improving recall

Our Contribution 3. High Recall

X O

slide-24
SLIDE 24

Bar.Bar() Bar.mC(int) Foo.mA() Foo.mB() Foo.mC() Boo.mA(bool) Boo.mB(bool) Bar.Bar() Bar.mC(int) Foo.mA(float) Foo.mB(float) Foo.mC() Bar.mA(bool) Boo.mA(int) Boo.mB(int)

P P’

Our rule encodes exceptions explicitly ➡ Easy to notice inconsistent and incomplete changes

Our Contribution 4. Explicit Exceptions

for all x in Foo.m*() except {Foo.mC()} argAppend(x, float)

slide-25
SLIDE 25

Change Rule

.

P P’

for all x:method in scope transformation(x)

slide-26
SLIDE 26

Scope

  • We use a regular expression to denote a

set of methods

  • e.g. chart.Factory.create*Chart(*)
slide-27
SLIDE 27

Transformations At or Above the Level of Method Header

  • 9 types of transformations representing:
  • replace the name of package, class, and

method

  • replace the return type
  • modify the input signature, etc.
slide-28
SLIDE 28

Change Rule with Exceptions

.

P P’

for all x:method in (scope - exceptions) transformation(x)

slide-29
SLIDE 29

Example Change Rule

.

Factory.createChart() Factory.createBarChart() ... Factory.createPieChart() Factory.createLineChart() Factory.createChart(int) Factory.createBarChart(int) ... Factory.createPieChart() Factory.createLineChart(int)

P P’

Chart creation APIs were changed to take an additional int parameter.

slide-30
SLIDE 30

Example Change Rule

.

Factory.createChart() Factory.createBarChart() ... Factory.createPieChart() Factory.createLineChart() Factory.createChart(int) Factory.createBarChart(int) ... Factory.createPieChart() Factory.createLineChart(int)

P P’ For all x in Factory.create*Chart(*) argAppend(x, [int])

slide-31
SLIDE 31

Example Change Rule

.

Factory.createChart() Factory.createBarChart() ... Factory.createPieChart() Factory.createLineChart() Factory.createChart(int) Factory.createBarChart(int) ... Factory.createPieChart() Factory.createLineChart(int)

P P’ For all x in Factory.create*Chart(*) except {Factory.createPieChart()} argAppend(x, [int]) 14 matches and 1 exception

slide-32
SLIDE 32

Outline

✓background ✓our rule-based matching approach

  • inference algorithm
  • evaluation
  • potential applications of change rules
slide-33
SLIDE 33

Inference Algorithm Overview

Input: two versions of a program Output: a set of likely change rules

  • 1. Generate seed matches
  • 2. Generate candidate rules by generalizing seed

matches

  • 3. Evaluate and select candidate rules (greedy algorithm)
slide-34
SLIDE 34

Step 1: Generate Seed Matches

  • Seed matches provide hints

about likely changes.

  • We generate seeds based on

textual similarity between two method headers.

  • Seed matches need not be all

correct matches.

Foo.getBar(int) Foo.getBar(bool)

textual similarity: 0.75

slide-35
SLIDE 35

Step 2: Generate Candidate Rules for each seed [x, y]

Given a seed match, [Foo.getBar(int), Boo.getBar(bool)] Transformations = { replaceArg(x, int, bool) replaceClass(x, Foo, Boo)} Scopes = {*.*(*), Foo.*(*), ..., *.get*(*), *.*Bar(*), ... , Foo.get*(int),... } Candidate Rules = { for all x in *.*(*) replaceArg(x, int, bool), for all x in Foo.*(*) replaceClass(x, Foo, Boo), ..., for all x in *.*(*) replaceArg(x, int, bool) AND replaceClass(x, Foo, Boo)

  • Compare x and y and

reverse engineer a set of

transformations, T.

  • Based on x, guess a set of

scopes, S.

  • Generate candidate rules

for each pair in S ×

PowerSet(T).

slide-36
SLIDE 36

Step 3: Evaluate and Select Rules

  • Greedily select a small subset of candidate

rules that explain a large number of matches.

  • In each iteration
  • evaluate all candidate rules
  • select a valid rule with the most number of

matches

  • exclude the matched methods from the set of

remaining unmatched methods

  • Repeat until no rule can find any additional matches.
slide-37
SLIDE 37

Finding Exceptions

.

Factory.createChart() Factory.createBarChart() Factory.createPieChart() Factory.createLineChart() Factory.createChart(int) Factory.createBarChart(int) Factory.createPieChart() Factory.createLineChart(int)

P P’ For all x in Factory.create*Chart(*) argAppend(x, [int])

a rule is valid if # exceptions < ε × |scope|

slide-38
SLIDE 38

Finding Exceptions

P P’ For all x in Factory.create*Chart(*) except {Factory.createPieChart} argAppend(x, [int]) 3 matches 1 exceptions

.

Factory.createChart() Factory.createBarChart() Factory.createPieChart() Factory.createLineChart() Factory.createChart(int) Factory.createBarChart(int) Factory.createPieChart() Factory.createLineChart(int)

a rule is valid if # exceptions < ε × |scope|

slide-39
SLIDE 39

Optimizations

  • We create and evaluate rules on

demand.

  • 1. Candidate rules have subsumption
  • structure. e.g. *.*.*(*Axis) ⊂ *.*.*(*)
  • 2. The nature of greedy algorithm
  • Running time: a few seconds (usual check-

ins), average 7 minutes (releases)

slide-40
SLIDE 40

Outline

✓background ✓our rule-based matching approach ✓inference algorithm

  • evaluation
  • potential applications of change rules
slide-41
SLIDE 41

Quantitative Evaluation

  • Precision
  • Recall
  • Conciseness = |Matches| / |Rules| (M/R

Ratio)

  • We created evaluation data sets by manually

inspecting our results combined with the results from other tools.

slide-42
SLIDE 42

Rule-based Matching Results for Three Release Archives

JFreeChart jHotDraw jEdit

(17 release pairs) (4 release pairs) (4 release pairs)

Precision Median (Min ~ Max)

94% (78~100%) 99% (82~100%) 93% (87~95%)

Recall Median (Min ~ Max)

93% (70~100%) 99% (92~100%) 98% (95~100%)

M/R ratio Median (Min ~ Max)

3.50 (1.20~135.23) 2.54 (1.00~244.26) 1.73 (1.23~2.39)

slide-43
SLIDE 43

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0% 20% 40% 60% 80% 100% Percentage of Found Rules Precision and Recall of Found Matches Precision(JFreeChart0.9.8- 0.9.9) Precision(JHotDraw-5.3-5.41) Precision(JEdit4.0-4.1) Recall(JFreeChart0.9.8-0.9.9) Recall(JHotDraw5.3-5.41) Recall(JEdit4.0-4.1) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0% 20% 40% 60% 80% 100% Percentage of Found Rules Recall of Found Matches Recall(JFreeChart0.9.8-0.9.9) Recall(JHotDraw5.3-5.41) Recall(JEdit4.0-4.1)

Rule-based Matching Results for Three Release Archives

Top 20% of the rules find over 55% of the matches. Top 40% of the rules find over 70% of the matches.

slide-44
SLIDE 44

Comparison with Three Existing Tools

  • UMLDiff [Xing and Stroulia 05]
  • Refactoring Reconstruction [Weißgerber

and Diehl 06]

  • Automatic Renaming Identification [S. Kim,

Pan, and Whitehead 05]

slide-45
SLIDE 45

Comparison: Recall & Precision

programs

Other’s Recall Our Recall Other’s Prec. Our Prec.

[XS05] jfreechart

18 releases

92% 98% 99% 97%

[WD06] jEdit

2715 check-ins

72% 96% 93% 98%

Tomcat

5096 check-ins

82% 89% 89% 93%

[KPW05] jEdit

1189 check-ins

70% 96% 98% 96%

ArgoUML

4683 check-ins

82% 95% 98% 94%

slide-46
SLIDE 46

Comparison: Recall & Precision

programs

Other’s Recall Our Recall Other’s Prec. Our Prec.

[XS05] jfreechart

18 releases

92% 98% 99% 97%

[WD06] jEdit

2715 check-ins

72% 96% 93% 98%

Tomcat

5096 check-ins

82% 89% 89% 93%

[KPW05] jEdit

1189 check-ins

70% 96% 98% 96%

ArgoUML

4683 check-ins

82% 95% 98% 94%

6-26% higher recall with roughly the same precision

slide-47
SLIDE 47

Comparison: Conciseness

programs Other’s Results

Our Results Our Improvement

[XS05] jfreechart

18 releases

4004 refactorings 939 rules 77% decrease in size

[WD06] jEdit

2715 check-ins

1218 refactorings 906 rules 26% decrease in size

Tomcat

5096 check-ins

2700 refactorings 1033 rules 62% decrease in size

[KPW05] jEdit

1189 check-ins

1430 matches 1119 rules 22% decrease in size

ArgoUML

4683 check-ins

3819 matches 2127 rules 44% decrease in size

slide-48
SLIDE 48

Comparison: Conciseness

programs Other’s Results

Our Results Our Improvement

[XS05] jfreechart

18 releases

4004 refactorings 939 rules 77% decrease in size

[WD06] jEdit

2715 check-ins

1218 refactorings 906 rules 26% decrease in size

Tomcat

5096 check-ins

2700 refactorings 1033 rules 62% decrease in size

[KPW05] jEdit

1189 check-ins

1430 matches 1119 rules 22% decrease in size

ArgoUML

4683 check-ins

3819 matches 2127 rules 44% decrease in size

22-77% reduction in the size

  • f matching results
slide-49
SLIDE 49

Outline

✓background ✓our rule-based matching approach ✓inference algorithm ✓evaluation

  • potential applications of change rules
  • bug finding, documentation assistant, API catch

up, API evolution analysis, etc.

slide-50
SLIDE 50

Potential App: Bug Finding Tool

for all x in J*.addTitle(Title) except {JThermometer.addTitle(Title)} procedureReplace(x, addTitle, addSubtitle)

Dynamic dispatching of JFreeChart.addSubtitle does not work properly.

JFreeChart.addTitle JThermometer.addTitle JLineChart.addTitle JPieChart.addTitle ... ... JFreeChart.addSubtitle JThermometer.addSubitle JLineChart.addSubtitle JPieChart.addSubtitle ...

slide-51
SLIDE 51

Conclusions

  • Matching is a basis for a variety of software

engineering research & tools.

  • Our approach is the first to automatically

infer structural changes and concisely represent them as a set of change rules.

  • Our tool find matches with high precision

and recall.

slide-52
SLIDE 52

Acknowledgment

David Notkin

University of Washington

Dan Grossman

University of Washington

Sunghun Kim, Jim Whitehead

University of California, Santa Cruz

Peter Weißgerber Stephan Diehl

University of Trier

Zhenchang Xing Eleni Stroulia

University of Alberta