Semantic Slicing of Software Version Histories
Yi Li / U Toronto Julia Rubin / MIT Marsha Chechik / U Toronto ASE 2015 / Lincoln, NE
Semantic Slicing of Software Version Histories Yi Li / U Toronto - - PowerPoint PPT Presentation
Semantic Slicing of Software Version Histories Yi Li / U Toronto Julia Rubin / MIT Marsha Chechik / U Toronto ASE 2015 / Lincoln, NE Motivation Feb, 2015 release [1.3.8] v1.3.8 make groovy.sandbox.blacklist append-only avoid
Yi Li / U Toronto Julia Rubin / MIT Marsha Chechik / U Toronto ASE 2015 / Lincoln, NE
2
v1.3.8
release [1.3.8]
v1.3.6
release [1.3.6] make ’groovy.sandbox.blacklist’ append-only avoid NullPointerException if optional Groovy jar is removed updated docs to use v1.3.6 as current prepare for next development iteration (1.3.7-SNAPSHOT) make groovy sandbox method blacklist dynamically additive … …
Feb, 2015 Nov, 2014 30 authors 67 commits 87 files changed
2
v1.3.8
release [1.3.8]
v1.3.6
release [1.3.6] make ’groovy.sandbox.blacklist’ append-only avoid NullPointerException if optional Groovy jar is removed updated docs to use v1.3.6 as current prepare for next development iteration (1.3.7-SNAPSHOT) make groovy sandbox method blacklist dynamically additive … …
Feb, 2015 Nov, 2014 30 authors 67 commits 87 files changed
2
v1.3.8
release [1.3.8]
v1.3.6
release [1.3.6] make ’groovy.sandbox.blacklist’ append-only avoid NullPointerException if optional Groovy jar is removed updated docs to use v1.3.6 as current prepare for next development iteration (1.3.7-SNAPSHOT) make groovy sandbox method blacklist dynamically additive … …
Feb, 2015 Nov, 2014 30 authors 67 commits 87 files changed
2
v1.3.8
release [1.3.8]
v1.3.6
release [1.3.6] make ’groovy.sandbox.blacklist’ append-only avoid NullPointerException if optional Groovy jar is removed updated docs to use v1.3.6 as current prepare for next development iteration (1.3.7-SNAPSHOT) make groovy sandbox method blacklist dynamically additive … …
Feb, 2015 Nov, 2014 30 authors 67 commits 87 files changed
3
base target
Options?
3
base target
Options?
3
base target
Options?
3
base target
Options?
Existing version control tools:
inaccurate!
3
base target
// comment int boo1() {
+ {return (new Bar()).y;} } class Bar { + int y = 0; static int bar1(int x) {return x - 1;}
Exploit existing artifacts:
and semantics
4
base target
Exploit existing artifacts:
and semantics
4
base target
Exploit existing artifacts:
and semantics
4
base target base target
History: sequence of commits + Criterion: set of tests Sub-history: well-formed: compiles & semantic preserving: passing tests
5
6
class A { int g() {return 0;} } class B { static int f(int x) {return x + 1;} }
v1.0
class A { // comment int g() {return 0;} } class B { static int f(int x) {return x + 1;} }
6
class A { + // comment int g() {return 0;} C1 v1.0
class A { // comment int g() {return 0;} } class B { static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 v1.0
class A { // comment int g() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 v1.0
class A { int x; // comment int g() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.0
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.1 class A { int x; + int g() + {return B.f(x);} // comment int h() C5
class TestA { public void t1() A a = new A(); {assertEquals(-1, a.g();} }
v1.0
Test case:
a.g()==-1
class A { int x; int g() {return B.f(x);} // comment int h() {return 0;} } class B{ static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.1 class A { int x; + int g() + {return B.f(x);} // comment int h() C5
class TestA { public void t1() A a = new A(); {assertEquals(-1, a.g();} }
v1.0
Test case:
a.g()==-1
class A { int x; int g() {return B.f(x);} // comment int h() {return 0;} } class B{ static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.1 class A { int x; + int g() + {return B.f(x);} // comment int h() C5
class TestA { public void t1() A a = new A(); {assertEquals(-1, a.g();} }
v1.0
Test case:
a.g()==-1
class A { int x; int g() {return B.f(x);} // comment int h() {return 0;} } class B{ static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.1 class A { int x; + int g() + {return B.f(x);} // comment int h() C5
class TestA { public void t1() A a = new A(); {assertEquals(-1, a.g();} }
v1.0
Test case:
a.g()==-1
class A { int x; int g() {return B.f(x);} // comment int h() {return 0;} } class B{ static int f(int x) {return x - 1;} }
6
class A { + // comment int g() {return 0;} C1 class A { static int f(int x) {
+ {return x - 1;} } C2 // comment int g() {
+ {return (new B()).y;} } class B { + int y = 0; static int f(int x) {return x - 1;} C3 class A { + int x; // comment int g() C4 v1.1 class A { int x; + int g() + {return B.f(x);} // comment int h() C5
class TestA { public void t1() A a = new A(); {assertEquals(-1, a.g();} }
v1.0
Test case:
a.g()==-1
7
Dependency Types Examples Definitions
Functional
required for maintaining the semantic behaviours (e.g., pass the same tests)
Compilation
required for maintaining the wellformedness of the program (e.g., free from compilation errors)
Hunk
specific to text-based version control systems (e.g., Git)
class A { + // comment int g() {return 0;} C1 class A { + int x; // comment int g() C4 class A { static int f(int x) {
+ {return x - 1;} } C2
7
Dependency Types Examples Definitions
Functional
required for maintaining the semantic behaviours (e.g., pass the same tests)
Compilation
required for maintaining the wellformedness of the program (e.g., free from compilation errors)
Hunk
specific to text-based version control systems (e.g., Git)
Dependency Hierarchy
class A { + // comment int g() {return 0;} C1 class A { + int x; // comment int g() C4 class A { static int f(int x) {
+ {return x - 1;} } C2
Textual Contexts Structural Glue Code Functional Core
Correctness Well-formedness Applicability
8
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
9
Input:
Slicing core:
Output:
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
9
Input:
Slicing core:
Output:
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
9
Input:
Slicing core:
Output:
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
9
Input:
Slicing core:
Output:
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
9
Input:
Slicing core:
Output:
Simplified language model:
algorithmic extensions
10
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
11
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
11
Compare two abstract syntax trees:
12
Pi-1 Pi
Edit Operations: + Ins((x,n,v),y)
* Upd(x,v)
foo B A h() f(int) foo B A y:int h() f(int)
Compare two abstract syntax trees:
12
Pi-1 Pi
∆i
Ins(y:int,B) Upd(A.f(int))
Edit Operations: + Ins((x,n,v),y)
* Upd(x,v)
foo B A h() f(int) foo B A y:int h() f(int)
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
13
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
13
foo B A x:int y:int h() g() f(int)
Functional Set:
during test execution
14
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
Pk
foo B A x:int y:int h() g() f(int)
Functional Set:
during test execution
14
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
Test case:
a.g()==-1
Pk
foo B A x:int y:int h() g() f(int)
Functional Set:
during test execution
14
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
foo B A x:int y:int h() g() f(int)
Test case:
a.g()==-1
Pk
15
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
15
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
foo B A x:int y:int h() g() f(int)
Compilation Set:
functional set
Inference Rules:
16
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
Pk
foo B A x:int y:int h() g() f(int)
Compilation Set:
functional set
Inference Rules:
16
class A { int x; int g() {return B.f(x);} // comment int h() {return (new B()).y;} } class B { int y = 0; static int f(int x) {return x - 1;} }
foo B A x:int y:int h() g() f(int)
Pk
17
Inference Rules:
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
18
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
18
+
A.g()
+
B.y A.h()
+
C1 C4
// comment
* *
C3 C2 C5 B.f(int)
+
A.x
Change Matrix: maps atomic changes to commits
19
+ Ins
* Upd
Functional Compilation
20
C1
+
δ5
+ +
δ4
*
C3
*
C5 C4
δ1
*
δ2
+ Ins - Del * Upd
Functional Compilation
General Slicing Rules:
affecting method lookup
C1
+
δ5
+ +
δ4
*
C3
*
C5 C4
δ1
*
δ2
+ Ins - Del * Upd
Functional Compilation
Side-effects (Git):
white cells
21
C1
+
δ5
+ +
δ4
*
C3
*
C5 C4
δ1
*
δ2
+ Ins - Del * Upd
Functional Compilation
Side-effects (Git):
white cells
21
C1
+
δ5
+ +
δ4
*
C3
*
C5 C4
δ1
*
δ2
+ Ins - Del * Upd
Functional Compilation
Side-effects (Git):
white cells
21
C1
+
δ5
+ +
δ4
*
C3
*
C5 C4
δ1
*
δ2
Side-effects (Git):
white cells
21
δ3 δ2 δ4 δ1 δ5
C1
+
+ + *
*
C5 C4
*
CN
*
22
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
Hunk Dependency
H’
22
… H
p0 pk
T
t1 … tm Compute
Compute
AST Diff
pi pi-1 Slicing
Λ Π ∆i ∆1’, …, ∆k’ H’ ∆i’
HunkDeps(H’)
Specific to text-based version control Applicable history slice
23
Research questions
Subjects
24
Project # Java Files LOC # Authors Hadoop 5,861 1,291K 169 Elasticsearc h 3,865 616K 649 Maven 1,048 142K 78 CSlicer 141 18K 2
25
trunk feature
Feature branch
periodically
25
trunk feature
trunk feature
Feature branch
periodically
Case Study:
related to the feature
25
Average Reduction:
~80%!
Reduction depends on:
26
length(slice) / length(history)
0.0% 22.5% 45.0% 67.5% 90.0% H a d
1 H a d
2 H a d
3 E l a s t i c 1 E l a s t i c 2 E l a s t i c 3 M a v e n 1 M a v e n 2 M a v e n 3 C S l i c e r 1 C S l i c e r 2 C S l i c e r 3
SLICE(H') HUNK
Average Reduction:
~80%!
Reduction depends on:
26
length(slice) / length(history)
0.0% 22.5% 45.0% 67.5% 90.0% H a d
1 H a d
2 H a d
3 E l a s t i c 1 E l a s t i c 2 E l a s t i c 3 M a v e n 1 M a v e n 2 M a v e n 3 C S l i c e r 1 C S l i c e r 2 C S l i c e r 3
SLICE(H') HUNK
Large test suites
Average Reduction:
~80%!
Reduction depends on:
26
length(slice) / length(history)
0.0% 22.5% 45.0% 67.5% 90.0% H a d
1 H a d
2 H a d
3 E l a s t i c 1 E l a s t i c 2 E l a s t i c 3 M a v e n 1 M a v e n 2 M a v e n 3 C S l i c e r 1 C S l i c e r 2 C S l i c e r 3
SLICE(H') HUNK
Large test suites Good committing style
functional & compilation set computation
effects on performance for large projects
27
CSlicer time breakdown HUNK 22% SLICE 8% COMP 52% FUNC 17%
functional & compilation set computation
effects on performance for large projects
27
CSlicer time breakdown HUNK 22% SLICE 8% COMP 52% FUNC 17%
28
Change Representation
SIGMOD’96]
Change Impact Analysis
29
CSlicer: history semantic slicing
comprehension; functionality transferring …
What’s next?
30
bitbucket.org/liyistc/gitslice
31
Semantic Slicing
Exploit existing artifacts:
and semantics
History: sequence of commits + Criterion: set of tests Sub-history: well-formed: compiles & semantic preserving: passing tests
Dependency Hierarchy
7Dependency Types Examples Definitions
Functional required for maintaining the semantic behaviours (e.g., pass the same tests) Compilation required for maintaining the wellformedness of the program (e.g., free from compilation errors) Hunk specific to text-based version control systems (e.g., Git) Dependency Hierarchy class A { + // comment int g() {return 0;} C1 class A { + int x; // comment int g() C4 class A { static int f(int x) {Correctness Well-formedness Applicability
… H p0 pk T t1 … tm ComputeCSlicer Overview
9Input:
Slicing core:
, ∆i) = ∆i’
Output:
Experiments
Reduction depends on:
length(slice) / length(history)
0.0% 22.5% 45.0% 67.5% 90.0% H a d