Untangling Composite Commits Untangling Composite Commits Using - - PowerPoint PPT Presentation

untangling composite commits untangling composite commits
SMART_READER_LITE
LIVE PREVIEW

Untangling Composite Commits Untangling Composite Commits Using - - PowerPoint PPT Presentation

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program Slicing Ward Muylaert and Coen De Roover @wardmuylaert and @oniroi Sofuware Languages Lab SCAM 2018 Vrije Universiteit Brussel Madrid, Spain


slide-1
SLIDE 1

Untangling Composite Commits Using Program Slicing Untangling Composite Commits Using Program Slicing

Ward Muylaert and Coen De Roover @wardmuylaert and @oniroi Sofuware Languages Lab Vrije Universiteit Brussel Brussels, Belgium SCAM 2018 Madrid, Spain

slide-2
SLIDE 2

Two Types of Commits

Single-task commit Composite commit

slide-3
SLIDE 3

Revert Integrate Research Understand

Composite Commit Difgiculties

slide-4
SLIDE 4

“15%” “17% - 29%” “11% - 39%”

  • K. Herzig et al., “The impact of tangled code

changes on defect prediction models,” Empirical Sofuware Engineering, 2015.

  • Y. Tao et al., “Partitioning Composite Code

Changes to Facilitate Code Review,” in MSR, 2015.

  • H. A. Nguyen et al., “Filtering Noise in Mixed-

Purpose Fixing Commits to Improve Defect Prediction and Localization”, in ISSRE, 2013.

Prevalence of Composite Commits

slide-5
SLIDE 5

Ideal

slide-6
SLIDE 6

Ideal

Hypothesis Related changes afgect source code from the same program slice. A commit may be decomposed using the created program slices.

slide-7
SLIDE 7

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

Overview of our Approach

slide-8
SLIDE 8

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

p u b l i c F i g A c t i

  • n

S t a t e ( ) {

  • _

b i g P

  • r

t = n e w F i g R R e c t ( 1 + 1 , 1 + 1 , 9

  • 2

, 2 5

  • 2

, C

  • l
  • r

. c y a n , C

  • l
  • r

. c y a n ) ; + b i g P

  • r

t = n e w F i g R R e c t ( 1 + 1 , 1 + 1 , 9

  • 2

, 2 5

  • 2

, C

  • l
  • r

. c y a n , C

  • l
  • r

. c y a n ) ;

  • _

b i g P

  • r

t . s e t C

  • r

n e r R a d i u s ( _ b i g P

  • r

t . g e t H a l f H e i g h t ( ) ) ; + b i g P

  • r

t . s e t C

  • r

n e r R a d i u s ( b i g P

  • r

t . g e t H a l f H e i g h t ( ) ) ;

  • _

c

  • v

e r = n e w F i g R R e c t ( 1 , 1 , 9 , 2 5 , C

  • l
  • r

. b l a c k , C

  • l
  • r

. w h i t e ) ; + c

  • v

e r = n e w F i g R R e c t ( 1 , 1 , 9 , 2 5 , C

  • l
  • r

. b l a c k , C

  • l
  • r

. w h i t e ) ;

  • _

c

  • v

e r . s e t C

  • r

n e r R a d i u s ( _ c

  • v

e r . g e t H a l f H e i g h t ( ) ) ; + c

  • v

e r . s e t C

  • r

n e r R a d i u s ( _ c

  • v

e r . g e t H a l f H e i g h t ( ) ) ;

  • _

b i g P

  • r

t . s e t L i n e W i d t h ( ) ; + b i g P

  • r

t . s e t L i n e W i d t h ( ) ;

  • a

d d F i g ( _ b i g P

  • r

t ) ; + a d d F i g ( b i g P

  • r

t ) ;

  • a

d d F i g ( _ c

  • v

e r ) ; + a d d F i g ( c

  • v

e r ) ; }

slide-9
SLIDE 9

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes AST tree difgerencing x y Edit operations

  • Insert
  • Move
  • Delete
  • Update
  • R. Stevens et al., “Extracting executable

transformations from distilled code changes,” in SANER, 2017.

  • B. Fluri et al., “Change distilling: Tree

difgerencing for fine-grained source code change extraction,” IEEE Transactions on Sofuware Engineering, 2007.

AST before commit AST afuer commit

  • 1. Delete x
  • 2. Insert y
slide-10
SLIDE 10

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U p d a t e S i m p l e N a m e

  • n

l i n e 2 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 3 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 3 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 6 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 7 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 4 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 5 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 5 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 8 : _ c

  • v

e r → c

  • v

e r

slide-11
SLIDE 11

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes TinyPDG Inter-procedural

  • Y. Higo et al., “Enhancing quality of code

clone detection with program dependency graph,” in Working Conference on Reverse Engineering, 2009.

  • S. Horwitz et al., “Interprocedural

slicing using dependence graphs,” ACM Transactions on Programming Languages and Systems, 1990.

slide-12
SLIDE 12

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

FigActionState _bigport = new FigRRect() _bigPort.setCR(_bigport.getHH()) addFig(_bigPort) _bigPort.setLW(0) _cover = new FigRRect() addFig(_cover) _cover.setCR(_cover.getHH()) Control dependence Data dependence

slide-13
SLIDE 13

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U p d a t e S i m p l e N a m e

  • n

l i n e 6 : _ b i g P

  • r

t → b i g P

  • r

t FigActionState _bigport = new FigRRect() _bigPort.setCR(_bigport.getHH()) addFig(_bigPort) _bigPort.setLW(0) _cover = new FigRRect() addFig(_cover) _cover.setCR(_cover.getHH())

slide-14
SLIDE 14

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U p d a t e S i m p l e N a m e

  • n

l i n e 6 : _ b i g P

  • r

t → b i g P

  • r

t FigActionState _bigport = new FigRRect() _bigPort.setCR(_bigport.getHH()) addFig(_bigPort) _bigPort.setLW(0) _cover = new FigRRect() addFig(_cover) _cover.setCR(_cover.getHH())

slide-15
SLIDE 15

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U p d a t e S i m p l e N a m e

  • n

l i n e 6 : _ b i g P

  • r

t → b i g P

  • r

t FigActionState _bigport = new FigRRect() _bigPort.setCR(_bigport.getHH()) addFig(_bigPort) _bigPort.setLW(0) _cover = new FigRRect() addFig(_cover) _cover.setCR(_cover.getHH())

slide-16
SLIDE 16

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U p d a t e S i m p l e N a m e

  • n

l i n e 2 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 3 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 3 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 6 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 7 : _ b i g P

  • r

t → b i g P

  • r

t U p d a t e S i m p l e N a m e

  • n

l i n e 4 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 5 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 5 : _ c

  • v

e r → c

  • v

e r U p d a t e S i m p l e N a m e

  • n

l i n e 8 : _ c

  • v

e r → c

  • v

e r

slide-17
SLIDE 17

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U/L2: _bigPort → bigPort U/L3: _bigPort → bigPort U/L3: _bigPort → bigPort U/L6: _bigPort → bigPort U/L7: _bigPort → bigPort U/L4: _cover → cover U/L5: _cover → cover U/L5: _cover → cover U/L8: _cover → cover

slide-18
SLIDE 18

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes → Change is in the slice associated with the other change

U/L2: _bigPort → bigPort U/L3: _bigPort → bigPort U/L3: _bigPort → bigPort U/L6: _bigPort → bigPort U/L7: _bigPort → bigPort U/L4: _cover → cover U/L5: _cover → cover U/L5: _cover → cover U/L8: _cover → cover

slide-19
SLIDE 19

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

U/L2: _bigPort → bigPort U/L3: _bigPort → bigPort U/L3: _bigPort → bigPort U/L6: _bigPort → bigPort U/L7: _bigPort → bigPort U/L4: _cover → cover U/L5: _cover → cover U/L5: _cover → cover U/L8: _cover → cover

→ Transitive closure Cluster 1 Cluster 2

slide-20
SLIDE 20

Commit ChangeDistiller per changed file SDG per changed file Slice SDG

  • n each change

Group changes

p u b l i c F i g A c t i

  • n

S t a t e ( ) {

  • _

b i g P

  • r

t = n e w F i g R R e c t ( 1 + 1 , 1 + 1 , 9

  • 2

, 2 5

  • 2

, C

  • l
  • r

. c y a n , C

  • l
  • r

. c y a n ) ; + b i g P

  • r

t = n e w F i g R R e c t ( 1 + 1 , 1 + 1 , 9

  • 2

, 2 5

  • 2

, C

  • l
  • r

. c y a n , C

  • l
  • r

. c y a n ) ;

  • _

b i g P

  • r

t . s e t C

  • r

n e r R a d i u s ( _ b i g P

  • r

t . g e t H a l f H e i g h t ( ) ) ; + b i g P

  • r

t . s e t C

  • r

n e r R a d i u s ( b i g P

  • r

t . g e t H a l f H e i g h t ( ) ) ;

  • _

c

  • v

e r = n e w F i g R R e c t ( 1 , 1 , 9 , 2 5 , C

  • l
  • r

. b l a c k , C

  • l
  • r

. w h i t e ) ; + c

  • v

e r = n e w F i g R R e c t ( 1 , 1 , 9 , 2 5 , C

  • l
  • r

. b l a c k , C

  • l
  • r

. w h i t e ) ;

  • _

c

  • v

e r . s e t C

  • r

n e r R a d i u s ( _ c

  • v

e r . g e t H a l f H e i g h t ( ) ) ; + c

  • v

e r . s e t C

  • r

n e r R a d i u s ( _ c

  • v

e r . g e t H a l f H e i g h t ( ) ) ;

  • _

b i g P

  • r

t . s e t L i n e W i d t h ( ) ; + b i g P

  • r

t . s e t L i n e W i d t h ( ) ;

  • a

d d F i g ( _ b i g P

  • r

t ) ; + a d d F i g ( b i g P

  • r

t ) ;

  • a

d d F i g ( _ c

  • v

e r ) ; + a d d F i g ( c

  • v

e r ) ; }

slide-21
SLIDE 21

Dataset

  • K. Herzig et al., “The impact of tangled code

changes on defect prediction models,” Empirical Sofuware Engineering, 2015.

  • K. Herzig et al., “The impact of tangled code

changes,” in MSR, 2015.

5 Java

  • pen source projects

994 commits Automatic filtering 504 commits Manual verification 388 commits

slide-22
SLIDE 22

RQ1: Composite Commits Identified Correctly?

Dataset → Our tool ↓ Single-task Composite Single-task True negative False negative Composite False positive True positive

slide-23
SLIDE 23

RQ1: Composite Commits Identified Correctly?

OK, but not great. Requires complementary tools still

slide-24
SLIDE 24

RQ2: Individual Tasks in a Commit Correctly Identified?

Number of reported tasks Manual analysis of tool output Dataset → Our tool ↓ Single-task Composite Single-task True positive False negative Composite False positive True negative

Tool vs. Dataset

slide-25
SLIDE 25

RQ2: Individual Tasks in a Commit Correctly Identified?

Number of reported tasks Manual analysis of tool output Dataset → Our tool ↓ Single-task Composite Single-task True positive False negative Composite False positive True negative

Tool vs. Dataset

~70% F-Score ~25% match 58% OK 32% too small 10% too large

Identified parts in a commit are too fine-grained, but stay within their tasks.

slide-26
SLIDE 26

Conclusion

Paper and more via https://sofu.vub.ac.be/~wmuylaer/publications/

  • Identifying the type of commit works
  • Identified parts are more fine-grained than the actual tasks
  • Our technique would ideally be complemented by other

techniques

slide-27
SLIDE 27

Discussion Generation Machine

  • 1. Should we look for ways to prevent this type of problem at the VCS

level? Are there good enough ways to go about this?

  • 2. Tools aimed at one IDE are terrible for reuse