Identifying change patterns in software history Jason Dagit | - - PowerPoint PPT Presentation

identifying change patterns in software history
SMART_READER_LITE
LIVE PREVIEW

Identifying change patterns in software history Jason Dagit | - - PowerPoint PPT Presentation

Identifying change patterns in software history Jason Dagit | Galois, Inc Motivation Tools to detect changes exist. For example, traditional line-based diff: Pro: diff is very general and programming language agnostic Con: diff is not


slide-1
SLIDE 1

Identifying change patterns in software history

Jason Dagit | Galois, Inc

slide-2
SLIDE 2

Motivation

Tools to detect changes exist. For example, traditional line-based diff:

  • Pro: diff is very general and programming language agnostic
  • Con: diff is not structurally aware:

if( foo ){ if( foo ) bar; { } bar; }

We need tools for interpreting changes.

c 2013 Galois, Inc. All Right Reserved.

slide-3
SLIDE 3

Motivation

Common looping pattern with loop counter initialized to zero:

for ( = 0; < ; ) {

  • }

We also want to see how source code changes.

c 2013 Galois, Inc. All Right Reserved.

slide-4
SLIDE 4

Example from Clojure: Related edits

Our tool found these related edits: PersistentArrayMap.java public Object kvreduce (IFn f, Object init ){ for(int i=0;i < array . length ;i +=2){ init = f. invoke (init , array [i], array [i +1]);

  • if(RT. isReduced ( init ))
  • return

(( IDeref ) init ). deref (); } return init ; } PersistentHashMap.java public Object kvreduce (IFn f, Object init ){

  • for( INode

node : array ){

  • if( node

!= null ){ + for( INode node : array ) + { + if( node != null ) init = node . kvreduce (f, init );

  • if(RT. isReduced ( init ))
  • return

(( IDeref ) init ). deref ();

  • }
  • }

+ } return init ; }

c 2013 Galois, Inc. All Right Reserved.

slide-5
SLIDE 5

Approach

Key Idea: We can find structural patterns by generalizing sufficiently similar difference trees.

  • Difference trees computed using structural diff of AST
  • Similarity is measured using a tree edit distance score
  • Generalization is accomplished through antiunification

c 2013 Galois, Inc. All Right Reserved.

slide-6
SLIDE 6

Workflow

a.c b.c source code version history a.c v1 a.c v2 compare sequential versions of each file forest of diff subtrees group by similarity P2 P1

treediff tree similarity antiunify antiunify

antiunify to obtain patterns

c 2013 Galois, Inc. All Right Reserved.

slide-7
SLIDE 7

ATerms

i++;

AAppl "ExpStmt" [AAppl "PostIncrement" [AAppl "ExpName" [AAppl "Name" [AList [AAppl "Ident" [AAppl "\"i\"" []]]]]]] Generic tree structure—programming language agnostic. Easy to modify parsers to generate ATerms.

c 2013 Galois, Inc. All Right Reserved.

slide-8
SLIDE 8

Structural diff

treediff        A B C , A B D F        = A B lefthole(D) mismatch(C,F) Keep just the differences with a bit of context: ta = A mismatch(C,F) tb = B lefthole(D) Output also gives us an edit distance.

c 2013 Galois, Inc. All Right Reserved.

slide-9
SLIDE 9

Workflow

a.c b.c source code version history a.c v1 a.c v2 compare sequential versions of each file forest of diff subtrees group by similarity P2 P1

treediff tree similarity antiunify antiunify

antiunify to obtain patterns

c 2013 Galois, Inc. All Right Reserved.

slide-10
SLIDE 10

Similarity grouping

We define the similarity score by: ∆(ta, tb) := min(d(ta, tb), d(tb, ta)) max(|ta|, |tb|) where d is the tree edit distance score. Similarity matrix D given by Dij = ∆(ti, tj). Given threshold τ ∈ [0, 1] we say ti and tj are similar if Dij ≥ τ. Group trees such that all elements in the group are within τ.

c 2013 Galois, Inc. All Right Reserved.

slide-11
SLIDE 11

ANTLR similarity groups with τ = 0.01

10 similarity groups from ANTLR source, when τ = 0.01: 7 are patterns:

; if( ) ; if( ) { } ; return ; for( : ) ; for( = ; < ; ) ; throw RuntimeException ( + );

c 2013 Galois, Inc. All Right Reserved.

slide-12
SLIDE 12

ANTLR similarity groups with τ = 0.01

3 are constants (no s):

try { walker . grammarSpec (); } catch ( RecognitionException re ){ ErrorManager . internalError ("bad grammar AST structure ",re ); } while (sp != StackLimitedNFAToDFAConverter . NFA_EMPTY_STACK_CONTEXT ) { n++; sp = sp. parent ; } switch ( gtype ) { case ANTLRParser . LEXER_GRAMMAR : return legalLexerOptions . contains (key ); case ANTLRParser . PARSER_GRAMMAR : return legalParserOptions . contains (key ); case ANTLRParser . TREE_GRAMMAR : return legalTreeParserOptions . contains (key ); default : return legalParserOptions . contains (key ); }

c 2013 Galois, Inc. All Right Reserved.

slide-13
SLIDE 13

Workflow

a.c b.c source code version history a.c v1 a.c v2 compare sequential versions of each file forest of diff subtrees group by similarity P2 P1

treediff tree similarity antiunify antiunify

antiunify to obtain patterns

c 2013 Galois, Inc. All Right Reserved.

slide-14
SLIDE 14

Antiunification

au        A B C , A B D F        =   A 1 2 , substl, substr   where, substl = {1 → B , 2 → C} substr = {1 → B D , 2 → F}

c 2013 Galois, Inc. All Right Reserved.

slide-15
SLIDE 15

Similarity groups versus threshold

What happens to similarity groups when we vary the threshold?

0 ¡ 5 ¡ 10 ¡ 15 ¡ 20 ¡ 25 ¡ 30 ¡ 0 ¡ 0.2 ¡ 0.4 ¡ 0.6 ¡ 0.8 ¡ 1 ¡ Number ¡of ¡groups ¡ Threshold ¡

addi.ons ¡ dele.ons ¡ modifica.ons ¡

Number of additions, deletions, and modifications by threshold for the Clojure source.

c 2013 Galois, Inc. All Right Reserved.

slide-16
SLIDE 16

Patterns as a function of threshold

Generic Loop pattern, τ = 0.15:

for ( = ; < ; ) {

  • }

Loop counter is initialized to zero, τ = 0.25:

for ( = 0; < ; ) {

  • }

Loop termination criteria becomes more specific, τ = 0.35:

for ( = 0; < .; ) {

  • }

c 2013 Galois, Inc. All Right Reserved.

slide-17
SLIDE 17

Future work

  • We only consider structural patterns
  • Example: We don’t detect design patterns
  • Not semantically aware
  • Example: changing the name of a loop variable leads to
  • Generate rewrite rules based on before and after patterns
  • Use patterns for searching as a structural grep-like mechanism
  • Correlate patterns with bug fixes

c 2013 Galois, Inc. All Right Reserved.

slide-18
SLIDE 18

Thank you! Questions?

This work was supported in part by the US Department of Energy Office of Science, Advanced Scientic Computing Research contract no. DE-SC0004968. Additional support was provided by Galois, Inc. c 2013 Galois, Inc. All Right Reserved.