Motivation Underlying question : How does software change ? In : - - PowerPoint PPT Presentation

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation Underlying question : How does software change ? In : - - PowerPoint PPT Presentation

Motivation Underlying question : How does software change ? In : Two versions of a program Out : Picture of changes Relevance Software development Software engineering 1 Understanding Source Code Evolution Using Abstract


slide-1
SLIDE 1

1 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Motivation

Underlying question : How does

software change ?

In : Two versions of a program Out : Picture of changes

Relevance

Software development Software engineering

slide-2
SLIDE 2

2 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Objective and Approach

Summarize C program changes

Functions (body AST, prototype) Global variables (type and initializer) Types

Structs/Unions (fields deleted / added / type changed) Typedefs Enums

Our Approach: AST matching

Accurate; handles renamings Scales to real-world applications; e.g., Apache,

Linux kernel, OpenSSH

slide-3
SLIDE 3

3 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Raw Output

struct "net_device": 1 fields changed type: “accept_fastpath” struct "reiserfs_journal": 1 fields deleted: “j_dummy_inode” struct "reiserfs_journal": 1 fields added: “j_dirty_buffers” function "block_read_full_page": 1 arguments changed type: “get_block” function "ext2_readdir": 1 arguments changed type: “filldir___0” + function “inetdev_changename” + function “__ide_dma_good_drive” + function “ide_unplugged_outbsync” + function “inode_init_once”

  • function “target_cpus”
  • function “ide_dmafunc_verbose”

+ typedef “cisco_proto”

  • typedef “ide_ioctl_proc”

+ global var “idecd”

Linux 2.4.20 vs 2.4.21

slide-4
SLIDE 4

4 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

The Renaming Problem

typedef int sz_t; struct foo { int i; }; int count; void f(int a) { struct foo sf; sz_t c = 2; sf.i = a + c; count++; } typedef int size_t; struct bar { int i; }; int counter; void f(int b) { struct bar sb; size_t d = 2; sf.i = b + d; counter++; }

Version 1 Version 2

Same program, syntactic changes only

slide-5
SLIDE 5

5 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Abstract Syntax Tree Matching

Program Program Version 2 Version 2 AST 2 AST 2 Renaming Detection AST Traversal Change Detection Parsing Chang Changes & Statisti Statistics cs Prog Program ram Version 1 Version 1 AST 1 AST 1 Parsing

Compare ASTs for functions with same name

AST Traversal Map Generation

AST Matching

slide-6
SLIDE 6

6 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

AST Traversal - Name Map Generation

void f(int a) { struct foo sf; sz_t c = 2; sf.i = a + c; count++; } void f(int b) { struct bar sb; size_t d = 2; sf.i = b + d; counter++; }

f

b sb d d= 2 sb.i= b+ d counter+ +

f

a c c= 2 sf.i= a+ c count+ + sf

Version 1 Version 2

a c sf b d sb

Name Map

count counter

slide-7
SLIDE 7

7 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

struct foo struct bar sz_t size_t

Type Map

int int

AST Traversal - Type Map Generation

void f(int a) { struct foo sf; sz_t c = 2; sf.i = a + c; count++; } void f(int b) { struct bar sb; size_t d = 2; sf.i = b + d; counter++; }

Version 1 Version 2 f

sf : struct foo c : sz_t a : int b : int

f

sb : struct bar d : size_t

slide-8
SLIDE 8

8 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Abstract Syntax Tree Matching

Program Program Version 2 Version 2 AST 2 AST 2 Renaming Detection AST Traversal Change Detection Parsing Chang Changes & Statisti Statistics cs Prog Program ram Version 1 Version 1 AST 1 AST 1 Parsing AST Traversal Map Generation

AST Matching

A renamed to B iff

  • A B in the map
  • A deleted
  • B added

Name/Type Maps -> Name/Type Bijections Traverse the ASTs in parallel, computing changes

slide-9
SLIDE 9

9 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

AST Traversal - Change Detection

typedef int sz_t; struct foo { int i; sz_t f; }

struct foo

i : int f : sz_t

struct foo

i : long long f : size_t e : double

typedef int size_t; struct foo { long long i; size_t f; double e; }

struct foo: field i changed type: int -> long long field e added

Version 1 Version 2 sz_t size_t

slide-10
SLIDE 10

10 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Implementation

Parsing via CIL toolkit

Merges whole program into single,

preprocessed file

Fast

Scales linearly, 400.000 LOC in 1 minute

Generates different output formats

Raw differences, summaries, density trees

slide-11
SLIDE 11

11 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Summary Statistics

  • ------ Functions -------

Version1 : 7697 Version2 : 7881 added : 232 deleted : 48 locals/ formals changed name : 3 arguments type changes : 19 return types changes : 15

  • ------ Structs/ Unions -------

Version1 : 1214 Version2 : 1233 added : 17 deleted : 1 field type changes : 15 field count changes : 19

  • ------ Typedefs -------

Version1 : 487 Version2 : 469 added : 13 deleted : 31 base type changes : 2

  • ------ Global Variables ---

Version1 : 8027 Version2 : 8074 added : 43 deleted : 16 var type changes : 11 var val changes : 51

  • ------ Enums -------

Version1 :33 Version2 : 31 deleted : 2 item count changes : 1 var exp changes : 20

Linux 2.4.20 vs 2.4.21

slide-12
SLIDE 12

12 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Density Trees

/ : 1 11 i nc lude / : 101 l i nux / : 96 f s .h : 4 i de . h : 80 r e iser fs_ fs_sb .h : 1 r e iser fs_ fs_ i . h : 2 sched .h : 1 w i rel ess .h : 1 hdreg .h : 7 ne t / : 2 t cp . h : 1 sock .h : 1 asm- i386 / : 3 i

  • _ap

ic .h: 3 d r i ve rs / : 9 char / : 1 agp / : 1 a gp .h : 1 i de / : 8 i de

  • pc

i .c: 8 n e t / : 1 i pv4 / : 1 i p_ f r agment .c : 1

Struct/Union field additions

Linux 2.4.20 vs 2.4.21

slide-13
SLIDE 13

13 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Case Studies: OpenSSH, Vsftpd, Apache

  • OpenSSH changes most frequently
  • Deletions infrequent, relative to additions

Functions & global variables: how often added and deleted?

slide-14
SLIDE 14

14 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Case Studies: OpenSSH, Vsftpd, Apache

  • Function bodies do change a lot
  • Function prototypes do not change much

How often do function bodies and prototypes change?

slide-15
SLIDE 15

15 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Related Approaches

Standard diff

Low-level Verbose: Linux 2.4.20-> 2.4.21 patch :

21MB

Release notes

High level Possibly incomplete

slide-16
SLIDE 16

16 Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Summary

Approach for reporting changes to C

programs

AST-matching Variety of changes at several levels of

detail

Accurate Scalable

Soon to be available at

http://www.cs.umd.edu/~ neamtiu/evolution