Release Pattern Discovery via Partitioning: Methodology and Case - - PowerPoint PPT Presentation

release pattern discovery via partitioning methodology
SMART_READER_LITE
LIVE PREVIEW

Release Pattern Discovery via Partitioning: Methodology and Case - - PowerPoint PPT Presentation

Release Pattern Discovery via Partitioning: Methodology and Case Study Release Pattern Discovery via Partitioning: Methodology and Case Study Abram Hindle, Michael W. Godfrey, Richard C. Holt Software Architecture Group David R. Cheriton


slide-1
SLIDE 1

Release Pattern Discovery via Partitioning: Methodology and Case Study

Release Pattern Discovery via Partitioning: Methodology and Case Study

Abram Hindle, Michael W. Godfrey, Richard C. Holt

Software Architecture Group David R. Cheriton School of Computer Science University of Waterloo Canada

{ahindle,migod,holt}@cs.uwaterloo.ca

Abram Hindle 1

slide-2
SLIDE 2

Release Pattern Discovery via Partitioning: Methodology and Case Study

Introduction

  • Methodology for analyzing revisions around releases
  • Discover project behaviour
  • Automated Process Extraction from change histories

(version control)

  • Release Time is the end and start of an iteration.

Abram Hindle 2

slide-3
SLIDE 3

Release Pattern Discovery via Partitioning: Methodology and Case Study

Introduction

  • Value of Process Discovery

– Verify what programmers are doing – Extract successful processes – Avoid unsuccessful processes – Do not have to rely on witnesses to the development

Abram Hindle 3

slide-4
SLIDE 4

Release Pattern Discovery via Partitioning: Methodology and Case Study

Introduction

  • For each class of revision, does the frequency of

those revisions increase (or decrease) preceding (or following) the time of the release?

Abram Hindle 4

slide-5
SLIDE 5

Release Pattern Discovery via Partitioning: Methodology and Case Study

Terminology

  • Revision
  • Major and Minor Releases
  • Revision Classes

– Source,Test,Build, and Documentation Revisions

  • Release Pattern

Abram Hindle 5

slide-6
SLIDE 6

Release Pattern Discovery via Partitioning: Methodology and Case Study

Methodology

  • Extract
  • Partition
  • Aggregate
  • Analyze

– STBD Notation

Abram Hindle 6

slide-7
SLIDE 7

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 1: Revisions and releases over time. Extract the revisions

Abram Hindle 7

slide-8
SLIDE 8

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 2: Partitioned revisions and releases over time

Abram Hindle 8

slide-9
SLIDE 9

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 3: Partitioned revisions and releases over time, sep- arated

Abram Hindle 9

slide-10
SLIDE 10

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 4: Partitioned revisions aggregated per day

Abram Hindle 10

slide-11
SLIDE 11

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 5: Partitioned revisions aggregated per day and smoothed

Abram Hindle 11

slide-12
SLIDE 12

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 6: Select the revisions around release times

Abram Hindle 12

slide-13
SLIDE 13

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 7: Aligned revisions aggregated

Abram Hindle 13

slide-14
SLIDE 14

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 8: Align and aggregate revisions of each class

Abram Hindle 14

slide-15
SLIDE 15

Release Pattern Discovery via Partitioning: Methodology and Case Study

Figure 9: Analysis: averages and linear regressions

Abram Hindle 15

slide-16
SLIDE 16

Release Pattern Discovery via Partitioning: Methodology and Case Study

STBD Notation

  • Shows relative revision frequency around a release
  • Shows slope of the linear regression around a release
  • Prefixes: Source S, Test T, Build B and Docs D

– + more before a release or positive slope – - more after a release or negative slope – = equal before and after a release or flat slope – ? undecided

  • Examples: S+T+B+D+, S-T-B-D-, S+T+B-D=

Abram Hindle 16

slide-17
SLIDE 17

Release Pattern Discovery via Partitioning: Methodology and Case Study

Case Study of MySQL

  • Popular Open Source RDBMS
  • Evaluated parallel branches: 3.23, 4.0, 4.1, 5.0, 5.1
  • BitKeeper repository, used bt2csv to extract change

log and revision information

  • Aggregated per day
  • 33 Major Releases across all branches and 563 Minor

releases across all branches.

  • Analyzed with bt2csv, HiraldoGrok, GNUPlot, R

Abram Hindle 17

slide-18
SLIDE 18

Release Pattern Discovery via Partitioning: Methodology and Case Study

Case Study of MySQL

  • Extraction

– Extract both revisions and release events – Extraction Tools for Revisions

∗ softChange - For CVS and the Schema of

extracted data

∗ bt2csv - Extractor BitKeeper, extracts into a

softChange schema

Abram Hindle 18

slide-19
SLIDE 19

Release Pattern Discovery via Partitioning: Methodology and Case Study

Case Study of MySQL

  • Extraction

– Extract Releases

∗ Manual ∗ VCS Tags, Changelogs, Manuals, date-stamps

in FTP repositories.

∗ The MySQL manual contained release info

Abram Hindle 19

slide-20
SLIDE 20

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Source Test Build Doc MySQL 3.23 4 220 1 410 421 21 MySQL 4.0 11 593 4 936 1 033 34 MySQL 4.1 31 451 16 430 2 990 88 MySQL 5.0 45 946 26 373 3 908 105 MySQL 5.1 52 897 31 389 4 772 122 Total 259 822 104 528 24 095 4 137 Table 1: Total Number of Revisions per class

Abram Hindle 20

slide-21
SLIDE 21

Release Pattern Discovery via Partitioning: Methodology and Case Study

1e-04 0.001 0.01 0.1 1 20 40 60 80 100 Proportion Linearly increasing bins (100) MySQL 5.1 Histogram (log) SRC TEST BUILD DOC

Figure 10: Distribution of revision classes for MySQL 5.1

Abram Hindle 21

slide-22
SLIDE 22

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Major Minor All MySQL 3.23

S-T+B-D+ S+T+B+D+ S+T+B+D+

MySQL 4.0

S+T+B-D+ S+T?B?D+ S+T?B?D+

MySQL 4.1

S+T+B-D= S+T+B?D+ S+T+B?D+

MySQL 5.0

S+T+B-D+ S+T+B?D+ S+T+B?D+

MySQL 5.1

S+T+B-D+ S+T-B+D+ S+T-B?D+

Table 2: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days)

Abram Hindle 22

slide-23
SLIDE 23

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Major Minor All MySQL 3.23

S-T+B-D+ S+T+B+D+ S+T+B+D+

MySQL 4.0

S+T+B-D+ S+T?B?D+ S+T?B?D+

MySQL 4.1

S+T+B-D= S+T+B?D+ S+T+B?D+

MySQL 5.0

S+T+B-D+ S+T+B?D+ S+T+B?D+

MySQL 5.1

S+T+B-D+ S+T-B+D+ S+T-B?D+

Table 3: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days)

Abram Hindle 23

slide-24
SLIDE 24

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Major Minor All MySQL 3.23

S-T+B-D+ S+T+B+D+ S+T+B+D+

MySQL 4.0

S+T+B-D+ S+T?B?D+ S+T?B?D+

MySQL 4.1

S+T+B-D= S+T+B?D+ S+T+B?D+

MySQL 5.0

S+T+B-D+ S+T+B?D+ S+T+B?D+

MySQL 5.1

S+T+B-D+ S+T-B+D+ S+T-B?D+

Table 4: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days)

Abram Hindle 24

slide-25
SLIDE 25

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Major Minor All MySQL 3.23

S-T+B-D+ S+T+B+D+ S+T+B+D+

MySQL 4.0

S+T+B-D+ S+T?B?D+ S+T?B?D+

MySQL 4.1

S+T+B-D= S+T+B?D+ S+T+B?D+

MySQL 5.0

S+T+B-D+ S+T+B?D+ S+T+B?D+

MySQL 5.1

S+T+B-D+ S+T-B+D+ S+T-B?D+

Table 5: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days)

Abram Hindle 25

slide-26
SLIDE 26

Release Pattern Discovery via Partitioning: Methodology and Case Study

500 1000 1500 2000 2500 3000

  • 40
  • 30
  • 20
  • 10

10 20 30 40 Sum of revisions Day MySQL 5.1 - test - Before and After - Major releases: 31 days, Flat windows of size 14 Sum of Releases per day Before Sum of Releases per day After Linear Regression of Before Linear Regression of After

Figure 11: Windowed plot of Test revisions

Abram Hindle 26

slide-27
SLIDE 27

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Before After Both MySQL 3.23

S-T-B+D+ S+T-B+D= S+T-B+D+

MySQL 4.0

S+T-B-D- S+T-B+D= S+T-B+D-

MySQL 4.1

S+T-B-D+ S-T-B+D+ S-T-B+D+

MySQL 5.0

S+T-B-D- S-T-B+D- S-T-B+D+

MySQL 5.1

S+T-B-D- S+T-B-D+ S+T-B+D+

Table 6: Linear Regressions of daily revisions class totals:

+ indicates a positive slope, - indicates a negative slope, = indicates a slope near 0 (Major releases, 42 day interval)

Abram Hindle 27

slide-28
SLIDE 28

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Before After Both MySQL 3.23

S-T-B+D+ S+T-B+D= S+T-B+D+

MySQL 4.0

S+T-B-D- S+T-B+D= S+T-B+D-

MySQL 4.1

S+T-B-D+ S-T-B+D+ S-T-B+D+

MySQL 5.0

S+T-B-D- S-T-B+D- S-T-B+D+

MySQL 5.1

S+T-B-D- S+T-B-D+ S+T-B+D+

Table 7: Linear Regressions of daily revisions class totals:

+ indicates a positive slope, - indicates a negative slope, = indicates a slope near 0 (Major releases, 42 day interval)

Abram Hindle 28

slide-29
SLIDE 29

Release Pattern Discovery via Partitioning: Methodology and Case Study

Case Study of MySQL

  • Notable behavior

– Frequencies of S+T+D+ were common for most Major and Minor Releases – Frequency of B- was common for Major Releases – MySQL probably doesn’t follow a test-first methodology (S+T- in slope across release)

∗ S+T- does not imply test first

– Consistency and Inconsistency across branches

Abram Hindle 29

slide-30
SLIDE 30

Release Pattern Discovery via Partitioning: Methodology and Case Study

Future Work

  • Characterize the whole process instead of the just

release time

  • More analysis techniques
  • Analyze the difference between Major and Minor

releases

  • Study more projects make broader more global

generalizations

Abram Hindle 30

slide-31
SLIDE 31

Release Pattern Discovery via Partitioning: Methodology and Case Study

Summary

  • Provided a methodology to generalize about

behaviour

  • Characterized release time behaviour via partitioning
  • Provided an initial step towards automated process

extraction

  • Showed that partitioning revisions allows for a more

process based analysis

  • Characterized release patterns of MySQL

Abram Hindle 31

slide-32
SLIDE 32

Release Pattern Discovery via Partitioning: Methodology and Case Study

Thank you

  • Any Questions?

Abram Hindle 32

slide-33
SLIDE 33

Release Pattern Discovery via Partitioning: Methodology and Case Study

Project Major Minor All MySQL 3.23 2 68 70 MySQL 4.0 4 110 114 MySQL 4.1 4 110 114 MySQL 5.0 4 110 114 MySQL 5.1 4 110 114 Total 33 563 595 Table 8: Number of Major and Minor Revisions in each branch (note that MySQL 4.0 to 5.1 share the same re- leases)

Abram Hindle 33