release pattern discovery via partitioning methodology
play

Release Pattern Discovery via Partitioning: Methodology and Case - PowerPoint PPT Presentation

Release Pattern Discovery via Partitioning: Methodology and Case Study Release Pattern Discovery via Partitioning: Methodology and Case Study Abram Hindle, Michael W. Godfrey, Richard C. Holt Software Architecture Group David R. Cheriton


  1. Release Pattern Discovery via Partitioning: Methodology and Case Study Release Pattern Discovery via Partitioning: Methodology and Case Study Abram Hindle, Michael W. Godfrey, Richard C. Holt Software Architecture Group David R. Cheriton School of Computer Science University of Waterloo Canada { ahindle,migod,holt } @cs.uwaterloo.ca Abram Hindle 1

  2. Release Pattern Discovery via Partitioning: Methodology and Case Study Introduction • Methodology for analyzing revisions around releases • Discover project behaviour • Automated Process Extraction from change histories (version control) • Release Time is the end and start of an iteration. Abram Hindle 2

  3. Release Pattern Discovery via Partitioning: Methodology and Case Study Introduction • Value of Process Discovery – Verify what programmers are doing – Extract successful processes – Avoid unsuccessful processes – Do not have to rely on witnesses to the development Abram Hindle 3

  4. Release Pattern Discovery via Partitioning: Methodology and Case Study Introduction • For each class of revision, does the frequency of those revisions increase (or decrease) preceding (or following) the time of the release? Abram Hindle 4

  5. Release Pattern Discovery via Partitioning: Methodology and Case Study Terminology • Revision • Major and Minor Releases • Revision Classes – Source,Test,Build, and Documentation Revisions • Release Pattern Abram Hindle 5

  6. Release Pattern Discovery via Partitioning: Methodology and Case Study Methodology • Extract • Partition • Aggregate • Analyze – STBD Notation Abram Hindle 6

  7. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 1: Revisions and releases over time. Extract the revisions Abram Hindle 7

  8. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 2: Partitioned revisions and releases over time Abram Hindle 8

  9. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 3: Partitioned revisions and releases over time, sep- arated Abram Hindle 9

  10. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 4: Partitioned revisions aggregated per day Abram Hindle 10

  11. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 5: Partitioned revisions aggregated per day and smoothed Abram Hindle 11

  12. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 6: Select the revisions around release times Abram Hindle 12

  13. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 7: Aligned revisions aggregated Abram Hindle 13

  14. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 8: Align and aggregate revisions of each class Abram Hindle 14

  15. Release Pattern Discovery via Partitioning: Methodology and Case Study Figure 9: Analysis: averages and linear regressions Abram Hindle 15

  16. Release Pattern Discovery via Partitioning: Methodology and Case Study STBD Notation • Shows relative revision frequency around a release • Shows slope of the linear regression around a release • Prefixes: Source S, Test T, Build B and Docs D – + more before a release or positive slope – - more after a release or negative slope – = equal before and after a release or flat slope – ? undecided • Examples: S+T+B+D+, S-T-B-D-, S+T+B-D= Abram Hindle 16

  17. Release Pattern Discovery via Partitioning: Methodology and Case Study Case Study of MySQL • Popular Open Source RDBMS • Evaluated parallel branches: 3.23, 4.0, 4.1, 5.0, 5.1 • BitKeeper repository, used bt2csv to extract change log and revision information • Aggregated per day • 33 Major Releases across all branches and 563 Minor releases across all branches. • Analyzed with bt2csv, HiraldoGrok, GNUPlot, R Abram Hindle 17

  18. Release Pattern Discovery via Partitioning: Methodology and Case Study Case Study of MySQL • Extraction – Extract both revisions and release events – Extraction Tools for Revisions ∗ softChange - For CVS and the Schema of extracted data ∗ bt2csv - Extractor BitKeeper, extracts into a softChange schema Abram Hindle 18

  19. Release Pattern Discovery via Partitioning: Methodology and Case Study Case Study of MySQL • Extraction – Extract Releases ∗ Manual ∗ VCS Tags, Changelogs, Manuals, date-stamps in FTP repositories. ∗ The MySQL manual contained release info Abram Hindle 19

  20. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Source Test Build Doc MySQL 3.23 4 220 1 410 421 21 MySQL 4.0 11 593 4 936 1 033 34 MySQL 4.1 31 451 16 430 2 990 88 MySQL 5.0 45 946 26 373 3 908 105 MySQL 5.1 52 897 31 389 4 772 122 Total 259 822 104 528 24 095 4 137 Table 1: Total Number of Revisions per class Abram Hindle 20

  21. Release Pattern Discovery via Partitioning: Methodology and Case Study MySQL 5.1 Histogram (log) 1 SRC TEST BUILD DOC 0.1 Proportion 0.01 0.001 1e-04 0 20 40 60 80 100 Linearly increasing bins (100) Figure 10: Distribution of revision classes for MySQL 5.1 Abram Hindle 21

  22. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Major Minor All S-T+B-D+ S+T+B+D+ S+T+B+D+ MySQL 3.23 S+T+B-D+ S+T?B?D+ S+T?B?D+ MySQL 4.0 S+T+B-D= S+T+B?D+ S+T+B?D+ MySQL 4.1 S+T+B-D+ S+T+B?D+ S+T+B?D+ MySQL 5.0 S+T+B-D+ S+T-B+D+ S+T-B?D+ MySQL 5.1 Table 2: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days) Abram Hindle 22

  23. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Major Minor All S-T+B-D+ S+T+B+D+ S+T+B+D+ MySQL 3.23 S+T+B-D+ S+T?B?D+ S+T?B?D+ MySQL 4.0 S+T+B-D= S+T+B?D+ S+T+B?D+ MySQL 4.1 S+T+B-D+ S+T+B?D+ S+T+B?D+ MySQL 5.0 S+T+B-D+ S+T-B+D+ S+T-B?D+ MySQL 5.1 Table 3: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days) Abram Hindle 23

  24. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Major Minor All S- T+ B-D+ S+ T+ B+D+ S+ T+ B+D+ MySQL 3.23 S+ T+ B-D+ S+T?B?D+ S+T?B?D+ MySQL 4.0 S+ T+ B-D= S+ T+ B?D+ S+ T+ B?D+ MySQL 4.1 S+ T+ B-D+ S+ T+ B?D+ S+ T+ B?D+ MySQL 5.0 S+ T+ B-D+ S+T-B+D+ S+T-B?D+ MySQL 5.1 Table 4: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days) Abram Hindle 24

  25. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Major Minor All S-T+B-D+ S+T+B+D+ S+T+B+D+ MySQL 3.23 S+T+B-D+ S+T?B?D+ S+T?B?D+ MySQL 4.0 S+T+B-D= S+T+B?D+ S+T+B?D+ MySQL 4.1 S+T+B-D+ S+T+B?D+ S+T+B?D+ MySQL 5.0 S+T+B-D+ S+T-B+D+ S+T-B?D+ MySQL 5.1 Table 5: Summary of revision frequencies before and after release using majority voting where ’?’ means no majority (voting over intervals of 7, 14, 31 and 42 days) Abram Hindle 25

  26. Release Pattern Discovery via Partitioning: Methodology and Case Study MySQL 5.1 - test - Before and After - Major releases: 31 days, Flat windows of size 14 3000 Sum of Releases per day Before Sum of Releases per day After Linear Regression of Before Linear Regression of After 2500 2000 Sum of revisions 1500 1000 500 0 -40 -30 -20 -10 0 10 20 30 40 Day Figure 11: Windowed plot of Test revisions Abram Hindle 26

  27. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Before After Both S-T-B+D+ S+T-B+D= S+T-B+D+ MySQL 3.23 S+T-B-D- S+T-B+D= S+T-B+D- MySQL 4.0 S+T-B-D+ S-T-B+D+ S-T-B+D+ MySQL 4.1 S+T-B-D- S-T-B+D- S-T-B+D+ MySQL 5.0 S+T-B-D- S+T-B-D+ S+T-B+D+ MySQL 5.1 Table 6: Linear Regressions of daily revisions class totals: + indicates a positive slope, - indicates a negative slope, = indicates a slope near 0 (Major releases, 42 day interval) Abram Hindle 27

  28. Release Pattern Discovery via Partitioning: Methodology and Case Study Project Before After Both S- T- B+D+ S+ T- B+D= S+ T- B+D+ MySQL 3.23 S+ T- B-D- S+ T- B+D= S+ T- B+D- MySQL 4.0 S+ T- B-D+ S- T- B+D+ S- T- B+D+ MySQL 4.1 S+ T- B-D- S- T- B+D- S- T- B+D+ MySQL 5.0 S+ T- B-D- S+ T- B-D+ S+ T- B+D+ MySQL 5.1 Table 7: Linear Regressions of daily revisions class totals: + indicates a positive slope, - indicates a negative slope, = indicates a slope near 0 (Major releases, 42 day interval) Abram Hindle 28

  29. Release Pattern Discovery via Partitioning: Methodology and Case Study Case Study of MySQL • Notable behavior – Frequencies of S+T+D+ were common for most Major and Minor Releases – Frequency of B- was common for Major Releases – MySQL probably doesn’t follow a test-first methodology (S+T- in slope across release) ∗ S+T- does not imply test first – Consistency and Inconsistency across branches Abram Hindle 29

  30. Release Pattern Discovery via Partitioning: Methodology and Case Study Future Work • Characterize the whole process instead of the just release time • More analysis techniques • Analyze the difference between Major and Minor releases • Study more projects make broader more global generalizations Abram Hindle 30

Recommend


More recommend