 
              Replication and Robust Results Jim Herbsleb School of Computer Science Carnegie Mellon University jdh@cs.cmu.edu http://conway.isri.cmu.edu/~jdh/
Science Is Based on a Peculiar Logic • Experimental method • Relationship => hypothesis • Hypothesis is true • Conclude relationship is true • Affirming the consequent • A => B • B is true • Conclude A
Many-Layered Problem Theory Relationship Hypothesis Measures
Many-Layered Problem Theory Relationship Hypothesis Measures
Robust Results • Results consistent as “irrelevant” things vary
Multi-site Delay Modification Request (MR) interval Work Last Modification - First Modification Days All changes for 2-year period 30 multi site single site 20 12.7 10 4.9 0 Network Element A Herbsleb, J.D. & Mockus, A. (2003). An Empirical Study of Speed and Communication in Globally- Distributed Software Development. IEEE Transactions on Software Engineering , 29, 3 , pp. 1-14.
Modeling Interval Variable Measure used in models MR interval Log of number of days, first delta to last delta Number of people Log of number of people Diffusion Log of number of modules touched by change Size Log of number of delta Time Date Severity Is high severity Fix Is fix Multi-site Set of sites of all actors has more than one element 7
H1 Multi-site work just Multi-site takes longer H2 Multi-site MRs are larger, take longer Number of People H3 Multi-site MRs are more diffuse, take Work Interval longer H4 Multi-site MRs Size involve more people, take longer Diffusion
Graphical model of work interval for Network Element A
H1 Multi-site work just Multi-site takes longer 199.7 0.27 H2 Multi-site MRs are larger, take longer Number of People H3 Multi-site MRs are 154.1 0.24 more diffuse, take Work Interval longer 35.9 0.12 148.9 H4 Multi-site MRs Size 0.25 involve more people, take longer Diffusion
H1 Multi-site work just Multi-site takes longer 199.7 0.27 H2 Multi-site MRs are larger, take longer Number of People H3 Multi-site MRs are 154.1 0.24 more diffuse, take Work Interval longer 35.9 0.12 148.9 H4 Multi-site MRs Size 0.25 involve more people, take longer Diffusion
H1 Multi-site work just Multi-site takes longer 199.7 0.27 H2 Multi-site MRs are larger, take longer Number of People H3 Multi-site MRs are 154.1 0.24 more diffuse, take Work Interval longer 35.9 0.12 148.9 H4 Multi-site MRs Size 0.25 involve more people, take longer Diffusion
H1 Multi-site work just Multi-site takes longer 199.7 0.27 H2 Multi-site MRs are larger, take longer Number of People H3 Multi-site MRs are 154.1 0.24 more diffuse, take Work Interval longer 35.9 0.12 148.9 H4 Multi-site MRs Size 0.25 involve more people, take longer Diffusion
The Decision . . . • Published in ICSE • What next? • Declare victory and move on? • Replicate with different data? • What was different? • Locations • People • Product • Software type
Multi-site Delay Modification Request (MR) interval Work Last Modification - First Modification Days All changes for 2-year period 30 multi site single site 18.1 20 12.7 10 6.9 4.9 0 Network Element A Network Element B Herbsleb, J.D. & Mockus, A. (2003). An Empirical Study of Speed and Communication in Globally- Distributed Software Development. IEEE Transactions on Software Engineering , 29, 3 , pp. 1-14.
Multi-site 199.7 0.27 Number of People 154.1 0.24 Work Interval 35.9 0.12 148.9 Size 0.25 Diffusion Graphical model of work interval for Network Element A (left) and B (right)
Multi-site Multi-site 199.7 2009.7 0.27 0.55 Number of People Number of People 154.1 566.8 0.24 0.25 Work Interval Work Interval 35.9 701.7 0.12 0.34 148.9 96.2 Size Size 0.25 -0.13 Diffusion Diffusion Graphical model of work interval for Network Element A (left) and B (right)
Thoughts on Replication • Replicating the result was a bit scary • What do we do if the results are different? • But that’s science • How similar must results be?
Graphical model of work interval for Network Element A (left) and B (right)
Closer? More Differentiated? • Would we have learned more from a closer replication? • From a more differentiated replication? • Differentiated how? • What would we have learned?
Replication is Always about Generalization • Close replication • Generalize over concrete instances • Differentiated replication • Generalize over additional variables • External replication • Generalize over experimenters/labs
What Do You Learn? Same result: Original result Amount Robust effect Was anomalous Of Learning Effect unlikely Different result: To be anomalous Many possible causes Closer More differentiated
Most of Science is Replication Theory Relationship Hypothesis Measures
Recommend
More recommend