an exploratory study into the prevalence of botched code
play

An Exploratory Study Into the Prevalence of Botched Code - PowerPoint PPT Presentation

An Exploratory Study Into the Prevalence of Botched Code Integrations @wardmuylaert @oniroi Ward Muylaert and Coen De Roover Software Languages Lab Vrije Universiteit Brussel Brussels, Belgium Textual Conflicts function foo(x) { if (x <


  1. An Exploratory Study Into the Prevalence of Botched Code Integrations @wardmuylaert @oniroi Ward Muylaert and Coen De Roover Software Languages Lab Vrije Universiteit Brussel Brussels, Belgium

  2. Textual Conflicts function foo(x) { if (x < 0) { return -x; } return x; } 2

  3. Textual Conflicts function foo(x) { if (x < 0) { return -x; } return x; } function foo(x) { if (x < 0) { return -x; } return 0; } 3

  4. Textual Conflicts function foo(x) { if (x < 0) { return -x; } return x; } function foo(x) { function foo(x) { if (x < 0) { if (x < 0) { return -x; return -x; } } return 0; return 1; } } ? 4

  5. Semantic Conflicts function foo(x) { if (x < 0) { return -x; } return x; } 5

  6. Semantic Conflicts function foo(x) { if (x < 0) { return -x; } return x; } function foo(x) { if (x < 0) { return -x; } return x; } foo(-10); 6

  7. Semantic Conflicts function foo(x) { if (x < 0) { return -x; } return x; } function foo(x) { function foo(x) { if (x < 0) { if (x < 0) { return -x; return -x; } } return x; return 1; } } foo(10); ? 7

  8. Research Questions Dataset Research Method Result Research Questions RQ1 How often does code integration lead to semantic conflicts? RQ2 How much effort is needed to fix semantic conflicts after code integration? RQ3 How long does it take to fix semantic conflicts after code integration? 8

  9. Research Questions Dataset Research Method Result GitHub 9

  10. Research Questions Dataset Research Method Result GitHub GHTorrent 150GB MySQL dump, 400M commits Georgios Gousios, The GHTorrent Dataset and Tool Suite, MSR 2013. 10

  11. Research Questions Dataset Research Method Result Continuous Delivery Commit Build Unit tests Acceptance tests … Release 11

  12. Research Questions Dataset Research Method Result Continuous Delivery Commit Build Unit tests Acceptance tests … Release Travis CI 12

  13. Research Questions Dataset Research Method Result Continuous Delivery Commit Build Unit tests Acceptance tests … Release TravisTorrent 1300 projects, 100+ builds, 50+ stars, recent commits Moritz Beller et al., Oops, My Tests Broke the Build: An Analysis of Travis CI Builds with GitHub, PeerJ Preprints, 2016. Travis CI 13

  14. Research Questions Dataset Research Method Result Travis CI API Builds Commits Repositories state commit_id repository_id started_at sha slug finished_at branch description commit_id message … repository_id committed_at … … Travis CI - API Reference, https://docs.travis-ci.com/api 14

  15. Research Questions Dataset Research Method Result Travis CI API Builds Commits Repositories state commit_id repository_id started_at sha slug finished_at branch description commit_id message … repository_id committed_at … … Travis CI - API Reference, https://docs.travis-ci.com/api 15

  16. Research Questions Dataset Research Method Result Travis CI API Builds Commits Repositories state commit_id repository_id started_at sha slug finished_at branch description commit_id message … repository_id committed_at … … Travis CI - API Reference, https://docs.travis-ci.com/api 16

  17. Research Questions Dataset Research Method Result Travis CI API Builds Commits Repositories i.e., state commit_id repository_id user/repo started_at sha slug finished_at branch description commit_id message … repository_id committed_at … … Travis CI - API Reference, https://docs.travis-ci.com/api 17

  18. Research Questions Dataset Research Method Result Travis CI API Builds Commits Repositories i.e., state commit_id repository_id user/repo started_at sha slug finished_at branch description commit_id message … repository_id committed_at … … 1.1M builds Those same 1300 projects Travis CI - API Reference, https://docs.travis-ci.com/api 18

  19. Research Questions Dataset Research Method Result Identify Merge Commits SELECT c.sha,dcs.state,dcs.repository_id FROM ( SELECT DISTINCT sha,state,repository_id FROM WardTravisCommits AS wtc INNER JOIN WardTravisBuilds AS wtb ON wtb.commit_id = wtc.id ) AS dcs INNER JOIN commits AS c ON c.sha = dcs.sha INNER JOIN commit_parents AS cp ON c.id = cp.commit_id GROUP BY c.id,dcs.state HAVING COUNT(*) > 1 Number of parents 19

  20. Research Questions Dataset Research Method Result Identify Merge Commits SELECT c.sha,dcs.state,dcs.repository_id FROM ( SELECT DISTINCT sha,state,repository_id Get commits FROM WardTravisCommits AS wtc and their builds INNER JOIN WardTravisBuilds AS wtb ON wtb.commit_id = wtc.id ) AS dcs INNER JOIN commits AS c ON c.sha = dcs.sha INNER JOIN commit_parents AS cp ON c.id = cp.commit_id GROUP BY c.id,dcs.state HAVING COUNT(*) > 1 Number of parents 20

  21. Research Questions Dataset Research Method Result Identify Merge Commits SELECT c.sha,dcs.state,dcs.repository_id FROM ( SELECT DISTINCT sha,state,repository_id Get commits FROM WardTravisCommits AS wtc and their builds INNER JOIN WardTravisBuilds AS wtb ON wtb.commit_id = wtc.id ) AS dcs INNER JOIN commits AS c ON c.sha = dcs.sha As long as they INNER JOIN commit_parents AS cp ON c.id = cp.commit_id have two or more GROUP BY c.id,dcs.state parents HAVING COUNT(*) > 1 Number of parents 21

  22. Research Questions Dataset Research Method Result Enough Merge Commits? 584 Projects with 50+ Builds of Merge Commits 75 114 217.5 22

  23. Research Questions Dataset Research Method Result Enough Merge Commits? 584 Projects with 50+ Builds of Merge Commits ! 75 114 217.5 History rewriting. Patch application. 23

  24. Research Questions Dataset Research Method Result Dataset 100+ builds, 50+ stars, recent commits: 1300 projects 24

  25. Research Questions Dataset Research Method Result Dataset 100+ builds, 50+ stars, recent commits: 1300 projects Still on Travis CI: 1248 projects 25

  26. Research Questions Dataset Research Method Result Dataset 100+ builds, 50+ stars, recent commits: 1300 projects Still on Travis CI: 1248 projects 50+ builds of merge commits: 584 projects 26

  27. Research Questions Dataset Research Method Result What Is Failure? Travis CI 27

  28. Research Questions Dataset Research Method Result What Is Failure? ✓ passed ✘ failed ✘ errored cancelled started Travis CI 28

  29. Research Questions Dataset Research Method Result Failed Merge Commit Builds per Project 6.7% 15.3% 29.0% 62.6% 29

  30. Research Questions Dataset Research Method Result Failed Merge Commit Builds per Project ½ projects: ⅙ merges fail 6.7% 15.3% 29.0% 62.6% ¼ projects: ⅓ merges fail 30

  31. Research Questions Dataset Research Method Result Dataset revisited 100+ builds, 50+ stars, recent commits: 1300 projects Still on Travis CI: 1248 projects 50+ builds of merge commits: 584 projects Without outliers: 559 projects 31

  32. Research Questions Dataset Research Method Result How Much Effort in Fixing the Build? 1 2 3 4 5 6 7 NBTF (Number of Builds To Fix): 4 32

  33. Research Questions Dataset Research Method Result How Much Effort in Fixing the Build? 2 4 16 NBTF 33

  34. Research Questions Dataset Research Method Result How Much Effort in Fixing the Build? Most builds are fixed easily. 2 4 16 ¼ are not or are ignored. NBTF 34

  35. Research Questions Dataset Research Method Result How Much Effort in Fixing the Build? ! NBTF only proxy for effort Most builds are fixed easily. 2 4 16 ¼ are not or are ignored. NBTF 35

  36. Research Questions Dataset Research Method Result How Long to Fix the Build? 13:00 13:10 13:30 14:00 14:45 15:00 15:20 1 2 3 4 5 6 7 TTF (Time To Fix): 1h50m 36

  37. Research Questions Dataset Research Method Result How Long to Fix the Build? Time range (disjunct) Amount Relative Cumulative Less than 1 hour 8031 28.2% 28.2% Less than 12 hours 6062 21.3% 49.4% Less than 1 day 2392 8.4% 57.8% Less than 7 days 5556 19.5% 77.3% Less than 1 month (30 days) 3321 11.6% 88.9% Less than 1 year (365 days) 2967 10.4% 99.3% More than 1 year 198 0.7% 100% 37

  38. Research Questions Dataset Research Method Result How Long to Fix the Build? Time range (disjunct) Amount Relative Cumulative Less than 1 hour 8031 28.2% 28.2% Less than 12 hours 6062 21.3% 49.4% Less than 1 day 2392 8.4% 57.8% Less than 7 days 5556 19.5% 77.3% Less than 1 month (30 days) 3321 11.6% 88.9% Most builds are fixed within a Less than 1 year (365 days) 2967 10.4% 99.3% day. More than 1 year 198 0.7% 100% 38

  39. Research Questions Dataset Research Method Result ! How Long to Fix the Build? Time range (disjunct) Amount Relative Cumulative Less than 1 hour 8031 28.2% 28.2% Time taken ≠ Less than 12 hours 6062 21.3% 49.4% time worked on Less than 1 day 2392 8.4% 57.8% Less than 7 days 5556 19.5% 77.3% Less than 1 month (30 days) 3321 11.6% 88.9% Most builds are fixed within a Less than 1 year (365 days) 2967 10.4% 99.3% day. More than 1 year 198 0.7% 100% 39

  40. Research Questions Dataset Research Method Result Number of Builds To Fix vs Time To Fix 40

  41. Research Questions Dataset Research Method Result Number of Builds To Fix vs Time To Fix 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend