How Do Centralized and Distributed Version Control Systems Impact - - PowerPoint PPT Presentation

how do centralized and distributed version control
SMART_READER_LITE
LIVE PREVIEW

How Do Centralized and Distributed Version Control Systems Impact - - PowerPoint PPT Presentation

How Do Centralized and Distributed Version Control Systems Impact Software Changes? Caius Brindescu Mihai Codoban Sergii Shmarkatiuk Danny Dig 1 GitHub is the main forge for OSS projects SourceForge GitHub 300K repos 4.6M repos 2


slide-1
SLIDE 1

How Do Centralized and Distributed Version Control Systems Impact Software Changes?

Caius Brindescu Mihai Codoban Sergii Shmarkatiuk Danny Dig

1

slide-2
SLIDE 2

2

GitHub is the main “forge” for OSS projects

SourceForge 300K repos GitHub 4.6M repos

slide-3
SLIDE 3

What’s the difference?

3

Git SVN History Local to every user On the server Commits Private, local Centralized, public Branching and merging Cheap Expensive History Modifiable “Set in stone”

slide-4
SLIDE 4

What’s the difference?

3

Git SVN History Local to every user On the server Commits Private, local Centralized, public Branching and merging Cheap Expensive History Modifiable “Set in stone”

slide-5
SLIDE 5

What’s the difference?

3

Git SVN History Local to every user On the server Commits Private, local Centralized, public Branching and merging Cheap Expensive History Modifiable “Set in stone”

slide-6
SLIDE 6

What’s the difference?

3

Git SVN History Local to every user On the server Commits Private, local Centralized, public Branching and merging Cheap Expensive History Modifiable “Set in stone”

slide-7
SLIDE 7

4

What are we missing?

Developers Managers Researchers Tool Builders

slide-8
SLIDE 8

4

What are we missing?

Developers Managers Researchers Tool Builders

slide-9
SLIDE 9

Are they using the tools to their full potential?

4

What are we missing?

Developers Managers Researchers Tool Builders

slide-10
SLIDE 10

Are they using the tools to their full potential?

4

What are we missing?

Developers Managers Researchers Tool Builders

slide-11
SLIDE 11

Are they using the tools to their full potential?

4

What are we missing?

Is switching to Git good? Developers Managers Researchers Tool Builders

slide-12
SLIDE 12

Are they using the tools to their full potential?

4

What are we missing?

Is switching to Git good? Developers Managers Researchers Tool Builders

slide-13
SLIDE 13

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories? Is switching to Git good? Developers Managers Researchers Tool Builders

slide-14
SLIDE 14

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories? Is switching to Git good? Developers Managers Researchers Tool Builders

slide-15
SLIDE 15

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories? Is switching to Git good? Are they building the right tools? Developers Managers Researchers Tool Builders

slide-16
SLIDE 16

Survey

5

820 participants

slide-17
SLIDE 17

Survey

5

820 participants

85% from industry 56% have over 10 years experience 51% work in teams of 6

  • r larger
slide-18
SLIDE 18

Repository Analysis

6

132 repositories 358K commits 409M LOC

slide-19
SLIDE 19

Repository Analysis

6

358K commits 409M LOC

52 SVN 51 Git 29 Hybrid

slide-20
SLIDE 20

Git is the most used VCS

7

0% 15% 30% 45% 60%

5% 1% 9% 12% 20% 53% Git SVN Hg MS TFS CVS Other

slide-21
SLIDE 21

We identified 3 themes

8

  • 2. Impact of the team size on the VCS
  • 3. Impact of the VCS on the software process

RQ 6: Does team size affect the choice of VCS?
 RQ 7: Are larger teams more likely to use Issue Tracking Systems (ITS)?
 RQ 8: Does team size affect the size of commits?
 RQ 9: Does team size influence commit squashing? RQ 10: Does the type of VCS influence the presence and the number of issue tracking labels (ITL)? RQ 11: Is there a correlation between the number of ITLs in the commit message and the commit size? RQ 12: How does the size of commits vary in time?

  • 1. Impact of VCS on developer’s behavior

RQ 1: Does the type of VCS affect the size of commits? RQ 2: Do developers split their commits into logical units of change? How do they do it? RQ 3: How often and why do developers squash their commits? RQ 4: Why do developers prefer one Version Control System over another? RQ 5: Does the VCS influence the frequency with which developers commit?

slide-22
SLIDE 22

We identified 3 themes

8

  • 2. Impact of the team size on the VCS
  • 3. Impact of the VCS on the software process

RQ 6: Does team size affect the choice of VCS? RQ 7: Are larger teams more likely to use Issue Tracking Systems (ITS)? RQ 8: Does team size aff RQ 9: Does team size influence commit squashing? RQ 10: Does the type of VCS influence the presence and the number of issue tracking labels (ITL)? RQ 11: Is there a correlation between the number of ITLs RQ 12: How does the size of commits vary in time?

  • 1. Impact of VCS on developer’s behavior

RQ 1: Does the type of VCS affect the size of commits? RQ 2: Do developers split their commits into logical units of change? How do they do it? RQ 3: How often and why do developers squash their commits? RQ 4: Why do developers prefer one Version Control System over another? RQ 5: Does the VCS influence the frequency with which developers commit?

slide-23
SLIDE 23

RQ1: Does the type of VCS affect commit size?

For Git and SVN the difference was statistically significant

9

LOC

10.5 21 31.5 42

Mean

23.20 40.06

SVN Git

slide-24
SLIDE 24

10

“Git promotes the idea that your commit space is not inflicting pain on anybody else […] it promotes small frequent commits […] rather than the 5pm commit”

RQ1: Does the type of VCS affect commit size?

slide-25
SLIDE 25

For repositories that transitioned, there was no statistically significant difference

11

RQ1: Does the type of VCS affect commit size?

LOC

6.5 13 19.5 26

Mean

25.72 23.02

Hybrid-SVN Hybrid-Git

slide-26
SLIDE 26

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC

RQ1: Does the type of VCS affect commit size?

slide-27
SLIDE 27

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC One possible explanation is that each developer commits to their own local repo, with no need for synchronization or merging their changes. Hybrid repos keep the same commit size because

  • f existing policies.

RQ1: Does the type of VCS affect commit size?

slide-28
SLIDE 28

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC One possible explanation is that each developer commits to their own local repo, with no need for synchronization or merging their changes. Hybrid repos keep the same commit size because

  • f existing policies.

RQ1: Does the type of VCS affect commit size?

Old habits die hard

slide-29
SLIDE 29

Implications

Smaller commits makes it easier to “bisect” the tree Git offers better tools for splitting commits Some repositories migrate from one paradigm to the

  • ther; this might bias the results

Changing the VCS is not enough

13

slide-30
SLIDE 30

Separating the changes to the working copy into multiple, separate commits

14

RQ2: Do developers split their changes?

file1.txt file2.txt! file3.txt file1.txt file2.txt! file3.txt

slide-31
SLIDE 31

Separating the changes to the working copy into multiple, separate commits

14

RQ2: Do developers split their changes?

file1.txt file2.txt! file3.txt Commit 1 Commit 2

slide-32
SLIDE 32

RQ2: Do developers split their changes?

15

0% 25% 50% 75% 100%

SVN Git

6% 6% 13% 27% 81% 68%

Split their changes Group their changes Other

slide-33
SLIDE 33

RQ2: Do developers split their changes?

15

0% 25% 50% 75% 100%

SVN Git

6% 6% 13% 27% 81% 68%

Split their changes Group their changes Other

“[changes] should be logically separated to easily allow [the] commit message to drive [the] review”

slide-34
SLIDE 34

16

0% 25% 50% 75% 100%

SVN Git

12% 11%

6% 5%

45% 62% 37% 22%

By implementation By issue Policy Other

RQ2: Do developers split their changes?

slide-35
SLIDE 35

16

0% 25% 50% 75% 100%

SVN Git

12% 11%

6% 5%

45% 62% 37% 22%

By implementation By issue Policy Other

RQ2: Do developers split their changes?

“[Git] gives useful tools for splitting or merging commits”

slide-36
SLIDE 36

76% of developers split their commits. The percentage is higher for Git (81.25%), compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

slide-37
SLIDE 37

76% of developers split their commits. The percentage is higher for Git (81.25%), compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

We attribute this to an easier commit process.

slide-38
SLIDE 38

76% of developers split their commits. The percentage is higher for Git (81.25%), compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

We attribute this to an easier commit process. Overall, developers choose to split their commits based on the issue they belong to.

slide-39
SLIDE 39

18

RQ2: Do developers split their changes?

For Git, more users (37%) split changes based

  • n implementation details that in SVN (22%).
slide-40
SLIDE 40

18

RQ2: Do developers split their changes?

For Git, more users (37%) split changes based

  • n implementation details that in SVN (22%).

“Each commit is one cohesive change […] (like ‘sphere class can now calculate its own volume’)

  • user level features usually take many commits.”
slide-41
SLIDE 41

Implications

Doing this makes it easier to perform other operations such as cherry-picking.

19

slide-42
SLIDE 42

Implications

Doing this makes it easier to perform other operations such as cherry-picking.

19

For mining software repositories, Git might be better since it allows smaller atomic changes. Splitting changes is a manual and tedious process. Tool builders could make their tools support this process

slide-43
SLIDE 43

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-44
SLIDE 44

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-45
SLIDE 45

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-46
SLIDE 46

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-47
SLIDE 47

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-48
SLIDE 48

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-49
SLIDE 49

RQ3: Why do developers prefer one VCS over another?

20

12.5 25 37.5 50

Killer features Old habit Ease of use Personal pref. Other 9% 2% 20% 23% 46%

5% 1% 42% 42% 11%

SVN Git

slide-50
SLIDE 50

21

RQ3: Why do developers prefer one VCS over another?

“You get to commit to a local repository and make your changes public only when they are ready”

slide-51
SLIDE 51

21

RQ3: Why do developers prefer one VCS over another?

“You get to commit to a local repository and make your changes public only when they are ready” “I found the commit process very straightforward […]”

slide-52
SLIDE 52

22

RQ3: Why do developers prefer one VCS over another?

Git is preferred because of its “killer features” SVN is preferred because of it’s easier to use and because of familiarity

slide-53
SLIDE 53

Implications

23

Tool builders should focus on features that complement the developer’s workflow. While Git has a steep learning curve, it does allow for better ways to manage your changes.

slide-54
SLIDE 54

RQ4: Do developers squash their commits

24

What is squashing?

slide-55
SLIDE 55

25

RQ4: Do developers squash their commits

0% 15% 30% 45% 60%

Git

8.62 55%

37%

Yes No N/A

slide-56
SLIDE 56

Why do they do it? Developers using Git mention two different reasons (a) grouping several changes together (b) they only care about the final solution, not the path they took to get there

26

RQ4: Do developers squash their commits

slide-57
SLIDE 57

27

RQ4: Do developers squash their commits

Over 1/3 of developers squash their commits Large teams squash commits more often then small ones

slide-58
SLIDE 58

Implications

28

Tool builder could allow for non-destructive history modifications, e.g.: hierarchical commits Git allows users to change history before they make it public or available to others.

slide-59
SLIDE 59

Threats

Squashing Age bias OSS vs. Proprietary software

29

slide-60
SLIDE 60

Conclusions

30

The commit size is smaller in Git than SVN. Developers split their changes more often in Git, using a finer granularity. 1/3 of developers use squashing to change the history. cope.eecs.oregonstate.edu/VCStudy Teams of all sizes predominantly prefer Git (71%)