How Has Forking Changed in the Last 20 Years? A Study of Hard Forks - - PowerPoint PPT Presentation

how has forking changed in the last 20 years a study of
SMART_READER_LITE
LIVE PREVIEW

How Has Forking Changed in the Last 20 Years? A Study of Hard Forks - - PowerPoint PPT Presentation

How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub Shurui Zhou, Bogdan Vasilescu, Christian Kstner Shurui Zhou Bogdan Vasilescu Christian Kstner University of Toronto Assistant Prof. (Fall 2020) Software


slide-1
SLIDE 1

How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub

Shurui Zhou, Bogdan Vasilescu, Christian Kästner

slide-2
SLIDE 2

Shurui Zhou University of Toronto Assistant Prof. (Fall 2020)

Bogdan Vasilescu

Software Engineering Ph.D. Program

Christian Kästner

slide-3
SLIDE 3

Upstream Fork/Branch

Forking

slide-4
SLIDE 4

Upstream Fork/Branch

à Splitting off a community A need of a community that was not fulfilled by the original project.

Traditional Notion of Forking

slide-5
SLIDE 5
  • Technical reason

Motivations for Forking

slide-6
SLIDE 6
  • Technical reason
  • Governance disputes

Motivations for Forking

slide-7
SLIDE 7
  • Technical reason
  • Personal reasons
  • Governance disputes
  • Discontinuation of the original project
  • Commercial forks
  • Legal reasons

Motivations for Forking

slide-8
SLIDE 8

‘99 ‘08 ‘11 ‘05 ‘17 ‘93 ‘14 ‘02

Since 1977

Timeline of Some Open-Source Forking Events

slide-9
SLIDE 9

Fo Fork-Ba Based D Develop

  • pment

Ch Changed E Everything

slide-10
SLIDE 10

à Fork a repository to start CONTRIBUTE to a project [1].

[1] Fork a repo. https://help.github.com/en/github/getting-started-with-github/fork-a-repo

Fork-Based Development

slide-11
SLIDE 11

#Forks #GitHub Projects >50 114,120 >500 9164 >1,000 2236 >5,000 198 >10,000 72 >100,000 2 [GHTorrent 2019-06]

Fork-based Dev. Becomes Popular

slide-12
SLIDE 12

Different kinds of Forks

slide-13
SLIDE 13

Controversial Discussion of Hard forks

Free and open-source licenses Guaranteeing flexibility Fostering disruptive innovations Fragment a community Lead to confusion for both maintainer and contributors

slide-14
SLIDE 14

Fork-Based Dev. Changed Everything

slide-15
SLIDE 15

Hard Forks in Social Coding Era Family tree of 3D printer firmware

slide-16
SLIDE 16

Hard Forks in Social Coding Era

slide-17
SLIDE 17

Research Question How have perceptions and practices around hard forks changed?

slide-18
SLIDE 18

How have perceptions and practices around hard forks changed? Research Question

slide-19
SLIDE 19

Mixed Methods

Repository Mining Interview

slide-20
SLIDE 20

Mixed Methods

Repository Mining

  • Heuristics to identify

candidate hard forks

  • Filtering false positives
  • Card sorting
slide-21
SLIDE 21

Traditional Notion of Forking

Visualizing Fork Activities

Commit graph of fork: tmyroadctfig/jnode

Commit history of both fork and upstream

slide-22
SLIDE 22

Identifying Evolution Patterns (Card Sorting)

slide-23
SLIDE 23

Identifying Evolution Patterns of Hard Forks

  • 15 evolution patterns
  • 15,306 hard forks

Covering 97.7 % of all hard forks

slide-24
SLIDE 24

Result: Frequency of Hard Forks

Most hard forks are created as forks of active projects (14,254 hard forks, 93 %)

slide-25
SLIDE 25

Result: Frequency of Hard Forks

A substantial number of cases where hard fork are created to revive a dead project (1,052 hard forks, 6.8 %)

slide-26
SLIDE 26

Both upstream and hard fork remain active for extended periods

  • f time are not common

(779 hard forks, 5%)

Result: Frequency of Hard Forks

slide-27
SLIDE 27

Result

  • a method to identify hard forks
  • a dataset of 15,306 hard forks
  • a classification and analysis of

evolution patterns of hard forks

A rare phenomenon Only 15,306 hard forks, 0.2 % of GitHub’s 47 million forks have 3 or more stars.

slide-28
SLIDE 28
  • Fork owner
  • decision process that lead to hard fork
  • relationship to the upstream project
  • future plans
  • Owners of upstream: “To what extent,…
  • aware of/interact with/monitor hard forks
  • concern/take steps to avoid hard forks

Interview 18 Upstream & Hard Fork owners

7% response rate

slide-29
SLIDE 29

Result: Why Hard Forks Are Created

Align well with prior findings.

slide-30
SLIDE 30

Result: Why Hard Forks Are Created

Common obstacles :

  • Unresponsive maintainers (P1, P2, P8)
  • Rejected pull requests (P11, P13, P14)

P2: “before forking, we started by opening issues and pull requests, but there was a lack of response from their part. [We] got some news only 2 months after, when

  • ur fork was getting some interest from others.”

upstream: openai/baselines P2: hill-a/stable-baselines (has 463 second-level forks)

slide-31
SLIDE 31

Har ard forks ar are e not lik likely ely to be e avoid idab able le specific general

slide-32
SLIDE 32

The stigma around hard forking is gone!

with concern about community fragmentation

slide-33
SLIDE 33

Tooling Opportunities

  • Considering multiple forked projects as part of a larger community
  • A bot to monitor emerging hard forks

Found a hard fork! shuiblue/fragment

  • Identify the intention behind a fork

The hard fork fixed bug #123 (high priority)!

slide-34
SLIDE 34
  • Considering multiple forked projects as part of a larger community.
  • A bot to monitor emerging hard forks

Found a hard fork! shuiblue/fragment

  • Identify the intention behind a fork

The hard fork fixed bug #123 (high priority)!

  • Dashboard to show how multiple

projects and important hard forks interrelate

Date Activity Participants 2021-06-11 repo1 cross-referenced 2 PRs to repo2 usr1, usr13 2021-06-13 repo3 has 105 more stars usr100… usr205 2021-07-01 repo4 submitted PR#234 to repo2 (35 commits), got rejected usr50, usr89 2021-07-05 12 contributors from repo2 migrate to repo 4 usr20, … … … …

Tooling Opportunities

slide-35
SLIDE 35

ej @shuishuiblue