The Social Organization of the R Project John Fox McMaster - - PowerPoint PPT Presentation

the social organization of the r project
SMART_READER_LITE
LIVE PREVIEW

The Social Organization of the R Project John Fox McMaster - - PowerPoint PPT Presentation

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue? The Social Organization of the R Project John Fox McMaster University useR 2008 Dortmund John Fox McMaster University The Social Organization of


slide-1
SLIDE 1

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Social Organization of the R Project

John Fox

McMaster University

useR 2008 Dortmund

John Fox McMaster University The Social Organization of the R Project

slide-2
SLIDE 2

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Introduction

What is Problematic About Open-Source Software Development?

Typical questions (particularly posed by economists): Why do people (or organizations) participate in open-source software development, and is their participation rational? A di¤erent point of view: Participation in voluntary associations is a normal social activity. What is problematic is why and how a voluntary association can produce a complex, integrated product such as software.

John Fox McMaster University The Social Organization of the R Project

slide-3
SLIDE 3

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Introduction

Stated Motivations of R-Core Developers

To leverage one’s own e¤orts by building a mutually useful product: “[M]y feeling is that I gain great bene…t from open source software. This is tremendously valuable to me, being able to use all of these other tools, and I feel both a moral and a practical obligation to contribute back into this sea of tools that are, I think, very important for the development of our profession.” An economist might …nd the “practical obligation” an expression of rationality.

John Fox McMaster University The Social Organization of the R Project

slide-4
SLIDE 4

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Introduction

Stated Motivations of R-Core Developers

To work on the cutting edge of statistical computing: “[It’s] very satisfying ... to work on a day-to-day basis with people with whom one has common interests and can get a lot of pleasure from working with.”

John Fox McMaster University The Social Organization of the R Project

slide-5
SLIDE 5

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Introduction

Stated Motivations of R-Core Developers

To provide statistical computing facilities to those who could not otherwise a¤ord them: “One of the nicest sort of things [is that] other people in the Philippines or Bolivia or Mexico ... can have a world class statistical software system [when] they could never a¤ord any of the commercial systems.”

John Fox McMaster University The Social Organization of the R Project

slide-6
SLIDE 6

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Introduction

Stated Motivations of R-Core Developers

Statisticians are habituated to cooperation: “But statistics itself is a collaborative …eld. You can’t actually do anything in statistics, or at least nothing of interest, unless you cooperate with a subject matter

  • expert. So basically . . . cooperation is built into the

subject, and that might have had some in‡uence on it, but maybe you have to be predisposed to collaboration if you’re going to be in statistics.”

John Fox McMaster University The Social Organization of the R Project

slide-7
SLIDE 7

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Trajectory of the R Project

The growth in CRAN packages is approximately exponential

Number of CRAN Packages 2001−06−21 2001−12−17 2002−06−12 2003−05−27 2003−11−16 2004−06−05 2004−10−12 2005−06−18 2005−12−16 2006−05−31 2006−12−12 2007−04−12 2007−11−16 2008−03−18 100 200 300 400 500 600 800 1000 1200 1400 1.3 1.4 1.5 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 110 129 162 219 273 357 406 548 647 739 911 1000 1300 1427 Date R Version

Source of Data: https://svn.r- project.org/R/branches/.

John Fox McMaster University The Social Organization of the R Project

slide-8
SLIDE 8

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Trajectory of the R Project

But Tukey would want us to plot the residuals

−0.06 −0.04 −0.02 0.00 0.02 0.04 0.06 Date Residuals 2002 2003 2004 2005 2006 2007 2008

  • John Fox

McMaster University The Social Organization of the R Project

slide-9
SLIDE 9

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Trajectory of the R Project

The growth rate in the number of messages on R-help has declined

Year Mean Number of R−help Messages per Month 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 50 100 200 500 1000 2000

Source of Data: https://stat.ethz.ch/ pipermail/r-help/.

John Fox McMaster University The Social Organization of the R Project

slide-10
SLIDE 10

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Trajectory of the R Project

The size of the R Core group has doubled

1990 1995 2000 2005 5 10 15 20 Year Number of "R−Core" Members

  • 5

10 15 20 Formation of R−Core Size of the R Core Group Points represent changes in membership.

Sources of Data: Interviews with members of R Core, contributors(), https://svn.r-project.org/R.

John Fox McMaster University The Social Organization of the R Project

slide-11
SLIDE 11

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Trajectory of the R Project

Activity in the R svn archive by members of R Core has become more unequal

1998 2000 2002 2004 2006 2008 0.3 0.4 0.5 0.6 0.7 0.8 Year "Commits" to R svn Archive by Members of R Core

  • Gini Coefficient

Proportion by most active member

Source of Data: https://svn.r-project.org/R

John Fox McMaster University The Social Organization of the R Project

slide-12
SLIDE 12

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

The Development and Organization of the R Project

A member of the R Core group on how decisions get made: “[W]e [have] a system that [is] democratic but the person who [is] going to do the work [gets] more votes than anybody else.”

John Fox McMaster University The Social Organization of the R Project

slide-13
SLIDE 13

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Development of the R Project

Stage Initial Transitional R-Core Approximate Dates 1990-94 1994-97 1997- Recruitment Some student participation Demonstrated interest Semi-purposive, by invitation Division of Labour None Developing Semi-formal Hierarchy None Original developers, contributors Informal

John Fox McMaster University The Social Organization of the R Project

slide-14
SLIDE 14

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Development of the R Project

Stage Initial Transitional R-Core Principal Mode

  • f Cooperation

Direct collaboration Anarchic voluntarism Role enactment + voluntarism Planning None Implicit Partial Decision-Making Joint Individual Semi-consensus Resolution of Disagreements Discussion Largely unnecessary Discussion, preemption, avoidance Principal Goal Personal Development Reproduce functionality of S Various, partly con‡icting

John Fox McMaster University The Social Organization of the R Project

slide-15
SLIDE 15

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Why Did R Succeed?

The initial developers opened up the project, eventually forming the R Core group (cf., Octave, LispStat) and releasing R under the GPL. The Core group is immensely talented, with complementary skills. The project had an initial target: reproducing the functionality

  • f S.

Much of the necessary software beyond the basic R system was already available in S “libraries” (e.g., MASS, survival, nlme). The S language had already penetrated the statistics community. S is relatively easy to use (cf., LispStat).

John Fox McMaster University The Social Organization of the R Project

slide-16
SLIDE 16

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Why Did R Succeed?

The package system, introduced early on, permitted participation with minimal direct intervention by R Core. The package system serves partially to circumvent disputes. The R Core group successfully leveraged information technology (e.g., version control, e-mail lists, package automation, distribution via the Internet). R clearly improved on S: e.g., lexical scoping, name spaces, package system. R runs of all widely used computing platforms (Windows, Mac OS, Linux/Unix). R is free (in both senses).

John Fox McMaster University The Social Organization of the R Project

slide-17
SLIDE 17

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Can This Success Continue?

Positive Factors

R has a great deal of momentum. The basic R system is essentially sound, and much of the dynamism of R is in package development. Many of the factors leading to the initial success of R continue to apply (e.g., talent of R Core). R has attracted a very large user and developer base. R is highly visible (e.g., in books and journal articles). R has powerful advocates.

John Fox McMaster University The Social Organization of the R Project

slide-18
SLIDE 18

Introduction Trajectory Development and Organization Why Did R Succeed? Can Success Continue?

Can This Success Continue?

Negative Factors

The decision-making procedures of R Core were perhaps better suited to an earlier stage in the development of the software and a smaller Core group

E.g., failure to resolve long-standing issues (no multi-threading, weakness in handling very large data sets).

There is no general plan for the development of R. Possible over-dependence on a few key individuals without a clear plan for succession. The current organization of CRAN may not be sustainable (e.g., users already su¤er from information overload).

John Fox McMaster University The Social Organization of the R Project