Apache Roles Dataset Dr. Megan Squire Elon University, NC - - PowerPoint PPT Presentation

apache roles dataset
SMART_READER_LITE
LIVE PREVIEW

Apache Roles Dataset Dr. Megan Squire Elon University, NC - - PowerPoint PPT Presentation

Apache Roles Dataset Dr. Megan Squire Elon University, NC FLOSSmole.org Goal Collect a giant list of all the Apache participants, what their role(s) are within Apache, and on what project. Goal Collect a giant list of all the Apache


slide-1
SLIDE 1

Apache Roles Dataset

  • Dr. Megan Squire

Elon University, NC FLOSSmole.org

slide-2
SLIDE 2

Goal

Collect a giant list of all the Apache participants, what their role(s) are within Apache, and on what project.

slide-3
SLIDE 3

Goal

Collect a giant list of all the Apache participants, what their role(s) are within Apache, and on what project. Timestamp everything so when I collect this data again, I can see changes.

slide-4
SLIDE 4

Why Apache?

  • Big (200+ subprojects)
  • Transparent
  • Popular to study
slide-5
SLIDE 5

What I'm looking for

Alice is a committer on CouchDB.

person role project

slide-6
SLIDE 6

Complete record

On May 19, 2013 I learned from the Minutes of the April Board Meeting that Alice is a committer on CouchDB.

person role project date collected + source = datasource_id

slide-7
SLIDE 7

Why is this useful?

Discussions that require knowledge of roles:

  • Power relations
  • Leadership structure
  • Group dynamics
  • Promotion and retention
  • Decisionmaking
  • Workload, responsibilities
slide-8
SLIDE 8

Data sources

  • List of Committers (CLA signers)
  • List of Signers (not yet committed)
  • List of Committers (self-submitted)
  • List of ASF members
  • Minutes of Board Meetings
  • Individual Project Team Listings
slide-9
SLIDE 9

List of CLA-signed committers

http://people.apache.org/committer-index.html

slide-10
SLIDE 10

List of CLA Signers, not yet committers

http://people.apache.org/committer-index.html#unlistedclas

slide-11
SLIDE 11

Committers by ID (self-submitted)

slide-12
SLIDE 12

Committers by ID (self-submitted)

http://people.apache.org/list_B.html

slide-13
SLIDE 13

Members of the ASF

http://www.apache.org/foundation/members.html

slide-14
SLIDE 14

Board Meeting Minutes

"initial members" "Vice President" apache "svn id" given

slide-15
SLIDE 15

Board Meeting Minutes

"committers" VP/PMC chair location within the data source

slide-16
SLIDE 16

Individual Project "Team" Pages

company and URL

slide-17
SLIDE 17

Team Pages, another version

slide-18
SLIDE 18

Team Pages, yet another version

slide-19
SLIDE 19

End result

everything on FLOSSmole:

  • - in MySQL, and
  • - at our Google Code site
slide-20
SLIDE 20

Sample data

Column Name Key Nullable? Sample Row svn_id PK real_name PK Luis Bernardo web_site Yes datasource_id PK, FK 375 project_name PK Apache XML Graphics role_on_project PK Committer details Yes Attachment BC email Yes

  • rganization

Yes last_updated 2013-04-14 20:25:08

slide-21
SLIDE 21

Dig in...

SELECT * FROM `apache_people_projects` WHERE real_name LIKE "Alice%" AND role_on_project = "PMC Chair" OR "Vice President"; more inspiring examples in the paper...

slide-22
SLIDE 22

Got a better idea?

Make a dataset and send it to FLOSSmole.

slide-23
SLIDE 23

Apache Twitter Dataset

  • Dr. Megan Squire

Elon University, NC FLOSSmole.org

slide-24
SLIDE 24

Goal

Collect a giant list of Twitter handles for Apache-affiliated people and projects.

slide-25
SLIDE 25

Goal

Collect a giant list of Twitter handles for Apache-affiliated people and projects. Timestamp this so I can see new entries.

slide-26
SLIDE 26

Why is Twitter useful?

  • Just another interesting dev artifact
  • Useful for social mining
  • Everybody's doing it
slide-27
SLIDE 27

Data sources

Directly from Apache

  • -project/team web pages
  • -listed personal blogs
slide-28
SLIDE 28

Data sources

Directly from Apache

  • -project/team web pages
  • -listed personal blogs

Indirectly

  • -Search on Twitter using items from my own

Apache Roles dataset

○ names ○ emails ○ svn ○ project names

slide-29
SLIDE 29

From team pages

no standard way of writing this

  • n any team page; lots of

unstructured data

slide-30
SLIDE 30

From individual blogs

write programs to follow the link, try to discern the Twitter name, if any

slide-31
SLIDE 31

Using Twitter itself: API

Let program try to guess Twitter handle using known Apache data

slide-32
SLIDE 32

Using Twitter itself: manual

Curation! Always confirm that this is really an Apache person.

slide-33
SLIDE 33

End Result

everything on FLOSSmole:

  • - in MySQL, and
  • - at our Google Code site

about 500 of these

slide-34
SLIDE 34

Sample data

Column Name Key Nullable? Sample Row 1 Sample Row 2 svn_id FK Yes claudio twitter_screen_n ame PK claudiomartella ApacheShiro real_name Yes Claudio Martella datasource_id PK, FK 372 370 project_name Yes Apache Shiro details Yes http://shiro.apache.org/ last_updated 2013-03-27 00:00:00 2013-01-24 14:01:56

slide-35
SLIDE 35

How to help

  • Anyone can contribute data to FLOSSmole
  • We'll help you store it (even if it's very large)
  • We'll get you a DOI for it