The Dataverse Network: An Infrastructure for Data Sharing Gary King - - PowerPoint PPT Presentation

the dataverse network an infrastructure for data sharing
SMART_READER_LITE
LIVE PREVIEW

The Dataverse Network: An Infrastructure for Data Sharing Gary King - - PowerPoint PPT Presentation

The Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard University (8/14/08 talk at UseR! 2008, Technische Universit at, Dortmund, Germany) (8/14/08 talk at UseR! 2008


slide-1
SLIDE 1

The Dataverse Network: An Infrastructure for Data Sharing

Gary King Institute for Quantitative Social Science Harvard University

(8/14/08 talk at “UseR! 2008”, Technische Universit¨ at, Dortmund, Germany)

()

(8/14/08 talk at “UseR! 2008

/ 21

slide-2
SLIDE 2

Papers

Gary King (Harvard) Dataverse Network 2 / 21

slide-3
SLIDE 3

Papers

Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199.

Gary King (Harvard) Dataverse Network 2 / 21

slide-4
SLIDE 4

Papers

Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Micah Altman and Gary King. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib Magazine, 13, 3/4 (March/April, 2007).

Gary King (Harvard) Dataverse Network 2 / 21

slide-5
SLIDE 5

Papers

Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Micah Altman and Gary King. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib Magazine, 13, 3/4 (March/April, 2007). Kosuke Imai; Gary King; and Olivia Lau. Toward A Common Framework for Statistical Analysis and Development, Journal of Computational and Graphical Statistics, forthcoming. (Zelig)

Gary King (Harvard) Dataverse Network 2 / 21

slide-6
SLIDE 6

Papers

Gary King, An Introduction to the Dataverse Network as an Infrastructure for Data Sharing, Sociological Methods and Research, 32, 2 (November, 2007): 173–199. Micah Altman and Gary King. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib Magazine, 13, 3/4 (March/April, 2007). Kosuke Imai; Gary King; and Olivia Lau. Toward A Common Framework for Statistical Analysis and Development, Journal of Computational and Graphical Statistics, forthcoming. (Zelig) More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

slide-7
SLIDE 7

Infrastructure for Quantitative Data

Gary King (Harvard) Dataverse Network 3 / 21

slide-8
SLIDE 8

Infrastructure for Quantitative Data

Accessibility:

Gary King (Harvard) Dataverse Network 3 / 21

slide-9
SLIDE 9

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives

Gary King (Harvard) Dataverse Network 3 / 21

slide-10
SLIDE 10

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Gary King (Harvard) Dataverse Network 3 / 21

slide-11
SLIDE 11

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Gary King (Harvard) Dataverse Network 3 / 21

slide-12
SLIDE 12

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers

Gary King (Harvard) Dataverse Network 3 / 21

slide-13
SLIDE 13

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions

Gary King (Harvard) Dataverse Network 3 / 21

slide-14
SLIDE 14

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Gary King (Harvard) Dataverse Network 3 / 21

slide-15
SLIDE 15

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Data sets are not like books

Gary King (Harvard) Dataverse Network 3 / 21

slide-16
SLIDE 16

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few years

Gary King (Harvard) Dataverse Network 3 / 21

slide-17
SLIDE 17

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content!

Gary King (Harvard) Dataverse Network 3 / 21

slide-18
SLIDE 18

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content!

Connection to analysis software (like R)

Gary King (Harvard) Dataverse Network 3 / 21

slide-19
SLIDE 19

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author

Problems even with professional archives:

Data in different archives have different identifiers One major archive renumbered all its acquisitions Changes to data are made; identifiers are reused or deaccessioned; old data are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

slide-20
SLIDE 20

What About a Centralized Data Access Solution?

Gary King (Harvard) Dataverse Network 4 / 21

slide-21
SLIDE 21

What About a Centralized Data Access Solution?

Highly desirable when feasible

Gary King (Harvard) Dataverse Network 4 / 21

slide-22
SLIDE 22

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place

Gary King (Harvard) Dataverse Network 4 / 21

slide-23
SLIDE 23

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc.

Gary King (Harvard) Dataverse Network 4 / 21

slide-24
SLIDE 24

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

Gary King (Harvard) Dataverse Network 4 / 21

slide-25
SLIDE 25

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

The Archive gets the credit

Gary King (Harvard) Dataverse Network 4 / 21

slide-26
SLIDE 26

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

The Archive gets the credit Upon questioning: they want credit, control, and visibility

Gary King (Harvard) Dataverse Network 4 / 21

slide-27
SLIDE 27

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit?

Gary King (Harvard) Dataverse Network 4 / 21

slide-28
SLIDE 28

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit? Lack of data citations!)

Gary King (Harvard) Dataverse Network 4 / 21

slide-29
SLIDE 29

What About a Centralized Data Access Solution?

Highly desirable when feasible Works great in astronomy, etc., when data formats are universal, goals are common, and agreements are in place Impossible when data are heterogeneous in format, origin, size, effort needed to collect or analyze, IRB access rules, etc. Why don’t researchers put data in public archives?

The Archive gets the credit Upon questioning: they want credit, control, and visibility (So why don’t they worry about print publishers getting all the credit? Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

slide-30
SLIDE 30

Requirements for Effective Data Sharing Infrastructure

Gary King (Harvard) Dataverse Network 5 / 21

slide-31
SLIDE 31

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web.

Gary King (Harvard) Dataverse Network 5 / 21

slide-32
SLIDE 32

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author

Gary King (Harvard) Dataverse Network 5 / 21

slide-33
SLIDE 33

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met

Gary King (Harvard) Dataverse Network 5 / 21

slide-34
SLIDE 34

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization

Gary King (Harvard) Dataverse Network 5 / 21

slide-35
SLIDE 35

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . .

Gary King (Harvard) Dataverse Network 5 / 21

slide-36
SLIDE 36

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted

Gary King (Harvard) Dataverse Network 5 / 21

slide-37
SLIDE 37

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R,

Gary King (Harvard) Dataverse Network 5 / 21

slide-38
SLIDE 38

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux,

Gary King (Harvard) Dataverse Network 5 / 21

slide-39
SLIDE 39

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD.

Gary King (Harvard) Dataverse Network 5 / 21

slide-40
SLIDE 40

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists

Gary King (Harvard) Dataverse Network 5 / 21

slide-41
SLIDE 41

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection:

Gary King (Harvard) Dataverse Network 5 / 21

slide-42
SLIDE 42

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection:

Journals have liability protection for print; none for data

Gary King (Harvard) Dataverse Network 5 / 21

slide-43
SLIDE 43

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection:

Journals have liability protection for print; none for data In the U.S., if you put data on the web without IRB approval, you are violating federal regulations

Gary King (Harvard) Dataverse Network 5 / 21

slide-44
SLIDE 44

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection:

Journals have liability protection for print; none for data In the U.S., if you put data on the web without IRB approval, you are violating federal regulations (IRB approval must be for data distribution, not merely for the study)

Gary King (Harvard) Dataverse Network 5 / 21

slide-45
SLIDE 45

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2) citations to associated articles, and (3) visibility on the web. Public Distribution, without permission from the author Authorization: fulfill requirements the author originally met Validation: check that data exists, without authorization Persistence Decades from now. . . . Verification: data remains unchanged, even if converted from SPSS to Stata to R, from a PC to a Mac to Linux, and from 8 inch magnetic tape to 5.25 inch floppies to a DVD. Ease of Use Neither editors nor authors employ professional archivists Legal Protection:

Journals have liability protection for print; none for data In the U.S., if you put data on the web without IRB approval, you are violating federal regulations (IRB approval must be for data distribution, not merely for the study) Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

slide-46
SLIDE 46

Rules for Citing Printed Matter

Gary King (Harvard) Dataverse Network 6 / 21

slide-47
SLIDE 47

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Gary King (Harvard) Dataverse Network 6 / 21

slide-48
SLIDE 48

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

First author (last name first)

Gary King (Harvard) Dataverse Network 6 / 21

slide-49
SLIDE 49

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Second author

Gary King (Harvard) Dataverse Network 6 / 21

slide-50
SLIDE 50

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Third author

Gary King (Harvard) Dataverse Network 6 / 21

slide-51
SLIDE 51

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Year

Gary King (Harvard) Dataverse Network 6 / 21

slide-52
SLIDE 52

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Article title

Gary King (Harvard) Dataverse Network 6 / 21

slide-53
SLIDE 53

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Journal (no longer exists)

Gary King (Harvard) Dataverse Network 6 / 21

slide-54
SLIDE 54

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Volume number

Gary King (Harvard) Dataverse Network 6 / 21

slide-55
SLIDE 55

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Issue number

Gary King (Harvard) Dataverse Network 6 / 21

slide-56
SLIDE 56

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Season

Gary King (Harvard) Dataverse Network 6 / 21

slide-57
SLIDE 57

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Pages

Gary King (Harvard) Dataverse Network 6 / 21

slide-58
SLIDE 58

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Special formatting codes

Gary King (Harvard) Dataverse Network 6 / 21

slide-59
SLIDE 59

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Special indentation

Gary King (Harvard) Dataverse Network 6 / 21

slide-60
SLIDE 60

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Citations: rule-based, precise, redundant

Gary King (Harvard) Dataverse Network 6 / 21

slide-61
SLIDE 61

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note on Factor Analyzing Dichotomous Variables: The Case of Political Participation,” Political Methodology, Vol. 4: No. 2 (Spring):

  • Pp. 39–62.

Print Citations Work: authors don’t think publishers get all the credit; cited articles can be found; copyeditors don’t need to see the original to know it exists; the link from citation to print persists

Gary King (Harvard) Dataverse Network 6 / 21

slide-62
SLIDE 62

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Gary King (Harvard) Dataverse Network 7 / 21

slide-63
SLIDE 63

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author Gary King (Harvard) Dataverse Network 7 / 21

slide-64
SLIDE 64

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author 2 Year Gary King (Harvard) Dataverse Network 7 / 21

slide-65
SLIDE 65

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author 2 Year 3 Title Gary King (Harvard) Dataverse Network 7 / 21

slide-66
SLIDE 66

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working Gary King (Harvard) Dataverse Network 7 / 21

slide-67
SLIDE 67

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL:

http://id.thedata.org/hdl%3A1902.4%2F00754)

Gary King (Harvard) Dataverse Network 7 / 21

slide-68
SLIDE 68

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?==

1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL:

http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF) Gary King (Harvard) Dataverse Network 7 / 21

slide-69
SLIDE 69

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics [Distributor];

1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL:

http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF) 7 Standard rules for adding citation elements Gary King (Harvard) Dataverse Network 7 / 21

slide-70
SLIDE 70

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754, UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics [Distributor]; NORC [Producer].

1 Author 2 Year 3 Title 4 Unique Global Identifier: will work after URLs stop working 5 Linked to a Bridge Service (presently a URL:

http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF) 7 Standard rules for adding citation elements Gary King (Harvard) Dataverse Network 7 / 21

slide-71
SLIDE 71

Data to Universal Numeric Fingerprints

Gary King (Harvard) Dataverse Network 8 / 21

slide-72
SLIDE 72

Data to Universal Numeric Fingerprints

                     1 4 4 21 · · · 121 1 2 2 91 · · · 212 1 9 2 72 · · · 104 2 2 2 · · · 321 1 6 2 12 · · · 204 1 9 4 52 · · · 311 3 2 23 · · · 92 2 5 91 · · · 212 5 8 91 · · · 91 1 9 1 72 · · · 104 . . . . . . . . . . . . ... . . . 1 2 2 91 · · · 212                     

Gary King (Harvard) Dataverse Network 8 / 21

slide-73
SLIDE 73

Data to Universal Numeric Fingerprints

                     1 4 4 21 · · · 121 1 2 2 91 · · · 212 1 9 2 72 · · · 104 2 2 2 · · · 321 1 6 2 12 · · · 204 1 9 4 52 · · · 311 3 2 23 · · · 92 2 5 91 · · · 212 5 8 91 · · · 91 1 9 1 72 · · · 104 . . . . . . . . . . . . ... . . . 1 2 2 91 · · · 212                      = ⇒ ZNQRI14053UZq389x0Bffg?==

Gary King (Harvard) Dataverse Network 8 / 21

slide-74
SLIDE 74

Advantages of UNFs

Gary King (Harvard) Dataverse Network 9 / 21

slide-75
SLIDE 75

Advantages of UNFs

UNF is calculated from the content not the file: .

Gary King (Harvard) Dataverse Network 9 / 21

slide-76
SLIDE 76

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, .

Gary King (Harvard) Dataverse Network 9 / 21

slide-77
SLIDE 77

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium, .

Gary King (Harvard) Dataverse Network 9 / 21

slide-78
SLIDE 78

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system,

.

Gary King (Harvard) Dataverse Network 9 / 21

slide-79
SLIDE 79

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software,

.

Gary King (Harvard) Dataverse Network 9 / 21

slide-80
SLIDE 80

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database,

.

Gary King (Harvard) Dataverse Network 9 / 21

slide-81
SLIDE 81

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software.

Gary King (Harvard) Dataverse Network 9 / 21

slide-82
SLIDE 82

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Gary King (Harvard) Dataverse Network 9 / 21

slide-83
SLIDE 83

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

Gary King (Harvard) Dataverse Network 9 / 21

slide-84
SLIDE 84

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content

Gary King (Harvard) Dataverse Network 9 / 21

slide-85
SLIDE 85

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data

Gary King (Harvard) Dataverse Network 9 / 21

slide-86
SLIDE 86

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization

Gary King (Harvard) Dataverse Network 9 / 21

slide-87
SLIDE 87

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered, even if journal doesn’t keep a copy

Gary King (Harvard) Dataverse Network 9 / 21

slide-88
SLIDE 88

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNF regardless of changes in computer hardware, storage medium,

  • perating system, statistical software, database, or spreadsheet

software. Cryptographic technology: any change in data content changes the

  • UNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content OK to distribute for highly sensitive, confidential, or proprietary data Copyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered, even if journal doesn’t keep a copy Future researchers can quickly check that they have the same data as used by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

slide-89
SLIDE 89

Web 2.0 Terminology

Gary King (Harvard) Dataverse Network 10 / 21

slide-90
SLIDE 90

Web 2.0 Terminology

Software: find CD, install locally,

Gary King (Harvard) Dataverse Network 10 / 21

slide-91
SLIDE 91

Web 2.0 Terminology

Software: find CD, install locally, hit next,

Gary King (Harvard) Dataverse Network 10 / 21

slide-92
SLIDE 92

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next,

Gary King (Harvard) Dataverse Network 10 / 21

slide-93
SLIDE 93

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . .

Gary King (Harvard) Dataverse Network 10 / 21

slide-94
SLIDE 94

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software)

Gary King (Harvard) Dataverse Network 10 / 21

slide-95
SLIDE 95

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software) Host: The computers where the web application software runs (universities, archives, libraries)

Gary King (Harvard) Dataverse Network 10 / 21

slide-96
SLIDE 96

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . . Web application software: no installation; load web browser and run (Dataverse Network Software) Host: The computers where the web application software runs (universities, archives, libraries) Virtual host: Where the web application software seems to run, but does not (web sites of: authors, journals, granting agencies, research centers, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

slide-97
SLIDE 97

Your dataverse branded as your web site but served by the Dataverse Network, therefore re- quiring no local installation and providing an enormous array of services Your web site

Dataverse Network™

po wered by the Pr oject

http://www.peterson.com http://dvn.iq.harvard.edu/peterson

Gary King (Harvard) Dataverse Network 11 / 21

slide-98
SLIDE 98

Dataverse Network™

po wered by the Pr oject

Gary King (Harvard) Dataverse Network 12 / 21

slide-99
SLIDE 99

Dataverse Network™

po wered by the Pr oject

Gary King (Harvard) Dataverse Network 13 / 21

slide-100
SLIDE 100

Dataverse Network™

po wered by the Pr oject

Gary King (Harvard) Dataverse Network 14 / 21

slide-101
SLIDE 101

Your dataverse branded as your web site but served by the Dataverse Network, therefore re- quiring no local installation and providing an enormous array of services Your web site

Dataverse Network™

po wered by the Pr oject

Gary King (Harvard) Dataverse Network 15 / 21

slide-102
SLIDE 102

Gary King (Harvard) Dataverse Network 16 / 21

slide-103
SLIDE 103

Your Dataverse

Gary King (Harvard) Dataverse Network 17 / 21

slide-104
SLIDE 104

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . )

Gary King (Harvard) Dataverse Network 17 / 21

slide-105
SLIDE 105

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data

Gary King (Harvard) Dataverse Network 17 / 21

slide-106
SLIDE 106

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site

Gary King (Harvard) Dataverse Network 17 / 21

slide-107
SLIDE 107

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse

Gary King (Harvard) Dataverse Network 17 / 21

slide-108
SLIDE 108

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand

Gary King (Harvard) Dataverse Network 17 / 21

slide-109
SLIDE 109

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand High acceptability: experiments indicate > 90% uptake for authors

Gary King (Harvard) Dataverse Network 17 / 21

slide-110
SLIDE 110

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand High acceptability: experiments indicate > 90% uptake for authors Reuse: same data may appear on different dataverses

Gary King (Harvard) Dataverse Network 17 / 21

slide-111
SLIDE 111

Your Dataverse

Full service virtual archive, with numerous data services (citation, metadata, archiving, subsetting, conversion, translation, analysis, . . . ) List of your data, or your view of the universe of data Branded as yours: with the look and feel of your site Easy to setup: give DVN your style, and include a link to your new dataverse Easy to manage: no software or hardware installation, backups, worry about archiving standards, or data format transations; still exists if you move; easy to rebrand High acceptability: experiments indicate > 90% uptake for authors Reuse: same data may appear on different dataverses Results: Articles with data available have twice the impact factor! (with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

slide-112
SLIDE 112

Dataverse Uses

Gary King (Harvard) Dataverse Network 18 / 21

slide-113
SLIDE 113

Dataverse Uses

Authors, for their data or their view of the universe of data

Gary King (Harvard) Dataverse Network 18 / 21

slide-114
SLIDE 114

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives

Gary King (Harvard) Dataverse Network 18 / 21

slide-115
SLIDE 115

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download

Gary King (Harvard) Dataverse Network 18 / 21

slide-116
SLIDE 116

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis

Gary King (Harvard) Dataverse Network 18 / 21

slide-117
SLIDE 117

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data

Gary King (Harvard) Dataverse Network 18 / 21

slide-118
SLIDE 118

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies

Gary King (Harvard) Dataverse Network 18 / 21

slide-119
SLIDE 119

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers

Gary King (Harvard) Dataverse Network 18 / 21

slide-120
SLIDE 120

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects

Gary King (Harvard) Dataverse Network 18 / 21

slide-121
SLIDE 121

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Academic departments, universities, data centers, libraries

Gary King (Harvard) Dataverse Network 18 / 21

slide-122
SLIDE 122

Dataverse Uses

Authors, for their data or their view of the universe of data Journals, for replication data archives Future Researchers: browse or search for a dataverse or dataset; forward citation search; verification via UNFs; subsetting; read metdata, abstract, & documentation; check for new versions; translate format; statistical analyses; download Teachers, a list or for in depth analysis Sections of scholarly organizations, to organize existing data Granting agencies Research centers Major Research Projects Academic departments, universities, data centers, libraries Data archives

Gary King (Harvard) Dataverse Network 18 / 21

slide-123
SLIDE 123

The Universe of Data meets the Universe of Methods

Gary King (Harvard) Dataverse Network 19 / 21

slide-124
SLIDE 124

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

Gary King (Harvard) Dataverse Network 19 / 21

slide-125
SLIDE 125

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first

Gary King (Harvard) Dataverse Network 19 / 21

slide-126
SLIDE 126

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality

Gary King (Harvard) Dataverse Network 19 / 21

slide-127
SLIDE 127

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Gary King (Harvard) Dataverse Network 19 / 21

slide-128
SLIDE 128

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

Gary King (Harvard) Dataverse Network 19 / 21

slide-129
SLIDE 129

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods

Gary King (Harvard) Dataverse Network 19 / 21

slide-130
SLIDE 130

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions)

Gary King (Harvard) Dataverse Network 19 / 21

slide-131
SLIDE 131

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method

Gary King (Harvard) Dataverse Network 19 / 21

slide-132
SLIDE 132

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

Gary King (Harvard) Dataverse Network 19 / 21

slide-133
SLIDE 133

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Gary King (Harvard) Dataverse Network 19 / 21

slide-134
SLIDE 134

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUI

Gary King (Harvard) Dataverse Network 19 / 21

slide-135
SLIDE 135

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUI Greatly reduced time from methods development to widespread use

Gary King (Harvard) Dataverse Network 19 / 21

slide-136
SLIDE 136

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUI Greatly reduced time from methods development to widespread use Easy for applied researchers who don’t use R

Gary King (Harvard) Dataverse Network 19 / 21

slide-137
SLIDE 137

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first Highly diverse examples, syntax, documentation, and quality Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods Users incorporate original packages a simple model description language (and R bridge functions) Result: Unified Syntax, the same 3 commands to use any method Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUI Greatly reduced time from methods development to widespread use Easy for applied researchers who don’t use R (GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

slide-138
SLIDE 138

How to participate

Gary King (Harvard) Dataverse Network 20 / 21

slide-139
SLIDE 139

How to participate

To increase citations to your data (& web visibility), choose:

Gary King (Harvard) Dataverse Network 20 / 21

slide-140
SLIDE 140

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

Gary King (Harvard) Dataverse Network 20 / 21

slide-141
SLIDE 141

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

Gary King (Harvard) Dataverse Network 20 / 21

slide-142
SLIDE 142

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Gary King (Harvard) Dataverse Network 20 / 21

slide-143
SLIDE 143

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

Gary King (Harvard) Dataverse Network 20 / 21

slide-144
SLIDE 144

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

Gary King (Harvard) Dataverse Network 20 / 21

slide-145
SLIDE 145

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

Gary King (Harvard) Dataverse Network 20 / 21

slide-146
SLIDE 146

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

Gary King (Harvard) Dataverse Network 20 / 21

slide-147
SLIDE 147

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, branded as yours, citations for all your data)

  • r install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

slide-148
SLIDE 148

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice)

Gary King (Harvard) Dataverse Network 21 / 21

slide-149
SLIDE 149

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project)

Gary King (Harvard) Dataverse Network 21 / 21

slide-150
SLIDE 150

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project) Database: we use PostgreSQL (can substitute others)

Gary King (Harvard) Dataverse Network 21 / 21

slide-151
SLIDE 151

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (team picked for JavaOne; Sun engineers regularly call for advice) Application server: GlassFish (wrote press release on our project) Database: we use PostgreSQL (can substitute others) Statistical computing: R and Zelig

Gary King (Harvard) Dataverse Network 21 / 21