@Project_TIER www.projecttier.org Making Replication Documentation - PowerPoint PPT Presentation

@Project_TIER www.projecttier.org Making Replication Documentation Useful To You and Others: Purposes, Principles and Practices Richard Ball Tomas Dvorak Professor of Economics, Haverford College Professor of Economics, Union College Director, Project TIER 2015-16 TIER Faculty Fellow Cornell University Department of Applied Economics September 14-15, 2017 Project TIER is supported by the Alfred P. Sloan Foundation.

@Project_TIER www.projecttier.org DIMENSIONS OF THE RESEARCH TRANSPARENCY MOVEMENT IN THE SOCIAL SCIENCES Computational reproducibility Experimental replicability Project registration and pre-analysis plans P-hacking Publication Bias

@Project_TIER www.projecttier.org Resources for learning more: Ted Miguel’s spring 2015 graduate course on research transparency — syllabus and videos of 14 lectures http://www.bitss.org/education/economics-270d/ Miguel and Christensen, forthcoming in JEL http://emiguel.econ.berkeley.edu/assets/miguel_research/78/Tr ansparency-JEL-2016-12-20.pdf BITSS MOOC https://www.bitss.org/events/mooc-transparent-and-open- social-science/

@Project_TIER www.projecttier.org Key initiatives: Berkeley Initiative for Transparency in the Social Sciences www.bitss.org Center for Open Science https://cos.io

@Project_TIER www.projecttier.org COMPUTATIONAL REPRODUCIBILITY OF SOCIAL SCIENCE RESEARCH: HISTORICAL CONTEXT Serious problems recognized decades ago, and despite some progress, they persist Concern about the reproducibility of published economic research was sparked by a 1986 study known as the “ Journal of Money, Credit and Banking ( JMCB ) Project.” Dewald, William G., Jerry G. Thursby, and Richard G. Anderson (1986). “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project.” American Economic Review 76(4):587-603.

@Project_TIER www.projecttier.org The JMCB Project Editors of the JMCB attempted to reproduce the statistical results reported in a large sample of the empirical papers published in that journal in the preceding five years. Requests for replication data and code were sent to authors of 154 papers. In 37 cases (24%), the authors did not reply to the request. In 24 cases (16%), the authors replied, but either refused to send data and code, or said they would but never did. In 3 cases (2%), the authors said they could not provide the data because it was proprietary or confidential. In the remaining 90 cases (58%), the authors sent some information in response to the request.

@Project_TIER www.projecttier.org The JMCB Project (continued) Out of the 90 submissions received, the first 54 were investigated for completeness and accuracy. Out of the 54 submissions that were investigated, the documentation provided by the authors of the papers successfully replicated the results of their papers in only 8 (15%) of the cases. The remaining 46 (85%) of the papers could not be replicated because the information the authors submitted was insufficiently complete or precise.

@Project_TIER www.projecttier.org Conclusions of the JMCB Project The authors of the JMCB study concluded: “Our findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.” and “…we recommend that journals require the submission of programs and data at the time empirical papers are submitted. The description of sources, data transformations, and econometric estimators should be so exact that another researcher could replicate the study and, it goes without saying, obtain the same results.”

@Project_TIER www.projecttier.org Subsequent studies show problems persist. A few examples: McCullough, Bruce D., Kerry Anne McGeary, and Teresa D. Harrison (2006). “Lessons from the JMCB Archive,” Journal of Money, Credit and Banking 38(4): 1093- 1107. McCullough, Bruce D., Kerry Anne McGeary, and Teresa D. Harrison (2008). “Do Economics Journal Archives Promote Replicable Research?” Canadian Journal of Economics 41(4): 1406-1420. Hoeffler, Jan (2014). “Teaching Replication in Quantitative Empirical Economics.” Presented at the Meetings of the European Economic Association and the Econometric Society, Toulouse, France, August 28. http://www.eea-esem.com/eea- esem/2014/prog/viewpaper.asp?pid=3108. Chang, And rew C., and Phillip Li (2015). “ Is Economics Research Replicable? Sixty Published Papers from Thi rteen Journals Say ‘Usually Not.’” Finance and Economics Discussion Series 2015-083. Washington: Board of Governors of the Federal Reserve System, http://dx.doi.org/10.17016/FEDS.2015.083.

@Project_TIER www.projecttier.org Fixing reproducibility problems means fixing replication documentation Better guidelines and standards need to be formulated And then somehow researchers need to be induced to adopt them

@Project_TIER www.projecttier.org But haven’t a lot of standards and guidelines for replication documentation been formulated already? Journals have policies for replication archives (e.g., AEA journalshttps://www.aeaweb.org/journals/policies/data- availability-policy) DA-RT: https://www.dartstatement.org/ TOPS: https://cos.io/our-services/top-guidelines/ BITSS manual: http://www.bitss.org/resources/manual-of-best- practices/

@Project_TIER www.projecttier.org ALSO: TIER Protocol: http://www.projecttier.org/tier-protocol/ DRESS Protocol: http://www.projecttier.org/tier- protocol/dress-protocol/

@Project_TIER www.projecttier.org PURPOSES OF REPLICATION DOCUMENTATION Not catching mistakes Rather: Exploration Experimentation Extension

@Project_TIER www.projecttier.org PRINCIPLES Complete —“soup -to- nuts” Portable The “seriously, folks” principle

@Project_TIER www.projecttier.org PRACTICES Establish a fixed folder structure Pay attention to the working directory Use relative directory paths

@Project_TIER www.projecttier.org Let’s see some examples: A toy demo: The midlife crisis paper A real research paper : Joseph Price & Justin Wolfers, 2010. "Racial Discrimination Among NBA Referees," The Quarterly Journal of Economics, MIT Press, vol. 125(4), pages 1859-1887, November. Both examples use a Stata/Word cut-and-past approach.

@Project_TIER www.projecttier.org Folder Structure Figure out what works for you, but generally: --one main project folder --pdf of paper --subfolder for data --subfolder for code --subfolder for supporting information (like citations of sources and codebooks for original data) --read-me file

@Project_TIER www.projecttier.org That whole packet is the medium of communication The idea is that while someone is working with your rep doc, they install the whole packet onto their computer — keep the folder structure and file organization intact while they work with your stuff

@Project_TIER www.projecttier.org In Data folder : assuming data are public — need original data files — before you have processed them at all, in whatever format they were in when you first got them (or else use “netuse” if there is a stable site your software can grab the files from) ---What about intermediate data files? ---What about analysis data files?

@Project_TIER www.projecttier.org In code folder: soup to nuts: commands that read the data from the original data files all the way to command that generate the figures, tables and other results you report in your paper — and all processing in between all one long script? separate for separate stages of analysis (import, process, analyze)? different scripts for different data sources? --Put tons of comments in code -----literate programming??

@Project_TIER www.projecttier.org Pay attention to the working directory: --for each command file, choose a folder that should be designated as the wd when the user runs the command file, and put a comment at the top of the do file indicating which folder that is --suggested conventions: ---- always designate the main project folder that contains all the rep doc as the working directory -----avoid using change directory commands ---- instead, use relative directory paths

@Project_TIER www.projecttier.org Making Replication Documentation - PowerPoint PPT Presentation

@Project_TIER www.projecttier.org Making Replication Documentation Useful To You and Others: Purposes, Principles and Practices Richard Ball Tomas Dvorak Professor of Economics, Haverford College Professor of Economics, Union College

An Overview of Tier 4 Visas for Departmental Administrators Julia Jago Tier 4 Visas Officer 2.

WHAT ARE TIER 1, 2, 3 WATERS Tier 1 impaired Tier 2 fishable, swimmable, drinkable

Tier 3 Vehicle and Fuel Standards February 2016 1 Overview Overview of the Tier 3 Program

FCPS FY 2010 Potential Reductions Tier 1 Tier 2 Tier 3 INSTRUCTIONAL 1. Academics 1.

Tier Two Report WHAT IS THE TIER TWO TIER TWO REPORT? EMERGENCY AND HAZARDOUS CHEMICAL A

WEC Tier 3 Annual Plan 2018 Vermont System Planning Committee 24 January 2018 WEC 2018 Tier 3

The 4-tier model for CAMHS Very specialist Services, often Tier 4 children away from home

CPSC 875 CPSC 875 John D McGregor John D. McGregor C 8 More Design 3 tier 3 tier Variations

Tier II and You Utilizing EPCRA Tier II Reports to Protect Your Community Kansas LEPC

Tier 2 Fidelity Data: Strengthening your Tier 2 PBIS Implementation: Using Fidelity Measures to

Tier 4 Review Findings Margaret Murphy 5 November 2014 www.england.nhs.uk Commissioning Tier 4

OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one

Lunch is proudly sponsored by: Business Unit/Tier 2 (Mandatory) | Market/Division/Tier 3

3-Tier Web Architectures Ramakrishnan & Gehrke, Chapter 7 www.w3schools.com

Monitoring Your CMS Tier 3 Site Joel W. Walker Sam Houston State University OSG and CMS Tier 3

Becoming a Top Tier Global Becoming a Top Tier Global Investment Bank Nomura Holdings, Inc.

TX-NM Network Gathering: August 1315, 2015 Generations and the Church Bill Young MCC Austin

Great wits are sure to madness near allied / And thin partitions do their bounds divide.

Summary of mid-term feedback Some interesting comments from you. Candy related comments: -

Rural Communities Climate Resilience Webinar Series U.S. Department of Housing and Urban

Its a question of communication clubvita.net/glossary @ClubVita #longevitylexicon 11 December

New approaches to the measurement of progress Laurence Roope HERC, University of Oxford 6 th

Aviation Market Overview Martin Sutton, Executive Director Standard Chartered Bank Aviation

The problem Consider the following rules specifying robots behavior: CUGS: Logic II 1 if the

@Project_TIER www.projecttier.org Making Replication Documentation - PowerPoint PPT Presentation

@Project_TIER www.projecttier.org Making Replication Documentation Useful To You and Others: Purposes, Principles and Practices Richard Ball Tomas Dvorak Professor of Economics, Haverford College Professor of Economics, Union College

An Overview of Tier 4 Visas for Departmental Administrators Julia Jago Tier 4 Visas Officer 2.

WHAT ARE TIER 1, 2, 3 WATERS Tier 1 impaired Tier 2 fishable, swimmable, drinkable

Tier 3 Vehicle and Fuel Standards February 2016 1 Overview Overview of the Tier 3 Program

FCPS FY 2010 Potential Reductions Tier 1 Tier 2 Tier 3 INSTRUCTIONAL 1. Academics 1.

Tier Two Report WHAT IS THE TIER TWO TIER TWO REPORT? EMERGENCY AND HAZARDOUS CHEMICAL A

WEC Tier 3 Annual Plan 2018 Vermont System Planning Committee 24 January 2018 WEC 2018 Tier 3

The 4-tier model for CAMHS Very specialist Services, often Tier 4 children away from home

CPSC 875 CPSC 875 John D McGregor John D. McGregor C 8 More Design 3 tier 3 tier Variations

Tier II and You Utilizing EPCRA Tier II Reports to Protect Your Community Kansas LEPC

Tier 2 Fidelity Data: Strengthening your Tier 2 PBIS Implementation: Using Fidelity Measures to

Tier 4 Review Findings Margaret Murphy 5 November 2014 www.england.nhs.uk Commissioning Tier 4

OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one

Lunch is proudly sponsored by: Business Unit/Tier 2 (Mandatory) | Market/Division/Tier 3

3-Tier Web Architectures Ramakrishnan &amp; Gehrke, Chapter 7 www.w3schools.com

Monitoring Your CMS Tier 3 Site Joel W. Walker Sam Houston State University OSG and CMS Tier 3

Becoming a Top Tier Global Becoming a Top Tier Global Investment Bank Nomura Holdings, Inc.

TX-NM Network Gathering: August 1315, 2015 Generations and the Church Bill Young MCC Austin

Great wits are sure to madness near allied / And thin partitions do their bounds divide.

Summary of mid-term feedback Some interesting comments from you. Candy related comments: -

Rural Communities Climate Resilience Webinar Series U.S. Department of Housing and Urban

Its a question of communication clubvita.net/glossary @ClubVita #longevitylexicon 11 December

New approaches to the measurement of progress Laurence Roope HERC, University of Oxford 6 th

Aviation Market Overview Martin Sutton, Executive Director Standard Chartered Bank Aviation

The problem Consider the following rules specifying robots behavior: CUGS: Logic II 1 if the

3-Tier Web Architectures Ramakrishnan & Gehrke, Chapter 7 www.w3schools.com