Reproducible Research with Stata using version control, GitHub, and - PowerPoint PPT Presentation

Reproducible Research with Stata Reproducible Research with Stata using version control, GitHub, and MarkDoc E. F. Haghish Nov. 17th, 2016

Reproducible Research with Stata Reproducible Analysis Overview Definition Figure 1: Reproducible Analysis

Reproducible Research with Stata to do so, we will need: the same version of software, data, and code, (and the same OS, depending on the software) and a literate programming software Figure 2: version control and literate programming also imply coding the analysis

Reproducible Research with Stata the analysis should be reproduced with identical software we should be able to access the software without requesting it from the author. the data, code, and software should be accessible publicly all versions of the software used for running the analysis should be accessible. archiving older versions becomes crucial. For example, Statistical Software Component (SSC) does not archive di ff erent versions of a package, in contrast to CRAN for developing computational programs, version control becomes much more important for fixing bugs and cooperating on the software

Reproducible Research with Stata Concerns about package archiving While the idea and importance of archiving versions is clear, some users may have concerns such as: 1 having access to di ff erent versions of a software might cause confusion for users, making them install old software 2 that can cause confusion for users from where they should install their software? 3 some would argue that we simply don’t need to make archives of older software because there is no use in that 4 software update fixes bugs. what is the point of using previous versions if we knew they are buggy? 5 what is the point of reproducing the same results, using the same software version, when we know they are bugged?

Reproducible Research with Stata GitHub for Stata community GitHub is a general platform that is used for variety of purposes: 1 sharing data 2 sharing code 3 developing and collaborating software 4 hosting software for R, Stata, . . . 5 archiving software versions 6 documenting software, using GitHub WiKi 7 reading code within browser

Reproducible Research with Stata Learning GitHub Using GitHub has a learning curve Using the GitHub desktop can considerable eliminate the learning curve. GitHub has a desktop GUI for Windows and Mac. Linux users have several third-party software options I recommend SmartGit for Linux users When using GitHub, you still write and update your code in your computer. Once you have made a change, you can register your commit on your machine (via the App or command-line), and when you are through, you can push it to the repository on GitHub website. Therefore the workflow for programming does not change much.

Reproducible Research with Stata Figure 3: a screenshot of the github package on my local drive, where programming takes place

Reproducible Research with Stata Figure 4: once you’re done with coding, commit the changes and push them to GitHub

Reproducible Research with Stata Figure 5: viewing the history of changes

Reproducible Research with Stata The github package It’s similar to the ssc command in Stata. But it is used for searching, installing, and uninstalling Stata packages from GitHub. The package can be installed from GitHub using: . net install github, from("https://raw.githubusercontent.com/haghish/github/master/") such a command is usually required for installing any Stata package on GitHub. But github command makes life easier in many ways

Reproducible Research with Stata Examples let’s search for a package named markdoc on GitHub using github search command followed by the keyword this searches first for all repositories named markdoc that have Stata as their language and are installable packages (have the pkg and toc files in the repository) the output shows a description of the package, along with its dependencies which will be installed automatically . github search markdoc -------------------------------------------------------------------------------- repository Author Install Description -------------------------------------------------------------------------------- MarkDoc haghish Install A literate programming package for Stata 3937k which develops dynamic documents, slides, and help files in various formats homepage: http://haghish.com/markdoc Hits:49 Stars:5 Lang:Stata (Depend) --------------------------------------------------------------------------------

Reproducible Research with Stata The github command allows you to specify the dependencies of the package and install them automatically after the package. the dependencies are simply a file named dependency.do that includes the code for installing a particular version of the package or alternatively, the latest version of it . But it allows the user to define a particular version of the dependencies, to ensure the package works as expected by the author and recent development of the dependency packages do not yield unexpected results You can install the package with a mouse click or, type the github install followed by username / repository names: . github install haghish/markdoc

Reproducible Research with Stata executing the command shows that markdoc installs weaver package and weaver package installs another package called statax which is its own dependency having the option to install dependencies, allows the authors to break their packages into pieces, which allows others to rely on the smaller pieces in their programs. Having the option version, makes it safe to use a particular version of the package. that also means more citations

Reproducible Research with Stata The versions are in fact GitHub releases, which are so easy to make Figure 6: Viewing the software releases on GitHub

Reproducible Research with Stata clicking on the releases button will open a page where all the previous releases are listed, the fixed bugs are explained, and you can download the old as well as the newest source code Figure 7: Creating a new release

Reproducible Research with Stata Figure 8: publishing the new release

Reproducible Research with Stata Accessing releases via Stata Once a new release is made on GitHub or the package master is updated, the new version becomes available for all users instantly. You can view all of the available versions using the github query command followed by the username / repository

Reproducible Research with Stata . github query haghish/markdoc ---------------------------------------- Version Release Date Install ---------------------------------------- 3.8.8 2016-11-16 Install 3.8.7 2016-11-10 Install 3.8.6 2016-11-10 Install 3.8.5 2016-10-16 Install 3.8.4 2016-10-13 Install 3.8.3 2016-10-03 Install 3.8.2 2016-10-01 Install 3.8.1 2016-09-29 Install 3.8.0 2016-09-24 Install 3.7.9 2016-09-20 Install 3.7.8 2016-09-19 Install 3.7.7 2016-09-18 Install 3.7.6 2016-09-13 Install 3.7.5 2016-09-08 Install 3.7.4 2016-09-07 Install 3.7.3 2016-09-06 Install 3.7.2 2016-09-05 Install 3.7.0 2016-08-23 Install 3.6.9 2016-08-16 Install 3.6.7 2016-02-27 Install ----------------------------------------

Reproducible Research with Stata Clicking on the install text would install any of the previous versions Alternatively we can use the version( tag ) option to install any version. The tag is the version that we specify for each release. For example, version 3.8.7 of MarkDoc (old version) can be installed as follows: . github install haghish/markdoc, version(3.8.7) the same procedure can be used in the dependency.do file to install a particular version of a package

Reproducible Research with Stata Other github subcommands you can uninstall a package, which only requires the repository name . github uninstall markdoc you can check whether a repository is installable? This will confirms that the packagename . pkg and the stata.toc files exist in the repository. The github search command also carries out this process and only shows the install text if the package is installable . github check haghish/markdoc stata.toc file was found pkg file was found haghish/markdoc is installable

Reproducible Research with Stata You can view the Stata packages that are popular and you have plenty of options to search di ff erent repositories: try: . github hot . github hot, n(30) . github hot, all . github hot, all language(Python) the data is available on GitHub: https://raw.githubusercontent.com/haghish/github/ master/data/archive.dta you can build a fresh archive of Stata repositories on GitHub anytime. and it takes about 10 minutes to be executed. The command will create a dataset with the given name. . github list stata, language(all) in(all) all save(archive) append

Reproducible Research with Stata Literate Programming Reproducible documentation Idea Figure 9: Literate Programming Process

Reproducible Research with Stata The main idea is to make the code more readable and well-written by preparing it for others to read and comprehend it. The document is only a byproduct . Literate programming must not be reduced to generating dynamic document ! It is meant to: 1 make reading and comprehending source code and data analysis code easier by including the documentation 2 make the analysis and documentation reproducible 3 make writing documentation easier

Reproducible Research with Stata using version control, GitHub, and - PowerPoint PPT Presentation

Reproducible Research with Stata Reproducible Research with Stata using version control, GitHub, and MarkDoc E. F. Haghish Nov. 17th, 2016 Reproducible Research with Stata Reproducible Analysis Overview Definition Figure 1: Reproducible

Reproducible Research Using Stata L. Philip Schumm Ronald A. Thisted Department of Health

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Reproducible and automated reporting using Stata Kristin MacDonald Director of Statistical

Reproducible builds in Debian and everywhere Lunar lunar@debian.org Libre Software Meeting

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Using Stata for data management and reproducible research Christopher F Baum Boston College and

Reproducible Research Practices for Economists Mindy L. Mallory November 10, 2017 Mindy L.

Reproducible research in practice ifgi Institute for Geoinformatics University of Mnster

Reproducible research in practice M ADAGASCAR software package Sergey Fomel Jackson School of

Mayfly Reproducible Research in Minutes Reproducible Research is

Stata: Basics, Shortcuts, and Integration with Introduction LaTeX Stata Syntax and Shortcuts

Meta-analysis using Stata Yulia Marchenko Executive Director of Statistics StataCorp LLC 2019

Analyzing interval-censored survival-time data in Stata Xiao Yang Senior Statistician and

Robust Statistics using Stata First Belgian Stata Users Meeting Vincenzo Verardi Fnrs, UNamur,

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Canadian Stata Users Group

R A D I X S O R T Radix Sort 147 dnc CS 16: Radix Sort Radix Sort Unlike other sorting

Prometheus Histograms Past, Present, and Future Bjrn Beorn Rabenstein PromCon EU,

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University

Observations to Models Lecture 3 Exploring likelihood spaces with CosmoSIS Joe Zuntz

Introduction to the HAMT: Opportunity for T cl 2017 Tcl Conference Don Porter Tcl/Tk Release

Multilevel Methods for Forward and Inverse Ice Sheet Modeling Tobin Isaac Institute for

Introduction to Algorithms Introduction to Algorithms Insertion sort: Insertion sort:

Modeling Sudoku puzzles with Python Sean Davis Matthew Henderson Andrew Smith June 30, 2010