Finding packages, project organization Steve Bagley - PowerPoint PPT Presentation

Finding packages, project organization Steve Bagley somgen223.stanford.edu 1

How to find R packages • There are over 15,000 packages available for R. • That’s great, but how do you find what you want? • Task Views (https://cran.r-project.org/web/views/): human-curated lists of packages for a given area. • METACRAN (https://r-pkg.org/): provides some more organization to CRAN. somgen223.stanford.edu 2

What is stored where • R servers (usually CRAN, but also BioConductor). Contain the packages. Also, some packages live on the developer’s website. • Your computer . Contains the packages you have installed. • Your (project) directory . Contains script (program) files, data files, output files. • The workspace . Contains the current variable and function bindings, and packages that have been loaded since starting R. somgen223.stanford.edu 3

Where to put files for your project • The most natural organization of a project uses the tree-like structure of a hierarchical file system. • For each project, put all the scripts/code in one directory or sub-directory. • R (and RStudio) have a notion of the current working directory, which can be set through the graphical user interface, or using R commands (setwd, getwd). somgen223.stanford.edu 4

The workspace • The workspace contains all the functions and variables that you have defined in it (but not deleted from it). • You can save the workspace contents, close R, and then restart it, restoring all of the workspace contents. • When you restart, all of your data will be there. But you still will need to reload all the packages you use. • You might not want to rely on save the workspace for two reasons: • It may be easier to start over with a fresh workspace than to try undoing some complicated error. • You want a written record of reproducible commands (scripts) to create the state, not just the state itself. somgen223.stanford.edu 5

Project organization: RStudio • Use the Project menu (upper right corner) to create/open a project. somgen223.stanford.edu 6

Project organization: directories 1. Make a project directory: .../dolphin/ 2. Make a subdirectory for the input data files: .../dolphin/data 3. Make a subdirectory for your code/script files: .../dolphin/src 4. Make a subdirectory for the output files: .../dolphin/output 5. Make a subdirectory for all pdfs: .../dolphin/figures 6. Make a subdirectory for any papers: .../dolphin/papers somgen223.stanford.edu 7

How to approach a new dataset 1. Whenever you get a new dataset, record when you got it, where you got it from. 2. Read the raw data from a file or url. 3. Fix column names to make all subsequent manipulation easier. 4. Figure out the meaning of the data in each column. You may have received a description of the data (“metadata”), or something called a “data dictionary”. If not, you may need to apply your knowledge of the domain and some common sense. 5. Start testing your assumptions about the data (and about the metadata, which can be wrong). Look for illegal values (completely out of bounds), outliers (possible, but unlikely), missing values, typos, coding errors, inconsistencies. 6. In general, try to fix the problems by writing a sequence of R expressions (script or R Markdown). This makes your work reproducible: you can rerun the script, or use it on the next version of the data. Try to never modify by hand the source files containing the original data. somgen223.stanford.edu 8

How to approach data visualization • Compared to what : decide how to make a meaningful comparison. • This could be: treatment vs control, compared to baseline, compared to some simple null model, trend over time, trend over space. • Then display the data to make this comparison visually salient. somgen223.stanford.edu 9

How to start the exploration • Make some assumptions, even very simple or straightforward ones, about the data. Sometimes these are explicitly stated by whoever gives you the data. (They might be wrong.) • See if those assumptions hold true. • Iterate, trying to build up an explanation (model) in your head. • Focus on understanding, make the graph pretty later. somgen223.stanford.edu 10

Saving figures • Use R Markdown to make a computational lab notebook. This will show your entire analysis workflow, and can include data frame tables and figures. • You can write a figure out to a file. somgen223.stanford.edu 11

## This opens a pdf file for writing. pdf ("../figures/fig27.pdf") ## This plot is sent to the file ggplot (iris, aes (Petal.Width, Petal.Length)) + geom_point ( aes (color = Species)) ## This closes the file dev.off () Saving figures • ../ is the parent directory of the current directory. • ../figures/ is the sibling directory, assuming we are in src . • pdf writes pdf files. • postscript writes postscript files. • png writes png files. • jpeg writes jpeg files. • tiff writes tiff files. • svg writes svg files. somgen223.stanford.edu 12

Kinds of data frames somgen223.stanford.edu 13

data.frame tibble data.table data.frame vs tibble vs data.table Type Package Notes built-in slow for big data, some odd defaults tidyverse used throughout tidyverse, fast enough data.table very fast, syntax is powerful/complex somgen223.stanford.edu 14

Finding packages, project organization Steve Bagley - PowerPoint PPT Presentation

Finding packages, project organization Steve Bagley somgen223.stanford.edu 1 How to find R packages There are over 15,000 packages available for R. Thats great, but how do you find what you want? Task Views

Building DICE Building DICE Building DICE Building DICE Packages Packages Packages Packages

Home Care Packages Program 1 Key points Home Care Packages More packages Four levels

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Extending R through packages: Theres a package for everything R packages are available on CRAN

A new way to pick up your packages How many student packages arrive on campus annually? A.

Parcel Pro Mockup Presentation 2.009 Pink B Packages get lost Packages get stolen

MATLAB 1 Mathematical Software Symbolic Math Packages - This amorphous set of packages can

4 OO Package Design Principles 4.1 Packages Introduction 4.2 Packages in UML 4.3 Three

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

CXXR and Add-on Packages Andrew Runnalls School of Computing, University of Kent, UK Outline

News from EDOS: Finding Outdated Packages Ralf Treinen PPS, Universit e Paris Diderot Debconf

Corridor Improvements Analysis Process Develop Option Apply Screening Select Analyze Packages

CORPORATE EVENT PACKAGES BANGKOK CONTENTS - Description - Benefits - Packages - Package

pkgsrc meets pkg-ng Generating pkg-ng packages from pkgsrc pkgsrcCon Berlin, July 4 th 2015

Writing and Building R Packages John Fox McMaster University Hamilton, Ontario, Canada IQS

Introduction to roxygen2 Aime Gott Education Practice Lead, Mango Solutions DataCamp

Plan Motivations (to combine navigation and querying in a file system) Specification (ls = ?,

Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan

Stupid !! Andr Seznec 2 Single thread performance Has been driving architecture till

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

1 Querying Irregular Dataset Structure Multi-dimensional Datasets Irregular datasets

Architecture and Synthesis for Multi- -Cycle Cycle Architecture and Synthesis for Multi On-

Do HiPS yourself! HiPS tutorial ASTERICS Heidelberg - 17 june 2016 P. Fernique & G.

Design Considerations for a DECADE SDT draft-kutscher-decade-protocol-00

Finding packages, project organization Steve Bagley - PowerPoint PPT Presentation

Finding packages, project organization Steve Bagley somgen223.stanford.edu 1 How to find R packages There are over 15,000 packages available for R. Thats great, but how do you find what you want? Task Views

Building DICE Building DICE Building DICE Building DICE Packages Packages Packages Packages

Home Care Packages Program 1 Key points Home Care Packages More packages Four levels

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Extending R through packages: Theres a package for everything R packages are available on CRAN

A new way to pick up your packages How many student packages arrive on campus annually? A.

Parcel Pro Mockup Presentation 2.009 Pink B Packages get lost Packages get stolen

MATLAB 1 Mathematical Software Symbolic Math Packages - This amorphous set of packages can

4 OO Package Design Principles 4.1 Packages Introduction 4.2 Packages in UML 4.3 Three

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

CXXR and Add-on Packages Andrew Runnalls School of Computing, University of Kent, UK Outline

News from EDOS: Finding Outdated Packages Ralf Treinen PPS, Universit e Paris Diderot Debconf

Corridor Improvements Analysis Process Develop Option Apply Screening Select Analyze Packages

CORPORATE EVENT PACKAGES BANGKOK CONTENTS - Description - Benefits - Packages - Package

pkgsrc meets pkg-ng Generating pkg-ng packages from pkgsrc pkgsrcCon Berlin, July 4 th 2015

Writing and Building R Packages John Fox McMaster University Hamilton, Ontario, Canada IQS

Introduction to roxygen2 Aime Gott Education Practice Lead, Mango Solutions DataCamp

Plan Motivations (to combine navigation and querying in a file system) Specification (ls = ?,

Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan

Stupid !! Andr Seznec 2 Single thread performance Has been driving architecture till

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

1 Querying Irregular Dataset Structure Multi-dimensional Datasets Irregular datasets

Architecture and Synthesis for Multi- -Cycle Cycle Architecture and Synthesis for Multi On-

Do HiPS yourself! HiPS tutorial ASTERICS Heidelberg - 17 june 2016 P. Fernique &amp; G.

Design Considerations for a DECADE SDT draft-kutscher-decade-protocol-00

Do HiPS yourself! HiPS tutorial ASTERICS Heidelberg - 17 june 2016 P. Fernique & G.