Intro to R OIDD215: Analytics & the Digital Economy Juan - PDF document

Intro to R OIDD215: Analytics & the Digital Economy Juan Manubens January 29, 2018 Welcome! This tutorial is designed to get you up and running with R as quickly as possible. If you have any questions, come to office hours, or ask questions on Slack . General Housekeeping First, we need to install ‘base R.’ Different download mediums are available on CRAN; pick the correct one for your computer. We strongly advise updating to the most up to date version of R. Base R comes with a passable ‘graphical user interface’ (GUI). Because R is an interpreted programming language, you can write R using any text editor. However, we strongly recommend using RStudio, an excellent interface for coding in R. Rstudio integrates the command line, graphical file directory explorer, graphical environment/variable inspector, and figure/plot output. It also has first-class support for RMarkdown, which comes in very handy for reports and PDF outputs. You can install Rstudio freely from their website - download the personal desktop version. If you ever want to check your current software version, go to the console and type sessionInfo() and hit enter. For non-beginners If you are familiar with R at more than a very superficial level (e.g. taken a previous class using R in the Statistics Department such as 470, 471, 405, 422, 431, etc) this tutorial is likely too simple. However, there is a lot to still learn. If you are interested in taking these classes or have questions about them, feel free to ask me via Slack or in person. Step 0: Finger Warming • It is strongly recommended that you go through Swirl, to work with R more before attempting any assignment or lab. Even if you have used R before, it is required to update your R and Rstudio versions. Our versions are updated versions, so to ensure compatibility, you should also be runnning updated versions. Update R : Install a new R version, via CRAN. Change the path in your Rstudio options: Tools -> Options -> General. Update Rstudio : Help -> Check for Updates 1

Step 1: Creating a Rmarkdown document After opening up RStudio you will want to open a new R Markdown document by going to File > New File > R Markdown , or selecting R Markdown from selecting the top-left dragdown in RStudio. You will likely be asked to install a number of packages - do that. If the install fails (permission issues), you can try the following: install.packages ("rmarkdown") install.packages ("knitr") An R code file is fundamentally a list of commands to be executed. You can write code directly in the console area, but it is preferable to use the R Scripting area to visualize and keep track of your commands. This is especially important for complicated analyses that you wish to reproduce or share. R Markdown files are perfect for reproducing and sharing data analysis. Each file is broken down into text chunks which are lightly formatted using Markdown, and code chunks which run R code. The results are shared linearly, so each code chunk remembers what the result of previous chunks was. Code chunks look like the following: ## ```{r} ## # code goes here ## ``` Three backticks indicate that the block is a code block, and {r} indicates that the code is in the R language. This file itself is a R Markdown file! You might be reading it as a PDF document or as a Word document. RStudio (technically, the knitr R package) builds and can save documents in a number of different formats. You may need to install Pandoc for exporting to PDF or Word, but exporting to HTML should pretty much always work. The Wharton lab computers should work just fine. There is a button at the top of RStudio, that should read ‘Knit [format]’ - clicking it will knit this document in that format. Try it out! Finally, R Markdown also supports LaTeX, which allows us to write some nice math into our documents: ˆ Y = β 1 x 1 + β 2 x 2 + · · · + β p x p + ǫ For more help with Rmarkdown, check out Hadley’s writeup or this cheatsheet. For time and brevity, we cannot cover everything about it. You can also examine the source of this Rmd file to understand how these files are laid out. 2

Step 2: Writing R code At its most basic, R can be used as a calculator. The following code block shows a super simple operation. (39 + 14) / 7 ## [1] 7.571429 You can assign values to variable names with the ‘left arrow’ operator, <- , and then access them with that name. x <- pi x ## [1] 3.141593 In RStudio, you can use alt + - to create the arrow operator. Do not use the ‘equals’ sign assignment! There is, of course, far more to R code than just this. More complex projects include packages to write books, serve interactive data applications, or ‘just’ do machine learning. People have made amazing visualizations purely in R as well. R packages can also take advantage of C++ integration, (example: RPresto), or integrate tightly with the command line or system processes. There’s even packages to mansplain the code :) Working with the console You can evaluate code in your file in several ways: • Evaluate current line: cmd+ret • Evaluate selection: select code block and use the above keystroke combo • Evaluate entire script: opt/alt+cmd+R • Evaluate code chunk: cmd+shift+enter Here are a few more nice key shortcuts: • Move cursor to script editor: ctrl + 1 • Move cursor to console editor: ctrl + 2 • Clear the console: ctrl + L Rstudio has many helpful shortcuts and tooling - check their Shortcut cheatsheet and their Tip Twitter for more. Setting your working directory Your working directory is where R will find and save data files, plots, etc. We recommend making a folder in your Dropbox directory for this class and its assorted files (see appendix). # getwd() You can set your working directory with setwd(path) . Make sure you always check working directory before reading data! This is especially important if you are working on various analyses that assume different working directories. 3

Step 3: Reading Data Run the following code block to download both tutorial files: # Initial code block to install curl if necessary if ( !require ('RCurl')) { install.packages ('RCurl') } ## Loading required package: RCurl ## Warning: package 'RCurl' was built under R version 3.4.3 ## Loading required package: bitops library ("RCurl") # Download demo files from Github into working directory survey_web <- "https://github.com/juanmanubens/OIDD215-245/raw/master/Tutorial/Survey_results_final.csv" tips_web <- "https://github.com/juanmanubens/OIDD215-245/raw/master/Tutorial/tips.txt" download.file (survey_web, destfile="Survey_OIDD.csv",method="libcurl") download.file (tips_web, destfile="tips_OIDD.txt",method="libcurl") The data for the rest of this section should now be downloaded in your working directory ( Survey_OIDD.csv and tips_OIDD.txt ). Here’s an example of how to read in a .csv data file located in your working directory, using the read.csv function in R: radio <- read.csv ("Survey_OIDD.csv", header = TRUE,stringsAsFactors = F) The most important thing to note is the path to the file. If you set your working directory cor- rectly, and the file is in the working directory, this will work. You can also use a direct path, e.g. “/Users/username/Dropbox/Survey_OIDD.csv”. Alternatively, you can use direct urls to content on the internet (as I did above), and R will open the connection to download the file. 4

Step 4: Cleaning and examining the data Before you conduct a formal analysis it is always wise to take a quick look at the data and try to spot anything abnormal such as missing data. In R, data is usually stored in an object called a ‘data frame.’ Each row is an observation and each column is a variable/feature. class (radio) ## [1] "data.frame" As noted above, you can type radio into console and get the full representation of the object. However, this won’t display nicely when there are a lot of columns. We often examine the structure, head, or tail of the data to get a feel for it. str (radio) head (radio) tail (radio) ncol (radio) You can also check the dimensions of the dataset with dim() . Other useful functions include length() , nrow() , ncol() . Variable names are accessed with names() function. In Rstudio, you can also go to the Environment panel, and click on a particular object to open a visual representation of the object. You can also access that with View() (capital V). You can subset with brackets. names(radio) returns a list, and to access the first object of the list you do names(radio)[1] . names (radio)[1] <- "hit_id" names (radio)[1 : 10] ## [1] "hit_id" "HITTypeId" ## [3] "Title" "Description" ## [5] "Keywords" "Reward" ## [7] "CreationTime" "MaxAssignments" ## [9] "RequesterAnnotation" "AssignmentDurationInSeconds" 5

Intro to R OIDD215: Analytics & the Digital Economy Juan - PDF document

Intro to R OIDD215: Analytics & the Digital Economy Juan Manubens January 29, 2018 Welcome! This tutorial is designed to get you up and running with R as quickly as possible. If you have any questions, come to office hours, or ask questions

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

INTRO: What is a MOOD BOARD? What is it? INTRO: Why are they Used? INTRO: Things to Consider

Intro to Life Cycle Analysis Intro to Life Cycle Analysis Intro to Life Cycle Analysis

Intro to Electronics Week 2 Intro to Electronics, Week 2 Last updated Oct. 17, 2012 1 Build a

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Lab 0 Objectives Intro to Labs Intro to Operating Systems Start Lab #0 UNIX/Linux

Some issues in model-based development for embedded control systems Paul Caspi Verimag-Cnrs

Intro to Electronics Week 5 Intro to Electronics, Week 5 Last updated Nov. 14, 2012 1 Build a

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE

Intro to FreeSurfer Jargon Intro to FreeSurfer Jargon voxel surface volume vertex

Hello! TaA - Beverly Chou - 1 What are we doing ? intro part one Intro to gear mechanisms.

06/09/14 10. A (very) short intro to JSP 10. A (very) short intro to JSP Dynamic web pages

Intro to Electronics Week 4 Intro to Electronics, Week 4 Last updated Oct. 31, 2012 1 Make an

Grazie focus on RAN virtualization Dario Sabella (Telecom Italia) Agenda Telecom Italia

R-ArcGIS Bridge Prof. Dr. Edzer Pebesma, Dr. Melanie Brandmeier & Dr. Benedikt Grler EDC

Statistical Analysis of Corpus Data with R A Gentle Introduction for Computational Linguists and

Writing and Building R Packages John Fox McMaster University Hamilton, Ontario, Canada IQS

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

Introduction to R Sofware Advanced Herd Management Ccile Cornou 1 What is R? Programming

Functional regression analysis using R Christian Ritz Statistics Group Faculty of Life Sciences

Statistical LeaRning Katja Nowick, Lydia Mueller Bioinformatics group, Markus Kreuz IMISE

Intro to R OIDD215: Analytics & the Digital Economy Juan - PDF document

Intro to R OIDD215: Analytics & the Digital Economy Juan Manubens January 29, 2018 Welcome! This tutorial is designed to get you up and running with R as quickly as possible. If you have any questions, come to office hours, or ask questions

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

INTRO: What is a MOOD BOARD? What is it? INTRO: Why are they Used? INTRO: Things to Consider

Intro to Life Cycle Analysis Intro to Life Cycle Analysis Intro to Life Cycle Analysis

Intro to Electronics Week 2 Intro to Electronics, Week 2 Last updated Oct. 17, 2012 1 Build a

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data &amp; Intro to Cloud Computing

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data &amp; Intro to Cloud Computing

Lab 0 Objectives Intro to Labs Intro to Operating Systems Start Lab #0 UNIX/Linux

Some issues in model-based development for embedded control systems Paul Caspi Verimag-Cnrs

Intro to Electronics Week 5 Intro to Electronics, Week 5 Last updated Nov. 14, 2012 1 Build a

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE

Intro to FreeSurfer Jargon Intro to FreeSurfer Jargon voxel surface volume vertex

Hello! TaA - Beverly Chou - 1 What are we doing ? intro part one Intro to gear mechanisms.

06/09/14 10. A (very) short intro to JSP 10. A (very) short intro to JSP Dynamic web pages

Intro to Electronics Week 4 Intro to Electronics, Week 4 Last updated Oct. 31, 2012 1 Make an

Grazie focus on RAN virtualization Dario Sabella (Telecom Italia) Agenda Telecom Italia

R-ArcGIS Bridge Prof. Dr. Edzer Pebesma, Dr. Melanie Brandmeier &amp; Dr. Benedikt Grler EDC

Statistical Analysis of Corpus Data with R A Gentle Introduction for Computational Linguists and

Writing and Building R Packages John Fox McMaster University Hamilton, Ontario, Canada IQS

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

Introduction to R Sofware Advanced Herd Management Ccile Cornou 1 What is R? Programming

Functional regression analysis using R Christian Ritz Statistics Group Faculty of Life Sciences

Statistical LeaRning Katja Nowick, Lydia Mueller Bioinformatics group, Markus Kreuz IMISE

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

Large-Scale Data Engineering Intro to LSDE, Intro to Big Data & Intro to Cloud Computing

R-ArcGIS Bridge Prof. Dr. Edzer Pebesma, Dr. Melanie Brandmeier & Dr. Benedikt Grler EDC