An Introduction to Statistical Computing in R K2I Data Science Boot - PowerPoint PPT Presentation

An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55

AM Session Outline Intro to R Basics Plotting In R Data Manipulation Statistical Computing in R May 15, 2017 2 / 55

R Basics Here we will give a quick overview of the R language and the RStudio IDE. Our emphasis will be to explore the most used features of R, especially those used in later courses. This won’t cover all the details, but will the most important parts. Statistical Computing in R May 15, 2017 3 / 55

Working with Rstudio Before beginning with R let’s orient ourselves with RStudio. Statistical Computing in R May 15, 2017 4 / 55

Our initial view of RStudio is: Statistical Computing in R May 15, 2017 5 / 55

Go to: File -> New File -> R Script. This gives: Statistical Computing in R May 15, 2017 6 / 55

Statistical Computing in R May 15, 2017 7 / 55

Try It Out Type the following into console ?lm ??linear plot(1:20, 1:20) Statistical Computing in R May 15, 2017 8 / 55

There are several useful shortcut keys in RStudio. A few popular ones: Ctrl+Enter - When pressed in Editor, sends current line to console. Ctrl+1 , Ctrl+2 - switch between editor and console Ctrl+Shift+Enter - run entire script in console tab completion - this is perhaps the most used feature For vim / emacs users Tools -> Global Options -> Code -> Keybindings will give you your prefered bindings. Statistical Computing in R May 15, 2017 9 / 55

It’s important to know our working directory. Given a file name, R will assume it is located in your current working directory. R will also save output to the working directory by default. It is important to set your working directory to the correct location or specify full path names. Statistical Computing in R May 15, 2017 10 / 55

Try out the following in the console window: getwd() list.files() To change your working directory go to: Session -> Set Working Directory -> Choose Directory Alternatively, setwd("/path/to/directory") Statistical Computing in R May 15, 2017 11 / 55

Reading, Writing, Saving, and Loading Here we’ll look at bringing data into R and getting it out We’ll also see how to save R objects and environments Statistical Computing in R May 15, 2017 12 / 55

Reading In Data read.table read.csv read.fwf Check out options for each ?read.table Statistical Computing in R May 15, 2017 13 / 55

Syntax ?read.table ?read.csv read.table("/path/to/your/file.ext", header=TRUE, sep=",", stringsAsFactors = FALSE) Statistical Computing in R May 15, 2017 14 / 55

Most Common Options sep tells how fields/variables are separated. Commons values are: ”,” (comma) ” ” (single space) ” \t ” (tab escape character) stringsAsFactors tells whether to treat non numeric values as factor/categorical variables. header tells whether first line of file has variable names na.strings tells how missing values are encoded in the file. Statistical Computing in R May 15, 2017 15 / 55

Standard Procedure Open file in text editor Check items relevant to options. Header? Separator type? For big files, Linux tools are helpful: head -n10 BigFile.txt > OpenMe Statistical Computing in R May 15, 2017 16 / 55

Try it Out Let’s read in the ReadMeInX.txt files into R. Try it on your own before looking at the answer on the next slides. Example workflow: 1 Set your working directory to the directory containing the files. 2 Examine the files in a text editor to check for common options (header, separator, etc.) Statistical Computing in R May 15, 2017 17 / 55

# read.table's default seperator ok for this one set0 <- read.table("ReadMeIn0.txt", header=TRUE) # specify new seperator set1 <- read.table("ReadMeIn1.txt", header=TRUE, sep=',') # Or use read.csv set1 <- read.csv("ReadMeIn1.txt", header=TRUE) Statistical Computing in R May 15, 2017 18 / 55

# another change of seperator set2 <- read.table("ReadMeIn2.txt", header=TRUE, sep=';') # check for missing set3 <- read.table("ReadMeIn3.txt", header=FALSE, sep=',', na.strings = '') Statistical Computing in R May 15, 2017 19 / 55

Writing Data write.table write.csv Statistical Computing in R May 15, 2017 20 / 55

Syntax and Common Options ?write.csv write.csv(myRObject, file="/path/to/save/spot/file.csv", row.names=FALSE) Options largely the same as their read counterparts row.names = FALSE is helpful to avoid have 1,2,3,... as a variable/column Statistical Computing in R May 15, 2017 21 / 55

Try It Out Write out one of the files you imported. Try to varying options like sep , quote . Statistical Computing in R May 15, 2017 22 / 55

Saving Objects saveRDS / readRDS are used to save (compressed version of) individual R objects # save our data set saveRDS(set1,file="TstObj.rds") # get it back newtst <- readRDS("TstObj.rds") # can save any R object. Try a vector my.vector <- c(1,8,-100) saveRDS(my.vector, file="JustAVector.rds") Statistical Computing in R May 15, 2017 23 / 55

Saving Environment We can save all variables in the current R workspace with save.image We can load in a saved workspace with load R will ask you save your work when you exit # Save all our work save.image("AllMyWork.RData") # Reload it load("AllMyWork.RData") # name given to default save load(".RData") Statistical Computing in R May 15, 2017 24 / 55

The Basics of R Let’s do a whirlwind tour of R: it’s syntax and data structures This won’t cover all the details, but will the most important parts Statistical Computing in R May 15, 2017 25 / 55

Basic R Data Types # numeric types: interger, double 348 # character "my string" # logical TRUE FALSE # artithmetic as you'd expect 43 + 1 * 2^4 # so too logical operators/comparison TRUE | FALSE 1 + 7 != 7 # Other logical operators: # &, |, ! # <,>,<=,>=, ==, != Statistical Computing in R May 15, 2017 26 / 55

Data Types Cont. # variables assignment is done with the <- operator my.number <- 483 # the '.' above does nothing. we could have done: # mynumber <- 483 # instead # it's an Rism to use .'s in variable names. # typeof() tells use type typeof(my.number) ## [1] "double" # we can convert between types my.int <- as.integer(my.number) typeof(my.int) ## [1] "integer" Statistical Computing in R May 15, 2017 27 / 55

R Data Structures - Vectors # the vector is the most important data structure # create it with c() my.vec <- c(1,2,67,-98) # get some properties str(my.vec) ## num [1:4] 1 2 67 -98 length(my.vec) ## [1] 4 # access elements with [] my.vec[3] ## [1] 67 my.vec[c(3,4)] ## [1] 67 -98 # can do assignment too my.vec[5] <- 41.2 Statistical Computing in R May 15, 2017 28 / 55

Vectors - Cont. # other ways to create vectors x <- 1:6 y <- seq(7,12,by=1) # Operations get recycled through whole vector x + 1 ## [1] 2 3 4 5 6 7 x > 3 ## [1] FALSE FALSE FALSE TRUE TRUE TRUE # Can do component wise operations between vectors x * y ## [1] 7 16 27 40 55 72 x / y ## [1] 0.1428571 0.2500000 0.3333333 0.4000000 0.4545455 0.5000000 y %/% x ## [1] 7 4 3 2 2 2 Statistical Computing in R May 15, 2017 29 / 55

Try It Out # Try guess what the following lines will do # Will it run at all? If so, what will it give? # Think about it and run to confirm 7 -> w w <- z <- 44 1 + TRUE 0 | 15 & 3 my.vec[2:4] my.vec[-2] my.vec[c(TRUE,FALSE,FALSE,TRUE,FALSE)] my.vec[ sum( c(TRUE,FALSE,FALSE,TRUE,TRUE) ) ] <- TRUE my.vec[3] <- "I'm a string" as.numeric(my.vec) x[x>3] x + c(1,2) Statistical Computing in R May 15, 2017 30 / 55

Matrices # matricies are 2d vectors. # create using matrix() my.matrix <- matrix(rnorm(20),nrow=4,ncol=5) # rnorm() draws 20 random samples from a n(0,1) distribution my.matrix ## [,1] [,2] [,3] [,4] [,5] ## [1,] 0.5351131 1.08710882 0.5670939 0.2800755 -0.8050743 ## [2,] -1.9263838 0.86267009 0.7318280 0.4177110 -0.9576529 ## [3,] -1.2931770 -1.03381286 -0.9035750 1.9787516 0.3747967 ## [4,] -2.6190953 -0.04829205 1.3157181 1.2562005 0.1131199 # note matricies loaded by column # Get details dim(my.matrix) ## [1] 4 5 nrow(my.matrix) ## [1] 4 ncol(my.matrix) ## [1] 5 Statistical Computing in R May 15, 2017 31 / 55

Matrices - Cont. # Indexing is similar to vectors but with 2 dimensions # get second row my.matrix[2,] ## [1] -1.9263838 0.8626701 0.7318280 0.4177110 -0.9576529 # get first,last columns of row three my.matrix[3,c(1,4)] ## [1] -1.293177 1.978752 # transposing done with t() Statistical Computing in R May 15, 2017 32 / 55

An Introduction to Statistical Computing in R K2I Data Science Boot - PowerPoint PPT Presentation

An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55 AM Session Outline Intro to R Basics Plotting In R Data Manipulation Statistical

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical Computing with Pathway Tools using RCyc Tomer Altman taltman1@stanford.edu

Introduction to Statistical Process Control Statistical Process Control (SPC) uses seven major

Workshop 4: Statistical modelling intro Murray Logan 10 Mar 2019 Section 1 Introduction

Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

When an Object Is Required Methods called outside the object definition require an object to

CSE 154 LECTURE 7: FILE I/O; FUNCTIONS Functions function name(parameterName, ...,

CSE2031 Software Tools - UNIX introduction Pawluk presented by Shakil Khan Summer 2010

COMP200 INPUT/OUTPUT OOP using Java, based on slides by Shayan Javed 2 Input/Output (IO) 3

Part I. Introduction to treewidth SCHOOL ON PARAMETERIZED ALGORITHMS AND COMPLEXITY 17-22

Shell Emanuele Valea <valea@lirmm.fr> LIRMM CNRS / Universit de Montpellier Shell

Compact Oblivious Routing Harald Rcke, Stefan Schmid Fakultt fr Informatik TU Mnchen

Web Security: Injection CS 161: Computer Security Prof. Vern Paxson TAs: Paul Bramsen, Apoorva

An Introduction to Statistical Computing in R K2I Data Science Boot - PowerPoint PPT Presentation

An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55 AM Session Outline Intro to R Basics Plotting In R Data Manipulation Statistical

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Lecture 1 : Introduction to Statistical Computing Biostatistics 615/815 - Statistical Computing .

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical Computing with Pathway Tools using RCyc Tomer Altman taltman1@stanford.edu

Introduction to Statistical Process Control Statistical Process Control (SPC) uses seven major

Workshop 4: Statistical modelling intro Murray Logan 10 Mar 2019 Section 1 Introduction

Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

When an Object Is Required Methods called outside the object definition require an object to

CSE 154 LECTURE 7: FILE I/O; FUNCTIONS Functions function name(parameterName, ...,

CSE2031 Software Tools - UNIX introduction Pawluk presented by Shakil Khan Summer 2010

COMP200 INPUT/OUTPUT OOP using Java, based on slides by Shayan Javed 2 Input/Output (IO) 3

Part I. Introduction to treewidth SCHOOL ON PARAMETERIZED ALGORITHMS AND COMPLEXITY 17-22

Shell Emanuele Valea &lt;valea@lirmm.fr&gt; LIRMM CNRS / Universit de Montpellier Shell

Compact Oblivious Routing Harald Rcke, Stefan Schmid Fakultt fr Informatik TU Mnchen

Web Security: Injection CS 161: Computer Security Prof. Vern Paxson TAs: Paul Bramsen, Apoorva

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

Shell Emanuele Valea <valea@lirmm.fr> LIRMM CNRS / Universit de Montpellier Shell