getting started in r
play

getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 - PowerPoint PPT Presentation

getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 INFORMS Code & Data Boot Camp Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 1 / 70 Find these slides at Today well talk about


  1. getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 INFORMS Code & Data Boot Camp Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 1 / 70

  2. Find these slides at Today we’ll talk about https://github.com/gadenbuie/usf-boot-camp-R Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 2 / 70 � The R Universe � Getting set up � Working with data � Base functions � Where to go from here

  3. Here’s what you need to start Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 3 / 70 � Install R � cloud.r-project.org � Install R-Studio � rstudio.com � Download the companion code to this talk � http://bit.ly/1q5Rfpy

  4. The R Universe Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 4 / 70

  5. What is R? statistical computing and graphics, based on it predecessor S. Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 5 / 70 � R is an Open Source and free programming language for � Available for Windows, Mac, and Linux � Under active development � R can be easily extended with “packages”: � code, data and documentation

  6. Why use R? collaborate with others and publish your work Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 6 / 70 � Free and open source � Excellent and robust community � One of the most popular tools for data analysis � Growing popularity in science and hacking � Article in Fast Company � Among the highest-paying IT skills on the market � 2014 Dice Tech Salary Survey � So many cool projects and tools that make it easy to

  7. Pros of using R professional and academic community Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 7 / 70 � Available on any platform � Source code is easy to read � Lots of work being done in R now, with an excellent and open � Plays nicely with many other packages (SPSS, SAS) � Bleeding edge analyses not available in proprietary packages

  8. Some downsides of R Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 8 / 70 � Older language that can be a little quirky � User-driven supplied features � It’s a programming language, not a point-and-click solution � Slower than compiled languages � To speed up R you vectorize � Opposite of other languages

  9. Some R Vocab packages getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 http://adv-r.had.co.nz/Vocabulary.html Data organized into rows and columns dataframe The basic unit of data in R vector “Apps” for R Default location of fjles for input/output Term working directory Repeatable blocks of commands functions Your “program” or text fjle containing commands scripts The “main” portal to R where you enter commands console, terminal Description 9 / 70

  10. The R Console Figure 1:Standard R Console Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 10 / 70

  11. R Studio: Standard View Figure 2 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 11 / 70

  12. R Studio: My personalized view Figure 3 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 12 / 70

  13. Take it for a quick spin 3+3 ## [1] 6 sqrt(4^4) ## [1] 16 2==2 ## [1] TRUE Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 13 / 70

  14. Setting up RStudio Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 14 / 70 � Under settings, move panes to where you want them to be � Change font colors, etc � Browse to downloaded companion script in Files pane � Open script and set working directory

  15. Where to get help Google Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 15 / 70 � Every R packages comes with documentation and examples � Try ?summary and ??regression � RStudio + tab completion = FTW! � Get help online � StackExchange � Google (add in R or R stats to your query) � RSeek � For really odd messages, copy and paste error message into

  16. Working directory Set working directory with setwd(”path/to/directory/”) Check to see where you are with getwd() Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 16 / 70

  17. Packages Install packages 1 install.packages(’ggplot2’) Load packages library(ggplot2) ?ggplot 1 Windows 7+ users need to run RStudio with System Administrator privileges. Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 17 / 70 Find packages on CRAN or Rdocumentation. Or

  18. Basics of the language Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 18 / 70

  19. Basic Operators 2 + 2 2/2 2*2 2^2 2 == 2 42 >= 2 2 <= 42 # Integer division -> 11 23 %% 2 # Remainder -> 1 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 19 / 70 2 != 42 23 %/% 2

  20. Key Symbols x <- 10 # Assigment operator y <- 1:x # Sequence y[2] # Element selection ## [1] 2 ”str” == ’str’ # Strings ## [1] TRUE Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 20 / 70

  21. Functions Functions have the form functionName(arg1, arg2, ...) and arguments always go inside the parenthesis. Defjne a function: fun <- function(x=0){ # Adds 42 to the input number return(x+42) } fun(8) ## [1] 50 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 21 / 70

  22. Data types NA getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 is.numeric() . You can check to see what type a variable is with class(x) or # factor factor() # NA # logical 1L FALSE == 0 # logical # character ’1’ # numeric 1.0 # integer 22 / 70 TRUE == 1

  23. Data Structures Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 23 / 70

  24. Vectors Basic data type is a vector, built with c() for concatenate . x <- c(1, 2, 3, 4, 5); x ## [1] 1 2 3 4 5 ## [1] 6 7 8 9 10 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 24 / 70 y <- c(6:10); y

  25. Working with vectors 8 10 10 getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 2 2 4 2 2 6 10 [1] a <- sample(1:5, 10, replace=TRUE) ## ## [1] 5 length(unique(a)) ## [1] 4 5 3 1 2 unique(a) ## [1] 10 length(a) 25 / 70 a * 2

  26. Strings Strings use either the ’ ’ or the ” ” characters. mystr <- ’Glad you\’re here’ print(mystr) ## [1] ”Glad you’re here” paste(mystr, ’!’, sep=’’) ## [1] ”Glad you’re here!” c(mystr, ’!’) ## [1] ”Glad you’re here” ”!” Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 26 / 70 Use paste() to concatenate strings, not c() .

  27. Matrices: binding vectors ## y ## x 1 2 3 4 5 6 ## 7 8 9 10 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R [,1] [,2] [,3] [,4] [,5] # 2 x 5 matrix Matrices can be built by row binding or column binding vectors: 6 cbind(x,y) # 5 x 2 matrix ## x y ## [1,] 1 ## [2,] 2 rbind(x,y) 7 ## [3,] 3 8 ## [4,] 4 9 ## [5,] 5 10 27 / 70

  28. Matrices: matrix function ## [2,] getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 10 9 8 7 6 5 Or you can build a matrix using the matrix() function: 4 3 2 1 ## [1,] [,1] [,2] [,3] [,4] [,5] ## matrix(1:10, nrow=2, ncol=5, byrow=TRUE) 28 / 70

  29. Coercion Vectors and matrices need to have elements of the same type, so R pushes mismatched elements to the best common type. c(’a’, 2) ## [1] ”a” ”2” c(1L, 1.0) ## [1] 1 1 c(1L, 1.1) ## [1] 1.0 1.1 Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 29 / 70

  30. Recycling 4 getting staRted in R Garrick Aden-Buie // Friday, March 25, 2016 5 3 1 4 2 ## [2,] 2 Recycling occurs when a vector has mismatched dimensions. R will 5 3 1 ## [1,] [,1] [,2] [,3] [,4] [,5] ## matrix(1:5, nrow=2, ncol=5, byrow=FALSE) fjll in dimensions by repeating a vector from the beginning. 30 / 70

  31. Factors Factors are a special (at times frustrating) data type in R. x <- rep(1:3, 2) x ## [1] 1 2 3 1 2 3 x <- factor(x, levels=c(1, 2, 3), labels=c(’Bad’, ’Good’, ’Best’)) x ## [1] Bad Good Best Bad Good Best ## Levels: Bad Good Best Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 31 / 70

  32. Ordering factors Order of factors is important for things like plot type, output, etc. Also factors are really two things tied together: the data itself and the labels. x[order(x)] ## [1] Bad Bad Good Good Best Best ## Levels: Bad Good Best x[order(x, decreasing=T)] ## [1] Best Best Good Good Bad Bad ## Levels: Bad Good Best Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 32 / 70

  33. Ordering factor labels That reordered the elements of x , but not the factor levels. Compare: factor(x, levels=c(’Best’, ’Good’, ’Bad’)) ## [1] Bad Good Best Bad Good Best ## Levels: Best Good Bad factor(x, labels=c(’Best’, ’Good’, ’Bad’)) ## [1] Best Good Bad Best Good Bad ## Levels: Best Good Bad Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 33 / 70

  34. Squashing factors What if you want your drop the “factor” and keep the data? Keep the numbers 2 as.numeric(x) ## [1] 1 2 3 1 2 3 Keep the labels as.character(x) ## [1] ”Bad” ”Good” ”Best” ”Bad” ”Good” ”Best” 2 Risky, order matters! Garrick Aden-Buie // Friday, March 25, 2016 getting staRted in R 34 / 70

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend