r session environment packages
play

R session = environment + packages R F OR S AS US ERS Melinda - PowerPoint PPT Presentation

R session = environment + packages R F OR S AS US ERS Melinda Higgins, PhD Research Professor/Senior Biostatistician Emory University Why learn R? R is FREE . Free as in no cost and free as in open source licensing 1 R 's popularity is


  1. R session = environment + packages R F OR S AS US ERS Melinda Higgins, PhD Research Professor/Senior Biostatistician Emory University

  2. Why learn R? R is FREE . Free as in no cost and free as in open source licensing 1 R 's popularity is growing rapidly Data science jobs for R have now surpassed those for SAS R appears to now be more commonly reported in scholarly articles than SAS The basic R installation is small (usually <100MB) Did I mention R is FREE ? 1 http://r4stats.com/articles/popularity/ R FOR SAS USERS

  3. A computing session SAS vs R R FOR SAS USERS

  4. A computing session SAS vs R R FOR SAS USERS

  5. A computing session SAS vs R R FOR SAS USERS

  6. A computing session SAS vs R R FOR SAS USERS

  7. A computing session SAS vs R R FOR SAS USERS

  8. A computing session SAS vs R R FOR SAS USERS

  9. Data and other objects ls() lists all data and related objects loaded in R session's global environment R FOR SAS USERS

  10. Load data �les load() loads datasets in .RData binary format R FOR SAS USERS

  11. Global environment - new session Usually there are no objects in the global environment at the beginning of a new R session. ls() character(0) R FOR SAS USERS

  12. Load data 1 Abalone dataset Shell�sh similar to clams, mussels or oysters Marine Research Lab, T asmania, Australia Use measurements to predict age # Load the abalone dataset load("abalone.RData") # List the objects in memory ls() "abalone" 1 https://archive.ics.uci.edu/ml/datasets/abalone R FOR SAS USERS

  13. Getting help help() provides access to documentation for any function or package installed R FOR SAS USERS

  14. help(ls) R FOR SAS USERS

  15. Settings and functionality sessioninfo() provides details on computer system and packages loaded library() is used to load packages during your R session 1 T ens of thousands of R packages are available and increasing everyday 1 https://cran.r—project.org/web/packages/index.html R FOR SAS USERS

  16. R sessionInfo sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base R FOR SAS USERS

  17. R sessionInfo # Load the dplyr package and run sessionInfo again library(dplyr) sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) ... some output removed ... attached base packages: [1] stats graphics grDevices utils datasets methods base other attached base packages: [1] dplyr_0.7.7 R FOR SAS USERS

  18. Let's get started on your �rst R session R F OR S AS US ERS

  19. Descriptive statistics with R R F OR S AS US ERS Melinda Higgins, PhD Research Professor/Senior Biostatistician Emory University

  20. Loading external CSV datasets Abalone dataset contains 9 measurements: length diameter height whole weight shucked weight For 4177 abalones shell weight viscera weight sex (infants, females, males) number of rings R FOR SAS USERS

  21. Loading external CSV datasets abalone dataset available in CSV (comma separated value) format read_csv() function from readr package used to load CSV data R FOR SAS USERS

  22. R FOR SAS USERS

  23. R FOR SAS USERS

  24. R FOR SAS USERS

  25. The assign operator <- puts output from readr::read_csv into an object abalone abalone is now saved in the global environment R FOR SAS USERS

  26. str(abalone) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4177 obs. of 9 variables: $ sex : chr "M" "M" "F" "M" ... $ length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 ... $ diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 ... $ height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 ... $ wholeWeight : num 0.514 0.226 0.677 0.516 0.205 ... $ shuckedWeight: num 0.2245 0.0995 0.2565 0.2155 0.0895 ... $ visceraWeight: num 0.101 0.0485 0.1415 0.114 0.0395 ... $ shellWeight : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 ... $ rings : int 15 7 9 10 7 8 20 16 9 19 ... R FOR SAS USERS

  27. # Display dimensions of abalone dataset dim(abalone) 4177 9 # Elements or variables in abalone dataset names(abalone) "sex" "length" "diameter" "height" "wholeWeight" "shuckedWeight" "visceraWeight" "shellWeight" "rings" R FOR SAS USERS

  28. Dataset contents and variable types head() and tail() show top and bottom 6 rows respectively by default Change the number of rows shown by adding a second argument to the function # Show bottom 7 rows of abalone tail(abalone, 7) # A tibble: 7 x 9 sex length diameter height wholeWeight shuckedWeight visceraWeight shellWeight rings <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 M 0.55 0.43 0.13 0.840 0.316 0.196 0.240 10 2 M 0.56 0.43 0.155 0.868 0.4 0.172 0.229 8 3 F 0.565 0.45 0.165 0.887 0.37 0.239 0.249 11 4 M 0.59 0.44 0.135 0.966 0.439 0.214 0.260 10 5 M 0.6 0.475 0.205 1.18 0.526 0.288 0.308 9 6 F 0.625 0.485 0.15 1.09 0.531 0.261 0.296 10 7 M 0.71 0.555 0.195 1.95 0.946 0.376 0.495 12 R FOR SAS USERS

  29. Working with data using dplyr approach In this course, you will use these dplyr functions: %>% is a pipe operator from the magrittr package included with dplyr arrange() will sort the data by one or more variables pull(x) will pull one column x variable out of the dataset select(x,y,z) will select more than one variable out of the dataset R FOR SAS USERS

  30. dplyr arrange function and pipe %>% approach R FOR SAS USERS

  31. dplyr arrange function and pipe %>% approach R FOR SAS USERS

  32. dplyr arrange function and pipe %>% approach R FOR SAS USERS

  33. dplyr arrange function and pipe %>% approach R FOR SAS USERS

  34. Arrange abalones by diameter # Arrange abalone dataset by diameter dimension abalone %>% arrange(diameter) # A tibble: 4,177 x 9 sex length diameter height wholeWeight shuckedWeight visceraWeight shellWeight rings <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 I 0.075 0.055 0.01 0.002 0.001 0.0005 0.0015 1 2 I 0.11 0.09 0.03 0.008 0.0025 0.002 0.003 3 3 I 0.13 0.095 0.035 0.0105 0.005 0.0065 0.0035 4 4 I 0.13 0.1 0.03 0.013 0.0045 0.003 0.004 3 5 I 0.15 0.1 0.025 0.015 0.0045 0.004 0.005 2 6 I 0.155 0.105 0.05 0.0175 0.005 0.0035 0.005 4 7 I 0.14 0.105 0.035 0.014 0.0055 0.0025 0.004 3 8 I 0.17 0.105 0.035 0.034 0.012 0.0085 0.005 4 9 I 0.14 0.105 0.035 0.0145 0.005 0.0035 0.005 4 10 M 0.155 0.11 0.04 0.0155 0.0065 0.003 0.005 3 R FOR SAS USERS

  35. Extract one variable from abalone Let's extract shuckedWeight from abalone using pull() from dplyr # Pull out shuckedWeight variable from abalone abalone %>% pull(shuckedWeight) [1] 0.2245 0.0995 0.2565 0.2155 0.0895 0.1410 0.2370 0.2940 0.2165 0.3145 0.1940 0.1675 [13] 0.2175 0.2725 0.1675 0.2580 0.0950 0.1880 0.0970 0.1705 0.0955 0.0800 0.4275 0.3180 [25] 0.5130 0.3825 0.3945 0.3560 0.3940 0.3930 0.3935 0.6055 0.5515 0.8150 0.6330 0.2270 [37] 0.5305 0.2370 0.3810 0.1340 0.1865 0.3620 0.0315 0.0255 0.0175 0.0875 0.2930 0.1775 [49] 0.0755 0.3545 0.2385 0.1335 0.2595 0.2105 0.1730 0.2565 0.1920 0.2765 0.0420 0.2460 [61] 0.1800 0.3050 0.3020 0.1705 0.2340 0.2340 0.3540 0.4160 0.2135 0.0630 0.2640 0.1405 [73] 0.4800 0.4740 0.4810 0.4425 0.3625 0.3630 0.2820 0.4695 0.3845 0.5105 0.3960 0.4080 [85] 0.3800 0.3390 0.4825 0.3305 0.2205 0.3135 0.3410 0.3070 0.4015 0.5070 0.5880 0.5755 [97] 0.2690 0.2140 0.2010 0.2775 0.1050 0.3280 0.3160 0.3105 0.4975 0.2910 0.2935 0.2610 ...remaining output removed... R FOR SAS USERS

  36. Compute mean and median shucked weight # Compute mean shuckedWeight abalone %>% pull(shuckedWeight) %>% mean() 0.3593675 # Compute median shuckedWeight abalone %>% pull(shuckedWeight) %>% median() 0.336 R FOR SAS USERS

  37. Select two variables from abalone # Select two variables length and height abalone %>% select(length, height) # A tibble: 4,177 x 2 length height <dbl> <dbl> 1 0.455 0.095 2 0.35 0.09 3 0.53 0.135 4 0.44 0.125 5 0.33 0.08 6 0.425 0.095 7 0.53 0.15 8 0.545 0.125 # ... with 4,169 more rows R FOR SAS USERS

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend