applied statistics and data modeling
play

Applied Statistics and Data Modeling An introduction to R Luc - PowerPoint PPT Presentation

Applied Statistics and Data Modeling An introduction to R Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS VM L. Duchateau &


  1. Applied Statistics and Data Modeling An introduction to R Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 1 / 38

  2. Overview R and RStudio 1 What is R and RStudio? Installation of R and RStudio Using RStudio R as a calculator 2 Some R concepts 3 R help Objects R functions 4 Data 5 What are data? Reading in data Exploring data UGent STATS VM The function lm 6 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 2 / 38

  3. R and RStudio What is R and RStudio? What is R? Programming language Open source Software environment 8 basic packages + 14574 other packages available Packages installed via install.packages("package name") UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 3 / 38

  4. R and RStudio What is R and RStudio? What is RStudio? Alternative implementation of R Packages can be installed via Tools - Install packages UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 4 / 38

  5. R and RStudio Installation of R and RStudio Installation of R https://cran.r-project.org/ UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 5 / 38

  6. R and RStudio Installation of R and RStudio Installation of RStudio https://www.rstudio.com/ UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 6 / 38

  7. R and RStudio Using RStudio Interface of RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 7 / 38

  8. R and RStudio Using RStudio Script in RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 8 / 38

  9. R and RStudio Using RStudio Run command in RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 9 / 38

  10. R as a calculator R as calculator 2+3 ## [1] 5 (5+11)/2-9 ## [1] -1 2ˆ3 ## [1] 8 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 10 / 38

  11. Some R concepts R help R help built-in help help (mean) ?mean online help StackOverflow StackExchange R-bloggers UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 11 / 38

  12. Some R concepts Objects Scalars Objects: scalars, vectors, datasets, . . . Creating objects: assignment operator ( <- ) height <- 173 height ## [1] 173 Case sensitive height <- 173 Height <- 186 height ## [1] 173 Height ## [1] 186 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 12 / 38

  13. Some R concepts Objects Scalars Calculations with objects height <- 173 weight <- 63 BMI<-weight/(height/100)ˆ2 BMI ## [1] 21.04982 Text objects Greeting <- "Hello world!" Greeting ## [1] "Hello world!" L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 13 / 38

  14. Some R concepts Objects Vectors Vectors: function c() A numeric vector x <- c (1, 1, 2, 3, 5, 8) x ## [1] 1 1 2 3 5 8 A character vector y <- c ("Belgium", "Portugal", "Italy") y ## [1] "Belgium" "Portugal" "Italy" UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 14 / 38

  15. Some R concepts Objects Vectors Calculating with vectors x*2 ## [1] 2 2 4 6 10 16 xˆ2 ## [1] 1 1 4 9 25 64 x*x ## [1] 1 1 4 9 25 64 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 15 / 38

  16. R functions Functions We already used one function to create a vector c() x <- c (1, 1, 2, 3, 5, 8) x ## [1] 1 1 2 3 5 8 A function has a name and a list of arguments separated by a comma UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 38

  17. R functions Math functions Trigonometric: sin (pi/2) asin (1) ## [1] 1 ## [1] 1.570796 cos (0) acos (1) ## [1] 1 ## [1] 0 tan (0) atan (0) ## [1] 0 ## [1] 0 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 17 / 38

  18. R functions Math functions Rounding round (8.6178,2) sign (8.6178); ## [1] 8.62 ## [1] 1 floor (8.6178) sign (-8.6178) ## [1] 8 ## [1] -1 signif (8.6178,2) abs (-8.6178) ## [1] 8.6 ## [1] 8.6178 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 18 / 38

  19. R functions Math functions Logarithms & exponentials exp (0) ## [1] 1 log (1) ## [1] 0 log10 (1000) ## [1] 3 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 19 / 38

  20. R functions Math functions Others sqrt (25) ## [1] 5 factorial (4) ## [1] 24 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 20 / 38

  21. R functions Statistical functions x <- c (1, 3, 4, 6, 2, 8) quantile (x) mean (x) ## 0% 25% 50% 75% 100% ## [1] 4 ## 1.00 2.25 3.50 5.50 8.00 var (x) sort (x) ## [1] 6.8 ## [1] 1 2 3 4 6 8 sd (x) rank (x) UGent ## [1] 2.607681 STATS VM ## [1] 1 3 4 5 2 6 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 21 / 38

  22. R functions Using functions on vectors x <- c (4, 16, 9, 25) sqrt (x) ## [1] 2 4 3 5 log (x) ## [1] 1.386294 2.772589 2.197225 3.218876 exp ( sqrt (x)) UGent ## [1] 7.389056 54.598150 20.085537 148.413159 STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 22 / 38

  23. Data What are data? Dataset breed size litters weight 1 Maine coon large 2 5.1 observations 2 Russian blue small 0 3.9 3 Bengal medium 0 4.5 4 Ragdol medium 1 4.8 5 Chartreux large 1 5.2 6 Siamese small 2 4.1 7 Persian medium 2 4.2 8 Maine coon large 3 4.8 variables discrete continuous nominal ordinal UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 23 / 38

  24. Data Reading in data Reading in data Different formats: .xls(x), .csv, .txt, . . . Most important distinguishing properties: header: does the first row contain column names? column separator: comma, semicolon, tab? decimal sign: point, comma? General function in R to read in data: read.table() args (read.table) ## function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", ## numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, ## col.names, as.is = !stringsAsFactors, na.strings = "NA", ## colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, ## fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, ## comment.char = "#", allowEscapes = FALSE, flush = FALSE, ## stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", UGent STATS ## encoding = "unknown", text, skipNul = FALSE) VM ## NULL L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 24 / 38

  25. Data Reading in data Reading in data Specific functions for specific formats Function Format Header Column Decimal separator sign .csv TRUE ” , ” ” . ” read.csv() .csv TRUE ” ; ” ” . ” read.csv(,sep=";") .csv TRUE ” ; ” ” , ” read.csv2() .txt TRUE ” tab ” ” . ” read.delim() .txt TRUE ” tab ” ” , ” read.delim2() UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 25 / 38

  26. Data Reading in data Reading in cats data In this course we use .csv files: cats.csv First open csv-file in notepad breed;size;litters;weight Maine coon;large;2;5.1 Russian Blue;small;0;3.9 Bengal;medium;0;4.5 British Shorthair;medium;1;4.8 Chartreux;large;1;5.2 Siamese;small;2;4.1 Persian;medium;2;4.2 Maine Coon;large;3;4.8 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 26 / 38

  27. Data Reading in data Reading in cats data breed;size;litters;weight Maine coon;large;2;5.1 Russian Blue;small;0;3.9 Bengal;medium;0;4.5 British Shorthair;medium;1;4.8 Chartreux;large;1;5.2 Siamese;small;2;4.1 Persian;medium;2;4.2 Maine Coon;large;3;4.8 separator: semicolon decimal sign: point UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 27 / 38

  28. Data Reading in data Reading in cats data separator: semicolon decimal sign: point Function Format Header Column Decimal separator sign read.csv() .csv TRUE ” , ” ” . ” .csv TRUE ” ; ” ” . ” read.csv(,sep=";") .csv TRUE ” ; ” ” , ” read.csv2() .txt TRUE ” tab ” ” . ” read.delim() .txt TRUE ” tab ” ” , ” read.delim2() Most appropriate function: read.csv(,sep=";") UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 28 / 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend