sequential data analysis an introduction to r
play

Sequential data analysis An introduction to R Gilbert Ritschard - PowerPoint PPT Presentation

Sequential data analysis Sequential data analysis An introduction to R Gilbert Ritschard Department of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data


  1. Sequential data analysis Sequential data analysis An introduction to R Gilbert Ritschard Department of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data Mining University of Southern California, Los Angeles, CA, July 2009 23/7/2009gr 1/64

  2. Sequential data analysis Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 2/64

  3. Sequential data analysis Introduction Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 3/64

  4. Sequential data analysis Introduction R R is: Software environment for statistical computing and graphics Based on the S language (as is S-PLUS) Freely distributed under GPL licence Available for any platform: Windows/Mac/Linux/Unix Easily extensible with numerous contributed modules 23/7/2009gr 4/64

  5. Sequential data analysis Installing and launching R Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 5/64

  6. Sequential data analysis Installing and launching R Installation R and the modules can be downloaded from the CRAN http://cran.r-project.org By default, no GUI is proposed under Linux. Under Windows and MacOSX, the basic GUI remains limited. ... but try Rcmdr (can be download from the CRAN) 23/7/2009gr 6/64

  7. Sequential data analysis Installing and launching R First steps in R Four possibilities to send commands to R 1 Type commands in the R Console. 2 The script editor - > File/New script (only Windows/Mac) 3 The Rcmd module 4 Use a text editor with R support (Tinn-R, WinEdt, etc.) In addition, you can also use your preferred text editor and copy-paste the commands into the R Console, 23/7/2009gr 7/64

  8. Sequential data analysis Objects and operators Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 8/64

  9. Sequential data analysis Objects and operators Introduction to R objects Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 9/64

  10. Sequential data analysis Objects and operators Introduction to R objects Objects R works with objects Assigning a value to an object ‘a’ R> a <- 50 Operation on an object R> a/50 [1] 1 Case-sensitive: a � = A R> A/50 Error: object "A" not found 23/7/2009gr 10/64

  11. Sequential data analysis Objects and operators Introduction to R objects Types of objects Different types of objects vector: 4 5 1 or in R c(4,5,1) ” D”” E”” A” or in R c("D","E","A") factor: categorical variable matrix: table of numerical data data frame: general data table (columns can be of different types) ... 23/7/2009gr 11/64

  12. Sequential data analysis Objects and operators Introduction to R objects Factors I A factor is defined by“levels”(possible values) and an indicator of whether it is ordinal or not. Vector of“strings” R> sex <- c("man", "woman", "woman", "man", "woman") R> sex [1] "man" "woman" "woman" "man" "woman" Creation of a factor R> sex.fac <- factor(sex) R> sex.fac [1] man woman woman man woman Levels: man woman R> attributes(sex.fac) 23/7/2009gr 12/64

  13. Sequential data analysis Objects and operators Introduction to R objects Factors II $levels [1] "man" "woman" $class [1] "factor" R> table(sex.fac) sex.fac man woman 2 3 To change the order of the“levels” R> sex.fac2 <- factor(sex, levels = c("woman", "man")) R> sex.fac2n <- as.numeric(sex.fac2) R> table(sex.fac2, sex.fac2n) sex.fac2n sex.fac2 1 2 woman 3 0 man 0 2 23/7/2009gr 13/64

  14. Sequential data analysis Objects and operators Introduction to R objects Objects (continued) I Results can always be stored in a new object Example: R> library(TraMineR) R> data(mvad) R> tab.male.gcse <- table(mvad$male, mvad$gcse5eq) R> tab.male.gcse no yes no 186 156 yes 266 104 23/7/2009gr 14/64

  15. Sequential data analysis Objects and operators Introduction to R objects Objects (continued) Depending of its class, methods can be directly applied to it R> plot(tab.male.gcse, cex.axis = 1.5) tab.male.gcse yes no no yes 23/7/2009gr 15/64

  16. Sequential data analysis Objects and operators Introduction to R objects Row and marginal distributions Row and column distributions R> prop.table(tab.male.gcse, 1) no yes no 0.5438596 0.4561404 yes 0.7189189 0.2810811 R> prop.table(tab.male.gcse, 2) no yes no 0.4115044 0.6000000 yes 0.5884956 0.4000000 Margins R> margin.table(tab.male.gcse, 1) no yes 342 370 R> margin.table(tab.male.gcse, 2) no yes 452 260 23/7/2009gr 16/64

  17. Sequential data analysis Objects and operators Acting on subsets of objects Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 17/64

  18. Sequential data analysis Objects and operators Acting on subsets of objects Indexes Indexing vectors x[n] nth element x[-n] all but the nth element x[1:n] first n elements x[-(1:n)] elements from n+1 to the end x[c(1,4,2)] specific elements x["name"] element named "name" x[x > 3] all elements greater than 3 x[x > 3 & x < 5] all elements between 3 and 5 x[x %in% c("a","and","the")] elements in the given set Indexing matrices x[i,j] element at row i, column j x[i,] row i x[,j] column j x[,c(1,3)] columns 1 and 3 x["name",] row named "name" Indexing data frames (matrix indexing plus the following) x[["name"]] column named "name" x$name idem 23/7/2009gr 18/64

  19. Sequential data analysis Objects and operators Acting on subsets of objects Crosstable on data subsets Cross tables for catholic and non catholic R> table(mvad$male[mvad$catholic == "yes"], mvad$gcse5eq[mvad$catholic == + "yes"]) no yes no 82 77 yes 133 52 R> table(mvad$male[mvad$catholic == "no"], mvad$gcse5eq[mvad$catholic == + "no"]) no yes no 104 79 yes 133 52 23/7/2009gr 19/64

  20. Sequential data analysis Objects and operators Acting on subsets of objects 3-dimensional crosstables Alternatively R> table(mvad$male, mvad$gcse5eq, mvad$catholic) , , = no no yes no 104 79 yes 133 52 , , = yes no yes no 82 77 yes 133 52 23/7/2009gr 20/64

  21. Sequential data analysis Objects and operators Importation/exportation Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 21/64

  22. Sequential data analysis Objects and operators Importation/exportation Opening and closing R R saves the working environment in the .RData file of the current directory. getwd() provides the current directory setwd("C:/introR/") sets the current directory save.image() saves the working directory in .RData load("example.RData") loads working directory example.RData On line help command: help(subject) , or ?sujet 23/7/2009gr 22/64

  23. Sequential data analysis Objects and operators Importation/exportation Object Management List of objects in the“Workingspace” R> ls() [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex" "sex.fac" [9] "sex.fac2" "sex.fac2n" "tab.male.gcse" Removing objects R> rm(sex, sex.fac2) R> ls() [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex.fac" "sex.fac2n" [9] "tab.male.gcse" 23/7/2009gr 23/64

  24. Sequential data analysis Objects and operators Importation/exportation Importing text files R can import text files (tab-delimited, CSV, ...) with read.table() read.table(file, header = FALSE, sep = "", quote = "\" ✬ ", dec = ".", row.names, col.names, as.is = FALSE, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#") Ex: importing a tab-delimited file with variables names in first row: R> example <- read.table(file = "example.dat", header = TRUE, + sep = "\t") R> example age revenu sexe 1 25 100 homme 2 45 200 femme 3 30 50 homme 23/7/2009gr 24/64

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend