1
play

1 Sequential data analysis Sequential data analysis Objects and - PDF document

Sequential data analysis Sequential data analysis Outline Sequential data analysis Introduction 1 An introduction to R Installing and launching R 2 Gilbert Ritschard Objects and operators 3 Department of Econometrics and Laboratory of


  1. Sequential data analysis Sequential data analysis Outline Sequential data analysis Introduction 1 An introduction to R Installing and launching R 2 Gilbert Ritschard Objects and operators 3 Department of Econometrics and Laboratory of Demography, University of Geneva Elements of statistical modeling 4 http://mephisto.unige.ch/biomining Growing trees: rpart and party APA-ATI Workshop on Exploratory Data Mining 5 University of Southern California, Los Angeles, CA, July 2009 Custom functions and programming 6 23/7/2009gr 1/64 23/7/2009gr 2/64 Sequential data analysis Sequential data analysis Introduction Installing and launching R R Installation R is: R and the modules can be downloaded from the CRAN Software environment for statistical computing and graphics http://cran.r-project.org Based on the S language (as is S-PLUS) By default, no GUI is proposed under Linux. Freely distributed under GPL licence Under Windows and MacOSX, the basic GUI remains limited. Available for any platform: Windows/Mac/Linux/Unix ... but try Rcmdr (can be download from the CRAN) Easily extensible with numerous contributed modules 23/7/2009gr 4/64 23/7/2009gr 6/64 Sequential data analysis Sequential data analysis Installing and launching R Objects and operators Introduction to R objects First steps in R Objects R works with objects Four possibilities to send commands to R Assigning a value to an object ‘a’ 1 Type commands in the R Console. R> a <- 50 2 The script editor - > File/New script (only Windows/Mac) Operation on an object 3 The Rcmd module R> a/50 4 Use a text editor with R support (Tinn-R, WinEdt, etc.) [1] 1 Case-sensitive: a � = A In addition, you can also use your preferred text editor and R> A/50 copy-paste the commands into the R Console, Error: object "A" not found 23/7/2009gr 7/64 23/7/2009gr 10/64 1

  2. Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Types of objects Factors I A factor is defined by“levels”(possible values) and an Different types of objects indicator of whether it is ordinal or not. vector: 4 5 1 or in R c(4,5,1) Vector of“strings” ” D”” E”” A” or in R c("D","E","A") R> sex <- c("man", "woman", "woman", "man", "woman") R> sex factor: categorical variable [1] "man" "woman" "woman" "man" "woman" matrix: table of numerical data Creation of a factor data frame: general data table (columns can be of different R> sex.fac <- factor(sex) types) R> sex.fac ... [1] man woman woman man woman Levels: man woman R> attributes(sex.fac) 23/7/2009gr 11/64 23/7/2009gr 12/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Factors II Objects (continued) I $levels [1] "man" "woman" $class Results can always be stored in a new object [1] "factor" Example: R> table(sex.fac) sex.fac man woman R> library(TraMineR) 2 3 R> data(mvad) R> tab.male.gcse <- table(mvad$male, mvad$gcse5eq) To change the order of the“levels” R> tab.male.gcse R> sex.fac2 <- factor(sex, levels = c("woman", "man")) no yes R> sex.fac2n <- as.numeric(sex.fac2) no 186 156 R> table(sex.fac2, sex.fac2n) yes 266 104 sex.fac2n sex.fac2 1 2 woman 3 0 man 0 2 23/7/2009gr 13/64 23/7/2009gr 14/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Objects (continued) Row and marginal distributions Depending of its class, methods can be directly applied to it Row and column distributions R> prop.table(tab.male.gcse, 1) R> plot(tab.male.gcse, cex.axis = 1.5) no yes no 0.5438596 0.4561404 tab.male.gcse yes 0.7189189 0.2810811 yes no R> prop.table(tab.male.gcse, 2) no yes no no 0.4115044 0.6000000 yes 0.5884956 0.4000000 Margins R> margin.table(tab.male.gcse, 1) yes no yes 342 370 R> margin.table(tab.male.gcse, 2) no yes 452 260 23/7/2009gr 15/64 23/7/2009gr 16/64 2

  3. Sequential data analysis Sequential data analysis Objects and operators Objects and operators Acting on subsets of objects Acting on subsets of objects Indexes Crosstable on data subsets Indexing vectors x[n] nth element x[-n] all but the nth element Cross tables for catholic and non catholic x[1:n] first n elements x[-(1:n)] elements from n+1 to the end R> table(mvad$male[mvad$catholic == "yes"], mvad$gcse5eq[mvad$catholic == x[c(1,4,2)] specific elements + "yes"]) x["name"] element named "name" no yes x[x > 3] all elements greater than 3 no 82 77 x[x > 3 & x < 5] all elements between 3 and 5 yes 133 52 x[x %in% c("a","and","the")] elements in the given set Indexing matrices R> table(mvad$male[mvad$catholic == "no"], mvad$gcse5eq[mvad$catholic == + "no"]) x[i,j] element at row i, column j x[i,] row i no yes x[,j] column j no 104 79 x[,c(1,3)] columns 1 and 3 yes 133 52 x["name",] row named "name" Indexing data frames (matrix indexing plus the following) x[["name"]] column named "name" x$name idem 23/7/2009gr 18/64 23/7/2009gr 19/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Acting on subsets of objects Importation/exportation 3-dimensional crosstables Opening and closing R Alternatively R saves the working environment in the .RData file of the R> table(mvad$male, mvad$gcse5eq, mvad$catholic) current directory. , , = no getwd() provides the current directory no yes setwd("C:/introR/") no 104 79 sets the current directory yes 133 52 save.image() saves the working directory in .RData , , = yes load("example.RData") loads working directory example.RData no yes no 82 77 On line help command: help(subject) , or ?sujet yes 133 52 23/7/2009gr 20/64 23/7/2009gr 22/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Importation/exportation Importation/exportation Object Management Importing text files R can import text files (tab-delimited, CSV, ...) with read.table() List of objects in the“Workingspace” read.table(file, header = FALSE, sep = "", quote = "\" ✬ ", dec = ".", R> ls() row.names, col.names, as.is = FALSE, na.strings = "NA", [1] "a" "datadir" "filename" "graphdir" colClasses = NA, nrows = -1, [5] "mvad" "pngdir" "sex" "sex.fac" skip = 0, check.names = TRUE, fill = !blank.lines.skip, [9] "sex.fac2" "sex.fac2n" "tab.male.gcse" strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#") Removing objects R> rm(sex, sex.fac2) Ex: importing a tab-delimited file with variables names in first row: R> ls() R> example <- read.table(file = "example.dat", header = TRUE, + sep = "\t") [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex.fac" "sex.fac2n" R> example [9] "tab.male.gcse" age revenu sexe 1 25 100 homme 2 45 200 femme 3 30 50 homme 23/7/2009gr 23/64 23/7/2009gr 24/64 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend