outline
play

Outline About this Tutorial An Introduction to the R Environment - PowerPoint PPT Presentation

About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and


  1. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and arithmetic Matrix calculus Peter Dalgaard Important functions Working with data frames Center for Statistics Programming Copenhagen Business School Statistics with R MPAS Lecture April 2010 Modelling Graphics 1 / 70 2 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Practicalities Plan ◮ Elementary things about R ◮ Data types and some important functions ◮ Matrix calculus ◮ Short introduction (approx. 90 min) ◮ Working with data sets ◮ Focus on things relevant to your project ◮ R as a programming language ◮ Script of demos on MPAS web page ◮ Basic statistics and tests ◮ Modeling tools ◮ Elementary Graphics 4 / 70 5 / 70

  2. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic The R environment R is a vectorized language ◮ The basic data type in R is a vector ◮ Built around the programming language R, an Open ◮ Vectors often represent data (e.g. the age for each Source dialect of the S language participant in a study), but also other things like regression ◮ R is Free Software, and runs on a variety of platforms (I’ll coefficients, plot limits, cut points, etc. be using Linux here). ◮ Data types: Numeric (integer/double), character (strings), ◮ Command-line execution based on function calls logical (TRUE/FALSE) ◮ Extensible with user functions ◮ Factor (really integer + level attribute) for categorical ◮ Workspace containing data and functions variables ◮ Various graphics devices (interactive and non-interactive) ◮ Lists (generic vectors) 7 / 70 8 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Basic operations Demo 1 ◮ Standard arithmetic is vectorized : x + y adds each element of x to the corresponding element of y x <- round(rnorm(10,mean=20,sd=5)) # simulate data ◮ Recycling: If operating on two vectors of different length, x the shorter one is replicated (with warning if it is not an mean(x) even multiple) m <- mean(x) ◮ c — concatenate: c(7, 9, 13) x - m # notice recycling ◮ seq — sequences: seq(1, 9, 2) , short form: 1:5 is sqrt(sum((x - m)^2)/9) the same as seq(1,5,1) sd(x) ◮ rep — replication rep(1:3, 3:1) (1 1 1 2 2 3) ◮ sum , mean , range , . . . 9 / 70 10 / 70

  3. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Smart indexing Extended data types ◮ The basic vector types can be combined and extended to form more complex data structures ◮ a[5] single element ◮ Attributes extend a basic type with further information. ◮ a[c(5,6,7)] several elements E.g., a vector can have a names attribute, for more ◮ a[-6] all except the 6th readable printing ◮ a[b>200] index by logical vector ◮ Classes have two main functions: ◮ a["name"] by name ◮ Hide details ◮ Allow function dispatch (functions that behave differently depending on the class. 11 / 70 12 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Factors Lists (generic vectors) ◮ Factors are used to describe nominal variables (the term originates from factorial designs ) ◮ Internally, they are just integer codes plus a set of names ◮ A vector where the elements can have different types for the levels ◮ Functions often return (classed) lists ◮ They have class "factor" making them (a) print nicely ◮ Indexing: and (b) behave consistently ◮ lst$A ◮ A factor can also be ordered (class "ordered" ), ◮ lst[[1]] first element ◮ lst[1] list containing the first element signifying that there is a natural sort order on the levels ◮ In model specifications, factors play a fundamental role by indicating that a variable should be treated as a classification rather than as a quantitative variable. 13 / 70 14 / 70

  4. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Matrix calculus Demo 2 Elementary matrix manipulations ◮ Matrices are implemented as vectors with a dim attribute (of length 2) ◮ Constructor function: matrix(1:4,2,2) (indexing, factors, lists) ◮ Indexing in the usual way M[i,j] , with all the features of “smart indexing”. M[,j] is j th column, etc. ◮ Special feature for matrices and arrays: Matrix indexing , M[A] where A has as many columns as M has dimensions. 15 / 70 16 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Matrix calculus Matrix algebra Demo 3 Permutation matrix ( Mx permutes the elements of x ) ◮ R contains a pretty full set of primitives for matrix calculus perm <- sample(5) # w/o replacement ◮ A %*% B for matrix multiplication n <- length(perm) M <- matrix(0,n,n) ◮ solve(A, b) for solving linear equations. ( solve(A) for M[cbind(1:n,perm)] <- 1 M matrix inverse) perm ◮ t(A) for transpose of a matrix. M %*% 1:n 17 / 70 18 / 70

  5. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Matrix calculus Other matrix techniques Row and column matrices ◮ diag has multiple functions: creation of diagonal matrices, ◮ R usually treats vectors as row or column matrices “as extracting, and manipulating the diagonal of a matrix. appropriate” (i.e., it guesses) Beware: diag(v) is ambiguous if v can have length 1. ◮ E.g., you can left- or right-multiply a vector by a matrix, ◮ row(X) , col(X) are convenient for generating some even though the latter formally requires transposition forms of matrices. ◮ And even do y %*% x to get the inner product y ′ x ◮ upper.tri and lower.tri generate indexes for ◮ If you want to be explicit about it, you can use rbind or accessing the upper/lower triangle of a matrix. cbind to create the appropriate single-row or ◮ Matrices can be “glued together” using cbind and rbind single-column matrix. 19 / 70 20 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Important functions Using drop() and drop=FALSE Some Basic Functions ◮ Default: If a dimension has length one, it is dropped from results. M[1,] is a vector, not 1 × n matrix. ◮ Constructors of simple objects ◮ Often convenient, but source of obscure bugs ◮ Single-column modifications ◮ Watch out for extreme cases ◮ Modifying and subsetting data frames ◮ Use M[1,drop=FALSE] to prevent this ◮ Conversely sometimes you get a matrix and want a vector, as in drop(M %*% 1:n) 21 / 70 22 / 70

  6. About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Important functions Important functions Constructors Demo 4 ◮ R deals with many kinds of objects besides data sets ◮ Need to have ways of constructing them from the command line x <- c(boys = 1.2, girls = 1.1) x ◮ We have (briefly) seen the c and list functions names(x) ◮ Notice the naming forms c(boys=1.2, girls=1.1) names(x) <- c("M", "F") x ◮ Extracting and setting names with names(x) matrix(1:4,ncol=2) cbind(x=0:3,"exp(x)"=exp(0:3)) ◮ For matrices and arrays, use the (surprise) matrix and array functions. data.frame for data frames. ◮ It is also fairly common to construct a matrix from its columns using cbind 23 / 70 24 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Important functions Important functions The factor Function Demo 5 ◮ This is typically used when read.table gets it wrong ◮ E.g. group codes read as numeric aq <- airquality ◮ Or read as factors, but with levels in the wrong order (e.g. aq$Month <- factor(aq$Month, levels=5:9, c("rare", "medium", "well-done") sorted labels=month.name[5:9]) alphabetically.) aq$Month ◮ Notice the slightly confusing use of levels and labels levels(aq$Month) <- month.abb[5:9] arguments. aq$Month ◮ levels are the value codes on input ◮ labels are the value codes on output (and become the levels of the resulting factor) 25 / 70 26 / 70

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend