R Basics / Course Business l Well be using a sample dataset in class - - PowerPoint PPT Presentation

r basics course business
SMART_READER_LITE
LIVE PREVIEW

R Basics / Course Business l Well be using a sample dataset in class - - PowerPoint PPT Presentation

R Basics / Course Business l Well be using a sample dataset in class today: l CourseWeb: Course Documents Sample Data Week 2 l Can download to your computer before class l Thanks for answering CourseWeb background survey! l If sitting in


slide-1
SLIDE 1

R Basics / Course Business

l We’ll be using a sample dataset in class today:

l CourseWeb: Course Documents à Sample Data

à Week 2

l Can download to your computer before class

l Thanks for answering CourseWeb background

survey!

l If sitting in on the course, e-mail me so I can

add you to CourseWeb

slide-2
SLIDE 2

R Basics

slide-3
SLIDE 3

R Basics

slide-4
SLIDE 4

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-5
SLIDE 5

R Commands

l Simplest way to interact with R is by

typing in commands at the > prompt:

R STUDIO R

slide-6
SLIDE 6

R as a Calculator

l Typing in a simple calculation shows us

the result:

l 608 + 28

l What’s 11527 minus 283? l Some more examples:

l 400 / 65

(division)

l 2 * 4 (multiplication) l 5 ^ 2 (exponentiation)

slide-7
SLIDE 7

Functions

l More complex calculations can be done

with functions:

l sqrt(64) l Can often read these

left to right (“square root of 64”)

l What do you think

this means?

l abs(-7)

What the function is (square root) In parenthesis: What we want to perform the function on

slide-8
SLIDE 8

Arguments

l Some functions have settings

(“arguments”) that we can adjust:

l round(3.14)

  • Rounds off to the nearest integer (zero

decimal places)

l round(3.14, digits=1)

  • One decimal place
slide-9
SLIDE 9

Nested Functions

slide-10
SLIDE 10

Nested Functions

l We can use multiple functions in a row,

  • ne inside another
  • sqrt(abs(-16))
  • “Square root of the absolute value of -16”

l Don't get scared when you see multiple

parentheses!

  • Can often just read left to right
  • R first figures out the thing nested in

the middle

  • Can you round off the square root of 7?
slide-11
SLIDE 11

Using Multiple Numbers at Once

l When we want to use multiple numbers,

we concatenate them

l c(2,6,16)

  • A list of the numbers 2, 6, and 16

l Sometimes a computation requires

multiple numbers

  • mean(c(2,6,16))

l Also a quick way to do the same thing to

multiple different numbers:

  • sqrt(c(16,100,144))
slide-12
SLIDE 12

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-13
SLIDE 13

Course Documents: Sample Data: Week 2

l Reading plausible versus implausible

sentences

l “Scott chopped the carrots with a knife.”

“Scott chopped the carrots with a spoon.”

Measure reading time

  • n final

word

Note: Simulated data; not a real experiment.

slide-14
SLIDE 14

Course Documents: Sample Data: Week 2

l Reading plausible versus implausible

sentences

l Reading time on critical word l 36 subjects l Each subject sees 30 items (sentences):

half plausible, half implausible

l Interested in changes over time, so we’ll

track number of trials remaining (29 vs 28 vs 27 vs 26…)

slide-15
SLIDE 15

Reading in Data

l Make sure you have the dataset at this

point if you want to follow along: Course Documents à Sample Data à Week 2

slide-16
SLIDE 16

Reading in Data – RStudio

l Navigate to the

folder in lower-right

l More ->

Set as Working Directory

l Open a “comma-separated value” file:

  • experiment <-read.csv('week2.csv')

Name of the “dataframe” we’re creating (whatever we want to call this dataset) read.csv is the function name File name

slide-17
SLIDE 17

Reading in Data – Regular R

l Read in a “comma-separated value” file:

  • experiment <- read.csv

('/Users/scottfraundorf/Desktop/week2.csv') Name of the “dataframe” we’re creating (whatever we want to call this dataset) read.csv is the function name Folder & file name

  • Drag & drop the file into R to get the

full folder & filename

slide-18
SLIDE 18

Looking at the Data: Summary

l A “big picture” of the dataset:

l summary(experiment)

l summary() is a very important function!

l Basic info & descriptive statistics l Check to make sure the data are correct

slide-19
SLIDE 19

Looking at the Data: Summary

l A “big picture” of the dataset:

l summary(experiment)

l We can use $ to refer to a specific

column/variable in our dataset:

l summary(experiment$ItemName)

slide-20
SLIDE 20

Looking at the Data: Raw Data

l Let’s look at the data!

l experiment

l Ack! That’s too much! How about just a

few rows?

l head(experiment) l head(experiment, n=10)

slide-21
SLIDE 21

Reading in Data: Other Formats

l Excel:

  • library(gdata)
  • experiment <-

read.xls('/Users/scottfraundorf/De sktop/week2.xls')

l SPSS:

  • library(foreign)
  • experiment <-

read.spss('/Users/scottfraundorf/D esktop/week2.spss', to.data.frame=TRUE)

slide-22
SLIDE 22

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-23
SLIDE 23

R Scripts

l Save & reuse commands with a script

R STUDIO R

File -> New Document

slide-24
SLIDE 24

R Scripts

l Run commands without typing them all

again

l R Studio:

l Code -> Run Region -> Run All: Run entire script l Code -> Run Line(s): Run just what you’ve

highlighted/selected

l R:

  • Highlight the section of script you want to run
  • Edit -> Execute

l Keyboard shortcut for this:

  • Ctrl+Enter (PC), ⌘+Enter (Mac)
slide-25
SLIDE 25

R Scripts

l Saves times when re-running analyses l Other advantages? l Some:

  • Documentation for yourself
  • Documentation for others
  • Reuse with new analyses/experiments
  • Quicker to run—can automatically

perform one analysis after another

slide-26
SLIDE 26

R Scripts—Comments

l Add # before a line to make it a

comment

  • Not commands to R, just notes to self

(or other readers)

  • Can also add a # to make the rest of a

line a comment

  • summary(experiment$Subject) #awesome
slide-27
SLIDE 27

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-28
SLIDE 28

Descriptive Statistics

l Remember how we referred to a

particular variable in a dataframe?

  • $

l Combine that with functions:

  • mean(experiment$RT)
  • median(experiment$RT)
  • sd(experiment$RT)

l Or, for a categorical variable:

  • levels(experiment$ItemName)
  • summary(experiment$Subject)
slide-29
SLIDE 29

Descriptive Statistics

l We often want to look at a dependent variable

as a function of some independent variable(s)

  • tapply(experiment$RT,

experiment$Condition, mean)

  • “Split up the RTs by Condition, then get the mean”

l Try getting the mean RT for each item l How about the median RT for each subject? l To combine multiple results into one table,

“column bind” them with cbind():

l cbind(

tapply(experiment$RT, experiment$Condition, mean), tapply(experiment$RT, experiment$Condition, sd) )

slide-30
SLIDE 30

Descriptive Statistics

l Can have 2-way tables...

  • tapply(experiment$RT,

list(experiment$Subject, experiment$Condition), mean)

  • 1st variable is rows, 2nd is columns

l ...or more!

  • tapply(experiment$RT,

list(experiment$ItemName, experiment$Condition, experiment$TestingRoom), mean)

slide-31
SLIDE 31

Descriptive Statistics

l Contingency tables for categorical

variables:

  • xtabs (~ Subject + Condition,

data=experiment)

slide-32
SLIDE 32

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-33
SLIDE 33

Subsetting Data

l Often, we want to examine or use just

part of a dataframe

l Remember how we read our dataframe?

  • experiment <- read.csv(...)

l Create a new dataframe that's just a

subset of experiment:

  • experiment.LongRTsRemoved <-

subset(experiment, RT < 2000)

Inclusion criterion: RT less than 2000 ms Original dataframe New dataframe name

slide-34
SLIDE 34

Subsetting Data: Logical Operators

l Try getting just the observations with

RTs 200 ms or more:

  • experiment.ShortRTsRemoved <-

subset(experiment, RT >= 200)

l Why not just delete the bad RTs from the

spreadsheet?

l Easy to make a mistake / miss some of them l Faster to have the computer do it l We’d lose the original data l No documentation of how we subsetted the data

slide-35
SLIDE 35

Subsetting Data: AND and OR

l What if we wanted only RTs between 200

and 2000 ms?

  • Could do two steps:
  • experiment.Temp <-

subset(experiment, RT >= 200)

  • experiment.BadRTsRemoved <-

subset(experiment.Temp, RT <= 2000)

l One step with & for AND:

  • experiment2 <- subset(experiment,

RT >= 200 & RT <= 2000)

slide-36
SLIDE 36

Subsetting Data: AND and OR

l What if we wanted only RTs between 200

and 2000 ms?

l One step with & for AND:

  • experiment2 <- subset(experiment,

RT >= 200 & RT <= 2000)

l | means OR:

  • experiment.BadRTs <-

subset(experiment, RT < 200 | RT > 2000)

  • Logical OR (“either or both”)
slide-37
SLIDE 37

Subsetting Data: == and !=

l Get a match / equals:

  • experiment.LastTrials <-

subset(experiment, TrialsRemaining == 0)

l Words/categorical variables need quotes:

  • experiment.ImplausibleSentences <-

subset(experiment, Condition=='Implausible')

l != means “not equal to”:

  • experiment.BadSubjectRemoved <-

subset(experiment, Subject != 'S23')

Note DOUBLE equals sign Drops subject “S23”

slide-38
SLIDE 38

Subsetting Data: %in%

l Sometimes our inclusion criteria aren't so

mathematical

l Suppose I just want the “Ducks” and

“Panther” items

l We can check against any arbitrary list:

  • experiment.SpecialItems <-

subset(experiment, ItemName %in% c('Ducks', 'Panther'))

l Or, keep just things that aren't in a list:

  • experiment.NonNativeSpeakersRemoved

<- subset(experiment, Subject %in% c('S10', 'S23') == FALSE)

slide-39
SLIDE 39

Writing Data

l Note that these subsets are just creating

new dataframes in R

l If you want to save to a folder on your

computer, use write.csv():

l write.csv(experiment.BadRTsRemoved,

file='experiment_badremoved.csv')

slide-40
SLIDE 40

Logical Operators Review

l Summary

  • >

Greater than

  • >=

Greater than or equal to

  • <

Less than

  • <=

Less than or equal to

  • &

AND

  • |

OR

  • ==

Equal to

  • !=

Not equal to

  • %in%

Is this included in a list?

slide-41
SLIDE 41

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-42
SLIDE 42

Let’s Practice It!

l Try getting the mean RT for each number

  • f TrialsRemaining (29 trials remaining,

28 trials remaining, etc.)

l Try getting a subset of just the people in

TestingRoom 3

slide-43
SLIDE 43

Let’s Practice It!

l Try getting the mean RT for each number

  • f TrialsRemaining (29 trials remaining,

28 trials remaining, etc.)

l tapply(experiment$RT,

experiment$TrialsRemaining, mean)

l Try getting a subset of just the people in

TestingRoom 3

l experiment.Room3 <- subset(experiment,

TestingRoom == 3)

slide-44
SLIDE 44

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-45
SLIDE 45

Assignment

l Remember the pointing arrow used to

create dataframes and subsets?

  • e.g., experiment <- read.csv(...)

l This is the assignment operator. It

saves results or values in a variable

  • x <- sqrt(64)
  • CriticalTrialsPerSubject <- 30
  • Remember, typing a name by itself shows

you the current value:

  • CriticalTrialsPerSubject

l Assigning a new value overwrites the old

slide-46
SLIDE 46

Assignment

l We can use this to create new columns

in our dataframe:

  • experiment$ExperimentNumber <- 1
  • Here, the same number (1) is assigned to

every trial

l Or, compute a value for each row:

  • experiment$RTinSeconds <-

experiment$RT / 1000

  • For each trial, finds the RT in seconds for that

specific trial and saves that into RTinSeconds

  • Similar to an Excel formula
slide-47
SLIDE 47

Assignment

l We can use this to create new columns

in our dataframe:

  • experiment$ExperimentNumber <- 1
  • Here, the same number (1) is assigned to

every trial

l Another example:

  • experiment$SerialPosition <-

30 - experiment$TrialsRemaining

  • For each trial, calculates the serial position

(trial #1, trial #2, etc.) and saves the result into

SerialPosition

slide-48
SLIDE 48

ifelse()

IF YOU WANT DESSERT, EAT YOUR PEAS … OR ELSE!

slide-49
SLIDE 49

ifelse()

l ifelse(): Use a test to decide which of two

values to assign:

l experiment$Half <- ifelse(

experiment$SerialPosition <= 15, 1, 2)

l Possible to nest ifelse() if we need more

than 2 categories:

l experiment$Third <-

ifelse(experiment$SerialPosition <= 10, 1, ifelse(experiment$SerialPosition <= 20, 2, 3))

Function name If serial position IS <= 15… “Half” is 1 If it’s NOT, “Half” is 2

slide-50
SLIDE 50

ifelse()

l Instead of specific numbers, can use

  • ther columns or a formula:

l experiment$RT.Fixed <- ifelse(

experiment$TestingRoom==2, experiment$RT + 100, experiment$RT)

l How can we check if this worked?

l head(experiment)

Fixed RTs are indeed 100 ms longer for TestingRoom 2

slide-51
SLIDE 51

ifelse()

l Instead of specific numbers, can use

  • ther columns or a formula:

l experiment$RT.Fixed <- ifelse(

experiment$TestingRoom==2, experiment$RT + 100, experiment$RT)

l How can we check if this worked?

l tapply(experiment$RT,

experiment$TestingRoom, mean)

l tapply(experiment$RT.Fixed,

experiment$TestingRoom, mean)

l Only room 2 should be different

slide-52
SLIDE 52

Which do you like better?

  • experiment$Half <-

ifelse(experiment$SerialPosition <= 15, 1, 2)

  • Shorter & faster to write

l vs:

  • CriticalTrialsPerSubject <- 30
  • experiment$Half <-

ifelse(experiment$SerialPosition <= (CriticalTrialsPerSubject / 2) , 1, 2)

  • Explains where the 15 comes from—helpful if we come back

to this script later

  • We can also refer to CriticalTrialsPerSubject

variable later in the script & this ensure it’s consistent

  • Easy to update if we change the number of critical trials
slide-53
SLIDE 53

Deleting Variables

l It is also possible to delete a variable by

assigning it the value of NULL

  • experiment$TrialsRemaining <- NULL
  • Since we now have SerialPosition, maybe we

don’t want to bother keeping TrialsRemaining any more

slide-54
SLIDE 54

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-55
SLIDE 55

Referring to Specific Cells

l So far, we’ve seen how to

  • Create a new dataframe that’s a subset of an

existing dataframe

  • Modify a dataframe by creating an entire column

l What if we want to modify a dataframe by

adjusting some existing values?

l e.g., replace all RTs above 2000 ms with the

number 2000 (“fencing”)

l Creating a new subset won’t work because we

want to change the original dataframe

l Need a way to edit specific values

slide-56
SLIDE 56

Referring to Specific Cells

l Use square brackets [ ] to refer to specific

entries in a dataframe:

  • Row, column
  • experiment[3,7]

l Omit the row or column number to mean all

rows or all columns:

  • experiment[3,]
  • experiment[,4]

l Can also use column names:

l experiment[,'RT']

All rows in the RT column

l Remember c()? We can check multiple rows:

  • experiment[c(1:4),]

Row 3, all columns All rows in column 4

slide-57
SLIDE 57

Logical Indexing

l We can look at rows or columns that meet

a specific criterion...

  • experiment[experiment$RT < 200,]

l Can use this as another way to subset:

  • experiment.ShortRTsRemoved <-

experiment[experiment$RT > 200, ]

  • Actually, subset() just does this

l But we can also set values this way

  • experiment[experiment$RT < 200,

'RT'] <- 200

  • In the dataframe experiment, find the rows

where RT < 200, and set the column RT to 200

slide-58
SLIDE 58

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-59
SLIDE 59

Types

l R treats continuous & categorical variables

differently:

l These are different data types:

  • Numeric
  • Factor: Variable w/ fixed set of

categories (e.g., treatment vs. placebo)

  • Character: Freely entered text (e.g.,
  • pen response question)
slide-60
SLIDE 60

Types

l R's heuristic when reading in data:

  • Letters anywhere in the column →

factor

  • No letters, purely numbers → numeric
slide-61
SLIDE 61

Type Conversion: Numeric → Factor

l Sometimes we need to correct this

  • Room 4 is not “twice as much” Room 2

l Create a new column that's the factor

(categorical) version of TestingRoom:

  • experiment$Room.Factor <-

as.factor(experiment$TestingRoom)

l Or, just overwrite the old column:

  • experiment$TestingRoom <-

as.factor(experiment$TestingRoom)

slide-62
SLIDE 62

Conversion: Character → Factor

l When ifelse() results in words, R creates

a character variable rather than a factor

  • Need to convert it

l Wrong:

  • experiment$FaveRoom <-

ifelse(experiment$TestingRoom==3, 'My favorite room', 'Not favorite')

l Right:

  • experiment$FaveRoom <-

as.factor(ifelse(experiment$TestingRo

  • m== 3, 'My favorite room', 'Not

favorite'))

slide-63
SLIDE 63

Type Conversion: Factor → Numeric

l To change a factor to a number, need to

turn it into a character first:

  • experiment$Age.Numeric <-

as.numeric(as.character(experiment$ Age))

slide-64
SLIDE 64

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-65
SLIDE 65

NA

l We might have run into some problems

trying to change Age into a numerical variable...

l NA means “not available”...

  • Characters that don't convert to numbers
  • Missing data in a spreadsheet
  • Invalid computations
slide-66
SLIDE 66

NA

l If we try to do computations on a set of

numbers where any of them is NA, we get

NA as a result...

  • mean(experiment$Age.Numeric)

l R wants you to think about how you want

to treat these missing values

slide-67
SLIDE 67

NA – Solutions

l To ignore the NAs when doing a specific

computation, use na.rm=TRUE:

  • sd(experiment$Age.Numeric,na.rm=TRUE)

l To get a copy of the dataframe that excludes

all rows with an NA (in any column):

  • experiment.NoNAs <- na.omit(experiment)

l Change NAs to something else with logical

indexing:

  • experiment[is.na(experiment$

Age.Numeric)==TRUE, ]$Age.Numeric <- 23

slide-68
SLIDE 68

R Basics

l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help

slide-69
SLIDE 69

Getting Help

l Get help on a specific known function:

  • ?sqrt
  • ?write.csv
  • Lists all

arguments

l Try to find a

function on a particular topic:

  • ??logarithm
slide-70
SLIDE 70

Analyses & Add-On Packages

l Some built-in analyses:

l aov()

ANOVA

l lm()

Linear regression

l glm()

Generalized linear models (e.g., logistic)

l cor.test()

Pearson correlation

l t.test()

t-test

l Help function (?) will tell you about the

arguments to these particular functions

slide-71
SLIDE 71

Wrap-Up

l Can use R for:

  • Reading in data
  • Descriptive statistics
  • Subsetting data
  • Creating new variables