Table of contents 1. Introduction: You are already an - - PowerPoint PPT Presentation

table of contents
SMART_READER_LITE
LIVE PREVIEW

Table of contents 1. Introduction: You are already an - - PowerPoint PPT Presentation

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items Section 1: 4. Ordering items for presentation Design 5. Judgment Tasks 6. Recruiting participants 7. Pre-processing data (if necessary) 8.


slide-1
SLIDE 1

Table of contents

137

Conditions Items Ordering items for presentation Judgment Tasks Recruiting participants Pre-processing data (if necessary) Introduction: You are already an experimentalist 1. 2. 3. 4. 5. 6. 7. Plotting 8. Building linear mixed effects models 9. Evaluating linear mixed effects models using Fisher 10. Bayesian statistics and Bayes Factors 12. Validity and replicability of judgments 13. The source of judgment effects 14. Gradience in judgments 15. Section 1: Design Section 2: Analysis Section 3: Application Neyman-Pearson and controlling error rates 11.

slide-2
SLIDE 2

What is pre-processing

138

Pre-processing is any manipulation you do to your data before the actual statistical analysis. For organizational purposes, I am going to lump two types of pre-processing together in this section, though they are distinct in principle. One type of pre-processing that you will always have to do is data formatting. You need to arrange your data in such a way that you can easily do the analysis (modeling, plotting, etc) that you need to do. Data formatting doesn’t change your data, so you should feel free to do whatever you need to do to make things work. Another type of pre-processing that you may have to do is data

  • transformation. This is where you take your raw data and perform some

number of calculations to derive new data (e.g., averaging, z-score transformations, log transformations, or in EEG, filtering). Data transformations should always be theoretically justified, and if possible, kept to a minimum. They change your data! I am going to cover both in this section because (i) they both use R, and (ii) the result is a data file that you can use for statistical analysis and plotting.

slide-3
SLIDE 3

Formatting your data

139

slide-4
SLIDE 4

Two formats: wide and long

140

When humans enter experimental data into a table, they tend to do it in wide

  • format. It is a very intuitive format for data.

age trial 1 trial 2 trial 3 trial 4 participant 1 18 2 7 6 1 participant 2 22 2 6 5 1 participant 3 23 3 7 4 2 In wide format, each row represents a participant. Each column represents something about the participant, such as a property or an experimental

  • trial. And each cell contains the value for that property.

Wide format has some uses in computer-aided analysis, typically as part of a calculation of a new value; but it is not the dominant format. I would say that I use wide format less than 5% of the time. 95% of the time, the analyses that you will perform will call for long format.

slide-5
SLIDE 5

Two formats: wide and long

141

When humans enter experimental data into a table, they tend to do it in wide

  • format. It is a very intuitive format for data.

age trial 1 trial 2 trial 3 trial 4 participant 1 18 2 7 6 1 participant 2 22 2 6 5 1 participant 3 23 3 7 4 2 In wide format, each row represents a participant. Each column represents something about the participant, such as a property or an experimental

  • trial. And each cell contains the value for that property.

Wide format has some uses in computer-aided analysis, typically as part of a calculation of a new value; but it is not the dominant format. I would say that I use wide format less than 5% of the time. 95% of the time, the analyses that you will perform will call for long format. Wide format grows longer by one row every time you add a participant, and by one column every time you add a trial/response/measurement/property. Because many experiments will have more trials/responses/properties than participants, the table will often look like a rectangle whose width is greater than its height.

slide-6
SLIDE 6

Two formats: wide and long

142

The primary format for computer-aided statistical analysis is long format. At first, long format is less intuitive than wide format, but you will very quickly learn to appreciate its logic. participant age condition item rating trial 1 1 21 long.island 1 1 trial 2 1 21 short.non 4 7 trial 3 1 21 long.non 2 5 trial 4 1 21 short.island 3 5 In long format, each row represents a trial. Each column represents a property of that trial, such as the ID of the participant in that trial, the condition of that trial, the item used in that trial, and ultimately the rating (or response) that came from that trial.

slide-7
SLIDE 7

Two formats: wide and long

143

The primary format for computer-aided statistical analysis is long format. At first, long format is less intuitive than wide format, but you will very quickly learn to appreciate its logic. participant age condition item rating trial 1 1 21 long.island 1 1 trial 2 1 21 short.non 4 7 trial 3 1 21 long.non 2 5 trial 4 1 21 short.island 3 5 In long format, each row represents a trial. Each column represents a property of that trial, such as the ID of the participant in that trial, the condition of that trial, the item used in that trial, and ultimately the rating (or response) that came from that trial. Long format is called “long” because it leads to really long tables. Each subject will have a number of rows equal to the number of trials in the the experiment. So 40 participants x 100 items = 4000 rows. Both formats grow longer with additional participants, but long format grows longer much faster. And long format grows longer with additional trials (wide format grows wider with additional trials).

slide-8
SLIDE 8

AMT gives you results in wide format (IBEX gives results in its own hybrid format)

144

But that is ok, we can use R to convert the results to long format. Exercise 6: convert wide format AMT data to long format In the document exercise.6.pdf, I give you a list of functions that you can (and probably will) use to do this. The trick with this, and any script you write, is to start by writing out the steps that you want to achieve in plain English. Then you can figure out how to make R perform those steps. In this case, you are re-arranging the data. So figure out how you would do that (with cutting and pasting, and filling in labels), and then convert those steps to R.

slide-9
SLIDE 9

There are two solution scripts on the website

145

I’ve created two scripts that can convert wide AMT data to long format: convert.to.long.format.v1.R and convert.to.long.format.v2.R. Version 1 works very similarly to the way you would convert from wide to long if you were cutting and pasting in excel. It cuts away different pieces of the dataset, stacks the columns that need to be stacked, and pastes them back together. Version 2 uses functions from two packages that were specifically designed to make manipulating data easier (including converting from wide format to long format). These packages are tidyr and dplyr. These two packages are now available in a single package called tidyverse. Tidyverse also includes other packages that are useful for data manipulation and visualization, including ggplot2, which we will use next time to make plots! We will go through these later so that you can see what the code looks like. You can also add them to your growing library of R scripts (and use them in future experiments). NEW: My scripts work on AMT data. Brian created a script to convert IBEX data to long format!

slide-10
SLIDE 10

Next step: adding item information

146

Although it is technically possible to upload item keys to AMT, and then have the AMT results contain item keys, I typically don’t do that (and IBEX cannot do that). AMT didn’t have the item or condition labels, so we need to add that ourselves. This is where our keys.csv file comes into play. We are going to use it to add item codes to the dataset. Then, we can use R to convert the item codes into condition codes and factors for each item! I have already written a script to add item keys, derive condition codes, and derive factor/level codes. It is called add.items.conditions.factors.r. The csv file called results.long.format.no.items.csv contains the results of converting from wide to long format. This means we need to add the item keys to our long format dataset.

slide-11
SLIDE 11

Next step: Correcting scale bias (z-scores)

147

Recall that pre-processing is any manipulation you do to your data before the actual statistical analysis. As a general rule, you should keep the pre- processing to a minimum (pre-processing changes your data!). But there is at least one property of judgment data that people agree should be corrected before analysis: scale bias. Different participants might choose to use a scale in different ways. Scale Bias: There are two types of scale bias that are relatively straightforward to correct. Different participants might use different parts of the scale, such as one using the high end, and another the low end). Skew: Different participants might use different amounts of the scale, such as one using only 3/7 responses, and another using the full 7 responses. Compression/ Expansion: PRO TIP: The best defense against scale bias is a well-designed

  • experiment. Try to have the mean rating of your items equal

the mid-point of your scale. Make sure all of your responses will be used, will be used an equal number of times!

slide-12
SLIDE 12

Here is an example of skew

148

If two participants are skewed in different ways, we are basically saying that their two private scales are separated from each other, but not because of meaningful differences in their judgments. If you were to average their results together, you would end up with the same pattern, but there would be a lot of (non-meaningful) variability (or spread) in your data. participant 1 participant 2 condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 Notice that these two participants both believe that condition 1 is greater than condition 2, and condition 2 is greater than condition 3. The difference here is skew on the scale.

slide-13
SLIDE 13

Skew can be corrected with centering

149

The way to correct skew is to identify the center point of all of the ratings for each participant, and then align the center points. You have several options for choosing a center point (we will discuss these next time). The mean is the most common choice for centering to remove scale bias. Important note: In this toy example, the center point is also the rating for a

  • condition. This is not necessary. The mean of all of the ratings of a participant

could be a number that isn’t the rating of a condition. condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 participant 1 participant 2

slide-14
SLIDE 14

Skew can be corrected with centering

150

If we align the center points, the same relationship holds among the conditions, and the same distances hold between the conditions. But the variability due to skew is removed. condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 This process is called mean centering; obviously, if you used a different center point, it would be a different kind of centering. One way to align the centers is to subtract the mean from each data point: new point = old point - mean This has the effect of turning the mean in 0, and arranging the points around 0 based on their distance from the mean. participant 1 participant 2

slide-15
SLIDE 15

Here is an example of compression/expansion

151

Here we have two participants that use different amounts of the scale. This means that the distances between the points is different for each of them. Notice that their centers are the same, so there is no skew. condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 If we use the mean as a center to calculate distances, what we can see is that each participant is characterized by very different distances from the mean. If we take the mean of these conditions, the means will be somewhere between the two, and there will be variability (spread) in our data set. But looking at the points, we see the same relative position, and we see that the distance differences affect all

  • f the points. This suggests a

scale bias issue, not a meaningful difference. participant 1 participant 2 3 3 1 1

slide-16
SLIDE 16

We need a standard unit of distance

152

Again, you have a number of choices for a standard unit of distance. The most common choice is to use the participant’s standard deviation. condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 We can use the standard deviation to standardize the distances for each

  • participant. Basically, we just divide each distance by the standard deviation!

We will discuss standard deviation in detail next class. For the curious, it is the root mean square error (square the distances from the mean, sum them, divide by n, then take the square root). But for now, just think of it as an average measure of the distance that each data point is from the mean (for each participant). participant 1 participant 2 3/sd 3/sd 1/sd 1/sd

slide-17
SLIDE 17

We need a standard unit of distance

153

The result is that both participants use the same distance unit, “standard deviation units”. Because it was simple unit conversion (division), the structure

  • f the data is unchanged.

One way people talk about this is that participants 1 and 2 are using different

  • scales. But finding a common unit of distance, we can put them on the same
  • scale. This can be done for any two scales — even qualitatively different ones.

By using a standard unit of distance, we can see the structure of the data without the interfering compression/expansion issue. Here we see that both participants share the same center, and they share the same relative distance from the mean (I just made these numbers up). condition 1 condition 2 condition 3 condition 1 condition 2 condition 3 participant 1 participant 2 1.5 sd 1.5 sd 1.5 sd 1.5 sd

slide-18
SLIDE 18

Putting both steps together: z-scores

154

The z-score transformation combines both steps: it centers the scores around the mean, and it converts the units to standard deviations. Z = the judgment - participant’s mean participant’s standard deviation Pro Tip 1: It is crucial that the z-score transformation is applied to each participant separately. That way you are eliminating the scale bias for each individual participant. If you z-score transform the entire sample at once, it won’t eliminate any scale bias, it will just convert the values to z-units (think about this offline to see why!). Pro Tip 2: If your goal is to eliminate scale bias, you have to use all of the data points from the participant (target items and fillers, not just the target items). I would also recommend not including the practice items. The practice items are there to help people learn how to use the scale. So those items might have different bias properties than the later items. So, my suggestion is to perform the z-score transformation using all of the item except the practice items.

slide-19
SLIDE 19

Some thoughts about z-scores

155

The primary benefit of the z-transformation is that it will help to eliminate the most common forms of scale bias, making the comparison of judgments across participants less noisy. Because the z-transformation does not alter any of the information in a data set (it is called a linear transformation), there are not many risks at all. This reduction in noise results in a noticeable increase in statistical power (scale bias introduces additional variance into the model). The z-score scale is also intuitive: 0 represents the mean, the sign of the score indicates if it is above or below the mean, and the number represents the number of standard deviations! Finally, it is relatively easy to compute. So all we need to do is apply it to each participant. The only real risk would be if the the skew that you saw as scale bias was actually meaningful. You need to be sure it is not meaningful. Typically, if each participant saw the same items, then any bias is an artifact; but if you give participants wildly different items, scale differences might be meaningful. Advantages: Disadvantages:

slide-20
SLIDE 20

Exercise 7: Adding z-scores to our dataset

156

Exercise 7: add z-scores to the dataset In the document exercise.7.pdf, I give you a list of functions that you can (and probably will) use to do this. The trick with this is to figure out how you would calculate z-scores for each participant, then figure out how to make R perform these calculations.

slide-21
SLIDE 21

My scripts: Adding z-scores to our dataset

157

The first is the long way: add.z.scores.v1.r. This calculates z-scores the same way you would do it if you were using excel by hand. The second takes advantage of two built-in functions in R: split() and scale(). Once you understand how the z-score works, add.z.scores.v2.r will save you time. add.z.scores.v1.r add.z.scores.v2.r On the website I have two versions of the z-score script. Once again, I’ve made two: a longer one and a shorter one.

slide-22
SLIDE 22

Removing outliers

158

An outlier is an experimental unit (either a participant or a judgment) that is substantially different from other experimental units. Outliers add noise to your data, which can lead to errors (a null result when there is a real difference, or a false positive result when there is no real difference). There are a number of ways to deal with outliers. Here I will review three common approaches, in the order in which I recommend them (with colors indicating the danger!). Run more participants. This will diminish the impact of an outlier. The nice thing about this approach is that you don’t have to make any

  • assumptions. You just report the data you have with no changes.

1. Use gold-standard questions. If you include sentences with known ratings, you can identify participants who rate these known sentences substantially differently (than expected), and eliminate those participants. There are two nice properties of this approach: (i) it does not rely on the experimental items, and (ii) you remove entire participants. 2. Trim (or Windsorize) the data. You can also look at the distribution of judgments for each experimental item, and remove outliers. The risk here is that bias can creep in (you are looking at the experimental items directly, so you could make choices that bias toward one outcome or another). 3.

slide-23
SLIDE 23

An approach that uses gold-standard questions

159

My preferred approach is to simply run more participants. AMT makes this very

  • easy. The only downside is that it increases the cost of the experiment.

If I can’t run many participants, my second choice is to use gold-standard

  • questions. In the design that we have been using, all of our filler items can

serve as gold-standard questions because we pre-tested the fillers, and know exactly what their expected rating should be (the mean or mode). Of course, there is some noise in judgments, so we don’t expect every participant to give the precise mean rating for each filler. So we don’t just want to eliminate everyone whose response differs from expected value. That would probably eliminate everybody. Remember, we expect variation in humans. So what we want to do is quantify the variation that each participant shows from the expected judgments, and then eliminate any participant that shows substantially more variation than the other participants. One common way to do this is with a sum of squares measure of error.

slide-24
SLIDE 24

Calculating variation using sum of squares

160

item expected

  • bserved

difference difference2 1F 1 2 1 1 4F 4 2

  • 2

4 6F 6 7 1 1 sum: 6

We run the calculation for each participant separately. First, we calculate the difference between the expected value of each filler and the value that we observed from the participant We can’t sum this value directly, because it could be either positive or negative, and the two will cancel each other out (given the appearance of good fit). So, next, we square those difference scores to eliminate the negative signs. Finally, we sum the squared differences to obtain a final variation score for the participant.

slide-25
SLIDE 25

Setting a criterion for exclusion

161

After we run this for each participant, we will end up with a distribution of scores like this (these are derived from identify.and.remove.outliers.r): One common way to identify outliers (in general) is to take the mean and standard deviation of some distribution of values, and then call any value that is some number of standard deviations away from the mean (in either direction) an outlier. Since only high scores are bad here, we need only look in the positive direction. The mean of these values is 27.367 The standard deviation of these values is 16.694 So any value above 64.756 would be an outlier. There are only two subjects that are above this threshold, so by this procedure there are two outliers. The number of SDs that you choose determines how many

  • utliers you will have. A low number like 2 will yield more
  • utliers, a high number like 4 will yield fewer.
slide-26
SLIDE 26

Now, let’s look at R and the scripts!

162

slide-27
SLIDE 27

What is R?

163

R is a programming language designed for statistics. That’s it. It is called R for two reasons: there is a proprietary statistical language called S that serves as a model for R, and the two creators of R are Ross Ihaka and Robert Gentleman. Why do so many people love R? It is free, open-source, and cross-platform. It has a giant user community. Anything you want to do has probably been done before, so there are pre-built packages and internet help groups galore. It allows you to do three things that you need to do: (i) manipulate data/text files, (ii) analyze your data, and (iii) create publication-quality figures (no, you can’t use excel for figures in publications). Yes, Matlab (proprietary) and Python (free) can do the same things. You can absolutely use those if you prefer. But R is specifically designed for stats and graphics, whereas Matlab is designed for matrix algebra, and python is a general computing language.

slide-28
SLIDE 28

Interacting with R: the R console

164

R is a programming language. You need to find a way to interact with the

  • language. The R-project (the

developers of R) provide a “console” to allow you to interact with the R

  • language. You type a command into

the console, and the R language implements that command. If you want to save your code, you can type it into a text editor like TextWrangler (Mac) or Notepad++ (Windows). Then you just have to move the text from the editor to the console window to run it (you can copy and paste, or create a shortcut key that does it for you).

slide-29
SLIDE 29

Interacting with R: R Studio

165

R Studio is a third-party piece of software (free) that allows you to interact with the R language in a single, unified environment. A text editor for saving your code. The R console that runs your commands. See your plots here. See objects and history here.

slide-30
SLIDE 30

R is an interpreted language

166

R is an interpreted language. That means that you tell it to run a function, and it does. You can run one function at a time, or several in sequence. Here are 5 functions. The first three only have one argument. They assign a number to a variable. The fourth one has multiple arguments. It calculates the sum

  • f the three
  • variables. The fifth

takes one complex argument. Notice that R runs each function. If it is a calculation, it gives you the result.

slide-31
SLIDE 31

Setting the working directory

167

R needs a working directory to do its work. The working directory is the directory (or folder) on your computer where it looks for files. This is also where it will save files. To see the current working directory that R is using, you can type the command getwd(). R will print the current working directory in the console window. To change the working directory, you can use the command setwd(). Unlike getwd(), setwd() needs an argument inside of the parentheses. The argument it needs is the name of the new working directory. I like to use my desktop for small projects, so I type the following setwd(“/Users/jsprouse/Desktop”). Notice that the directory must be in quotes. Character strings must be in quotes in R (either double or single, it is your choice). Also note that nothing seems to happen when you set the working

  • directory. R just does that

in the background, and waits for a new command.

slide-32
SLIDE 32

Base functions and add-on packages

168

The language R comes with a (very large) set of functions built-in. This built-in library of functions is called the base. But R is also a complete (as in Turing-complete) programming language, so you can easily create new functions for yourself. There is literally a function called function() that you can use to define a new function of your own. When people write functions that they think are useful enough to share with the world, they combine them together into something called a package (or library). Packages often consist of several thematically-related functions that help you run a specific kind of task (or analysis). These user-created packages are part of the reason that R is so incredibly

  • useful. Nearly every analysis you can think of has been implemented by

somebody in an R package. You just need to do some searching to find the right package for the job you want to do. (And if it can be done using base functions, somebody on the internet has posted the code to do it.)

slide-33
SLIDE 33

Installing and loading packages

169

Once you know the name of a package that you want to use, you can install it right from the command line in the console of R. To install a new package, you can use the command install.packages(). Then, you just put the name of the package, in quotes, inside of the parentheses. For example, if you want to install the tidyr package, you would type install.packages(“tidyr”) and hit enter. R will find the package in an online repository (called CRAN for comprehensive R archive network), and install it on your machine. R does not load every package that you install when you open R. You have to tell it to load a package (some of them are large, so it would take time and memory to load them all). To do this, you use the function library(). You put the name of the package inside the parentheses, this time without quotes. For example, if you want to use the functions in tidyr, you would run the command library(tidyr) and hit enter.

slide-34
SLIDE 34

Reading/Writing csv files

170

To read in data from a CSV file, you use the function read.csv(): amtdata = read.csv(“raw.data.from.AMT.csv”) If you type the name of the data set, amtdata, and press enter, R will print it

  • ut for you on the screen.

You can also use the functions head() and tail(). Head(amtdata) will show you the first 6 rows of the data set; tail(amtdata) will show you the last 6 rows. To write an existing piece of data in R to a CSV file, you use the aptly named function write.csv(), where “x” is the argument specifying the data you want to write, and “file” is the name of the CSV file you want R to create: write.csv(x=amtdata, file=“my.first.file.csv”, row.names=FALSE) Notice that I’ve used the optional argument row.names=FALSE here to suppress R’s default action of putting row names in the first column of the CSV.

slide-35
SLIDE 35

Reading about functions inside of R

171

R comes with a fairly complete help system, and you should use it. The primary use of the help system is to see all of the arguments that you can pass to a function, along with descriptions of what they do, and examples that demonstrate them. To see the help page for a function, just type a question mark followed by the function name, and press enter: ?write.csv Go ahead and do that now, and take a look at all of the information it provides. It may take a while to get used to reading this information (it is dry, without much hand-holding), but trust me, over time, you will find the help files really useful.

slide-36
SLIDE 36

Data types in R

172

R recognizes several different data types: vector: matrix: A two dimensional object, where all of the items in the matrix are of the same type (all numbers, or all character strings, etc). A one dimensional object, like a sequence of numbers. array: Like a matrix, but can have more than two dimensions. data frame: Two dimensions, and perfectly suited to data analysis. Each column can be of a different type (numbers, strings, etc). list: Just a collection of objects. This is the most general data type. It allows you to collect multiple (possibly unrelated) objects together under one name. For experimental data analysis, the goal is to put your results into a data

  • frame. Along the way, you may construct vectors, matrices, etc. But the final
  • bject will be a data frame that you can use to run analyses and create plots.
slide-37
SLIDE 37

Indexing data types

173

Indexing means identifying a specific element in your data. As you can imagine, the way that you index an element depends on the data type that you have. vector: A one dimensional object, like a sequence of numbers. You can create a vector using the c() function (it stands for “combine”): x = c(1, 3, 5, 7, 9) And you can index an element in a vector by using bracket notation, and referring to the ordinal number of the element: x[2] #this will return 3 x[5] #this will return 9 (the hash mark indicates a comment in R) You can also change an element in a vector, while leaving everything else the same, by using the bracket notation: x[2] = 17 #this will make x the sequence 1, 17, 5, 7, 9 x[5] = 23 #this will make x the sequence 1, 17, 5, 7, 23

slide-38
SLIDE 38

Indexing data types

174

matrix: A two dimensional object You can create a matrix using the matrix() function: y = matrix(1:16, nrow=4, ncol=4) And you can index an element in a matrix by using bracket notation with two

  • numbers. The first is the row number, the second is the column number.

y[2,4] #this will return the element in row 2 / column 4 y[2,] #this will return the entire second row y[,4] #this will return the entire fourth column Just like with vectors, you can replace elements in a matrix using the bracket

  • notation. I’ll leave that to you.

y[1:2,3:4] #this will return the first two rows of columns three and four

slide-39
SLIDE 39

Indexing data types

175

data frame: A two dimensional object, optimized for data analysis. You can create a data frame using the data.frame() function: names = c(“Mary”, “John”, “Sue”) ages = c(22, 25, 27) colors = c(“red”, “blue”, “green”) people = data.frame(names, ages, colors) You can index data frames using bracket notation: You can also index data frames by naming the columns using the $ operator: people[2,3] #this will return “blue” people$names #this will return the names column as a vector people$names[2] #this will return “John”

slide-40
SLIDE 40

Indexing data types

176

list: A collection of objects You can create a list using the list() function: mylist = list(x, y, people) You can index elements of a list using a double bracket: Once you’ve indexed a list element, you can use bracket notation to index specific elements inside that element: mylist[[1]] #this will return the vector x mylist[[2]][2,4] #this will return 14 mylist[[2]] #this will return the matrix y mylist[[3]] #this will return the data frame people

slide-41
SLIDE 41

Assignment operators

177

You’ve already seen one assignment operator in action - the equal sign. An assignment operator allows you to assign an object (like a vector or matrix) to a variable name (like x, or mylist). There are three assignment operators in R: x = c(1,2,3) The equal sign assigns the element on the right to the variable name on the left. x <- c(1,2,3) The left arrow (made from a less than sign and a dash) assigns to the left. c(1,2,3) -> x The right arrow (made from a greater than sign and a dash) assigns to the right. Logical operators check to see if a given mathematical statement is true. I put them here because they shouldn’t be confused with assignment operators: 5 == 2+3 This is logical equals. It checks to see if the values on either side are equal to each other. Notice it is two equal signs. 5 == 2+3

slide-42
SLIDE 42

Logical operators

178

Logical operators check to see if a given mathematical statement is true. I put them here because they shouldn’t be confused with assignment operators: 5 == 2+3 This is logical equals. It checks to see if the values on either side are equal to each other. Notice it is two equal signs. 5 > 2 5 < 8 Greater than Less than 5 >= 2 5 <= 8 Greater than or equal to Less than or equal to 5 != 4 No equal to. You can apply logical operators to any data type, including matrixes, data frames, etc. y <= 5 #Remember that y is a 4x4 matrix. This will return a 4x4 matrix of TRUEs and FALSEs.

slide-43
SLIDE 43

Learning R

179

The assignments in this course will help you learn R. I’ll give you a list of functions that will help you with the assignment, and then it will be up to you to work with them to complete the assignment. I am doing it this way because the only way to learn a programming language is to jump in and do it. That said, I realize that you won’t be able to do it without help. So here are the ways to get more knowledge:

  • 1. Google your question (the answer will be on StackOverflow)
  • 2. Read an R tutorial
  • 3. Read a book

R has a huge user community. Google your questions. You will likely find an answer, or an answer to a similar question. There are tons of free tutorials out there. I am not going to recommend any specific ones, because they all cover the same stuff for the most part. There are tons of free R books out there. Again, just google for them.