BootcampR
AN INTRODUCTION TO R
Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler
BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of - - PowerPoint PPT Presentation
BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler Welcome! Hi. I'm Jason. I like to gesture at screens. Digital Engagement Librarian , University of Nebraska at Omaha Mentor,
AN INTRODUCTION TO R
Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler
I like to gesture at screens.
Digital Engagement Librarian, University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design, Stanford University
Open up RStudio. We'll start doing a few things together soon.
new functions and methods, and extended capabilities.
stored.
update.packages() in the console or the package update interface in RStudio.
There may not be a regular release cycle.
there.
reproducible example available (a short simulated data and code that replicates the problem). For example: foo <- c(1, "b", 5, 7, 0) bar <- c(1, 2, 3, 4, 5) foo + bar Error: non-numeric argument to binary operator
data(mtcars) mtcars mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 [ reached 'max' / getOption("max.print") -- omitted 23 rows ]
central feature for doing data analysis in R.
manipulating data.
either be by the entire row or entire column.
2 + 2 [1] 4 2 * pi # multiply by a constant [1] 6.283 7 + runif(1, min = 0, max = 1) # add a random variable [1] 7.375 4^4 # powers [1] 256 sqrt(4^4) # functions [1] 16
R as a calculator
data(trees) median(trees$Girth) var(trees$Girth) # variance sd(trees$Girth) # standard deviation max(trees$Girth) # max value min(trees$Girth) # min value range(trees$Girth) # range quantile(trees$Girth) # quantiles 25% fivenum(trees$Girth) # box plot elements length(trees$Girth) # number of observations for a variable length(trees) # number of observations for a dataset nrows(trees) # number of rows in a data frame
R can do the usual arithmetic operators + - = / * and ^, plus integer division %/% and remainder integer division %%. Try the following in the console: 2 + 2 2/2 2 * 2 2^2 2 == 2 23%/%2 23%%2
<- is the assignment operator. RStudio keyboard shortcut exists for macOS (Option + -) and Windows and Linux (Alt + -). This is different from most programming languages, which often use a single = for assignment. Try entering into your console: foo <- 3 foo
: is the sequence operator. We can create ranges this way. Try the following in the console: 1:10 You can also store these ranges in a variable. Try: a <- 100:120 a
# is for writing comments. Anything after the # is not evaluated and ignored by R. # Something I want to keep from R, but mostly # notes for myself or someone else so they # understand what's happening with the follow # code. Below, we just add two numbers. 2 + 2
We can do plenty of advanced math in R. For example, we can generate distributions of data very easily. Try this: rnorm(100) Neat, huh? Now try this: hist(rnorm(10000)) On advanced R:
<https://adv-r.hadley.nz>
<https://r4ds.had.co.nz>
R lets us store vectors, datasets, and functions in memory. All R
easy for organizing the workspace. Try the following in the console: x <- 5 # store the variable x # print the variable z <- 3 ls() # list all variables ls.str() # list and describe variables rm(x) # delete a variable ls()
R, like any programming language, has a set of rules to follow. You'll learn more as you go, but let's cover a few quick ones.
a <- 3 A <- 4 print(c(a, A)) # what happens if you type print(a,A)? Are they the same? a == A
multiple elements. If you need two elements in a vector, you need to wrap it up in c. c() can put together any vectors, but typically you want to keep the objects of the vector all of the same type (e.g., don't mix strings and numbers). G <- c(3,4) print(G)
functions are camelCase, others are.dot.separated, others used_underscores. RStudio autocomplete can try to help.
even sometimes sharing function names. Sometimes you'll need to tell R explicitly which package you're referring to. This is done with two colons :: (e.g., dplyr::filter())
Everything in R is an object, even functions. We can manipulate
summary function to a variety of object types. Let's try this. # summary of columns 1, 2, and 3 summary(mtcars[, 1:3]) # summary of a single column summary(mtcars$mpg)
Since everything is an object in R, we can do all sorts of
length(unique(mtcars$mpg)) We can also store the results of function calls. unique_mpg <- length(unique(mtcars$mpg)) unique_mpg
We can use comparison operators to compare values across
big <- c(9,12,15,25) small <- c(9, 3, 4, 2) big > small big == small # don't do big = small!
R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical. Vectors must be of one consistent type of data. If you make a vector that mixes types, it will default to a character vector. is.numeric(A) is.character(A) is.logical(A) # If you don't know what the data type is, # just ask! class(A)
There are several more supported classes in R beyond numeric, character, and logical. This includes things like linear models, matrices, networks, spatial data frames, and others. Classes determine what you can and cannot do to objects.
Head on over to https://tinyurl.com/unobootcamp
Next workshop: February 25, 1:30p-3p: Spark Joy with Data (CL 232)