RStudio, vectors Steve Bagley somgen223.stanford.edu 1 - - PowerPoint PPT Presentation

rstudio vectors
SMART_READER_LITE
LIVE PREVIEW

RStudio, vectors Steve Bagley somgen223.stanford.edu 1 - - PowerPoint PPT Presentation

RStudio, vectors Steve Bagley somgen223.stanford.edu 1 Introduction somgen223.stanford.edu 2 Course information and philosophy Course website: somgen223.stanford.edu Website has detailed syllabus, homework, links to videos


slide-1
SLIDE 1

RStudio, vectors

Steve Bagley

somgen223.stanford.edu 1

slide-2
SLIDE 2

Introduction

somgen223.stanford.edu 2

slide-3
SLIDE 3

Course information and philosophy

  • Course website: somgen223.stanford.edu
  • Website has detailed syllabus, homework, links to videos
  • Structure of the course:
  • lecture with in-class interactive exercises
  • video/recording via Zoom. Videos should be available shortly after class ends.
  • Q/A using Zoom chat window during class
  • Q/A, announcements, discussion using Piazza outside of class
  • homework with rapid reinforcement (approx one per week)
  • review answers in class immediately after homework is due
  • no late submissions
  • drop lowest homework grade
  • final data analysis project

somgen223.stanford.edu 3

slide-4
SLIDE 4

Using RStudio

somgen223.stanford.edu 4

slide-5
SLIDE 5

The basics of interaction using the console window

In RStudio the console window is the left (or lower-left) window. The R console uses a “read, eval, print” loop. This is sometimes called a REPL.

  • Read: R reads what you type …
  • Eval: R evaluates it …
  • Print: R prints the result …
  • Loop: (repeat forever)

somgen223.stanford.edu 5

slide-6
SLIDE 6

A simple example in the console

  • This is an expression that will be evaluated by R, followed by the result of

evaluating that expression. > 1 + 2 [1] 3

  • 3 is the answer
  • [1] means: the answer is a vector and this line starts with the first element of

that vector.

  • It does not mean the answer has one element, although that is true in this case.

somgen223.stanford.edu 6

slide-7
SLIDE 7

The console

  • The console window is a great place to test something simple.
  • It is not a great place to develop and save your ideas.

somgen223.stanford.edu 7

slide-8
SLIDE 8

R script pane

  • Use the menu item File / New File / R Script to create a new script

pane.

  • You can save the contents of this pane to a file.
  • You can evaluate all or part of the script in the console window.
  • Hit Command-RETURN (Mac), or Ctrl-ENTER (Linux/Windows).
  • That line is automatically copied to the console pane and evaluated.

somgen223.stanford.edu 8

slide-9
SLIDE 9

Spaces (mostly) don’t matter

1 +2 1+ 2 1+2 1 + 2

  • These all do the same thing.

somgen223.stanford.edu 9

slide-10
SLIDE 10

R is a scientific calculator

1 + 2 * 3 [1] 7 log(10) [1] 2.302585 4 * atan(1) [1] 3.141593

  • * denotes multiplication.
  • log is the natural logarithm.
  • atan is the one-argument arctangent.
  • The result of each of these expressions is a vector of one element.

somgen223.stanford.edu 10

slide-11
SLIDE 11

Adding comments

## This is a comment 1 + 2 # add some numbers [1] 3

  • Use a # to start a comment.
  • A comment extends to the end of the line and is ignored by R.
  • The recipient of a comment is you in six months.

somgen223.stanford.edu 11

slide-12
SLIDE 12

Vectors

somgen223.stanford.edu 12

slide-13
SLIDE 13

Creating vector of ascending numbers

2:4 [1] 2 3 4 1 + (2:4) [1] 3 4 5 3:5 # same as above [1] 3 4 5

  • A vector is a one-dimensional sequence of zero or more numbers (or other

values).

  • The colon : is an R operator to create a vector that is a sequence of integers from

the first number to the second number (inclusive).

  • Many R functions and operators automatically work with multi-element vector

arguments.

somgen223.stanford.edu 13

slide-14
SLIDE 14

Creating a longer vector

0:50 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [21] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 [41] 40 41 42 43 44 45 46 47 48 49 50

  • Long vectors wrap around.
  • Look at the [ ] notation. The second output line starts with 20, which is the

21th element of the vector.

  • Your screen may have a different width than what is shown here.
  • This notation will help you figure out where you are in a long vector.
  • R starts counting at 1.

somgen223.stanford.edu 14

slide-15
SLIDE 15

Operator precedence examples

0:10 [1] 1 2 3 4 5 6 7 8 9 10 1 + 0:10 # which operator gets executed first? [1] 1 2 3 4 5 6 7 8 9 10 11 1 + (0:10) [1] 1 2 3 4 5 6 7 8 9 10 11 (1 + 0):10 [1] 1 2 3 4 5 6 7 8 9 10

  • Note the subtle differences in these expressions and their results.
  • R does not evaluate strictly left to right. Instead, operators have their own

precedence, a kind of priority for evaluation.

  • The sequence operator : has a higher precedence than addition +.
  • Use parentheses to enforce the desired evaluation order.

somgen223.stanford.edu 15

slide-16
SLIDE 16

Assigning values to names

somgen223.stanford.edu 16

slide-17
SLIDE 17

Saving values by setting a variable

x <- 10

  • To do more complex computations, we need to be able to give names to things.
  • Read this as “x gets 10” or “assign 10 to x”
  • R prints no result from this assignment, but what you entered causes a side effect:

R has stored the association between x and 10. (Look at the Environment pane.)

somgen223.stanford.edu 17

slide-18
SLIDE 18

Using the value of a variable

x [1] 10 x / 5 [1] 2

  • When R sees the name of a variable, it uses the stored value of that variable in

the calculation.

  • Here R uses the value of x, which is 10.
  • / is the division operator.
  • We can break complex calculations into named parts. This is a simple, but very

useful kind of abstraction.

somgen223.stanford.edu 18

slide-19
SLIDE 19

Two ways to assign

x <- 10 x [1] 10 x = 20 x [1] 20 In R, there are two assignment operators. They have subtly different meanings (more details later).

  • <- requires that you type two characters. Don’t put a space between < and -.

(What would happen?)

  • RStudio hint: Use “Option -” (Mac) or “Alt -” (Linux/Windows) to type

this using one key combination.

  • = is easier to type.
  • You will see both used throughout R and user code.

somgen223.stanford.edu 19

slide-20
SLIDE 20

Warning: Assignment has no undo

x <- 2232.348 x [1] 2232.348 ## later in your code: x <- 0 x [1] 0

  • If you assign to a name with an existing value, that value is overwritten.
  • There is no way to undo an assignment, so be careful in reusing variable names.

somgen223.stanford.edu 20

slide-21
SLIDE 21

Naming variables

  • It is important to pick meaningful variable names.
  • Names can be too short, so don’t use x and y everywhere.
  • Pick names that will make sense to someone else (including the person you will

be in six months).

  • ADVANCED: See ?make.names for the complete rules on what can be a name.

somgen223.stanford.edu 21

slide-22
SLIDE 22

Case matters for names in R

a <- 1 A Error in eval(expr, envir, enclos): object 'A' not found

  • Now a has a value. A does not have a value.
  • R cares about upper and lower case in names.

somgen223.stanford.edu 22

slide-23
SLIDE 23

Elementwise operations on a vector

  • This multiplies each element of x by the corresponding element of x, that is, it

squares each element. x <- 1:10 y <- x * x y [1] 1 4 9 16 25 36 49 64 81 100

  • Equivalently, we could use exponentiation:

z <- x^2 z [1] 1 4 9 16 25 36 49 64 81 100

somgen223.stanford.edu 23

slide-24
SLIDE 24

Calling built-in functions

sqrt(2) [1] 1.414214 sqrt(0:10) [1] 0.000000 1.000000 1.414214 1.732051 2.000000 2.236068 [7] 2.449490 2.645751 2.828427 3.000000 3.162278

  • To call a function, type the function name, then the argument(s) in parentheses.
  • Use a comma to separate the arguments, if more than one.
  • Nearly all R functions operate on multi-element vectors as easily as on vectors

containing a single element.

somgen223.stanford.edu 24

slide-25
SLIDE 25

Creating a vector from a list you specify

x <- c(4, 0, 3) x [1] 4 0 3

  • The c function (combine) makes a vector out of its arguments, which are

separated by commas.

  • Very important: the input representation and output representation of a vector are not

the same.

  • Input: c(4, 0, 3)
  • Output: [1] 4 0 3
  • (Compare this to a Python list: [4, 0, 3].)

somgen223.stanford.edu 25

slide-26
SLIDE 26

Exercise (for you to do now): convert weights

weights <- c(1.1, 2.2, 3.3)

  • These weights are in pounds.
  • Convert them to kilograms.
  • (Hint: 2.2 lb = 1.0 kg)

somgen223.stanford.edu 26

slide-27
SLIDE 27

Answer: convert weights

weights <- c(1.1, 2.2, 3.3) ## this divides the weights, element-wise, by the conversion ## factor: weights / 2.2 [1] 0.5 1.0 1.5

somgen223.stanford.edu 27

slide-28
SLIDE 28

Some vector functions

x <- c(9, 12, 6, 10, 10, 16, 8, 4) x [1] 9 12 6 10 10 16 8 4 length(x) [1] 8 sum(x) [1] 75 sum(x) / length(x) [1] 9.375 mean(x) [1] 9.375

  • The mean is the sum divided by the length.
  • There is a built-in mean function in R.

somgen223.stanford.edu 28

slide-29
SLIDE 29

Exercise: subtract the mean

x <- c(7, 3, 1, 9)

  • Subtract the mean of x from x, and then sum the result.

somgen223.stanford.edu 29

slide-30
SLIDE 30

Answer: subtract the mean

x <- c(7, 3, 1, 9) mean(x) [1] 5 x - mean(x) [1] 2 -2 -4 4 sum(x - mean(x)) # answer in one expression [1] 0

somgen223.stanford.edu 30

slide-31
SLIDE 31

Exercise: compute a confidence interval

m <- 13 se <- 0.25

  • Given the values of m (mean), and se (standard error), construct a vector

containing the two values, 𝑛 ± 2 × 𝑡𝑓.

  • That is, add and subtract two times the standard error value to/from the mean

value.

  • Your output should look like:

[1] 12.5 13.5

somgen223.stanford.edu 31

slide-32
SLIDE 32

Answer: compute a confidence interval

## one way: c(m - 2 * se, m + 2 * se) [1] 12.5 13.5 ## another way: m + c(-2, 2) * se [1] 12.5 13.5

somgen223.stanford.edu 32

slide-33
SLIDE 33

Thinking the vector way

x <- c(3, 4, 1, 9, 2, 2) mean(x) [1] 3.5 x - mean(x) [1] -0.5 0.5 -2.5 5.5 -1.5 -1.5 (x - mean(x))^2 [1] 0.25 0.25 6.25 30.25 2.25 2.25 sum((x - mean(x))^2) [1] 41.5 sqrt(sum((x - mean(x))^2)) [1] 6.442049

  • When programming in R, try to think about how to operate on the entire vector,

instead of proceeding element-wise through a vector using a loop.

somgen223.stanford.edu 33

slide-34
SLIDE 34

Need to install the tidyverse set of packages

  • Type this in the console pane now:

install.packages("tidyverse")

  • “tidyverse” is a coherent set of packages for operating a kind of data called a

“data frame.”

  • We need these packages for the rest of the course.
  • If something broke during the installation, ask for help. Post screenshot in Piazza.

somgen223.stanford.edu 34

slide-35
SLIDE 35

Reading

  • Read: 1 Introduction | R for Data Science
  • Read: 4 Workflow: basics | R for Data Science

somgen223.stanford.edu 35