Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1 cw - - PowerPoint PPT Presentation

chaining r markdown
SMART_READER_LITE
LIVE PREVIEW

Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1 cw - - PowerPoint PPT Presentation

Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1 cw <- read_csv ( str_c (data_dir, "cw.csv")) data_dir <- "https://somgen223.stanford.edu/data/" Set up cw somgen223.stanford.edu 2 1 4 # A tibble: 4 x 1


slide-1
SLIDE 1

Chaining, R Markdown

Steve Bagley

somgen223.stanford.edu 1

slide-2
SLIDE 2

Set up cw

data_dir <- "https://somgen223.stanford.edu/data/" cw <- read_csv(str_c(data_dir, "cw.csv"))

somgen223.stanford.edu 2

slide-3
SLIDE 3

distinct: how many different diets?

distinct(cw, diet) # A tibble: 4 x 1 diet <dbl> 1 2 2 1 3 3 4 4

  • distinct returns a new data frame with all duplicate rows, as determined by

specified column or columns, removed.

somgen223.stanford.edu 3

slide-4
SLIDE 4

How many different chicks?

distinct(cw, chick) # A tibble: 50 x 1 chick <dbl> 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 # ... with 40 more rows

somgen223.stanford.edu 4

slide-5
SLIDE 5

How many different chicks?

length(pull(distinct(cw, chick), chick)) [1] 50

  • pull returns a data frame column as a vector.

somgen223.stanford.edu 5

slide-6
SLIDE 6

Chaining: combining a sequence of data frame function calls

length(pull(distinct(cw, chick), chick)) x1 <- distinct(cw, chick) x2 <- pull(x1, chick) length(x2)

  • In the first expression, the functions are executed “inside out”: first distinct,

then pull, then length. That can be a little hard to follow.

  • In the second series of expressions, we use temporary variables to store

intermediate results.

somgen223.stanford.edu 6

slide-7
SLIDE 7

Chaining using the pipe operator

cw %>% distinct(chick) %>% pull(chick) %>% length()

  • We can use a new operator, %>%, to “pipe” the result from the first function call

to the second function call, and then from that to the third function call ….

  • In English:
  • 1. start with cw
  • 2. pass it to distinct
  • 3. pass that result to pull
  • 4. pass that result to length

somgen223.stanford.edu 7

slide-8
SLIDE 8

Keyboard help

  • The pipe operator %>% is a bit ugly, but RStudio will insert it for you.
  • Mac: Command-Shift-M
  • Windows/Linux: Ctrl-Shift-M

somgen223.stanford.edu 8

slide-9
SLIDE 9

Pipe: technical details

df1 %>% fun(x) is converted into: fun(df1, x)

  • The object being piped in is used as the first argument of fun.
  • The tidyverse functions are consistently designed so that the first argument is a

data frame, and the result is a data frame.

  • If fun produces a data frame, we can pass it along to the next function:

df1 %>% fun(x) %>% fun2(y, z)

somgen223.stanford.edu 9

slide-10
SLIDE 10

Another chaining example

bod %>% mutate(inv_demand = 1 / demand) %>% arrange(inv_demand) # A tibble: 6 x 3 Time demand inv_demand <dbl> <dbl> <dbl> 1 7 19.8 0.0505 2 3 19 0.0526 3 4 16 0.0625 4 5 15.6 0.0641 5 2 10.3 0.0971 6 1 8.3 0.120

somgen223.stanford.edu 10

slide-11
SLIDE 11

Reproducible analysis and RMarkdown

somgen223.stanford.edu 11

slide-12
SLIDE 12

Reproducible analysis and RMarkdown

The goal of reproducible analysis is to produce a computational artifact that others can view, scrutinize, test, and run, to convince themselves that your ideas are valid. (It’s also good for you to be as skeptical of your work.) This means you should write code to be run more than once and by others. Doing so requires being organized in several ways:

  • Combining text with code (the focus of this module)
  • Project/directory organization
  • Version control

somgen223.stanford.edu 12

slide-13
SLIDE 13

The problem

  • You write text (in a word processor).
  • You write code (in RStudio or similar) to compute with data and produce
  • utput and graphics.
  • These are performed using different software.
  • So when integrating both kinds of information into a notebook, report, or

publication, it is very easy to make mistakes, copy/paste the wrong version, and have information out of sync.

somgen223.stanford.edu 13

slide-14
SLIDE 14

A solution

  • Write text and code in the same file.
  • Use special syntax to separate text from code.
  • Use special syntax for annotating the text with formatting operations (if desired).
  • RStudio can then:
  • 1. run the code blocks,
  • 2. insert the output and graphs at the correct spot in the text,
  • 3. then call a text processor for final formatting.
  • This whole process is called “knitting”.

somgen223.stanford.edu 14

slide-15
SLIDE 15

The special syntax for code blocks

```{r} ## your code goes here, eg: 1 + 2 ```

  • Special syntax groups successive lines of code into chunks.
  • This is a bit ugly, but RStudio will insert it for you.
  • Mac: Command-Option-I
  • Windows/Linux: Ctrl-Alt-I

somgen223.stanford.edu 15

slide-16
SLIDE 16

Evaluating RMarkdown

  • Use the command Run Current Chunk (the little green right arrow at the top
  • f the chunk) to evaluate.
  • Mac: Command-Shift-Return
  • Windows/Linux: Ctrl-Shift-Enter
  • There are more commands under the Run menu.
  • Use the command Knit to convert the entire document into
  • html
  • pdf (only if you have latex installed)
  • Word docx

somgen223.stanford.edu 16

slide-17
SLIDE 17

R Markdown: The special syntax for formatting text

  • RStudio supports a simple and easy-to-use format called “R Markdown”.
  • This is a very simple markup language:
  • use * or _ around italics.
  • use ** or __ around bold.
  • Markdown Quick Reference (RStudio internal help)
  • Introduction to R Markdown
  • R Markdown web page
  • R Markdown Cheat Sheet

somgen223.stanford.edu 17

slide-18
SLIDE 18

Homework

  • You need to submit homework as a single pdf file.
  • Please use formatting to clearly identify the part of each question.
  • Make sure you put all the necessary library calls at the top of your file. You

need to do this even if you have loaded the package into your current R session.

  • If you have TeX installed on your system, RStudio can export directly to pdf.
  • If you do not have TeX installed on your system, then:
  • 1. Export to html format.
  • 2. Select “Open in Browser”.
  • 3. In your browser, File > Print, and select pdf as the destination.
  • See the course website for more instructions for how to upload to gradescope.

somgen223.stanford.edu 18

slide-19
SLIDE 19

Reading

  • Read: 18 Pipes | R for Data Science (skim whole chapter)
  • Read: 27 R Markdown | R for Data Science (skim whole chapter)

somgen223.stanford.edu 19