Welcome to data visualization best practices in R Nick Strayer - - PowerPoint PPT Presentation

welcome to data visualization best practices in r
SMART_READER_LITE
LIVE PREVIEW

Welcome to data visualization best practices in R Nick Strayer - - PowerPoint PPT Presentation

DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Welcome to data visualization best practices in R Nick Strayer Instructor DataCamp Visualization Best Practices in R What is this course? What you will learn How


slide-1
SLIDE 1

DataCamp Visualization Best Practices in R

Welcome to data visualization best practices in R

VISUALIZATION BEST PRACTICES IN R

Nick Strayer

Instructor

slide-2
SLIDE 2

DataCamp Visualization Best Practices in R

What is this course?

What you will learn How to make better visualizations by thinking deeply about the data at hand. How you will learn it Overviews of different data types Standard visualizations Alternatives

slide-3
SLIDE 3

DataCamp Visualization Best Practices in R

Course layout

Ch1 Proportions of a whole Ch2 Point data Ch3 Single distributions Ch4 Multiple(or conditional) distributions

slide-4
SLIDE 4

DataCamp Visualization Best Practices in R

Warning!

Topics here are not as cut and dry as

  • ther programming topics

Every rule will have exceptions An emphasis on thinking through each problem is given to help you deal with these cases when you get to them

slide-5
SLIDE 5

DataCamp Visualization Best Practices in R

Tools used

R The 'Tidyverse' Ggplot2

slide-6
SLIDE 6

DataCamp Visualization Best Practices in R

Data used

Comes from the World Health Organization (WHO)

> who_disease # A tibble: 43,262 x 6 region countryCode country disease year cases <chr> <chr> <chr> <chr> <int> <dbl> 1 EMR AFG Afghanistan measles 2016 638 2 EUR ALB Albania measles 2016 17.0 3 AFR DZA Algeria measles 2016 41.0 4 EUR AND Andorra measles 2016 0 5 AFR AGO Angola measles 2016 53.0 6 AMR ATG Antigua and Barbuda measles 2016 0 7 AMR ARG Argentina measles 2016 0 8 EUR ARM Armenia measles 2016 2.00 9 WPR AUS Australia measles 2016 99.0 10 EUR AUT Austria measles 2016 27.0 # ... with 43,252 more rows

slide-7
SLIDE 7

DataCamp Visualization Best Practices in R

WHO disease data

# filter to AMR region. amr_region <- who_disease %>% filter(region == 'AMR') # map x to year and y to cases, color by disease. ggplot(amr_region, aes(x = year, y = cases, color = disease)) + geom_point(alpha = 0.5)

slide-8
SLIDE 8

DataCamp Visualization Best Practices in R

slide-9
SLIDE 9

DataCamp Visualization Best Practices in R

Let's practice!

VISUALIZATION BEST PRACTICES IN R

slide-10
SLIDE 10

DataCamp Visualization Best Practices in R

Proportions of a single population

VISUALIZATION BEST PRACTICES IN R

Nick Strayer

Instructor

slide-11
SLIDE 11

DataCamp Visualization Best Practices in R

What is a proportion?

Parts making up a whole Often used to understand population

slide-12
SLIDE 12

DataCamp Visualization Best Practices in R

The pie chart

Often the first technique people learn Also, the first technique people learn to dislike Dislike is not entirely warranted

slide-13
SLIDE 13

DataCamp Visualization Best Practices in R

A sour pie

Pie charts are not very precise data encoded in angles Doesn't handle lots of classes well After three slices it becomes hard to compare

slide-14
SLIDE 14

DataCamp Visualization Best Practices in R

A sweet pie

Intuitive and compact

who_disease %>% mutate( region = ifelse( region %in% c('EUR', 'AFR'), region, 'Other') ) %>% ggplot(aes(x = 1, fill = region)) + geom_bar(color = 'white') + coord_polar(theta = "y") + theme_void()

slide-15
SLIDE 15

DataCamp Visualization Best Practices in R

The waffle chart

More precise than pie charts Encode data in area, not angles

  • bs_by_region <- who_disease %>%

group_by(region) %>% summarise(num_obs = n()) %>% mutate(percent = round(num_obs/sum(num_obs)*100)) # Array of rounded percentages percent_by_region <- obs_by_region$percent names(percent_by_region) <- obs_by_region$region # Send array of percentages to waffle plot function waffle::waffle(percent_by_region, rows = 5)

slide-16
SLIDE 16

DataCamp Visualization Best Practices in R

slide-17
SLIDE 17

DataCamp Visualization Best Practices in R

Let's practice!

VISUALIZATION BEST PRACTICES IN R

slide-18
SLIDE 18

DataCamp Visualization Best Practices in R

Comparing multiple populations

VISUALIZATION BEST PRACTICES IN R

Nick Strayer

Instructor

slide-19
SLIDE 19

DataCamp Visualization Best Practices in R

Why not use faceting?

Almost impossible to compare

slide-20
SLIDE 20

DataCamp Visualization Best Practices in R

The stacked bar chart

Allow each population to share the same y-axis Enables easier comparisons based on vertical position/size

who_disease %>% filter(region == 'SEAR') %>% ggplot(aes(x = countryCode, y = cases, fill = disease)) + geom_col(position = 'fill')

slide-21
SLIDE 21

DataCamp Visualization Best Practices in R

Caveats

Worse in isolation than pie or waffle charts Accuracy degrades rapidly after 3 classes

slide-22
SLIDE 22

DataCamp Visualization Best Practices in R

Chapter recap

Proportions: Pie Charts: Waffle Charts: Stacked Bars:

slide-23
SLIDE 23

DataCamp Visualization Best Practices in R

Let's practice!

VISUALIZATION BEST PRACTICES IN R