ETC1010: Data Modelling and Computing Week of Data Visualisation: - - PowerPoint PPT Presentation

etc1010 data modelling and computing
SMART_READER_LITE
LIVE PREVIEW

ETC1010: Data Modelling and Computing Week of Data Visualisation: - - PowerPoint PPT Presentation

ETC1010: Data Modelling and Computing Week of Data Visualisation: Lecture 3 Dr. Nicholas Tierney & Professor Di Cook EBS, Monash U. 2019-08-14 Learning Tips 2 / 46 Understanding learning Growth and xed mindsets Reframe success +


slide-1
SLIDE 1

ETC1010: Data Modelling and Computing

Week of Data Visualisation: Lecture 3

  • Dr. Nicholas Tierney & Professor Di Cook

EBS, Monash U. 2019-08-14

slide-2
SLIDE 2

Learning Tips

2 / 46

slide-3
SLIDE 3

Understanding learning

Growth and xed mindsets Reframe success + failure as opportunities for growth Growing area of research by Carol Dweck of Stanford

3 / 46

slide-4
SLIDE 4

From

"I'll never understand" "I just don't get programming" "I'm not a maths person"

To

"I understand more than I did yesterday" "I can learn how to program" "Compared to this last week, I've learnt quite a bit!"

Reframing

4 / 46

slide-5
SLIDE 5

Overview for today

Going from tidy data to a data plot, using a grammar Mapping of variables from the data to graphical elements Using dierent geoms

5 / 46

slide-6
SLIDE 6

The case notications table From WHO. Data is tidied here, with

  • nly counts for

Australia.

tb_au ## # A tibble: 192 x 6 ## country iso3 year count gender age ## <chr> <chr> <dbl> <dbl> <chr> <chr> ## 1 Australia AUS 1997 8 m 1524 ## 2 Australia AUS 1998 11 m 1524 ## 3 Australia AUS 1999 13 m 1524 ## 4 Australia AUS 2000 16 m 1524 ## 5 Australia AUS 2001 23 m 1524 ## 6 Australia AUS 2002 15 m 1524 ## 7 Australia AUS 2003 14 m 1524 ## 8 Australia AUS 2004 18 m 1524 ## 9 Australia AUS 2005 32 m 1524 ## 10 Australia AUS 2006 33 m 1524 ## # … with 182 more rows

Example: Tuberculosis data

6 / 46

slide-7
SLIDE 7

The "100% charts"

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_bar(stat = "identity", position = "fill") + facet_grid(~ age) + scale_fill_brewer(palette="Dark2") 7 / 46

slide-8
SLIDE 8

Let's unpack a bit.

8 / 46

slide-9
SLIDE 9

Data Visualisation

"The simple graph has brought more information to the data analyst’s mind than any other device." — John Tukey

9 / 46

slide-10
SLIDE 10

Data Visualisation

The creation and study of the visual representation of data. Many tools for visualizing data (R is one of them) Many approaches/systems within R for making data visualizations (ggplot2 is one of them, and that's what we're going to use).

10 / 46

slide-11
SLIDE 11

ggplot2 is tidyverse's data visualization package The gg in "ggplot2" stands for Grammar of Graphics It is inspired by the book Grammar of Graphics by Leland Wilkinson † A grammar of graphics is a tool that enables us to concisely describe the components of a graphic

ggplot2 tidyverse

Source: BloggoType 11 / 46

slide-12
SLIDE 12

From BloggoType 12 / 46

slide-13
SLIDE 13

library(ggplot2) ggplot(tb_au)

Our rst ggplot!

13 / 46

slide-14
SLIDE 14

library(ggplot2) ggplot(tb_au, aes(x = year, y = count))

Our rst ggplot!

14 / 46

slide-15
SLIDE 15

library(ggplot2) ggplot(tb_au, aes(x = year, y = count)) + geom_point()

Our rst ggplot!

15 / 46

slide-16
SLIDE 16

Our rst ggplot! (what's the data again?)

country iso3 yearcountgenderage AustraliaAUS 1997 8m 1524 AustraliaAUS 1998 11m 1524 AustraliaAUS 1999 13m 1524 AustraliaAUS 2000 16m 1524 AustraliaAUS 2001 23m 1524 AustraliaAUS 2002 15m 1524 AustraliaAUS 2003 14m 1524 AustraliaAUS 2004 18m 1524 AustraliaAUS 2005 32m 1524 AustraliaAUS 2006 33m 1524

16 / 46

slide-17
SLIDE 17

library(ggplot2) ggplot(tb_au, aes(x = year, y = count)) + geom_col()

Our rst ggplot!

17 / 46

slide-18
SLIDE 18

library(ggplot2) ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col()

Our rst ggplot!

18 / 46

slide-19
SLIDE 19

library(ggplot2) ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "fill")

Our rst ggplot!

19 / 46

slide-20
SLIDE 20

library(ggplot2) ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "fill") + scale_fill_brewer( palette = "Dark2" )

Our rst ggplot!

20 / 46

slide-21
SLIDE 21

library(ggplot2) ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "fill") + scale_fill_brewer( palette = "Dark2" ) + facet_wrap(~ age)

Our rst ggplot!

21 / 46

slide-22
SLIDE 22

The "100% charts"

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_bar(stat = "identity", position = "fill") + facet_grid(~ age) + scale_fill_brewer(palette="Dark2")

What do we learn

22 / 46

slide-23
SLIDE 23

What do we learn?

Focus is on proportion in each category. Across (almost) all ages, and years, the proportion of males having TB is higher than females These proportions tend to be higher in the older age groups, for all years.

23 / 46

slide-24
SLIDE 24

Code structure of ggplot

ggplot() is the main function Plots are constructed in layers Structure of code for plots can often be summarised as

ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() +

  • ther options

24 / 46

slide-25
SLIDE 25

How to use ggplot

To use ggplot2 functions, rst load tidyverse

library(tidyverse)

For help with the ggplot2, see ggplot2.tidyverse.org

25 / 46

slide-26
SLIDE 26

Let's look at some more options to emphasise dierent features

26 / 46

slide-27
SLIDE 27

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "fill") + scale_fill_brewer( palette = "Dark2" ) + facet_wrap(~ age) 27 / 46

slide-28
SLIDE 28

Emphasizing dierent features with ggplot2

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "fill") + scale_fill_brewer( palette = "Dark2") + facet_grid(~ age) 28 / 46

slide-29
SLIDE 29

Emphasise ... ?

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col() + scale_fill_brewer( palette = "Dark2") + facet_grid(~ age) 29 / 46

slide-30
SLIDE 30

What do we learn?

, position = "fill" was removed Focus is on counts in each category. Dierent across ages, and years, counts tend to be lower in middle age (45-64) 1999 saw a bit of an outbreak, in most age groups, with numbers doubling or tripling other years. Incidence has been increasing among younger age groups in recent years.

30 / 46

slide-31
SLIDE 31

Emphasise ... ?

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col(position = "dodge") + scale_fill_brewer(palette = "Dark2") + facet_grid(~ age) 31 / 46

slide-32
SLIDE 32

What do we learn?

, position="dodge" is used in geom_col Focus is on counts by gender, predominantly male incidence. Incidence among males relative to females is from middle age on. There is similar incidence between males and females in younger age groups.

32 / 46

slide-33
SLIDE 33

Separate bar charts

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col() + scale_fill_brewer(palette = "Dark2") + facet_grid(gender ~ age) 33 / 46

slide-34
SLIDE 34

What do we learn?

facet_grid(gender ~ age) + faceted by gender as well as age note facet_grid vs facet_wrap Easier to focus separately on males and females. 1999 outbreak mostly aected males. Growing incidence in the 25-34 age group is still aecting females but seems to be have stablised for males.

34 / 46

slide-35
SLIDE 35

Pie charts? Rose Charts

ggplot(tb_au, aes(x = year, y = count, fill = gender)) + geom_col() + scale_fill_brewer(palette="Dark2") + facet_grid(gender ~ age) + coord_polar() + theme(axis.text = element_blank()) 35 / 46

slide-36
SLIDE 36

What do we learn?

Bar charts in polar coordinates produce rose charts. coord_polar() + plot is made in polar coordinates, rather than the default Cartesian coordinates Emphasizes the middle years as low incidence.

36 / 46

slide-37
SLIDE 37

Rainbow charts?

ggplot(tb_au, aes(x = 1, y = count, fill = factor(year))) + geom_col(position = "fill") + facet_grid(gender ~ age) 37 / 46

slide-38
SLIDE 38

What do we see in the code??

A single stacked bar, in each facet. Year is mapped to colour. Notice how the mappings are dierent. A single number is mapped to x, that makes a single stacked bar chart. year is now mapped to colour (that's what gives us the rainbow charts!)

38 / 46

slide-39
SLIDE 39

What do we learn?

Pretty chart but not easy to interpret.

39 / 46

slide-40
SLIDE 40

(Actual) Pie charts

ggplot(tb_au, aes(x = 1, y = count, fill = factor(year))) + geom_col(position = "fill") + facet_grid(gender ~ age) + coord_polar(theta = "y") + theme(axis.text = element_blank()) 40 / 46

slide-41
SLIDE 41

What is dierent in the code?

coord_polar(theta="y") is using the y variable to do the angles for the polar coordinates to give a pie chart.

41 / 46

slide-42
SLIDE 42

What do we learn?

Pretty chart but not easy to interpret, or make comparisons across age groups.

42 / 46

slide-43
SLIDE 43

Using named plots, eg pie chart, bar chart, scatterplot, is like seeing animals in the zoo. The grammar of graphics allows you to dene the mapping between variables in the data, with elements of the plot. It allows us to see and understand how plots are similar or dierent. And you can see how variations in the denition create variations in the plot.

Why?

The various looks of David Bowie 43 / 46

slide-44
SLIDE 44

Your Turn:

Do the lab exercises Take the lab quiz Use the rest of the lab time to coordinate with your group on the rst assignment.

44 / 46

slide-45
SLIDE 45

References

Chapter 3 of R for Data Science Data made available from WHO Garret Aden Buie's gentle introduction to ggplot2 Mine Çetinkaya-Rundel's introduction to ggplot using star wars.

45 / 46

slide-46
SLIDE 46

Share and share alike

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 46 / 46