Data visualization and graphics Dr. Nomie Becker Dr. Sonja Grath - - PDF document

data visualization and graphics
SMART_READER_LITE
LIVE PREVIEW

Data visualization and graphics Dr. Nomie Becker Dr. Sonja Grath - - PDF document

An introduction to WS 2017/2018 Data visualization and graphics Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr. Benedikt Holtmann for sharing slides for this lecture What you should know after day 6 Review: Rearranging and


slide-1
SLIDE 1

An introduction to WS 2017/2018

  • Dr. Noémie Becker
  • Dr. Sonja Grath

Special thanks to:

  • Dr. Benedikt Holtmann for sharing slides for this lecture

Data visualization and graphics

2

What you should know after day 6

Review: Rearranging and manipulating data Solutions Exercise Sheet 5 Graphics with base R

  • Histograms
  • Scatterplots
  • Boxplots

Saving plots Graphics with ggplot2

slide-2
SLIDE 2

3

Properties of variables Review

R has four different functions that tell you the type of a variable: class() typeof() mode() storage.mode()

Comparison of variable class, type, mode and storage.mode

class typeof mode storage.mode Logical logical logical logical logical Integer integer integer numeric integer Floating point numeric double numeric double Complex complex complex complex complex Numeric Matrix matrix double numeric double Character Matrix matrix character character character Categorical factor integer numeric integer

Remember: is.numeric(), is.integer, is.logical, is.matrix …

4

Reshaping data Review

Package tidyr gather() spread()

slide-3
SLIDE 3

5

Combining datasets Review

Fish survey Site Month Transect Species Water characteristics Site Month Water temp. O2 - content GPS Site Transect Latitude Longitude

Functions to combine data sets in dplyr

left_join(a, b, by = "x1") Joins matching rows from b to a right_join(a, b, by = "x1") Joins matching rows from a to b inner_join(a, b, by = "x1") Returns all rows from a where there are matching values in b full_join(a, b, by = "x1") Joins data and returns all rows and columns

6

Adding new variables

Three ways for adding a new variable (log of FID)

  • 1. Using $

Bird_Behaviour$log_FID <- log(Bird_Behaviour$FID)

  • 2. Using [ ] - operator

Bird_Behaviour[ , "log_FID"] <- log(Bird_Behaviour$FID)

  • 3. Using mutate() from the dplyr package

Bird_Behaviour <- mutate(Bird_Behaviour, log_FID = log(FID))

Review

slide-4
SLIDE 4

7

Adding new variables

Split one column into two using separate() from dplyr package Combine two columns using unite() from tidyr package

X1 X2 A 1_1 B 1_2 A 2_1 B 2_2 X1 X2.1 X2.2 A 1 1 B 1 2 A 2 1 B 2 2 X1 X2 A 1_1 B 1_2 A 2_1 B 2_2

separate() unite()

Review

8

Subsetting data

Subsetting data

  • Using [ ] – operator
  • Using subset()

# selects all rows with FID smaller than 10m subset(Bird_Behaviour, FID < 10) # selects all rows for males with FID smaller than 10m subset(Bird_Behaviour, FID < 10 & Sex == "male") # selects all rows that have a value of FID greater than 10 or less than 15. We keep only the IND, Sex and Year column subset(Bird_Behaviour, FID > 10 | FID < 15, select = c(Ind, Sex, Year))

Operator Description > greater than >= greater than or equal to < less than <= less than or equal to == equal to != not equal to x & y x and y x | y x or y

Review

slide-5
SLIDE 5

9

Graphics with base R

Simple graphics using plotting functions in the graphics package

  • Base R, installed by default
  • Easy and quick to type
  • Wide variety of functions

10

Graphics with base R

Simple graphics using plotting functions in the graphics package

  • Base R, installed by default
  • Easy and quick to type
  • Wide variety of functions

Function Description hist() Histograms plot() Scatterplots, etc. boxplot() Box- and whisker plots barplot() Bar- and column charts dotchart() Cleveland dot plots contour Contour of a surface (2D) pie() Circular pie chart …

slide-6
SLIDE 6

11

Graphics with base R

Creating a histogram with hist() Example 1: hist(Sparrows$Tarsus)

Histogram

  • f Sparrows$Tarsus

S p a r r

  • w

s $ T a r s u s F r e q u e n c y 1 9 2 0 2 1 2 2 2 3 2 4 2 5 5 1 1 5 2

12

Graphics with base R

Creating a histogram with hist() Example 2: Alter colour and the number of bins hist(Sparrows$Tarsus, col = "grey", breaks = 50)

Histogram of Sparrows$T arsus

S p a r r

  • w

s $ T a r s u s F r e q u e n c y 1 9 2 2 1 2 2 2 3 2 4 2 5 1 2 3 4 5 6

slide-7
SLIDE 7

13

Graphics with base R

Creating a histogram with hist() Example 3: Add density curve hist(Sparrows$Tarsus, col="grey", breaks = 50, freq = FALSE)

Histogram

  • f Sparrow

s$Tarsus

S p a r r

  • w

s $ T a r s u s D e n s i t y 1 9 2 0 2 1 2 2 2 3 2 4 2 5 . . 2 . 4 . 6

14

Graphics with base R

Creating a histogram with hist() Example 3: Add density curve hist(Sparrows$Tarsus, col="grey", breaks = 50, freq = FALSE) lines(density(Sparrows$Tarsus), col = "blue", lwd = 2)

Histogram of Sparrows$Tarsus

S p a r r

  • w

s $ T a r s u s D e n s i t y 1 9 2 2 1 2 2 2 3 2 4 2 5 . . 2 . 4 . 6

slide-8
SLIDE 8

15

Graphics with base R

Creating a histogram with hist() Example 4: Plot only males hist(Sparrows[Sparrows$Sex == "Male",]$Tarsus, col = "grey", breaks = 50)

Histogram

  • f Sparrows[Sparrows$Sex =

= "M ale", ]$Tarsus

S p a r r

  • w

s [ S p a r r

  • w

s $ S e x = = " M a l e " , ] $ T a r s u s F r e q u e n c y 2 2 1 2 2 2 3 2 4 2 5 1 0 2 0 3 0 4 0 5

16

Graphics with base R

Creating a scatterplot with plot()

➔ Relationship between two continuous variables

Example 1: plot(Sparrows$Wing, Sparrows$Tarsus)

5 5 6 6 5 1 9 2 2 1 2 2 2 3 2 4 2 5 S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s

slide-9
SLIDE 9

17

Graphics with base R

Creating a scatterplot with plot() Example 2: Alter axis limits and shape of symbols plot(Sparrows$Tarsus, Sparrows$Wing, xlim = c(50, 70), pch = 15, col = “blue”)

5 5 5 6 6 5 7 1 9 2 2 1 2 2 2 3 2 4 2 5 S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s

 Try yourself: ?pch 18

Graphics with base R

Creating a scatterplot with plot() Example 3: Alter the size of plotting symbols plot(Sparrows$Wing, Sparrows$Tarsus, xlim = c(50,70), cex = 1.5)

5 5 5 6 6 5 7 1 9 2 1 2 3 2 5 S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s

slide-10
SLIDE 10

19

Graphics with base R

Creating line graphs with plot() Examples:

plot(pressure$temperature, pressure$pressure) plot(pressure$temperature, pressure$pressure, type = "l")

0 5 1 5 2 5 3 5 0 2 6 p r e s s u r e $ t e mp e r a t u r e p r e s s u r e $ p r e s s u r e 0 5 1 5 2 5 3 5 0 2 6 p r e s s u r e $ t e m p e r a t u r e p r e s s u r e $ p r e s s u r e

20

Graphics with base R

Use the type argument to specify the type of plot Possible types

"p" points "l" lines "b" points connected by lines "o" points overlaid by lines "h" vertical lines from points to the zero axis "s" steps "n" nothing, only the axes

slide-11
SLIDE 11

21

Graphics with base R

Creating a boxplot with boxplot()

➔ Relationship between continuous and categorical variables

Example 1: boxplot(Wing ~ Sex, data = Sparrows)

F e m a l e M a l e 5 5 6 6 5

22

Graphics with base R

Example 2:

boxplot(Wing ~ Sex, data = Sparrows, xlab = 'Sex', # Adds label to x-axis ylab = 'Wing length (mm)', # Adds label to y-axis col=c("red", "blue"), # Adds colour ylim = c(50,70), # Changes axis limits main = "Boxplot”)) # Adds title

F e m a l e M a l e 5 5 5 6 6 5 7

Boxplot

S e x Wi n g l e n g t h ( m m )

slide-12
SLIDE 12

23

Graphics with base R

Example 2: Multiple grouping variables boxplot(Wing ~ Sex + Species, data = Sparrows, xlab = ’Species and Sex', ylab = 'Wing length (mm)', col=c("red", "blue"), ylim = c(50,70), main = ""))

F e m a l e . S E S P M a l e . S E S P F e m a l e . S S T S M a l e . S S T S 5 5 5 6 6 5 7 S p e c i e s a n d S e x Wi n g l e n g t h ( m m )

24

Graphics with base R

Common parameters in graphics

main title of the plot xlab label of x-axis ylab label of y-axis xlim range/limits of x-axis ylim range/limits of y-axis col colour of the points, bars, etc. can be character string or hexadecimal colour (e.g. #RRGGBB) breaks number of bins pch shape of symbol cex size of symbols lty line type lwd line width

slide-13
SLIDE 13

25

Multiple plots on one page

The par() function:

  • comes with an extensive list of graphical parameters you can

change (see ?par)

  • Some options are helpful; others you may never use

To plot multiple charts within the same window, you can use the mfcol or mfrow parameter For example, par(mfrow=c(2, 2) divides the graphic window into four panels (two rows and two columns) 26

Multiple plots on one page

Histogram

  • f Sparrows$Tarsus

S p a r r

  • w

s $ T a r s u s D e n s i t y 1 9 2 0 2 1 2 2 2 3 2 4 2 5 . . 2 . 4 . 6 5 5 5 6 6 5 7 1 9 2 0 2 1 2 2 2 3 2 4 2 5 S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s F e m a l e M a l e 5 5 5 6 6 5 7

Boxplot

S e x Wi n g l e n g t h ( m m ) F e m a l e . S E S P Ma l e . S S T S 5 5 5 6 6 5 7 S p e c i e s a n d S e x Wi n g l e n g t h ( m m )

Histogram

  • f Sparrows$Tarsus

S p a r r

  • w

s $ T a r s u s D e n s i t y 1 9 2 0 2 1 2 2 2 3 2 4 2 5 . . 2 . 4 . 6 5 5 5 6 6 5 7 1 9 2 0 2 1 2 2 2 3 2 4 2 5 S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s F e m a l e M a l e 5 5 5 6 6 5 7

Boxplot

S e x Wi n g l e n g t h ( m m ) F e m a l e . S E S P M a l e . S S T S 5 5 5 6 6 5 7 S p e c i e s a n d S e x Wi n g l e n g t h ( m m )

1 2 3 4 1 3 2 4

par(mfrow = c(2,2)) par(mfcol = c(2,2))

slide-14
SLIDE 14

27

Saving plots

There are several possibilities saving a plot

  • 1. dev.print()

Example: plot(x, y, ….) # Make a plot # After you are finished with the plot use: dev.print(device = pdf, file = "filename.pdf") Important: When you are done, you have to close the printing device! dev.off() # shuts down current device 28

Saving plots

  • 2. savePlot()

Example: plot(x, y, ….) # Make a plot savePlot(filename = "Figure1.pdf", type = "pdf") Important: It is possible that it does not work for your system! (uses X11 device, most Unix systems)

slide-15
SLIDE 15

29

Saving plots

  • 3. Plot directly into a fjle

Example:

# width and height are in inches pdf("Figure2.pdf", width= 4, height = 4) # You can execute multiple graphing commands hist(x) # The result of each will go into the pdf file plot(x, y, … ) dev.off()

But fjle is not printed on screen! 30

Different devices

Functions to save plots

pdf() Opens a pdf-fjle as device postscript() Opens a postscript-fjle as device png() Opens a png-fjle as device jpeg() Opens a jpeg-fjle as device tifg() Opens a tifg-fjle as device bmp() Opens a bmp-fjle as device

slide-16
SLIDE 16

31

Graphics with ggplot2

Why use ggplot2?

  • Many users, a lot of support
  • Check out the ggplot2 documentation at http://docs.ggplot2.org/
  • Very flexible and powerful
  • Sophisticated plots for publication

32

Graphics with ggplot2

To create a plot you use the ggplot() function Basic structure:

ggplot(data, # data frame with variables to plot aes(x variable, y variable)) + # specifies which variables to plot geom_object() # specifies the geometric objects

Commonly used geometric objects: Histogram: + geom_histogram() Scatterplot: + geom_point() Boxplot: + geom_boxplot()

slide-17
SLIDE 17

33

Graphics with ggplot2

Creating a histogram with ggplot() Example:

ggplot(Sparrows, aes(Tarsus)) + geom_histogram(col = "grey", binwidth = 0.1) + xlab("Tarsus length (mm)") + ylab("Frequency")

2 4 6 2 2 2 2 4

T a r s u s l e n g t h ( m m ) F r e q u e n c y

34

Graphics with ggplot2

Creating a scatterplot with ggplot() Example 1: ggplot(Sparrows, aes(x = Wing, y = Tarsus)) + geom_point()

2 2 2 2 4 5 5 6 6 5

S p a r r

  • w

s $ Wi n g S p a r r

  • w

s $ T a r s u s

slide-18
SLIDE 18

35

Graphics with ggplot2

Creating a scatterplot with ggplot() Example 2: Avoid overplotting of symbols ggplot(Sparrows, aes(x = Wing, y = Tarsus))+ geom_point(position=position_jitter(width=0.5, height=0) ) 36

Graphics with ggplot2

Creating a scatterplot with ggplot() Example 2: Avoid overplotting of symbols

2 2 2 2 4 5 5 6 6 5

Wi n g T a r s u s

slide-19
SLIDE 19

37

Graphics with ggplot2

Creating a scatterplot with ggplot() Example 3: Alter colour, shape, and size of symbols ggplot(Sparrows, aes(x = Wing, y = Tarsus, colour = Sex, shape = Species)) + geom_point(size = 2) 38

Graphics with ggplot2

Creating a scatterplot with ggplot() Example 3: Alter colour, shape, and size of symbols

2 2 2 2 4 5 5 6 6 5

Wi n g T a r s u s S p e c i e s

S E S P S S T S

S e x

F e m a l e M a l e

slide-20
SLIDE 20

39

Graphics with ggplot2

Creating a boxplot with ggplot() Example 1: ggplot(Sparrows, aes(Sex, Wing, fill=Sex)) + geom_boxplot()

5 5 6 6 5 F e m a l e M a l e

S e x Wi n g S e x

F e m a l e M a l e

40

Saving a ggplot

Save a plot from ggplot2 with print() Example 1: print a ggplot to a file # Print the plot to a pdf file data("mtcars") pdf("myplot.pdf") myplot <- ggplot(mtcars, aes(wt, mpg)) + geom_point() print(myplot) dev.off()

slide-21
SLIDE 21

41

Saving a ggplot

Save a plot from ggplot2 with ggsave() Example 2: save the last ggplot # 1. Create a plot # The plot is displayed on the screen ggplot(mtcars, aes(wt, mpg)) + geom_point() # 2. Save the plot to a pdf ggsave("myplot.pdf") 42

Graphics with ggplot2

Preparing plots for publication

  • Title and axis labels
  • Range of axes
  • Colours
  • Overall appearance (themes)
  • Text size
  • Legend
slide-22
SLIDE 22

43

Graphics with ggplot2

Preparing plots for publication

  • Title and axis labels
  • Range of axes
  • Colours
  • Overall appearance (themes)
  • Text size
  • Legend

5 5 5 6 6 5 7 F e m a l e M a l e

S e x Wi n g l e n g t h ( m m )

S p a r r

  • w

m

  • r

p h

  • l
  • g

y

44

Further reading

http://www.cookbook-r.com/ http://www.cookbook-r.com/Graphs/ http://docs.ggplot2.org/ http://r4ds.had.co.nz/