Workshop 4 Data visualisation 2 Learning Objectives By following - - PowerPoint PPT Presentation

workshop 4
SMART_READER_LITE
LIVE PREVIEW

Workshop 4 Data visualisation 2 Learning Objectives By following - - PowerPoint PPT Presentation

1 Workshop 4 Data visualisation 2 Learning Objectives By following the slides and applying the techniques to select examples from the workbook the successful student will be able to: explain what is important in choosing a figure


slide-1
SLIDE 1

Workshop 4

Data visualisation

1

slide-2
SLIDE 2

Learning Objectives

By following the slides and applying the techniques to select examples from the workbook the successful student will be able to:

  • explain what is important in choosing a figure
  • determine which variables are best mapped to which elements of a figure
  • explain the fundamentals of ggplot and recognise it as a 'tidy' package
  • create, with ggplot, appropriate figures to accompany lm and glm analyses of up to

three explanatory variables which show the data, the main statistical model and the results of any post-hoc testing as appropriate.

2

slide-3
SLIDE 3

Key ideas in data visualisation

  • communicate information clearly, efficiently, honestly

(without distortion)

  • Should help the reader understand the data, the analysis, and

the results

  • Make it as easy as possible to make relevant comparisons
  • Minimise ink
  • See Edward Tufte

3

slide-4
SLIDE 4

Key ideas in data visualisation

  • Variables to elements

– Response variable almost always on the vertical axis – Explanatory variables: horizontal axis, colour, shape, size,

  • facets. Consider the variable type
  • Ideally plot all the data and the model
  • Or model and additional summary of data

4

slide-5
SLIDE 5

Data visualisation to enjoy

David McCandless http://www.informationisbeautiful.net/ Hans Rosling’s Gapminder

5

slide-6
SLIDE 6

ggplot2

  • ‘Tidy’ datasets
  • are easy to manipulate, model and visualize
  • have a specific structure: each variable is a column, each
  • bservation is a row, and each type of observational unit is a

table

Wickham, H. (2014), “Tidy Data,” Journal of Statistical Software, 59, available at http://www.jstatsoft.org/article/view/v059i10 “tidy datasets are all alike but every messy dataset is messy in its own way”

6

slide-7
SLIDE 7

Tidy data

Each variable is in a named column Each row is an

  • bservation

Easy to explore, plot, model, report. Easy way to think about data. Several powerful packages exist.

7

slide-8
SLIDE 8

Keys to understanding ggplot

8

ggplot() ggplot(data =clover) ggplot(clover, aes(x = yarrow.s, y = clov.y))

Empty plot

The data

'data.frame':30 obs. of 3 variables: $ cycle : Factor w/ 3 levels "A","B","C": $ clov.y : num 14 50.7 11.4 23.1 32.2 18.5 $ yarrow.s: int 220 20 510 40 120 300 60 10

The aesthetic maps variables to axes ….but it doesn’t know what to plot

slide-9
SLIDE 9

Keys to understanding ggplot

9 ggplot(clover, aes(x = yarrow.s, y = clov.y)) + geom_point()

geoms say what what the data should be plotted as

ggplot(clover, aes(x = yarrow.s, y = clov.y)) + geom_bar(stat = "identity") ggplot(clover, aes(x = yarrow.s, y = clov.y)) + geom_line()

...but will plot what you tell it, sensible or not

slide-10
SLIDE 10

Keys to understanding ggplot

10

  • You can have as many geoms as you want
  • geoms use the aes() previously defined, or

you can add

  • geoms have a default ‘stat’ often count or

identity

slide-11
SLIDE 11

ggplot

  • Axes: xlim(), xlab()
  • Annotations
  • Themes
  • Code layout
  • google

11

slide-12
SLIDE 12

ggplot

All the figures you’ve ever done The cookbook for R ggplot2 cheatsheet The R Graph Gallery Googling and more googling

12

slide-13
SLIDE 13

Summary

  • Making the data, the analysis and the result easier to

understand is the most important thing

  • The response nearly always goes on the vertical; explanatory

variables are mapped to the horizontal axis, colours, shapes, sizes, facets

  • ggplot is awesome

13