Scientific Figure Design v2020-01 Simon Andrews, Anne - - PowerPoint PPT Presentation

scientific figure design
SMART_READER_LITE
LIVE PREVIEW

Scientific Figure Design v2020-01 Simon Andrews, Anne - - PowerPoint PPT Presentation

Scientific Figure Design v2020-01 Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk bhupinder.virk@babraham.ac.uk jo.montgomery@babraham.ac.uk Figures are the way your


slide-1
SLIDE 1

Scientific Figure Design

v2020-01

Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk bhupinder.virk@babraham.ac.uk jo.montgomery@babraham.ac.uk

slide-2
SLIDE 2

Figures are the way your science is presented to an audience

AAV-CTRL AAV-VEGF-C 0.8245 1.3232 1.0136 2.5644 1.3224 1.4899 1.0128 1.512 0.9644 2.6002 0.9668 2.1132 1.2296 1.3228 1.0532 1.7566

slide-3
SLIDE 3

What this course covers…

  • Theory of data visualisation

– Why do some figures work better than others? – Applying theory to common plot types

  • Ethics of data representation
  • Using graphic design
  • Practical figure editing and compositing in Inkscape
slide-4
SLIDE 4

What this course doesn’t cover…

  • How to draw graphs in specific programs

http://www.bioinformatics.babraham.ac.uk/training.html

slide-5
SLIDE 5

Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion

Consider the requirements for a figure

Clean Dataset Exploratory Analysis Generate Conclusion

Exploratory Figures Illustrative Figures Reference Figures

slide-6
SLIDE 6

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 20 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 20 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 20 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 20 1500

Histogram of log2(full.counts[[x]])

log2(full.counts[[x]]) Frequency 5 10 15 20 1500

Exploratory figures

C o n tr o l T r e a tm e n t 1 T r e a tm e n t 2 T r e a tm e n t 3 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0

V a lu e

  • Quick!
  • Complete
  • Interactive
slide-7
SLIDE 7

Reference figures

  • Complete
  • Flexible
slide-8
SLIDE 8

Illustrative figures

  • Simple
  • Clear
  • Pretty
slide-9
SLIDE 9

What makes a good figure?

  • Has a clear purpose and message

– Helps to tell a story – Adds to the text, and links to it

  • Is focused

– Don’t confuse one message with another

  • Is easy to interpret correctly

– Good data visualisation – Good design

  • Is an honest and true reflection of the data
slide-10
SLIDE 10

The theory of data visualisation

Simon Andrews, Phil Ewels simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se

slide-11
SLIDE 11

Data Visualisation

  • A scientific discipline involving the creation and study of the

visual representation of data whose goal is to communicate information clearly and efficiently to users.

  • Data Visualisation is both an art and a science.
slide-12
SLIDE 12

20 40 60 80 100 120 140 160 1 2 3 4 5 Sample A Sample B 20 40 60 80 100 120 140 160 180 1 2 3 4 5 Sample B Sample A 20 40 60 80 100 120 140 160 1 2 3 4 5 Sample A Sample B Sample A Sample B 50 100 150 1 2 3 4 5 Sample A Sample B 50 100 150 1 2 3 4 5 Sample A Sample B 20 40 60 80 100 120 140 160 5 10 15

Sample B

Sample B

Sample A Sample B 1 1 2 4 4 16 8 64 12 144

slide-13
SLIDE 13

ISBN-10: 1466508914 http://www.cs.ubc.ca/~tmm/talks.html

slide-14
SLIDE 14

Different representations have common elements

slide-15
SLIDE 15

Marks and Channels

  • Marks

– Geometric primitives

  • Lines
  • Points
  • Areas

– Used to represent data sets

  • Channels

– Graphical appearance of a mark

  • Colour
  • Length
  • Position
  • Angle

– Used to encode data

slide-16
SLIDE 16

Figures are a combination of marks and channels

0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 2 3

1 Mark = Rectangle 1 Channel = Length of longest side

1 2 3 4 5 6 7 8 9 10 2 4 6 8 10

1 Mark = Diamond shape 2 Channels = X position, Y position 1 Mark = Circle segment 1 Channel = Angle 1 Mark = Circle 4 Channels:

X position Y position Area Colour

slide-17
SLIDE 17

Golden Rules

  • Expressiveness

– Match the properties of the data and channel

  • Effectiveness

– Encode the most important information with the most effective channel

slide-18
SLIDE 18

Types of channel

  • Quantitative

– Position on scale – Length – Angle – Area – Colour (saturation) – Colour (lightness)

  • Qualitative

– Spatial Grouping – Colour (hue) – Shape

  • Quantitative

– Weight – Length – Height – Expression – Time – Density

  • Qualitative

– Treatment – Genotype – Batch

slide-19
SLIDE 19

Golden Rules

  • Expressiveness

– Match the properties of the data and channel

  • Effectiveness

– Encode the most important information with the most effective channel

slide-20
SLIDE 20

Matching the data and channel

slide-21
SLIDE 21

Colour

  • Technical representations of colour

– Red + Green + Blue (RGB) – Cyan + Magenta + Yellow + Black (CMYK)

  • Perceptual representation of colour

– Hue + Saturation + Lightness (HSL)

  • Only channel to appear in both Qualitative and Quantitative
slide-22
SLIDE 22

HSL Representation

  • Hue =

Shade of colour = Qualitative

  • Saturation =

Amount of colour = Quantitative

  • Lightness =

Amount of white = Quantitative

  • Humans have no innate quantitative perception of hue but we have learned

some (cold – hot, rainbow etc.)

  • Our perception of hue is not linear
slide-23
SLIDE 23

Types of channel

  • Quantitative

– Colour (saturation) – Colour (lightness)

  • Qualitative

– Colour (hue)

In a single plot you should modify

  • nly ONE colour parameter
slide-24
SLIDE 24

Golden Rules

  • Expressiveness

– Match the properties of the data and channel

  • Effectiveness

– Encode the most important information with the most effective channel

slide-25
SLIDE 25

Effectiveness of quantitative channels

4.5X 1.8X 2X 16X

1 2 3 4 5 6 7 8 9 10 1 2 1 2 3 4 5 6 7 8 9 10 0.9 1 1.1 2 4 6 8 10 12 14 16 18 1

7X 3.4X

slide-26
SLIDE 26

Quantitation Perception

slide-27
SLIDE 27

Golden Rules

  • Effectiveness

– Encode the most important information with the most effective channel

  • Expressiveness

– Match the properties of the data and channel

slide-28
SLIDE 28

Most Quantitative Representations

  • Bar chart
  • Stacked bar chart with common start
  • Stacked bar chart with different starts
  • Pie charts
  • Bubble plots (circular area)
  • Rectangular area
  • Colour (luminance)
  • Colour (saturation)

Good quantitation Poor quantitation

slide-29
SLIDE 29

Effectiveness of Qualitative Channels

  • If you encode categorical data are the differences between

categories easy for the user to perceive correctly?

slide-30
SLIDE 30

Colour Discrimination

  • How many colours can you discriminate?
slide-31
SLIDE 31

Colour Discrimination

  • How many colours can you discriminate?
slide-32
SLIDE 32

Colour Discrimination

slide-33
SLIDE 33

Qualitative Discrimination

  • How many (fillable) shapes can you

discriminate?

  • Can combine shape with colour, but you need

to maintain similar fillable areas

slide-34
SLIDE 34

Qualitative Discrimination

  • You can combine shape

with colour, but you need to maintain similar fillable areas

slide-35
SLIDE 35

Separability

Adding channels can adversely affect the effectiveness of existing channels

Larger points are easier to discriminate than smaller ones We tend to focus on the area of the shape rather than the height/width separately Humans are very bad at separating combined colours There is no confusion between the two channels

slide-36
SLIDE 36

Separability

slide-37
SLIDE 37

Other visual cues How can you modify your plot to improve its ease of interpretation, without changing the basic data representation?

slide-38
SLIDE 38

Other visual cues Popout

  • Sometimes you want to draw people's attention to

parts of the plot

  • We can use colours or shapes to trigger a 'popout'

reaction

  • An implicit rather than explicit cue
slide-39
SLIDE 39

Popout

(find the red circle)

slide-40
SLIDE 40

Popout

Speed of identification is independent of the number of distracting points

slide-41
SLIDE 41

Popout

Colour pops out more than shape

slide-42
SLIDE 42

Popout

Mixing channels removes the effect (Find the red circle)

slide-43
SLIDE 43

Popout Examples

slide-44
SLIDE 44

Other visual clues

10 20 30 40 50 60 70 80

slide-45
SLIDE 45

Grouping

10 20 30 40 50 60 70 80 CpG CHH CHG CpG CHH CHG CpG CHH CHG CpG CHH CHG

Exon CGI Intron Repeat

slide-46
SLIDE 46

Other visual clues

  • Is a monkey heavier than a dog?

20 40 60 80 100 120 140 aardvark cat cow dog fish horse monkey Weight (kg) 20 40 60 80 100 120 140 fish aardvark cat monkey dog cow horse Weight (kg)

slide-47
SLIDE 47

Other visual clues

20 40 60 80 100 120 140 fish aardvark cat monkey dog cow horse Weight (kg)

  • Is a monkey heavier than animal X?
slide-48
SLIDE 48

Containment / Linking

10 20 30 40 50 60 70 80 CpG CHH CHG CpG CHH CHG CpG CHH CHG CpG CHH CHG

Wild Type

10 20 30 40 50 60 70 80 CpG CHH CHG CpG CHH CHG CpG CHH CHG CpG CHH CHG

Mutant

slide-49
SLIDE 49

Containment / Linking

slide-50
SLIDE 50

How do you know if your figure is working?

slide-51
SLIDE 51

Validation

  • Always try to validate plots you create
  • You have seen your data too often to get an unbiased

view

  • Show the plot to someone not familiar with the data

– What does this plot tell you? – Is this the message you wanted to convey? – If they pick multiple points, do they choose the most important one first?

slide-52
SLIDE 52

Exercise

You will be given a series of (not very good) plots to validate. Try to think what message the plot is trying to convey and whether it is doing so effectively. Work out how you would choose to represent the data if you don’t like the way it’s presented now.

slide-53
SLIDE 53

Making effective use of common plot types

Anne Segonds-Pichon Simon Andrews Phil Ewels anne.segonds-pichon@babraham.ac.uk simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se

slide-54
SLIDE 54

Types of plot

Things you can illustrate

slide-55
SLIDE 55

Distributions

slide-56
SLIDE 56

Representing Distributions Single Samples

Histograms Density Plots

slide-57
SLIDE 57

Representing Distributions Single Samples - Bandwidth

slide-58
SLIDE 58

Representing Distributions Single Samples – Discontinuous data

1.5 1.8 2

Plotting Integer Data

slide-59
SLIDE 59

Representing Distributions Multiple Samples

slide-60
SLIDE 60

Comparisons

slide-61
SLIDE 61

Comparisons

slide-62
SLIDE 62

Error Bars

  • Standard Error of Mean (SEM)
  • How accurately is the mean calculated
  • Gets smaller with increased data
  • Good when comparing means
  • Standard Deviation (SD)
  • How well does the mean summarise the data
  • No systematic change with increased data
  • Good when comparing variability
slide-63
SLIDE 63

Setting a suitable baseline

slide-64
SLIDE 64

Relationships

slide-65
SLIDE 65

Relationships – Line Graphs

slide-66
SLIDE 66

Relationships - Scatterplots

slide-67
SLIDE 67

Composition

slide-68
SLIDE 68

A B C D T o ta l= 6 2 E A B C D E T o ta l= 6 2

Pie Charts

slide-69
SLIDE 69

Stacked Bar Charts

slide-70
SLIDE 70

Heatmaps

slide-71
SLIDE 71

Making Heatmaps Effective

  • Cluster rows and columns
  • Median centre rows
  • Diverging symmetrical colour

scheme (colourblind friendly)

  • Clear annotation
slide-72
SLIDE 72

Ethics of data representation

Simon Andrews, Anne Segonds-Pichon simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk

slide-73
SLIDE 73

What is an Ethical data visualisation?

  • Different ways of being unethical:

– not exploring/getting to know the data well enough – misusing your chosen graphical representation – deliberately showing the data in a misleading manner – choosing the ‘most representative’ image/experiment

slide-74
SLIDE 74

Is my plot ethical?

Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

slide-75
SLIDE 75

Advertising and politics are built on unethical data representation.

https://venngage.com/blog/misleading-graphs/

slide-76
SLIDE 76

Not exploring the data well enough

C o n d A C o n d B 1 0 2 0 3 0 4 0 5 0 6 0 7 0

  • One experiment: change in the variable of interest between CondA to CondB.
  • Data plotted as a bar chart.

C o n d A C o n d B 2 0 4 0 6 0 8 0 1 0 0 1 2 0

slide-77
SLIDE 77

Not exploring the data well enough

C o n tr o l T r e a tm e n t 1 T r e a tm e n t 2 T r e a tm e n t 3 2 0 4 0 6 0 8 0 1 0 0 1 2 0

V a lu e

p=0.04 p=0.32 p=0.001

Comparisons: Treatments vs. Control

C o n tr o l T r e a tm e n t 1 T r e a tm e n t 2 T r e a tm e n t 3 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0

V a lu e

Exp3 Exp4 Exp1 Exp5 Exp2

T r e a t1 T r e a t2 T r e a t3

  • 1 0 0
  • 5 0

5 0 1 0 0

S ta n d a rd is e d v a lu e s

  • Five experiments: change in the variable of interest between 3 treatments and a control.
  • Data plotted as a bar chart.
slide-78
SLIDE 78
  • Example: increase in salaries offered in the last term.

J u n e J u ly Au g S e p t O c t N o v D e c 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0 2 5 0 0 0

S a la ry

J u n e J u ly Au g S e p t O c t N o v D e c 1 9 2 0 0 1 9 4 0 0 1 9 6 0 0 1 9 8 0 0 2 0 0 0 0 2 0 2 0 0

S a la ry

Choosing the wrong axis/scale

slide-79
SLIDE 79
  • Be careful with Linear vs. logarithmic scale.

Choosing the y-axis/scale

slide-80
SLIDE 80
  • Inappropriate use of a log scale can artificially minimise differences

Choosing the y-axis/scale

slide-81
SLIDE 81

Choosing the y-axis/scale

  • Logarithmic axis should only be used for:

Lognormal data Logarithmically spaced values

slide-82
SLIDE 82

Image Manipulation

Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation

  • ‘Playing’ too much with contrast

“Adjusting the contrast/brightness

  • f a digital image is common

practice and is not considered improper if the adjustment is applied to the whole image. Adjusting the contrast/brightness

  • f only part of an image is

improper, however, and this practice can usually be spotted by someone scrutinizing a file.”

slide-83
SLIDE 83

Image Manipulation

  • Presenting bands out of context

Juxtaposing two lanes that were not next to each other in an original gel is common practice when preparing figures from hard copy photographs of the gel, and is acceptable manipulation if the figure is digital. Taking a band from one digital image and placing it in a lane in another is improper manipulation, which can usually be spotted by someone scrutinizing a file.

  • ‘Rebuilding’ a gel from several cuts
slide-84
SLIDE 84

Image Manipulation can be detected

10.1172/JCI28824

slide-85
SLIDE 85

Is my plot ethical?

Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

slide-86
SLIDE 86

Practical Design Theory

Boo Virk Simon Andrews boo.virk@babraham.ac.uk simon.andrews@babraham.ac.uk

slide-87
SLIDE 87

Why does good design matter?

  • Good design makes a great first impression
  • Good design makes for effective communication
  • Good design keeps the reader engaged

Art Palvanov (http://www.palvanov.com/)

slide-88
SLIDE 88

Planning

  • Always look at the guidelines for the journal you're submitting to

– https://www.sciencemag.org/authors/instructions-preparing-initial-manuscript – https://www.nature.com/nature/for-authors/formatting-guide – https://www.cell.com/figureguidelines

  • Huge variation in the amount of detail they provide
  • Getting things right from the start saves huge amounts of time
slide-89
SLIDE 89

General Figure Guidelines

  • Use distinct colors with comparable visibility and consider colorblind individuals by avoiding the use of red and

green for contrast. Recoloring primary data, such as fluorescence images, to color-safe combinations such as green and magenta, turquoise and red, yellow and blue or other accessible color palettes is strongly encouraged. Use of the rainbow color scale should be avoided.

  • Use solid color for filling objects and avoid hatch patterns.
  • Avoid background shading.
  • Figures divided into parts should be labeled with a lower-case, boldface 'a', 'b', etc in the top left-hand corner.

Labeling of axes, keys and so on should be in 'sentence case' (first word capitalized only) with no full stop. Units must have a space between the number and the unit, and follow the nomenclature common to your field.

  • Commas should be used to separate thousands.
  • Unusual units or abbreviations should be spelled out in full, or defined in the legend.

https://mts-ncomms.nature.com/cgi-bin/main.plex?form_type=display_auth_instructions

slide-90
SLIDE 90

Plan out your panels

  • Plan your panels before starting to

draw final figures

  • Plan to be consistent

– Multiple figures of the same type – Common colour/shape schemes – Common fonts and sizing – Common abbreviations and units – Common naming of samples / conditions

slide-91
SLIDE 91
slide-92
SLIDE 92

Alignment: We are sensitive to aligned edges, even when they are separated

50 100 150 200 Control Treatment A Treatment B 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead

slide-93
SLIDE 93

Use a grid to help align disparate parts of a figure

50 100 150 200 Control Treatment A Treatment B Control Treatment A Treatment B Dead 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B

slide-94
SLIDE 94

Don't make figures too crowded

slide-95
SLIDE 95

Don't make figures too crowded

slide-96
SLIDE 96

Don't cram too much information onto one figure

slide-97
SLIDE 97

Don’t invent your own colour schemes

Colorbrewer2.org

slide-98
SLIDE 98

If possible try to consider colour blind readers

  • Affects 1:12 men and 1:200 women worldwide
  • “If a submitted manuscript happens to go to three male

reviewers of Northern European descent, the chance that at least one will be colour blind is 22 percent.”

slide-99
SLIDE 99

See how well your figure works for colour blind people

  • Gradients are easy

to change

  • Categorical colours

are very limited

  • Basic interpretability

in black and white is ideal

Normal colour vision Protanopia http://www.color-blindness.com/coblis-color-blindness-simulator/

slide-100
SLIDE 100

Try to consider colour blind readers

slide-101
SLIDE 101

Only use plain colours as fills

  • Use a standard colour scheme
  • Optimise for colour blind people

if possible

  • Keep colours plain
slide-102
SLIDE 102

When overlaying information, make sure you have sufficient contrast

Poor contrast Good contrast Poor contrast Good contrast Vibrating colour Busy background

slide-103
SLIDE 103

Add overlays to increase contrast Poor contrast Good contrast

slide-104
SLIDE 104

Keep text and fonts simple

  • All fonts for figures should use sans serif fonts
  • All text in figures should be black or white*

sans-serif serif

Wild type Knockout Wild type Knockout

* Some journals insist on coloured text. They're wrong, but you can't fight the system

slide-105
SLIDE 105

Contrast and text

slide-106
SLIDE 106

Keep text horizontal

slide-107
SLIDE 107

Keep text horizontal

  • Numbers are small, text is big
  • All graphs still work when rotated 90o
slide-108
SLIDE 108

Keep text horizontal

slide-109
SLIDE 109

Labelling and annotation

  • Each axis is labelled
  • Axis scales are appropriate
  • Quantitative axes have units
  • Colour scheme is explained
  • Point shapes are explained

You need enough annotation that the figure is understandable on its own.

slide-110
SLIDE 110

Labelling and annotation

slide-111
SLIDE 111

Make sure all text is legible at the final printed size

6 12 18 24 30 1 2 3 4 5

6 12 18 24 30 1 2 3 4 5

6 point font is the smallest you can comfortably read (just over 2mm height on paper)

slide-112
SLIDE 112

Make sure text is legible

slide-113
SLIDE 113

When resizing be aware of what can and cannot have its aspect ratio changed

  • Things that always need to maintain

their aspect ratios:

– Images – Text – Circular objects – Axes with comparable units

X 

slide-114
SLIDE 114

Checklist

  • Consistent use of

– Figure types – Colours / Shapes – Fonts and Sizes – Names

  • Colour

– Uses a standard scheme – Colourblind friendly (if possible)

  • All figures are correctly annotated

– Axes labelled with names and units – Colours and Shapes explained

  • Text

– Sans serif font – Large enough to be legible – Ideally in black or white – Sufficient contrast to be legible