DATA VISUALIZATION INTRODUCTION TO DATA ANALYSIS LEARNING GOALS - - PowerPoint PPT Presentation

data visualization
SMART_READER_LITE
LIVE PREVIEW

DATA VISUALIZATION INTRODUCTION TO DATA ANALYSIS LEARNING GOALS - - PowerPoint PPT Presentation

INTRODUCTION TO DATA ANALYSIS DATA VISUALIZATION INTRODUCTION TO DATA ANALYSIS LEARNING GOALS obtain a basic understanding of better/worse plotting understand the idea of hypothesis-driven visualization develop a basic understanding


slide-1
SLIDE 1

DATA VISUALIZATION

INTRODUCTION TO DATA ANALYSIS

slide-2
SLIDE 2

INTRODUCTION TO DATA ANALYSIS

LEARNING GOALS

▸ obtain a basic understanding of better/worse plotting ▸ understand the idea of hypothesis-driven visualization ▸ develop a basic understanding of the 'grammar of graphs' ▸ get familiar with frequent visualization strategies ▸ barplots, densities, violins, error bars etc. ▸ be able to fine-tune graphs for better visualization

slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

INTRODUCTION TO DATA ANALYSIS

WHY VISUALIZE?

▸ a picture can be worth a million words (and numbers) ▸ every data analysis should start with a ‘getting to know the data’ phase ▸ visualization of different aspects of data is key to get intimate with the data ▸ data visualization as a means of communication (with others) ▸ hypothesis-driven visualization: obtain visual (suggestive) evidence

regarding a research question of relevance

slide-5
SLIDE 5

INTRODUCTION TO DATA ANALYSIS

WHY VISUALIZE?

▸ a picture can be worth a million words (and numbers) ▸ summary statistics can be misleading (because of information loss) ▸ every data analysis should start with a ‘getting to know the data’ phase ▸ use extensive visualization to get intimate with the data ▸ data visualization as a means of communication (with others / with yourself) ▸ hypothesis-driven visualization: obtain visual (suggestive) evidence

regarding a research question of relevance

slide-6
SLIDE 6

INTRODUCTION TO DATA ANALYSIS

BEYOND SUMMARY STATISTICS

slide-7
SLIDE 7

INTRODUCTION TO DATA ANALYSIS

MOTIVATING EXAMPLE :: ANSCOMBE’S QUARTET

▸ famous data set, ships with core R

messy start tidy up nice!

slide-8
SLIDE 8

INTRODUCTION TO DATA ANALYSIS

MOTIVATING EXAMPLE :: ANSCOMBE’S QUARTET

input data summarise all four groups look very similar!

slide-9
SLIDE 9

INTRODUCTION TO DATA ANALYSIS

MOTIVATING EXAMPLE :: ANSCOMBE’S QUARTET

▸ quite different

patterns despite similar correlation

slide-10
SLIDE 10

The good, the bad and the info-graphic

slide-11
SLIDE 11

INTRODUCTION TO DATA ANALYSIS

PRINCIPLES OF GOOD VISUALIZATION

▸ maximize data-ink ratio (Tufte 1983) ▸ maximize information, minimize ink ▸ contra chart junk ▸ ink vs. processing effort ▸ analogy to language ▸ information flow ▸ ease of processing ▸ bound by conventional rules ▸ hypothesis-driven visualization ▸ relevance of information

slide-12
SLIDE 12

INTRODUCTION TO DATA ANALYSIS

EXAMPLE OF UNINFORMATIVE PLOTTING

slide-13
SLIDE 13

INTRODUCTION TO DATA ANALYSIS

EXAMPLE OF INFORMATIVE HYPOTHESIS-DRIVEN PLOTTING

slide-14
SLIDE 14

INTRODUCTION TO DATA ANALYSIS

EXAMPLE OF UNINFORMATIVE PLOTTING

slide-15
SLIDE 15

INTRODUCTION TO DATA ANALYSIS

EXAMPLE OF (STILL) UNINFORMATIVE PLOTTING

slide-16
SLIDE 16

INTRODUCTION TO DATA ANALYSIS

EXAMPLE OF INFORMATIVE HYPOTHESIS-DRIVEN PLOTTING

slide-17
SLIDE 17

INTRODUCTION TO DATA ANALYSIS

INFOGRAPHICS

▸ ≠ hypothesis-driven visualization ▸ purposes: ▸ memorability ▸ eye-catchiness ▸ persuasion ▸ ….

slide-18
SLIDE 18

Basics of ggplot

slide-19
SLIDE 19

INTRODUCTION TO DATA ANALYSIS

BASICS OF GGPLOT

▸ “grammar of layered graphs” ▸ incremental composition ▸ layers ▸ system of rich convenience

functions & defaults

▸ grouping ▸ multiple ways of customization

slide-20
SLIDE 20

INTRODUCTION TO DATA ANALYSIS

INCREMENTAL COMPOSITION

create a plot display the plot

  • utput 😊
slide-21
SLIDE 21

INTRODUCTION TO DATA ANALYSIS

INCREMENTAL COMPOSITION

  • utput
slide-22
SLIDE 22

INTRODUCTION TO DATA ANALYSIS

INCREMENTAL COMPOSITION

  • utput

▸ piping data into 1st argument slot ▸ declaring mapping globally for all

subsequent calls to `geom_` functions

slide-23
SLIDE 23

INTRODUCTION TO DATA ANALYSIS

FULL EXAMPLE

slide-24
SLIDE 24

INTRODUCTION TO DATA ANALYSIS

FULL EXAMPLE

title subtitle legend for group distinction y-axis label y-axis tick labels linear regression lines data points grid lines

slide-25
SLIDE 25

INTRODUCTION TO DATA ANALYSIS

FULL EXAMPLE :: CODE

slide-26
SLIDE 26

INTRODUCTION TO DATA ANALYSIS

LAYERED GRAMMAR OF GRAPHS

  • utput equivalent

▸ `geom_` functions are wrappers ▸ default stat. transform, position,

axis type etc.

▸ defaults can be overwritten

slide-27
SLIDE 27

Layers

slide-28
SLIDE 28

INTRODUCTION TO DATA ANALYSIS

LAYERS

slide-29
SLIDE 29

INTRODUCTION TO DATA ANALYSIS

LAYER ORDER

slide-30
SLIDE 30

INTRODUCTION TO DATA ANALYSIS

OPACITY

slide-31
SLIDE 31

INTRODUCTION TO DATA ANALYSIS

DIFFERENT DATA FOR DIFFERENT LAYERS

slide-32
SLIDE 32

Grouping

slide-33
SLIDE 33

INTRODUCTION TO DATA ANALYSIS

GROUPING

▸ group information for uniform

display in terms of color, shape, etc.

slide-34
SLIDE 34

INTRODUCTION TO DATA ANALYSIS

GLOBAL GROUPING

▸ global grouping applies to all

subsequent layers

slide-35
SLIDE 35

INTRODUCTION TO DATA ANALYSIS

OVERWRITING GROUPING INFORMATION

▸ overwriting grouping information

locally

slide-36
SLIDE 36

INTRODUCTION TO DATA ANALYSIS

DIFFERENT GROUPING IN DIFFERENT LAYERS

▸ each layer has its own grouping

information

slide-37
SLIDE 37

Geoms & plot types

slide-38
SLIDE 38

INTRODUCTION TO DATA ANALYSIS

SCATTER PLOTS

slide-39
SLIDE 39

INTRODUCTION TO DATA ANALYSIS

CURVE AND LINE FITS

slide-40
SLIDE 40

INTRODUCTION TO DATA ANALYSIS

LINE PLOTS

slide-41
SLIDE 41

INTRODUCTION TO DATA ANALYSIS

BAR PLOTS

slide-42
SLIDE 42

INTRODUCTION TO DATA ANALYSIS

BAR PLOTS

slide-43
SLIDE 43

INTRODUCTION TO DATA ANALYSIS

BAR PLOTS CAN BE UNDERINFORMATIVE

▸ suboptimal data-ink ratio ▸ lacks distributional information

slide-44
SLIDE 44

INTRODUCTION TO DATA ANALYSIS

BAR PLOTS CAN OKAY

▸ choice proportions ▸ with 95% bootstrapped CIs

slide-45
SLIDE 45

INTRODUCTION TO DATA ANALYSIS

HISTOGRAMS

▸ fix bins ▸ count number of data

points in each bin

▸ plot as bar

slide-46
SLIDE 46

INTRODUCTION TO DATA ANALYSIS

BOX PLOTS

▸ visualize common summary

statistics

▸ mean ▸ 25% & 75% quantile ▸ …

slide-47
SLIDE 47

INTRODUCTION TO DATA ANALYSIS

DENSITY PLOTS

▸ “generalized histogram” ▸ uses kernel estimation to

predict smoothed curves

slide-48
SLIDE 48

INTRODUCTION TO DATA ANALYSIS

VIOLIN PLOTS

▸ “mirrored density plots” ▸ good for multi-group

comparisons

slide-49
SLIDE 49

INTRODUCTION TO DATA ANALYSIS

RUG PLOTS

▸ show data points near axis

slide-50
SLIDE 50

INTRODUCTION TO DATA ANALYSIS

RUG PLOTS

▸ show data points near axis

slide-51
SLIDE 51

INTRODUCTION TO DATA ANALYSIS

ANNOTATION

slide-52
SLIDE 52

INTRODUCTION TO DATA ANALYSIS

ANNOTATION

slide-53
SLIDE 53

Faceting

slide-54
SLIDE 54

INTRODUCTION TO DATA ANALYSIS

FACET GRID

slide-55
SLIDE 55

INTRODUCTION TO DATA ANALYSIS

FACET WRAP

slide-56
SLIDE 56

Bells & whistles

slide-57
SLIDE 57

INTRODUCTION TO DATA ANALYSIS

READY-MADE THEMES

slide-58
SLIDE 58

INTRODUCTION TO DATA ANALYSIS

TWEAKING AN EXISTING THEME