E x ploring categorical data E XP L OR ATOR Y DATA AN ALYSIS IN R - - PowerPoint PPT Presentation

e x ploring categorical data
SMART_READER_LITE
LIVE PREVIEW

E x ploring categorical data E XP L OR ATOR Y DATA AN ALYSIS IN R - - PowerPoint PPT Presentation

E x ploring categorical data E XP L OR ATOR Y DATA AN ALYSIS IN R Andre w Bra y Assistant Professor , Reed College Comics dataset comics # A tibble: 23,272 x 11 name id align <fctr> <fctr> <fctr>


slide-1
SLIDE 1

Exploring categorical data

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-2
SLIDE 2

EXPLORATORY DATA ANALYSIS IN R

Comics dataset

comics # A tibble: 23,272 x 11 name id align <fctr> <fctr> <fctr> 1 Spider-Man (Peter Parker) Secret Identity Good 2 Captain America (Steven Rogers) Public Identity Good 3 Wolverine (James \\"Logan\\" Howlett) Public Identity Neutral 4 Iron Man (Anthony \\"Tony\\" Stark) Public Identity Good 5 Thor (Thor Odinson) No Dual Identity Good 6 Benjamin Grimm (Earth-616) Public Identity Good 7 Reed Richards (Earth-616) Public Identity Good 8 Hulk (Robert Bruce Banner) Public Identity Good 9 Scott Summers (Earth-616) Public Identity Neutral 10 Jonathan Storm (Earth-616) Public Identity Good # ... with 23,262 more rows, and 8 more variables: eye <fctr>, # hair <fctr>, gender <fctr>, gsm <fctr>, alive <fctr>, # appearances <int>, first_appear <fctr>, publisher <fctr>

slide-3
SLIDE 3

EXPLORATORY DATA ANALYSIS IN R

Working with factors

levels(comics$align) "Bad" "Good" "Neutral" "Reformed Criminals" levels(comics$id) "No Dual" "Public" "Secret" "Unknown" # Note: NAs ignored by levels() function table(comics$id, comics$align) Bad Good Neutral Reformed Criminals No Dual 474 647 390 0 Public 2172 2930 965 1 Secret 4493 2475 959 1 Unknown 7 0 2 0

slide-4
SLIDE 4

EXPLORATORY DATA ANALYSIS IN R

slide-5
SLIDE 5

EXPLORATORY DATA ANALYSIS IN R

slide-6
SLIDE 6

EXPLORATORY DATA ANALYSIS IN R

Bar chart

library(ggplot2) # Load package ggplot(comics, aes(x = id, fill = align)) + geom_bar()

slide-7
SLIDE 7

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R

slide-8
SLIDE 8

Counts vs. proportions

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-9
SLIDE 9

EXPLORATORY DATA ANALYSIS IN R

From counts to proportions

  • ptions(scipen = 999, digits = 3) # Simplify display format

tab_cnt <- table(comics$id, comics$align) tab_cnt Bad Good Neutral No Dual 474 647 390 Public 2172 2930 965 Secret 4493 2475 959 Unknown 7 0 2 prop.table(tab_cnt) Bad Good Neutral No Dual 0.030553 0.041704 0.025139 Public 0.140003 0.188862 0.062202 Secret 0.289609 0.159533 0.061815 Unknown 0.000451 0.000000 0.000129 sum(prop.table(tab_cnt)) 1

slide-10
SLIDE 10

EXPLORATORY DATA ANALYSIS IN R

Conditional proportions

prop.table(tab_cnt, 1) Bad Good Neutral No Dual 0.314 0.428 0.258 Public 0.358 0.483 0.159 Secret 0.567 0.312 0.121 Unknown 0.778 0.000 0.222 prop.table(tab_cnt, 2) Bad Good Neutral No Dual 0.066331 0.106907 0.168394 Public 0.303946 0.484137 0.416667 Secret 0.628743 0.408956 0.414076 Unknown 0.000980 0.000000 0.000864

slide-11
SLIDE 11

EXPLORATORY DATA ANALYSIS IN R

slide-12
SLIDE 12

EXPLORATORY DATA ANALYSIS IN R

slide-13
SLIDE 13

EXPLORATORY DATA ANALYSIS IN R

slide-14
SLIDE 14

EXPLORATORY DATA ANALYSIS IN R

Conditional bar chart

ggplot(comics, aes(x = id, fill = align)) + geom_bar(position = "fill") + ylab("proportion")

slide-15
SLIDE 15

EXPLORATORY DATA ANALYSIS IN R

Conditional bar chart

ggplot(comics, aes(x = id, fill = align)) + geom_bar(position = "fill") + ylab("proportion")

slide-16
SLIDE 16

EXPLORATORY DATA ANALYSIS IN R

Conditional bar chart

ggplot(comics, aes(x = id, fill = align)) + geom_bar(position = "fill") + ylab("proportion")

slide-17
SLIDE 17

EXPLORATORY DATA ANALYSIS IN R

Conditional bar chart

ggplot(comics, aes(x = align, fill = id)) + geom_bar(position = "fill") + ylab("proportion")

slide-18
SLIDE 18

EXPLORATORY DATA ANALYSIS IN R

Conditional bar chart

ggplot(comics, aes(x = align, fill = id)) + geom_bar(position = "fill") + ylab("proportion")

slide-19
SLIDE 19

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R

slide-20
SLIDE 20

Distribution of one variable

E XP L OR ATOR Y DATA AN ALYSIS IN R

Andrew Bray

Assistant Professor, Reed College

slide-21
SLIDE 21

EXPLORATORY DATA ANALYSIS IN R

Marginal distribution

table(comics$id) No Dual Public Secret Unknown 1511 6067 7927 9 tab_cnt <- table(comics$id, comics$align) tab_cnt Bad Good Neutral No Dual 474 647 390 Public 2172 2930 965 Secret 4493 2475 959 Unknown 7 0 2

slide-22
SLIDE 22

EXPLORATORY DATA ANALYSIS IN R

Simple barchart

ggplot(comics, aes(x = id)) + geom_bar()

slide-23
SLIDE 23

EXPLORATORY DATA ANALYSIS IN R

Faceting

tab_cnt <- table(comics$id, comics$align) tab_cnt Bad Good Neutral No Dual 474 647 390 Public 2172 2930 965 Secret 4493 2475 959 Unknown 7 0 2

slide-24
SLIDE 24

EXPLORATORY DATA ANALYSIS IN R

Faceted barcharts

ggplot(comics, aes(x = id)) + geom_bar() + facet_wrap(~align)

slide-25
SLIDE 25

EXPLORATORY DATA ANALYSIS IN R

Faceting vs. stacking

slide-26
SLIDE 26

EXPLORATORY DATA ANALYSIS IN R

Faceting vs. stacking

slide-27
SLIDE 27

EXPLORATORY DATA ANALYSIS IN R

Faceting vs. stacking

slide-28
SLIDE 28

EXPLORATORY DATA ANALYSIS IN R

Faceting vs. stacking

slide-29
SLIDE 29

EXPLORATORY DATA ANALYSIS IN R

Faceting vs. stacking

slide-30
SLIDE 30

EXPLORATORY DATA ANALYSIS IN R

Pie chart vs. bar chart

slide-31
SLIDE 31

EXPLORATORY DATA ANALYSIS IN R

Pie chart vs. bar chart

slide-32
SLIDE 32

EXPLORATORY DATA ANALYSIS IN R

Pie chart vs. bar chart

slide-33
SLIDE 33

Let's practice!

E XP L OR ATOR Y DATA AN ALYSIS IN R