[PPT] - De ciles- na.rm=TRUE o gether o gether Alwaysp De PowerPoint Presentation

SLIDE 1

Always p lot y

ur data rst!

" Always. " - Se verus Snap e

2 / 29

Always p lot y

ur data rst!

" Always. " - Se verus Snap e

Wh y?

Outliers an d imp

ssible v

alues Determine c

rre ct

statistical appr

ach

Assumptions an d diagn

stics

Disco ver n ew  relationships 2 / 29 Often th e  most inf

rmative asp

e ct of analysis Comm unicates th e " data st

ry" th

e b est Most abuse d ar ea of quan titative science Figures c an b e  very  misleading

The Visualization P aradox

Misleading Grap hs 3 / 29

Much b etter

4 / 29

Graphical m etho d should match  level of measuremen t Lab el all ax es an d include gur e c aption Simplicity an d clarity A void of ‘ char tjunk’

K eys t

G
o d Viz'

s

5 / 29

Graphical m etho d should match  level of measuremen t Lab el all ax es an d include gur e c aption Simplicity an d clarity A void of ‘ char tjunk’ Unless th ere ar e 3 or more v ariables, a void 3D gur es (an d e ven then, a void it) Black & w hite, grayscale/pattern n e for m

st simp

le gur es

K eys t

G
o d Viz'

s

5 / 29

Data Visualizations

T ak es practic e -- tr y a bun ch of stuff

6 / 29

Data Visualizations

T ak es practic e -- tr y a bun ch of stuff Resources Edward  T ufte' s  b o oks "R  for  Data  Science"  by  Grolem und  and  Wickham "Data  Visualization  for  So cial  Science"  by  Healy

6 / 29

Coun ting th e n umb er of

c currences of unique

even ts

Cate gorical or c

n

tin uous just lik e with  tableF() an d  table1() Can se e  cen tral t endency (c

n

tin uous data) or  most c

mmon v

alue (cate gorical data) Can se e  range an d extr emes

Fre quency Distributions

                    ──────────────────────────────────────────────────────                      x       Freq CumFreq Percent CumPerc Valid  CumValid                      1       265  265     26.50%  26.50%  27.32% 27.32%                        2       222  487     22.20%  48.70%  22.89% 50.21%                        3       242  729     24.20%  72.90%  24.95% 75.15%                        4       241  970     24.10%  97.00%  24.85% 100.00%                       Missing 30   1000    3.00%   100.00%                                     ──────────────────────────────────────────────────────

7 / 29

Bar Grap h

Fre quencies an d Viz' s T

gether  ❤

8 / 29

Bar Grap h Histo gram

Fre quencies an d Viz' s T

gether  ❤

8 / 29

What d

es DISTRIBUTION m

ean?

The wa y that th e data p

in

ts ar e sc attere d

9 / 29

F

r

Con tin uous General  shap e Exceptions  (outliers) Mo des  (p eaks) Cen ter  &  spread  (chap  3) Histo gram F

r

Ca tegorical Coun ts 

f

each Percen t 

r

Rate  (adjusts  for an  ‘

ut 
f’

to  compare) Bar  char t Pie  char t 

avoid!

What d

es DISTRIBUTION m

ean?

The wa y that th e data p

in

ts ar e sc attere d

9 / 29

Let' s App ly This T

th

e Inh

Dataset

10 / 29

Reminder

11 / 29

Read in th e Data

library(tidyverse)   # the easy button library(rio)      # read in Excel files library(furniture)   # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>%    dplyr::rename_all(tolower)                   # converts all variable names to lower case

12 / 29

Read in th e Data

library(tidyverse)   # the easy button library(rio)      # read in Excel files library(furniture)   # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>%    dplyr::rename_all(tolower)                   # converts all variable names to lower case

And Cl ean It

data_clean <- data_raw %>%                        dplyr::mutate(majorF = factor(major,                                 levels= c(1, 2, 3, 4, 5),                                 labels = c("Psychology", "Premed",                                            "Biology", "Sociology",                                            "Economics"))) %>%   dplyr::mutate(coffeeF = factor(coffee,                                  levels = c(0, 1),                                  labels = c("Not a regular coffee drinker",                                             "Regularly drinks coffee")))

12 / 29

data_clean %>%                    furniture::tableF(majorF) ##  ## ───────────────────────────────────────── ##  majorF     Freq CumFreq Percent CumPerc ##  Psychology 29   29      29.00%  29.00%  ##  Premed     25   54      25.00%  54.00%  ##  Biology    21   75      21.00%  75.00%  ##  Sociology  15   90      15.00%  90.00%  ##  Economics  10   100     10.00%  100.00% ## ───────────────────────────────────────── data_clean %>%    furniture::tableF(phobia) ##  ## ───────────────────────────────────── ##  phobia Freq CumFreq Percent CumPerc ##  0      12   12      12.00%  12.00%  ##  1      15   27      15.00%  27.00%  ##  2      12   39      12.00%  39.00%  ##  3      16   55      16.00%  55.00%  ##  4      21   76      21.00%  76.00%  ##  5      11   87      11.00%  87.00%  ##  6      1    88      1.00%   88.00%  ##  7      4    92      4.00%   92.00%  ##  8      4    96      4.00%   96.00%  ##  9      1    97      1.00%   97.00%  ##  10     3    100     3.00%   100.00% ## ─────────────────────────────────────

Fre quency Distrubutions

13 / 29

Fre quency Viz' s F

r viz'

s, w e will use  ggplot2

This pr

vides th

e m

st p
werful, b

eautiful fram ework for data visualizations

14 / 29

Fre quency Viz' s F

r viz'

s, w e will use  ggplot2

This pr

vides th

e m

st p
werful, b

eautiful fram ework for data visualizations It  is  built 

n

making  layers Each  plot  has  a  " geom "  function e.g.  geom_bar()  for  bar  char ts,  geom_histogram()  for histo grams,  etc.

14 / 29

data_clean %>%    ggplot() +   aes(majorF)

Bar Char ts

15 / 29

data_clean %>%    ggplot() +   aes(majorF) data_clean %>%    ggplot() +   aes(majorF) +   geom_bar()

Bar Char ts

15 / 29

Bar Char ts

data_clean %>%    ggplot() +   aes(coffee) +   geom_bar()

16 / 29

Histo grams

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram()

17 / 29

Histo grams (chan ge n umb er of bins)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(bins = 8)

18 / 29

Histo grams (chan ge bins t

siz

e 5)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(binwidth = 5)

19 / 29

Histo grams

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4)

20 / 29

Histo grams -b y- a F actor (c

lumns)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(. ~ coffeeF)

21 / 29

Histo grams -b y- a F actor (r

ws)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(coffeeF ~ .)

22 / 29

De ciles (br eak in to 10% ch unks)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90%  ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

23 / 29

De ciles - with missin g v alues

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

24 / 29

De ciles -  na.rm = TRUE

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90),            na.rm =TRUE) ##  10%  20%  30%  40%  50%  60%  70%  80%  90%  ## 15.0 21.0 25.2 28.0 30.0 32.0 33.8 37.2 41.0

25 / 29

Quar tiles (br eak in to 4 ch unks)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(0, .25, .50, .75, 1)) ##   0%  25%  50%  75% 100%  ##    1    6    7    8   10

26 / 29

Percen tiles

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.01, .05, .173, .90)) ##    1%    5% 17.3%   90%  ##  2.98  3.00  5.00  8.10

27 / 29

Questions?

28 / 29

Next T

pic

Cen ter an d Spr ead

29 / 29

Data Visualization

Cohen Chapt er 2   

EDUC/PSY 6600

SLIDE 2

Always p lot y

ur data rst!

" Always. " - Se verus Snap e

2 / 29

SLIDE 3

Always p lot y

ur data rst!

" Always. " - Se verus Snap e

Wh y?

Outliers an d imp

ssible v

alues Determine c

rre ct

statistical appr

ach

Assumptions an d diagn

stics

Disco ver n ew  relationships 2 / 29

SLIDE 4

Often th e  most inf

rmative asp

e ct of analysis Comm unicates th e " data st

ry" th

e b est Most abuse d ar ea of quan titative science Figures c an b e  very  misleading

The Visualization P aradox

Misleading Grap hs 3 / 29

SLIDE 5

Much b etter

4 / 29

SLIDE 6

Graphical m etho d should match  level of measuremen t Lab el all ax es an d include gur e c aption Simplicity an d clarity A void of ‘ char tjunk’

K eys t

G
o d Viz'

s

5 / 29

SLIDE 7

Graphical m etho d should match  level of measuremen t Lab el all ax es an d include gur e c aption Simplicity an d clarity A void of ‘ char tjunk’ Unless th ere ar e 3 or more v ariables, a void 3D gur es (an d e ven then, a void it) Black & w hite, grayscale/pattern n e for m

st simp

le gur es

K eys t

G
o d Viz'

s

5 / 29

SLIDE 8

Data Visualizations

T ak es practic e -- tr y a bun ch of stuff

6 / 29

SLIDE 9

Data Visualizations

T ak es practic e -- tr y a bun ch of stuff Resources Edward  T ufte' s  b o oks "R  for  Data  Science"  by  Grolem und  and  Wickham "Data  Visualization  for  So cial  Science"  by  Healy

6 / 29

SLIDE 10

Coun ting th e n umb er of

c currences of unique

even ts

Cate gorical or c

n

tin uous just lik e with  tableF() an d  table1() Can se e  cen tral t endency (c

n

tin uous data) or  most c

mmon v

alue (cate gorical data) Can se e  range an d extr emes

Fre quency Distributions

                    ──────────────────────────────────────────────────────                      x       Freq CumFreq Percent CumPerc Valid  CumValid                      1       265  265     26.50%  26.50%  27.32% 27.32%                        2       222  487     22.20%  48.70%  22.89% 50.21%                        3       242  729     24.20%  72.90%  24.95% 75.15%                        4       241  970     24.10%  97.00%  24.85% 100.00%                       Missing 30   1000    3.00%   100.00%                                     ──────────────────────────────────────────────────────

7 / 29

SLIDE 11

Bar Grap h

Fre quencies an d Viz' s T

gether  ❤

8 / 29

SLIDE 12

Bar Grap h Histo gram

Fre quencies an d Viz' s T

gether  ❤

8 / 29

SLIDE 13

What d

es DISTRIBUTION m

ean?

The wa y that th e data p

in

ts ar e sc attere d

9 / 29

SLIDE 14

F

r

Con tin uous General  shap e Exceptions  (outliers) Mo des  (p eaks) Cen ter  &  spread  (chap  3) Histo gram F

r

Ca tegorical Coun ts 

f

each Percen t 

r

Rate  (adjusts  for an  ‘

ut 
f’

to  compare) Bar  char t Pie  char t 

avoid!

What d

es DISTRIBUTION m

ean?

The wa y that th e data p

in

ts ar e sc attere d

9 / 29

SLIDE 15

Let' s App ly This T

th

e Inh

Dataset

10 / 29

SLIDE 16

Reminder

11 / 29

SLIDE 17

Read in th e Data

library(tidyverse)   # the easy button library(rio)      # read in Excel files library(furniture)   # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>%    dplyr::rename_all(tolower)                   # converts all variable names to lower case

12 / 29

SLIDE 18

Read in th e Data

library(tidyverse)   # the easy button library(rio)      # read in Excel files library(furniture)   # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>%    dplyr::rename_all(tolower)                   # converts all variable names to lower case

And Cl ean It

data_clean <- data_raw %>%                        dplyr::mutate(majorF = factor(major,                                 levels= c(1, 2, 3, 4, 5),                                 labels = c("Psychology", "Premed",                                            "Biology", "Sociology",                                            "Economics"))) %>%   dplyr::mutate(coffeeF = factor(coffee,                                  levels = c(0, 1),                                  labels = c("Not a regular coffee drinker",                                             "Regularly drinks coffee")))

12 / 29

SLIDE 19

data_clean %>%                    furniture::tableF(majorF) ##  ## ───────────────────────────────────────── ##  majorF     Freq CumFreq Percent CumPerc ##  Psychology 29   29      29.00%  29.00%  ##  Premed     25   54      25.00%  54.00%  ##  Biology    21   75      21.00%  75.00%  ##  Sociology  15   90      15.00%  90.00%  ##  Economics  10   100     10.00%  100.00% ## ───────────────────────────────────────── data_clean %>%    furniture::tableF(phobia) ##  ## ───────────────────────────────────── ##  phobia Freq CumFreq Percent CumPerc ##  0      12   12      12.00%  12.00%  ##  1      15   27      15.00%  27.00%  ##  2      12   39      12.00%  39.00%  ##  3      16   55      16.00%  55.00%  ##  4      21   76      21.00%  76.00%  ##  5      11   87      11.00%  87.00%  ##  6      1    88      1.00%   88.00%  ##  7      4    92      4.00%   92.00%  ##  8      4    96      4.00%   96.00%  ##  9      1    97      1.00%   97.00%  ##  10     3    100     3.00%   100.00% ## ─────────────────────────────────────

Fre quency Distrubutions

13 / 29

SLIDE 20

Fre quency Viz' s F

r viz'

s, w e will use  ggplot2

This pr

vides th

e m

st p
werful, b

eautiful fram ework for data visualizations

14 / 29

SLIDE 21

Fre quency Viz' s F

r viz'

s, w e will use  ggplot2

This pr

vides th

e m

st p
werful, b

eautiful fram ework for data visualizations It  is  built 

n

making  layers Each  plot  has  a  " geom "  function e.g.  geom_bar()  for  bar  char ts,  geom_histogram()  for histo grams,  etc.

14 / 29

SLIDE 22

data_clean %>%    ggplot() +   aes(majorF)

Bar Char ts

15 / 29

SLIDE 23

data_clean %>%    ggplot() +   aes(majorF) data_clean %>%    ggplot() +   aes(majorF) +   geom_bar()

Bar Char ts

15 / 29

SLIDE 24

Bar Char ts

data_clean %>%    ggplot() +   aes(coffee) +   geom_bar()

16 / 29

SLIDE 25

Histo grams

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram()

17 / 29

SLIDE 26

Histo grams (chan ge n umb er of bins)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(bins = 8)

18 / 29

SLIDE 27

Histo grams (chan ge bins t

siz

e 5)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(binwidth = 5)

19 / 29

SLIDE 28

Histo grams

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4)

20 / 29

SLIDE 29

Histo grams -b y- a F actor (c

lumns)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(. ~ coffeeF)

21 / 29

SLIDE 30

Histo grams -b y- a F actor (r

ws)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(coffeeF ~ .)

22 / 29

SLIDE 31

De ciles (br eak in to 10% ch unks)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90%  ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

23 / 29

SLIDE 32

De ciles - with missin g v alues

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

24 / 29

SLIDE 33

De ciles -  na.rm = TRUE

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90),            na.rm =TRUE) ##  10%  20%  30%  40%  50%  60%  70%  80%  90%  ## 15.0 21.0 25.2 28.0 30.0 32.0 33.8 37.2 41.0

25 / 29

SLIDE 34

Quar tiles (br eak in to 4 ch unks)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(0, .25, .50, .75, 1)) ##   0%  25%  50%  75% 100%  ##    1    6    7    8   10

26 / 29

SLIDE 35

Percen tiles

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.01, .05, .173, .90)) ##    1%    5% 17.3%   90%  ##  2.98  3.00  5.00  8.10

27 / 29

SLIDE 36

Questions?

28 / 29

SLIDE 37

Next T

pic

Cen ter an d Spr ead

29 / 29

library(tidyverse) # the easy button library(rio) # read in Excel files library(furniture) # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>% dplyr::rename_all(tolower) # converts all variable names to lower case

library(tidyverse) # the easy button library(rio) # read in Excel files library(furniture) # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>% dplyr::rename_all(tolower) # converts all variable names to lower case

data_clean <- data_raw %>% dplyr::mutate(majorF = factor(major, levels= c(1, 2, 3, 4, 5), labels = c("Psychology", "Premed", "Biology", "Sociology", "Economics"))) %>% dplyr::mutate(coffeeF = factor(coffee, levels = c(0, 1), labels = c("Not a regular coffee drinker", "Regularly drinks coffee")))

data_clean %>% ggplot() + aes(majorF)

data_clean %>% ggplot() + aes(majorF) data_clean %>% ggplot() + aes(majorF) + geom_bar()

data_clean %>% ggplot() + aes(coffee) + geom_bar()

data_clean %>% ggplot() + aes(phobia) + geom_histogram()

data_clean %>% ggplot() + aes(phobia) + geom_histogram(bins = 8)

data_clean %>% ggplot() + aes(phobia) + geom_histogram(binwidth = 5)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4) + facet_grid(. ~ coffeeF)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4) + facet_grid(coffeeF ~ .)

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90% ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

data_clean %>% dplyr::pull(mathquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

data_clean %>% dplyr::pull(mathquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90), na.rm =TRUE) ## 10% 20% 30% 40% 50% 60% 70% 80% 90% ## 15.0 21.0 25.2 28.0 30.0 32.0 33.8 37.2 41.0

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(0, .25, .50, .75, 1)) ## 0% 25% 50% 75% 100% ## 1 6 7 8 10

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(.01, .05, .173, .90)) ## 1% 5% 17.3% 90% ## 2.98 3.00 5.00 8.10

library(tidyverse) # the easy button library(rio) # read in Excel files library(furniture) # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>% dplyr::rename_all(tolower) # converts all variable names to lower case

library(tidyverse) # the easy button library(rio) # read in Excel files library(furniture) # nice tables data_raw <- rio::import("Ihno_dataset.xls") %>% dplyr::rename_all(tolower) # converts all variable names to lower case

data_clean <- data_raw %>% dplyr::mutate(majorF = factor(major, levels= c(1, 2, 3, 4, 5), labels = c("Psychology", "Premed", "Biology", "Sociology", "Economics"))) %>% dplyr::mutate(coffeeF = factor(coffee, levels = c(0, 1), labels = c("Not a regular coffee drinker", "Regularly drinks coffee")))

data_clean %>% ggplot() + aes(majorF)

data_clean %>% ggplot() + aes(majorF) data_clean %>% ggplot() + aes(majorF) + geom_bar()

data_clean %>% ggplot() + aes(coffee) + geom_bar()

data_clean %>% ggplot() + aes(phobia) + geom_histogram()

data_clean %>% ggplot() + aes(phobia) + geom_histogram(bins = 8)

data_clean %>% ggplot() + aes(phobia) + geom_histogram(binwidth = 5)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4) + facet_grid(. ~ coffeeF)

data_clean %>% ggplot() + aes(mathquiz) + geom_histogram(binwidth = 4) + facet_grid(coffeeF ~ .)

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90% ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

data_clean %>% dplyr::pull(mathquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

data_clean %>% dplyr::pull(mathquiz) %>% quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90), na.rm =TRUE) ## 10% 20% 30% 40% 50% 60% 70% 80% 90% ## 15.0 21.0 25.2 28.0 30.0 32.0 33.8 37.2 41.0

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(0, .25, .50, .75, 1)) ## 0% 25% 50% 75% 100% ## 1 6 7 8 10

data_clean %>% dplyr::pull(statquiz) %>% quantile(probs = c(.01, .05, .173, .90)) ## 1% 5% 17.3% 90% ## 2.98 3.00 5.00 8.10

data_clean %>%    ggplot() +   aes(majorF)

data_clean %>%    ggplot() +   aes(majorF) data_clean %>%    ggplot() +   aes(majorF) +   geom_bar()

data_clean %>%    ggplot() +   aes(coffee) +   geom_bar()

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram()

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(bins = 8)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(binwidth = 5)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(. ~ coffeeF)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(coffeeF ~ .)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90%  ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(0, .25, .50, .75, 1)) ##   0%  25%  50%  75% 100%  ##    1    6    7    8   10

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.01, .05, .173, .90)) ##    1%    5% 17.3%   90%  ##  2.98  3.00  5.00  8.10

data_clean %>%    ggplot() +   aes(majorF)

data_clean %>%    ggplot() +   aes(majorF) data_clean %>%    ggplot() +   aes(majorF) +   geom_bar()

data_clean %>%    ggplot() +   aes(coffee) +   geom_bar()

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram()

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(bins = 8)

data_clean %>%    ggplot() +   aes(phobia) +   geom_histogram(binwidth = 5)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(. ~ coffeeF)

data_clean %>%    ggplot() +   aes(mathquiz) +   geom_histogram(binwidth = 4) +   facet_grid(coffeeF ~ .)

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90)) ## 10% 20% 30% 40% 50% 60% 70% 80% 90%  ## 4.0 6.0 6.0 7.0 7.0 8.0 8.0 8.0 8.1

data_clean %>%    dplyr::pull(mathquiz) %>%    quantile(probs = c(.10, .20, .30, .40, .50, .60, .70, .80, .90))

Error in quantile.default(., probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, : missing values and NaN's not allowed if 'na.rm' is FALSE

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(0, .25, .50, .75, 1)) ##   0%  25%  50%  75% 100%  ##    1    6    7    8   10

data_clean %>%    dplyr::pull(statquiz) %>%    quantile(probs = c(.01, .05, .173, .90)) ##    1%    5% 17.3%   90%  ##  2.98  3.00  5.00  8.10