Visualization with scatterplots Kelly McConville Assistant - - PowerPoint PPT Presentation

visualization with scatterplots
SMART_READER_LITE
LIVE PREVIEW

Visualization with scatterplots Kelly McConville Assistant - - PowerPoint PPT Presentation

DataCamp Analyzing Survey Data in R ANALYZING SURVEY DATA IN R Visualization with scatterplots Kelly McConville Assistant Professor of Statistics DataCamp Analyzing Survey Data in R Head size and age babies <- filter(NHANESraw, AgeMonths


slide-1
SLIDE 1

DataCamp Analyzing Survey Data in R

Visualization with scatterplots

ANALYZING SURVEY DATA IN R

Kelly McConville

Assistant Professor of Statistics

slide-2
SLIDE 2

DataCamp Analyzing Survey Data in R

Head size and age

babies <- filter(NHANESraw, AgeMonths <= 6) %>% select(AgeMonths, HeadCirc) babies # A tibble: 484 x 2 AgeMonths HeadCirc <int> <dbl> 1 3 42.7 2 4 42.8 3 2 38.8 4 0 36.0 5 5 42.7 6 2 41.9 7 6 44.3 8 3 42.0 9 2 41.3 10 1 38.9 # ... with 474 more rows

slide-3
SLIDE 3

DataCamp Analyzing Survey Data in R

Scatterplots

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc)) + geom_point()

slide-4
SLIDE 4

DataCamp Analyzing Survey Data in R

Jittering

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc)) + geom_jitter(width = 0.3, height = 0)

slide-5
SLIDE 5

DataCamp Analyzing Survey Data in R

Survey-weighted scatterplots

babies <- filter(NHANESraw, AgeMonths <= 6) %>% select(AgeMonths, HeadCirc, WTMEC4YR) babies # A tibble: 484 x 3 AgeMonths HeadCirc WTMEC4YR <int> <dbl> <dbl> 1 3 42.7 12915 2 4 42.8 12791 3 2 38.8 2359 4 0 36.0 4306 5 5 42.7 2922 6 2 41.9 5561 7 6 44.3 10416 8 3 42.0 9957 9 2 41.3 4503 10 1 38.9 3718 # ... with 474 more rows

slide-6
SLIDE 6

DataCamp Analyzing Survey Data in R

Bubble plots

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, size = WTMEC4YR)) + geom_jitter(width = 0.3, height = 0) + guides(size = FALSE)

slide-7
SLIDE 7

DataCamp Analyzing Survey Data in R

Bubble plots

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, size = WTMEC4YR)) + geom_jitter(width = 0.3, height = 0, alpha = 0.3) + guides(size = FALSE)

slide-8
SLIDE 8

DataCamp Analyzing Survey Data in R

Survey-weighted scatterplots

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, color = WTMEC4YR)) + geom_jitter(width = 0.3, height = 0) + guides(color = FALSE)

slide-9
SLIDE 9

DataCamp Analyzing Survey Data in R

Survey-weighted scatterplots

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, alpha = WTMEC4YR)) + geom_jitter(width = 0.3, height = 0) + guides(alpha = FALSE)

slide-10
SLIDE 10

DataCamp Analyzing Survey Data in R

Let's practice!

ANALYZING SURVEY DATA IN R

slide-11
SLIDE 11

DataCamp Analyzing Survey Data in R

Visualizing trends

ANALYZING SURVEY DATA IN R

Kelly McConville

Assistant Professor of Statistics

slide-12
SLIDE 12

DataCamp Analyzing Survey Data in R

Scatter plots

slide-13
SLIDE 13

DataCamp Analyzing Survey Data in R

Survey-Weighted Line of Best Fit

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, alpha = WTMEC4YR)) + geom_jitter(width = 0.3, height = 0) + guides(alpha = FALSE) + geom_smooth(method = "lm", se = FALSE, mapping = aes(weight = WTMEC4YR))

slide-14
SLIDE 14

DataCamp Analyzing Survey Data in R

Trend Lines

babies <- filter(NHANESraw, AgeMonths <= 6) %>% select(AgeMonths, HeadCirc, WTMEC4YR, Gender) babies # A tibble: 484 x 4 AgeMonths HeadCirc WTMEC4YR Gender <int> <dbl> <dbl> <fct> 1 3 42.7 12915. male 2 4 42.8 12791. female 3 2 38.8 2359. female 4 0 36.0 4306. female 5 5 42.7 2922. female 6 2 41.9 5561. male 7 6 44.3 10416. female 8 3 42.0 9957. female 9 2 41.3 4503. male 10 1 38.9 3718. female # ... with 474 more rows

slide-15
SLIDE 15

DataCamp Analyzing Survey Data in R

Trend Lines

ggplot(data = babies, mapping = aes(x = AgeMonths, y = HeadCirc, alpha = WTMEC4YR, color = Gender)) + geom_jitter(width = 0.3, height = 0) + guides(alpha = FALSE) + geom_smooth(method = "lm", se = FALSE, mapping = aes(weight = WTMEC4YR))

slide-16
SLIDE 16

DataCamp Analyzing Survey Data in R

Let's practice!

ANALYZING SURVEY DATA IN R

slide-17
SLIDE 17

DataCamp Analyzing Survey Data in R

Modeling with linear regression

ANALYZING SURVEY DATA IN R

Kelly McConville

Assistant Professor of Statistics

slide-18
SLIDE 18

DataCamp Analyzing Survey Data in R

Regression line

slide-19
SLIDE 19

DataCamp Analyzing Survey Data in R

Regression line

slide-20
SLIDE 20

DataCamp Analyzing Survey Data in R

Regression equation

Regression equation is given by: = a + bx Find a and b by minimizing w (y − ) y ^

i=1

n i i

y ^i

2

slide-21
SLIDE 21

DataCamp Analyzing Survey Data in R

Fitting regression model

mod <- svyglm(HeadCirc ~ AgeMonths, design = NHANES_design) summary(mod) svyglm(formula = HeadCirc ~ AgeMonths, design = NHANES_design) Survey design: svydesign(data = NHANESraw, strata = ~SDMVSTRA, id = ~SDMVPSU, nest = TRUE, weights = ~WTMEC4YR) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 38.1376 0.2004 190.3 <2e-16 *** AgeMonths 1.0708 0.0593 18.1 <2e-16 *** (Some output omitted)

slide-22
SLIDE 22

DataCamp Analyzing Survey Data in R

Linear regression inference

Estimated regression equation is given by: = a + bx True regression equation is given by: E(y) = A + Bx E(y) is the average value of y and the variance is sd(y) = σ. y ^

slide-23
SLIDE 23

DataCamp Analyzing Survey Data in R

Linear regression inference

Null Hypothesis: Head size and age are not linearly related (i.e., B = 0). Alternative Hypothesis: Head size and age are linearly related (i.e. B ≠ 0). Test statistic: t =

mod <- svyglm(HeadCirc ~ AgeMonths, design = NHANES_design) summary(mod) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 38.1376 0.2004 190.3 <2e-16 *** AgeMonths 1.0708 0.0593 18.1 <2e-16 *** (Some Output Omitted)

SE b

slide-24
SLIDE 24

DataCamp Analyzing Survey Data in R

Let's practice!

ANALYZING SURVEY DATA IN R

slide-25
SLIDE 25

DataCamp Analyzing Survey Data in R

More complex modeling

ANALYZING SURVEY DATA IN R

Kelly McConville

Assistant Professor of Statistics

slide-26
SLIDE 26

DataCamp Analyzing Survey Data in R

Multiple linear regression

slide-27
SLIDE 27

DataCamp Analyzing Survey Data in R

Multiple linear regression

Multiple linear regression equation is given by: E(y) = B + B x + B x + … + B x

1 1 2 2 p p

babies # A tibble: 484 x 4 AgeMonths HeadCirc WTMEC4YR Gender <int> <dbl> <dbl> <fct> 1 3 42.7 12915. male 2 4 42.8 12791. female 3 2 38.8 2359. female 4 0 36.0 4306. female 5 5 42.7 2922. female 6 2 41.9 5561. male 7 6 44.3 10416. female 8 3 42.0 9957. female 9 2 41.3 4503. male 10 1 38.9 3718. female # ... with 474 more rows

slide-28
SLIDE 28

DataCamp Analyzing Survey Data in R

Multiple linear regression

Multiple linear regression equation is given by: E(y) = B + B x + B x

1 1 2 2

babies # A tibble: 484 x 4 AgeMonths HeadCirc WTMEC4YR Gender <int> <dbl> <dbl> <fct> 1 3 42.7 12915. male 2 4 42.8 12791. female 3 2 38.8 2359. female 4 0 36.0 4306. female 5 5 42.7 2922. female 6 2 41.9 5561. male 7 6 44.3 10416. female 8 3 42.0 9957. female 9 2 41.3 4503. male 10 1 38.9 3718. female # ... with 474 more rows

slide-29
SLIDE 29

DataCamp Analyzing Survey Data in R

Multiple linear regression

babies <- mutate(babies, Gender2 = case_when( Gender == "male" ~ 1, Gender == "female" ~ 0)) babies # A tibble: 484 x 5 AgeMonths HeadCirc WTMEC4YR Gender Gender2 <int> <dbl> <dbl> <fct> <dbl> 1 3 42.7 12915. male 1. 2 4 42.8 12791. female 0. 3 2 38.8 2359. female 0. 4 0 36.0 4306. female 0. 5 5 42.7 2922. female 0. 6 2 41.9 5561. male 1. 7 6 44.3 10416. female 0. 8 3 42.0 9957. female 0. 9 2 41.3 4503. male 1. 10 1 38.9 3718. female 0. # ... with 474 more rows

slide-30
SLIDE 30

DataCamp Analyzing Survey Data in R

Multiple linear regression

Multiple linear regression equation is given by: E(y) = B + B x + B x Line for males: E(y) = (B + B ) + B x Line for females: E(y) = B + B x

  • 1

1 2 2

  • 2

1 1

  • 1

1

slide-31
SLIDE 31

DataCamp Analyzing Survey Data in R

Multiple linear regression

mod <- svyglm(HeadCirc ~ AgeMonths + Gender, design = NHANES_design) summary(mod) svyglm(formula = HeadCirc ~ AgeMonths + Gender, design = NHANES_design) Survey design: svydesign(data = NHANESraw, strata = ~SDMVSTRA, id = ~SDMVPSU, nest = TRUE, weights = ~WTMEC4YR) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.48508 0.18320 204.613 < 2e-16 *** AgeMonths 1.08658 0.05379 20.200 < 2e-16 *** Gendermale 1.15034 0.16298 7.058 6.3e-08 *** (Some output omitted)

slide-32
SLIDE 32

DataCamp Analyzing Survey Data in R

Multiple linear regression

Null hypothesis: Given age is in the model, gender should not be included (B = 0). Alternative hypothesis: Given age is in the model, gender should be included (B ≠ 0). Test statistic: t =

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.48508 0.18320 204.613 < 2e-16 *** AgeMonths 1.08658 0.05379 20.200 < 2e-16 *** Gendermale 1.15034 0.16298 7.058 6.3e-08 *** (Some output omitted)

2 2 SE b2

slide-33
SLIDE 33

DataCamp Analyzing Survey Data in R

Multiple linear regression

Null hypothesis: Given gender is in the model, age should not be included (B = 0). Alternative hypothesis: Given gender is in the model, age should be included (B ≠ 0). Test statistic: t =

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.48508 0.18320 204.613 < 2e-16 *** AgeMonths 1.08658 0.05379 20.200 < 2e-16 *** Gendermale 1.15034 0.16298 7.058 6.3e-08 *** (Some output omitted)

1 1 SE b1

slide-34
SLIDE 34

DataCamp Analyzing Survey Data in R

Multiple linear regression

E(y) = B + B x + B x

  • 1

1 2 2

slide-35
SLIDE 35

DataCamp Analyzing Survey Data in R

Let's practice!

ANALYZING SURVEY DATA IN R

slide-36
SLIDE 36

DataCamp Analyzing Survey Data in R

Wrap-up

ANALYZING SURVEY DATA IN R

Kelly McConville

Assistant Professor of Statistics

slide-37
SLIDE 37

DataCamp Analyzing Survey Data in R

R packages

survey: To analyze survey data dplyr: To wrangle data ggplot2: To graph the data

slide-38
SLIDE 38

DataCamp Analyzing Survey Data in R

Course summary

Ch 1: Survey fundamentals Common design features: clustering, stratification Survey weights Telling R about your svydesign() Ch 2: Categorical data Frequency and contingency tables with svytable() Bar graphs with geom_col() Inference with svychisq()

slide-39
SLIDE 39

DataCamp Analyzing Survey Data in R

Course summary

Ch 3: Quantitative and categorical data Summary stats with svymean(), svytotal(), svyquantile() Domain estimates with svyby() Describing shape with geom_histogram(), geom_density() Inference with svyttest() Ch 4: Modeling trends Mapping survey weights in geom_point() Linear trends with geom_smooth(method = "lm") Linear regression with svyglm()

slide-40
SLIDE 40

DataCamp Analyzing Survey Data in R

Extensions

Estimating more complex population quantities. EX: svyratio() Building more complex models EX : svyglm(Diabetes ~ Age, design = NHANES_design, family =

quasibinomial)

slide-41
SLIDE 41

DataCamp Analyzing Survey Data in R

Congratulations!

ANALYZING SURVEY DATA IN R