Vis u ali z ing bi v ariate relationships C OR R E L ATION AN D R - - PowerPoint PPT Presentation

vis u ali z ing bi v ariate relationships
SMART_READER_LITE
LIVE PREVIEW

Vis u ali z ing bi v ariate relationships C OR R E L ATION AN D R - - PowerPoint PPT Presentation

Vis u ali z ing bi v ariate relationships C OR R E L ATION AN D R E G R E SSION IN R Ben Ba u mer Assistant Professor at Smith College Bi v ariate relationships Both v ariables are n u merical Response v ariable a . k . a . y, dependent E x


slide-1
SLIDE 1

Visualizing bivariate relationships

C OR R E L ATION AN D R E G R E SSION IN R

Ben Baumer

Assistant Professor at Smith College

slide-2
SLIDE 2

CORRELATION AND REGRESSION IN R

Bivariate relationships

Both variables are numerical Response variable a.k.a. y, dependent Explanatory variable Something you think might be related to the response a.k.a. x, independent, predictor

slide-3
SLIDE 3

CORRELATION AND REGRESSION IN R

Graphical representations

Put response on vertical axis Put explanatory on horizontal axis

slide-4
SLIDE 4

CORRELATION AND REGRESSION IN R

Scatterplot

ggplot(data = possum, aes(y = totalL, x = tailL)) + geom_point()

slide-5
SLIDE 5

CORRELATION AND REGRESSION IN R

Scatterplot

ggplot(data = possum, aes(y = totalL, x = tailL)) + geom_point() + scale_x_continuous("Length of Possum Tail (cm)") + scale_y_continuous("Length of Possum Body (cm)")

slide-6
SLIDE 6

CORRELATION AND REGRESSION IN R

Bivariate relationships

Can think of boxplots as scaerplots… …but with discretized explanatory variable

cut() function discretizes

Choose appropriate number of "boxes"

slide-7
SLIDE 7

CORRELATION AND REGRESSION IN R

Scatterplot

ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) geom_point()

slide-8
SLIDE 8

CORRELATION AND REGRESSION IN R

Scatterplot

ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) geom_boxplot()

slide-9
SLIDE 9

Let's practice!

C OR R E L ATION AN D R E G R E SSION IN R

slide-10
SLIDE 10

Characterizing bivariate relationships

C OR R E L ATION AN D R E G R E SSION IN R

Ben Baumer

Assistant Professor at Smith College

slide-11
SLIDE 11

CORRELATION AND REGRESSION IN R

Characterizing bivariate relationships

Form (e.g. linear, quadratic, non-linear) Direction (e.g. postive, negative) Strength (how much scaer/noise?) Outliers

slide-12
SLIDE 12

CORRELATION AND REGRESSION IN R

slide-13
SLIDE 13

CORRELATION AND REGRESSION IN R

Sign legibility

slide-14
SLIDE 14

CORRELATION AND REGRESSION IN R

NIST

slide-15
SLIDE 15

CORRELATION AND REGRESSION IN R

NIST 2

slide-16
SLIDE 16

CORRELATION AND REGRESSION IN R

Non-linear

slide-17
SLIDE 17

CORRELATION AND REGRESSION IN R

Fan shape

slide-18
SLIDE 18

Let's practice!

C OR R E L ATION AN D R E G R E SSION IN R

slide-19
SLIDE 19

Outliers

C OR R E L ATION AN D R E G R E SSION IN R

Ben Baumer

Assistant Professor at Smith College

slide-20
SLIDE 20

CORRELATION AND REGRESSION IN R

Outliers

ggplot(data = mlbBat10, aes(x = SB, y = HR)) + geom_point()

slide-21
SLIDE 21

CORRELATION AND REGRESSION IN R

Add transparency

ggplot(data = mlbBat10, aes(x = SB, y = HR)) + geom_point(alpha = 0.5)

slide-22
SLIDE 22

CORRELATION AND REGRESSION IN R

Add some jitter

ggplot(data = mlbBat10, aes(x = SB, y = HR)) + geom_point(alpha = 0.5, position = "jitter")

slide-23
SLIDE 23

CORRELATION AND REGRESSION IN R

Add some jitter

ggplot(data = mlbBat10, aes(x = SB, y = HR)) + geom_point(alpha = 0.5, position = "jitter")

slide-24
SLIDE 24

CORRELATION AND REGRESSION IN R

Identify the outliers

mlbBat10 %>% filter(SB > 60 | HR > 50) %>% select(name, team, position, SB, HR) name team position SB HR 1 J Pierre CWS OF 68 1 2 J Bautista TOR OF 9 54

slide-25
SLIDE 25

Let's practice!

C OR R E L ATION AN D R E G R E SSION IN R