Covariance and correlation P RACTICIN G S TATIS TICS IN TERVIEW - - PowerPoint PPT Presentation

covariance and correlation
SMART_READER_LITE
LIVE PREVIEW

Covariance and correlation P RACTICIN G S TATIS TICS IN TERVIEW - - PowerPoint PPT Presentation

Covariance and correlation P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna Chmielewska Actuary Covariance and correlation PRACTICING STATISTICS INTERVIEW QUESTIONS IN R PRACTICING STATISTICS INTERVIEW QUESTIONS IN R


slide-1
SLIDE 1

Covariance and correlation

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

Zuzanna Chmielewska

Actuary

slide-2
SLIDE 2

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Covariance and correlation

slide-3
SLIDE 3

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-4
SLIDE 4

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Covariance

Formula for a sample:

cov(X,Y ) =

Formula for a population:

cov(X,Y ) = n − 1 (x − ) ⋅ (y − ) ∑i=1

n i

x

i

y n (x − ) ⋅ (y − ) ∑i=1

n i

x

i

y

slide-5
SLIDE 5

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Covariance

Formula for a sample:

cov(X,Y ) = n − 1 (x − ) ⋅ (y − ) ∑i=1

n i

x

i

y

slide-6
SLIDE 6

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Covariance

Formula for a population:

cov(X,Y ) = n (x − ) ⋅ (y − ) ∑i=1

n i

x

i

y

slide-7
SLIDE 7

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Covariance - numerical example

x = 3,x = 5,x = 7 y = 6,y = 11,y = 13 = 5 = 10 (x − ) ⋅ (y − ) = 8 (x − ) ⋅ (y − ) = 0 (x − ) ⋅ (y − ) = 6 (x − ) ⋅ (y − ) = 14 = 7

1 2 3 1 2 3

x y

1

x

1

y

2

x

2

y

3

x

3

y ∑i=1

n i

x

i

y

n−1 (x − )⋅(y − ) ∑i=1

n i

x

i

y

slide-8
SLIDE 8

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

corr(X, Y ) = σ ⋅ σ

x y

cov(X, Y )

slide-9
SLIDE 9

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-10
SLIDE 10

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-11
SLIDE 11

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-12
SLIDE 12

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-13
SLIDE 13

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-14
SLIDE 14

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation coefcient

slide-15
SLIDE 15

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Nonlinear relationships

slide-16
SLIDE 16

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Correlation does not imply causation!

slide-17
SLIDE 17

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Summary

covariance correlation coefcient

slide-18
SLIDE 18

Let's practice!

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

slide-19
SLIDE 19

Linear regression model

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

Zuzanna Chmielewska

Actuary

slide-20
SLIDE 20

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear regression model

slide-21
SLIDE 21

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear regression model

slide-22
SLIDE 22

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-23
SLIDE 23

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-24
SLIDE 24

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear regression model

y = β + β ⋅ x + ... + β ⋅ x + e

where:

y - dependent variable, x - independent variables, β - parameters, e - error.

i 1 i1 p ip i

i ij j i

slide-25
SLIDE 25

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear predictor function

= β + β ⋅ x + ... + β ⋅ x yi ^

1 i1 p ip

slide-26
SLIDE 26

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

= β + β ⋅ x yi ^

1 i

slide-27
SLIDE 27

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

= β + β ⋅ x yi ^

1 i

slide-28
SLIDE 28

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

= β + β ⋅ x yi ^

1 i

slide-29
SLIDE 29

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

= β + β ⋅ x yi ^

1 i

slide-30
SLIDE 30

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Log-transformation

Examples:

= β + β ⋅ ln(x ) + ... + β ⋅ x ln( ) = β + β ⋅ x + ... + β ⋅ x yi ^

1 i1 p ip

yi ^

1 i1 p ip

slide-31
SLIDE 31

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Assumptions

Linear relationship Normally distributed errors Homoscedastic errors Independent observations

slide-32
SLIDE 32

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear model in R

model <- lm(dist ~ speed, data = cars) print(model) Call: lm(formula = dist ~ speed, data = cars) Coefficients: (Intercept) speed

  • 17.579 3.932
slide-33
SLIDE 33

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Linear model in R

model <- lm(dist ~ speed, data = cars) new_car <- data.frame(speed = 17.5) predict(model, newdata = new_car) 1 51.23806

slide-34
SLIDE 34

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Diagnostic plots

model <- lm(dist ~ speed, data = cars) plot(model)

slide-35
SLIDE 35

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-36
SLIDE 36

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-37
SLIDE 37

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-38
SLIDE 38

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-39
SLIDE 39

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Summary

linear regression model linear predictor function

lm() in R

diagnostic plots

slide-40
SLIDE 40

Let's practice!

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

slide-41
SLIDE 41

Logistic regression model

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

Zuzanna Chmielewska

Actuary

slide-42
SLIDE 42

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic regression's application

slide-43
SLIDE 43

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-44
SLIDE 44

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-45
SLIDE 45

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic function

f(x) = 1 + e−x 1

slide-46
SLIDE 46

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic function

f(x) = ∈ (0, 1) 1 + e−x 1

slide-47
SLIDE 47

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic regression model

Probability prediction:

p = P(y = 1) =

Logit prediction:

l = ln( ) = β + β ⋅ x + ... + β ⋅ x

i i

1 + e−(β +β ⋅x +...+β ⋅x )

1 i1 p ip

1

i

1 − pi pi

1 i1 p ip

slide-48
SLIDE 48

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-49
SLIDE 49

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-50
SLIDE 50

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic regression in R

model <- glm(y ~ x, data = df, family = "binomial")

slide-51
SLIDE 51

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Logistic regression in R

model <- glm(y ~ x, data = df, family = "binomial") predict(model, newdata = new_df, type = "response")

slide-52
SLIDE 52

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Summary

logistic regression model prediction of a binary response variable logistic regression in R with glm()

slide-53
SLIDE 53

Let's practice!

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

slide-54
SLIDE 54

Model evaluation

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

Zuzanna Chmielewska

Actuary

slide-55
SLIDE 55

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-56
SLIDE 56

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-57
SLIDE 57

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-58
SLIDE 58

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-59
SLIDE 59

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-60
SLIDE 60

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

slide-61
SLIDE 61

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-62
SLIDE 62

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-63
SLIDE 63

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-64
SLIDE 64

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-65
SLIDE 65

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-66
SLIDE 66

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-67
SLIDE 67

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Cross-validation

slide-68
SLIDE 68

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Confusion matrix

slide-69
SLIDE 69

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Confusion matrix

slide-70
SLIDE 70

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Confusion matrix

slide-71
SLIDE 71

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Confusion matrix

slide-72
SLIDE 72

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Classication metrics

accuracy = precision = recall =

TP +TN+F P +F N TP +TN TP +F P TP TP +F N TP

slide-73
SLIDE 73

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Classication metrics

Precision Recall

slide-74
SLIDE 74

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Regression metrics

slide-75
SLIDE 75

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Regression metrics

slide-76
SLIDE 76

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Regression metrics

Root Mean Squared Error

RMSE =

Mean Absolute Error

MAE = ∣y − ∣ √ (y − )

n 1 ∑i=1 n i

y ^i 2

n 1 ∑i=1 n i

y ^i

slide-77
SLIDE 77

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Regression metrics

Root Mean Squared Error

RMSE =

height weight to large errors Mean Absolute Error

MAE = ∣y − ∣

straightforward interpretation

√ (y − )

n 1 ∑i=1 n i

y ^i 2

n 1 ∑i=1 n i

y ^i

slide-78
SLIDE 78

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Summary

validation set approach cross-validation confusion matrix classication metrics regression metrics

slide-79
SLIDE 79

Let's practice!

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

slide-80
SLIDE 80

Wrapping up

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R

Zuzanna Chmielewska

Actuary

slide-81
SLIDE 81

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Congratulations!

slide-82
SLIDE 82

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Chapter 1

Probability distributions: discrete distributions continuous distributions central limit theorem

slide-83
SLIDE 83

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Chapter 2

Exploratory Data Analysis: descriptive statistics categorical data time-series principal component analysis

slide-84
SLIDE 84

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Chapter 3

Statistical tests: normality tests inference for a mean comparing two means ANOVA

slide-85
SLIDE 85

PRACTICING STATISTICS INTERVIEW QUESTIONS IN R

Chapter 4

Regression models: covariance and correlation linear regression model logistic regression model model evaluation

slide-86
SLIDE 86

Best of luck!

P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R