M u ltiple e x planator y v ariables IN TE R ME D IATE STATISTIC AL - - PowerPoint PPT Presentation

m u ltiple e x planator y v ariables
SMART_READER_LITE
LIVE PREVIEW

M u ltiple e x planator y v ariables IN TE R ME D IATE STATISTIC AL - - PowerPoint PPT Presentation

M u ltiple e x planator y v ariables IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R Dann y Kaplan Instr u ctor The statisticalModeling package To e v al u ate the model , need to set v al u es for e x planator y v ariables Commonl y u se


slide-1
SLIDE 1

Multiple explanatory variables

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R

Danny Kaplan

Instructor

slide-2
SLIDE 2

INTERMEDIATE STATISTICAL MODELING IN R

The statisticalModeling package

To evaluate the model, need to set values for explanatory variables Commonly use mean, median, or mode To visualize the model, need to select several dierent levels of explanatory variables to include

# Load statisticalModeling package library(statisticalModeling)

slide-3
SLIDE 3

INTERMEDIATE STATISTICAL MODELING IN R

Using effect_size()

# Train model wage_model <- lm(wage ~ educ + sector + sex + exper, data = CPS85) # Effect size of education on wage: a slope effect_size(wage_model, ~ educ) slope educ to:educ sector sex exper 1 0.7179628 12 14.61537 prof M 15

slide-4
SLIDE 4

INTERMEDIATE STATISTICAL MODELING IN R

Using fmodel()

# A model of the probability of being married married_model <- glm(married == "Married" ~ educ * sector * sex + age, data = CPS85, family = "binomial") fmodel(married_model, ~ age + sex + sector + educ, data = CPS85, type = "response", educ = c(10, 16))

slide-5
SLIDE 5

INTERMEDIATE STATISTICAL MODELING IN R

Using fmodel()

slide-6
SLIDE 6

INTERMEDIATE STATISTICAL MODELING IN R

Designing graphs of models

fmodel(married_model, ~ age + sex + sector + educ, data = CPS85, type = "response", educ = c(10, 16))

  • 1. Response variable always on y-axis
  • 2. Explanatory variables of primary interest on x-axis
  • 3. Choose one, two, or three variables you want in display
  • 4. If others, choose a xed value that's of interest

fmodel() does 2-3 automatically and 4 either automatically or manually

slide-7
SLIDE 7

Let's practice!

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R

slide-8
SLIDE 8

Categorical response variables

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R

Danny Kaplan

Instructor

slide-9
SLIDE 9

INTERMEDIATE STATISTICAL MODELING IN R

The question at hand

For a quantitative response variable and a... Quantitative explanatory variable Eect size is a rate Categorical explanatory variable Eect size is a dierence But what happens when the response variable is categorical?

slide-10
SLIDE 10

INTERMEDIATE STATISTICAL MODELING IN R

Model output for categorical response

Two ways to frame the output: As categories or classes As probabilities

slide-11
SLIDE 11

INTERMEDIATE STATISTICAL MODELING IN R

Example: marital status

# Create model and set inputs married_model <- rpart(married ~ educ + sex + age, data = CPS85, cp = 0.005) # Output as a category (i.e. class) evaluate_model(married_model, type = "class", age = c(25, 30), educ = 12, sex = "F") educ sex age model_output 1 12 F 25 Married 2 12 F 30 Married

slide-12
SLIDE 12

INTERMEDIATE STATISTICAL MODELING IN R

Example: marital status

# Output as a probability evaluate_model(married_model, type = "prob", age = c(25, 30), educ = 12, sex = "F") educ sex age model_output.Married model_output.Single 1 12 F 25 0.6333333 0.3666667 2 12 F 30 0.7425743 0.2574257

Extra 5 years of age associated with 11% increase in probability of being married

slide-13
SLIDE 13

Let's practice!

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R

slide-14
SLIDE 14

Interactions among explanatory variables

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R

Danny Kaplan

Instructor

slide-15
SLIDE 15

INTERMEDIATE STATISTICAL MODELING IN R

Interaction

Eect size of one variable may change with the other explanatory variables

slide-16
SLIDE 16

INTERMEDIATE STATISTICAL MODELING IN R

Probability of being married

married_model <- glm(married == "Married" ~ educ * sector * sex + age, data = CPS85, family = "binomial") fmodel(married_model, ~ age + sex + sector + educ, data = CPS85, type = "response")

slide-17
SLIDE 17

INTERMEDIATE STATISTICAL MODELING IN R

Interactions and model architecture

lm() includes interactions only if you ask for them rpart() has interactions built into the method

slide-18
SLIDE 18

INTERMEDIATE STATISTICAL MODELING IN R

World swimming records

slide-19
SLIDE 19

INTERMEDIATE STATISTICAL MODELING IN R

World swimming records

slide-20
SLIDE 20

INTERMEDIATE STATISTICAL MODELING IN R

World swimming records

slide-21
SLIDE 21

INTERMEDIATE STATISTICAL MODELING IN R

World swimming records

mod1 <- rpart(time ~ sex + year, data = SwimRecords) mod2 <- lm(time ~ sex + year, data = SwimRecords)

slide-22
SLIDE 22

INTERMEDIATE STATISTICAL MODELING IN R

Formulas with interactions

Must specify interaction explicitly in lm()

mod3 <- lm(time ~ sex * year, data = SwimRecords)

slide-23
SLIDE 23

INTERMEDIATE STATISTICAL MODELING IN R

Does an interaction improve a model?

Use cross validation to see which is beer: mod2: ~ year + sex vs. mod3: ~ year * sex

t.test(mse ~ model, data = cv_pred_error(mod2, mod3)) data: mse by model t = 20, df = 18, p-value = 1.323e-13 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 4.2 5.2 sample estimates: mean in group mod2 mean in group mod3 17 12

slide-24
SLIDE 24

Let's practice!

IN TE R ME D IATE STATISTIC AL MOD E L IN G IN R