Research questions... Research questions... Could missing data - - PowerPoint PPT Presentation

▶

Aug 11, 2023 241 likes •288 views

Research questions... Research questions... Could missing data method change the quality of the results obtained from a Customer Satisfaction market study? Could standard or classical imputation methods be applied no matter the rate of non

SLIDE 1

Cordeiro, C.; Machás, A. and Neves, M.

The user R Conference, Wien, Austria, June 15-17, 2006

Missing Data, PLS and Bootstrap: Missing Data, PLS and Bootstrap: A Magical Recipe? A Magical Recipe?

Research questions... Research questions...

Could missing data method change the quality of the results

btained from a Customer Satisfaction market study?

Could standard or classical imputation methods be applied no matter the rate of non responses? Could Bootstrap improve quality of estimates?

Missing Data Missing Data

Standard practices to treat non-responses are not statistically justified and could result in biased estimates Data imputation methods are used for reconstructing the incomplete data to obtain a complete data set to produce more accurate estimates. Most common methods to treat missing data are:

Mean imputation Listwise deletion Pairwise deletion Maximum Likelihood

Missing Data Methods Missing Data Methods

Multiple Imputation (MI) Maximum Likelihood (ML) Expectation Maximization (EM)

MODEL BASED METHODS

Mean, Modal and Median Nearest Neighbour (NN)

IMPUTATION METHODS

SLIDE 2

Missing Data and Bootstrap Missing Data and Bootstrap

Efron(1994) uses the extensive imputation theory developed by Rubin(1987) The simplest nonparametric bootstrap approach:

The rows in the original data matrix are resampled with replacement A bootstrap matrix is obtained and a bootstrap estimate is calculated for the parameter in

study So an extensive computer work is performed, repeating the above procedure several times; a large number of estimates are calculated and imputed in the original data.

Case Study: Case Study: Bootstrap and SEM-PLS on CSM

Bootstrap and SEM-PLS on CSM

ACSI Model for Mobile Telecom (Fornell, C) SEM estimated with PLS algorithm (Chin, W) Data treatment for missing data: Standard procedure Mean imputation

Image Expect Quality Value CS CL

ACSI Model, with Image like in EPSI Model

Customer Loyalty

Customer Satisfaction

Value

Quality

Expectations

Image

STRUCTURAL MODEL
MEASUREMENT MODEL (number of questions)

Methodological aspects Methodological aspects

Compared scenarios: Compared scenarios:

 10% Rate of non responses from Original Data Matrix X = 1st scenario  50% Rate of non-responses from Simulated Data Matrix Y= 2nd scenario

1 2 5 3 6 4 8 10 1 1 7 6 10 7 4 2 6 3 4 5 4 6 46 50 55 57 52 50 55 54 53 44 52 55 56 52 48 52 50 46 50 50 51 49 56

10 20 30 40 50 60

Rate of non responses (%) 1st scenario 2nd scenario

Using R Using R

Bootstrap application in R Step1: matrix rows are resampled with replacement; Step2: a bootstrap sample is obtained; Step3: a bootstrap estimate is computed according to the missing data method; Step4: go to step1.

SLIDE 3

Using R Using R

This procedure was repeated r=5000 times;  Missing values in scenarios 1 and 2, are replaced with new

estimates generated by 5000 replications;

Then, a new PLS estimation is performed.

Both scenarios, using bootstrap methodology, were compared with the classical situation (CSM estimation based on PLS, where Mean Imputation is the ad hoc procedure adopted for ECSI/EPSI model).

Case Study questions... Case Study questions...

? How the classical missing data techniques perform for the two scenarios ? How the Bootstrap perform with the missing data techniques for the two scenarios ? What conclusion based on quality measures of model adjustment like: RSquared, Residual Variance….

Conclusion Conclusion

 1st Scenario (10%): Bootstrap methodology doesn't increase the

quality of estimates

2nd Scenario (50%): Bootstrap methodology used with Hot Deck

Imputation and K Nearest Neighbor achieves good results Overall, it was seen that for a higher non- response rates, bootstrap is the best method to be adopted in case of missing data completely at random.

The work still goes on... The work still goes on...

 Perform an extensive theoretical work  Improve some performance methods  Explore other bootstrap approaches to the estimation in the problem

f missing

Cordeiro, C.; Machás, A. and Neves, M.

Missing Data, PLS and Bootstrap: Missing Data, PLS and Bootstrap: A Magical Recipe? A Magical Recipe?

Research questions... Research questions...

Could missing data method change the quality of the results

Could standard or classical imputation methods be applied no matter the rate of non responses? Could Bootstrap improve quality of estimates?

Missing Data Missing Data

Standard practices to treat non-responses are not statistically justified and could result in biased estimates Data imputation methods are used for reconstructing the incomplete data to obtain a complete data set to produce more accurate estimates. Most common methods to treat missing data are:

Missing Data Methods Missing Data Methods

MODEL BASED METHODS

IMPUTATION METHODS

Missing Data and Bootstrap Missing Data and Bootstrap

Efron(1994) uses the extensive imputation theory developed by Rubin(1987) The simplest nonparametric bootstrap approach:

Case Study: Case Study: Bootstrap and SEM-PLS on CSM

Bootstrap and SEM-PLS on CSM

ACSI Model for Mobile Telecom (Fornell, C) SEM estimated with PLS algorithm (Chin, W) Data treatment for missing data: Standard procedure Mean imputation

Methodological aspects Methodological aspects

Compared scenarios: Compared scenarios:

Using R Using R

Bootstrap application in R Step1: matrix rows are resampled with replacement; Step2: a bootstrap sample is obtained; Step3: a bootstrap estimate is computed according to the missing data method; Step4: go to step1.

Using R Using R

This procedure was repeated r=5000 times;  Missing values in scenarios 1 and 2, are replaced with new

estimates generated by 5000 replications;

Then, a new PLS estimation is performed.

Both scenarios, using bootstrap methodology, were compared with the classical situation (CSM estimation based on PLS, where Mean Imputation is the ad hoc procedure adopted for ECSI/EPSI model).

Case Study questions... Case Study questions...

? How the classical missing data techniques perform for the two scenarios ? How the Bootstrap perform with the missing data techniques for the two scenarios ? What conclusion based on quality measures of model adjustment like: RSquared, Residual Variance….

Conclusion Conclusion

 1st Scenario (10%): Bootstrap methodology doesn't increase the

quality of estimates

2nd Scenario (50%): Bootstrap methodology used with Hot Deck

Imputation and K Nearest Neighbor achieves good results Overall, it was seen that for a higher non- response rates, bootstrap is the best method to be adopted in case of missing data completely at random.

The work still goes on... The work still goes on...

 Perform an extensive theoretical work  Improve some performance methods  Explore other bootstrap approaches to the estimation in the problem

THANK YOU