De Deal aling ing wit ith h mi missing ssing dat ata a in - PowerPoint PPT Presentation

De Deal aling ing wit ith h mi missing ssing dat ata a in in pr pract actice: ice: Met Methods, hods, app pplicati lications, ons, and nd implication plications s for or HIV IV coh ohort ort st studies udies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre de 2017 1

Wh What at is Mi is Missin ssing g or or Inc Incom omplete plete da data ta? ?

What at is Missi ssing ng or Incom omple plete te dat ata? a? Data that were intended to collect on observations but that due to different reasons were not Missing or Incomplete data collected V1 V2 V3 V4 X X X . X X X . X X X . X X X .

Do Do I I nee need d to to be be worr worried ied ab abou out t mi missin ssing g da data ta? ?

Imp mpor orta tance nce and conse onseque quences nces No universal rule to indicate the proportion of missing data producing bias or to invalid results The success of a statistical analysis in the presence of missing data will depend on the reasons why data are missing ( missing data mechanisms ) 5

Wh Whic ich h Miss Missing ing da data ta me mech chanisms anisms are are there? there?

Whic Wh ich h Miss Missing ing da data ta me mech chanisms anisms are are there? there? Missing Completly At Random (MCAR) Missing At Random (MAR) Missing Not At Random (MNAR)

Missing data mechanisms Missing completely at random (MCAR) There is no relationship between whether an observation is missing and the unseen value nor to any values (observed or missing) 𝑸 𝑺 𝒁 = 𝑸(𝑺) Missing at random (MAR) There is no relationship between whether an observation is missing and the unseen value, but it is related to some of the observed data 𝑸 𝑺 𝒁 = 𝑸(𝑺|𝒁 𝒑𝒄𝒕 ) Missing not at random (MNAR) Whether an observation is missing depends on the unseen value itself R=missing data point ; Y=Variables

Met Method hods s to to deal deal wi with th mi missin ssing g da data ta

Metho thods s to o deal eal with th miss ssing ing data ta If it is not possible to get the original value … it is necessary to face the problem with statistical techniques

Methods to deal with missing data Ad-hoc or conventional Complete- Case (CC) Indicator Method (IM) Simple mean or regression mean imputation Stochastic regression imputation  Easy implementation  No specific software  Not based on statistical principles  Might produce biased results and loss of power 11

Methods to deal with missing data Ad-hoc or Advanced or conventional complex Multiple Imputation by Chained Equations Complete- Case (CC) (MICE) Indicator Method (IM) Maximum likelihood estimation Simple mean or regression mean imputation Bayesian Methods Stochastic regression imputation Inverse Probability weighting  Easy implementation  Maximize use of available information  No specific software  More precise results (higher statistical power)  Not based on statistical principles  Depend on missing data mechanism  Might produce biased results and loss of  Some not implemented in statistical software power 12

Methods to deal with missing data Ad-hoc or Advanced or conventional complex Multiple Imputation by Chained Equations Complete- Case (CC) (MICE) Indicator Method (IM) Maximum likelihood estimation Simple mean or regression mean imputation Bayesian Methods Stochastic regression imputation Inverse Probability weighting  Easy implementation  Maximize use of available information  No specific software  More precise results (higher statistical power)  Not based on statistical principles  Depend on missing data mechanism  Might produce biased results and loss of  Some not implemented in statistical software power 13

Compl mplete ete-Case Cases Consists of restricting the statistical analyses to the cases with complete information for all the variables in the model Original Complete-cases ID Outcome Variable Complete- ID Outcome Variable Complete- Case Case 1 5 4 Yes 1 5 4 Yes 2 4 . No 5 4 5 Yes 3 . 2 No 4 3 . No 5 4 5 Yes

Ind ndica icator tor me meth thod od Creates an extra category for missing values in each incomplete, independent and categorical variable and therefore all the observations are included in the analyses Original Indicator Method ID Outcome Variable Complete- ID Outcome Variable Complete- Case Case 1 5 0 1 1 5 0 1 2 4 . 0 2 4 9 0 3 4 1 1 3 4 1 1 4 3 . 0 4 3 9 0 5 4 1 1 5 4 1 1

Si Simp mple le im imputa tation tion me meth thod ods The information collected in the sample is used to assign one value to those variables with missing values 23.5

Si Simp mple le im imputa tation tion me meth thod ods Simple mean imputation replaces each missing observation by the completers mean Regression mean imputation replaces each missing observation with the predicted values from a regression model Random or stochastic regression imputation to create an imputed value, an appropriate random residual is added to the value predicted using regression mean imputation.

Si Simp mple le im imputa tation tion me meth thod ods SOLUTION: Multiple Imputation

Mul ulti tiple ple Imp mput utation ation me meth thods ods Imputation techniques that assign several imputed values to each missing value using the following procedure:

Mul ulti tiple ple Imp mput utation ation me meth thods ods Imputation techniques that assign several imputed values to each missing value using the following procedure: IMPUTED ESTIMATOR FINAL MODEL 1 DATA 1 1 DATASET WITH MISSING FINAL VALUES ESTIMATOR ESTIMATOR M

Mul ulti tiple ple Imp mput utation ation me meth thods ods Imputation techniques that assign several imputed values to each missing value using the following procedure: IMPUTED ESTIMATOR FINAL MODEL 1 DATA 1 1 IMPUTED FINAL MODEL 2 ESTIMATOR DATA 2 2 DATASET WITH MISSING FINAL VALUES ESTIMATOR ESTIMATOR M

Mul ulti tiple ple Imp mput utation ation me meth thods ods Imputation techniques that assign several imputed values to each missing value using the following procedure: IMPUTED ESTIMATOR FINAL MODEL 1 DATA 1 1 IMPUTED FINAL MODEL 2 ESTIMATOR DATA 2 2 DATASET WITH MISSING FINAL ESTIMATORS VALUES ARE ESTIMATOR IMPUTED COMBINED FINALMODEL 3 ESTIMATOR DATA 3 3 The total variance is the sum of Within-imputation variance and Between imputation variance IMPUTED FINAL MODEL M ESTIMATOR corrected by for a finite DATA number of imputations M M

Mul ulti tiple ple Imp mput utation ation me meth thods ods Multiple Imputation by Chained Equations (MICE)

Mul ulti tiple ple Imp mput utation ation me meth thods ods Multiple Imputation by Chained Equations (MICE) A particular multiple imputation technique that allows to impute missing values in multiple variables under MAR assumption. Logistic, multinomial or ordered regression can be used instead linear regression for non-normal variables . Missing values in X 1 , X 2 , X 3 X 1 X 2 X 3 Multiple Imputation : The complete process is repeated m times

Oth ther r ad adva vanc nced ed me metho thods ds Maximum likelihood estimation models simultaneously the outcome and the reason why data are missing Bayesian methods estimate a statistical model for full data (including missingness mechanism and the outcome) Inverse Probability Weighting calculates the predicted probability for certain variable to be observed of each patient and use these weights in the outcome model

Re Real al Wo World rld Da Data ta ca case se

Different Approaches to Account for Missing Data in a Cohort of HIV-Positive Patients To compare three different methods to deal with missing data in both outcome (cause of death) and covariates in a cohort of HIV-Positive patients (CoRIS) • CoRIS ( N=10,469) • Cancer mortality Poisson regression mortality rates and rate ratios for the effect of Hepatitis C Virus coinfection • Complete-case • Indicator- Method • MICE

De Deal aling ing wit ith h mi missing ssing dat ata a in - PowerPoint PPT Presentation

De Deal aling ing wit ith h mi missing ssing dat ata a in in pr pract actice: ice: Met Methods, hods, app pplicati lications, ons, and nd implication plications s for or HIV IV coh ohort ort st studies udies Belen

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Zoonoses Online Education Proje ject Onlin line cou ourses wit ith vid videos for or th the

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

The Green Deal Tracy Vegro Director, Green Deal Contents 1. Introducing the Green Deal 2. ECO

New I nt egrat ed Modeling Modeling wit h wit h New I nt egrat ed Special Ref erence t o APEI S

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

SIG IG1510: Power Your Material Editing wit ith Substance Designer, MDL and Ir Iray Sebastien

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Vo lunte e r Na vig a to rs Co nne c ting , Ac c e ssing , Re so urc ing a nd E ng a g ing (Na

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

Engineering November 2, 2009 Innovative Solutions Through Test and Analysis-Driven Design ATA

Sahar hara a Be Beach ch Sahara ara Beach ch Perfec fect place e to connec ect wit ith

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

A Presentation on: How to Start a Local Commission on Disability In Your Community MGL

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Fishbanks Jason Jay Lecturer in Sustainability Director, Sustainability Initiative at MIT Sloan

A Surface-Syntactic UD Treebank for Naija B.Caron, M.Courtin, K.Gerdes, S.Kahane SyntaxFest 2019

Lab 2-3: LC-3b Simulator to toupper2.cod Tinghuan Chen Department of Computer Science and

Towards a constructive simplicial model of univalent foundations Nicola Gambino 1 Simon Henry 2 1

AS OF MARCH 31, 2017 KEY MESSAGES & OPERATIONAL UPDATE KEY MESSAGES & OPERATIONAL UPDATE

aiida.net Computational Materials Science in the High-Throughput Era with AiiDA and the

Sambuz

Useful Links

Newsletter

Mail Us

De Deal aling ing wit ith h mi missing ssing dat ata a in - PowerPoint PPT Presentation

De Deal aling ing wit ith h mi missing ssing dat ata a in in pr pract actice: ice: Met Methods, hods, app pplicati lications, ons, and nd implication plications s for or HIV IV coh ohort ort st studies udies Belen

6. &quot;Happy Days Are Here Again&quot;: FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Zoonoses Online Education Proje ject Onlin line cou ourses wit ith vid videos for or th the

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

The Green Deal Tracy Vegro Director, Green Deal Contents 1. Introducing the Green Deal 2. ECO

New I nt egrat ed Modeling Modeling wit h wit h New I nt egrat ed Special Ref erence t o APEI S

Bitly Link &amp; DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

SIG IG1510: Power Your Material Editing wit ith Substance Designer, MDL and Ir Iray Sebastien

Itera rati tive Dat ata a Min inin ing Jill illes V s Vreeken 26 June une 2014 2014 (TA

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Vo lunte e r Na vig a to rs Co nne c ting , Ac c e ssing , Re so urc ing a nd E ng a g ing (Na

Why EVs are key to your biz strategy now Beln Gallego ATA Insights belen.gallego@ata.email

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC

Engineering November 2, 2009 Innovative Solutions Through Test and Analysis-Driven Design ATA

Sahar hara a Be Beach ch Sahara ara Beach ch Perfec fect place e to connec ect wit ith

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

A Presentation on: How to Start a Local Commission on Disability In Your Community MGL

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Fishbanks Jason Jay Lecturer in Sustainability Director, Sustainability Initiative at MIT Sloan

A Surface-Syntactic UD Treebank for Naija B.Caron, M.Courtin, K.Gerdes, S.Kahane SyntaxFest 2019

Lab 2-3: LC-3b Simulator to toupper2.cod Tinghuan Chen Department of Computer Science and

Towards a constructive simplicial model of univalent foundations Nicola Gambino 1 Simon Henry 2 1

AS OF MARCH 31, 2017 KEY MESSAGES &amp; OPERATIONAL UPDATE KEY MESSAGES &amp; OPERATIONAL UPDATE

aiida.net Computational Materials Science in the High-Throughput Era with AiiDA and the

Sambuz

Useful Links

Newsletter

Mail Us

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Bitly Link & DAT Page Link to Digital Preservation Peer Assessment: http://bit.ly/BPE-DAT

AS OF MARCH 31, 2017 KEY MESSAGES & OPERATIONAL UPDATE KEY MESSAGES & OPERATIONAL UPDATE