save yourself Marylyn D Ritchie June 29, 2015 Regression analysis - - PowerPoint PPT Presentation
save yourself Marylyn D Ritchie June 29, 2015 Regression analysis - - PowerPoint PPT Presentation
DANGER: Understand your data, save yourself Marylyn D Ritchie June 29, 2015 Regression analysis issue In pediatric null variant PheWAS, issues with replication emerged 1) Some phenotypes were present in only discovery or replication, but
Regression analysis issue
- In pediatric null variant PheWAS, issues with
replication emerged
1) Some phenotypes were present in only discovery or replication, but not both 2) Some regression models were not coverging
- Simultaneously, AAA quick PheWAS for top
hits from GWAS noted convergence issue
- Our investigations began….
Convergence is an issue
- Evaluated previous GWAS analyses and the
majority of them had convergence issues
- What does this mean:
– Each variable in the model will have a p-value associated with it, and in some software, the regression model will also have a p-value associated with it, but the p-values may not be valid
- Bigger problem is that you need to look carefully
– Some software does not tell you the model did not converge
Comparison between PLATO, PLINK, R, Minitab, SAS
Model : rs328 and 272.1 PLATO Output:
Outcome Var1_ID Var1_Pos Var1_MAF Num_Missing Num_Cases Converged Var1_Pval Var1_beta Var1_SE LRT_Pvalue 272.1 rs328 8:19819724 G:0.0972 772 394 2.23585e-05 -0.663193 0.156416 1
PLINK Output: Same in PLINK1.07 and PLINK1.9
CHR SNP BP A1 TEST NMISS OR STAT P 8 rs328 19819724 G ADD 19728 NA NA NA Note: No warning from PLINK
R output:
Comparison between PLATO, PLINK, R, Minitab, SAS
Model : rs328 and 272.1
SAS: It provides the following warning message:
Warning: The maximum likelihood estimate may not exist. Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.
Minitab:
Comparison between PLINK1.07 and PLINK1.9
PLATO OUTPUT:
Var1_ID Var1_Pos Var1_MAF Num_Missing Num_Cases Converged Var1_Pval Var1_beta Var1_SE LRT_Pval rs11774033 8:137585559 T:0.981774 1 308 0 0.0469422 -0.570091 0.286 1
PLINK 1.07:
CHR SNP BP A1 TEST NMISS OR STAT P 8 rs11774033 137585559 C ADD 14457 NA NA NA
PLINK 1.9:
CHR SNP BP A1 TEST NMISS OR STAT P 8 rs11774033 137585559 C ADD 14457 1.799 2.046 0.04079
Why is this happening?
AAA - did not converge
Why is this happening?
Resistant Hypertension - did not converge
Why is this happening?
- Due to the nature of the imputation and the
potential for stratification, we need to adjust models for site, platform, sex, and PCs
- Having only cases or controls in some of these
strata leads to non-convergence
- If you evaluate the clinical model alone,
without SNP data, you will identify this early
- Such investigations are not typically done in
PheWAS
Corrections
- Two options to deal with this issue
1) Drop any rows from the table where you have zeros
- Caveat: may become small sample size
- Caveat: works best when your case or control group with
data present is large –no convergence issue when # cases = 10 –yes convergence issue when # controls = 100
2) Use firth regression
- Caveat: not available in PLINK
- Caveat: may run slower in R than standard regression;
(however, not slower in PLATO)
Hypothyroidism- Converged
Firth regression comparison between PLATO and SAS
ICD-9 code: 272.1 SNP: rs328 CHR:BP: 8:19819724 Allele:MAF: G:0.0972208 Num_Missing: 772 Cases: 394 Converged: 1 Raw_LRT_Pval: 2.5945569177565631e-06 SNP_Pval: 2.22132e-05 SNP_Beta: -0.651743 SNP_SE: 0.153662 LRT_Pval: 2.59456e-06 PLATO Output SAS Output
Conclusion
- Be cautious when combining data from multiple sites where
case and control contributions are biased
- Look carefully at regression results to confirm convergence