save yourself Marylyn D Ritchie June 29, 2015 Regression analysis - - PowerPoint PPT Presentation

save yourself
SMART_READER_LITE
LIVE PREVIEW

save yourself Marylyn D Ritchie June 29, 2015 Regression analysis - - PowerPoint PPT Presentation

DANGER: Understand your data, save yourself Marylyn D Ritchie June 29, 2015 Regression analysis issue In pediatric null variant PheWAS, issues with replication emerged 1) Some phenotypes were present in only discovery or replication, but


slide-1
SLIDE 1

DANGER: Understand your data, save yourself

Marylyn D Ritchie June 29, 2015

slide-2
SLIDE 2

Regression analysis issue

  • In pediatric null variant PheWAS, issues with

replication emerged

1) Some phenotypes were present in only discovery or replication, but not both 2) Some regression models were not coverging

  • Simultaneously, AAA quick PheWAS for top

hits from GWAS noted convergence issue

  • Our investigations began….
slide-3
SLIDE 3

Convergence is an issue

  • Evaluated previous GWAS analyses and the

majority of them had convergence issues

  • What does this mean:

– Each variable in the model will have a p-value associated with it, and in some software, the regression model will also have a p-value associated with it, but the p-values may not be valid

  • Bigger problem is that you need to look carefully

– Some software does not tell you the model did not converge

slide-4
SLIDE 4

Comparison between PLATO, PLINK, R, Minitab, SAS

Model : rs328 and 272.1 PLATO Output:

Outcome Var1_ID Var1_Pos Var1_MAF Num_Missing Num_Cases Converged Var1_Pval Var1_beta Var1_SE LRT_Pvalue 272.1 rs328 8:19819724 G:0.0972 772 394 2.23585e-05 -0.663193 0.156416 1

PLINK Output: Same in PLINK1.07 and PLINK1.9

CHR SNP BP A1 TEST NMISS OR STAT P 8 rs328 19819724 G ADD 19728 NA NA NA Note: No warning from PLINK

R output:

slide-5
SLIDE 5

Comparison between PLATO, PLINK, R, Minitab, SAS

Model : rs328 and 272.1

SAS: It provides the following warning message:

Warning: The maximum likelihood estimate may not exist. Warning: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.

Minitab:

slide-6
SLIDE 6

Comparison between PLINK1.07 and PLINK1.9

PLATO OUTPUT:

Var1_ID Var1_Pos Var1_MAF Num_Missing Num_Cases Converged Var1_Pval Var1_beta Var1_SE LRT_Pval rs11774033 8:137585559 T:0.981774 1 308 0 0.0469422 -0.570091 0.286 1

PLINK 1.07:

CHR SNP BP A1 TEST NMISS OR STAT P 8 rs11774033 137585559 C ADD 14457 NA NA NA

PLINK 1.9:

CHR SNP BP A1 TEST NMISS OR STAT P 8 rs11774033 137585559 C ADD 14457 1.799 2.046 0.04079

slide-7
SLIDE 7

Why is this happening?

AAA - did not converge

slide-8
SLIDE 8

Why is this happening?

Resistant Hypertension - did not converge

slide-9
SLIDE 9

Why is this happening?

  • Due to the nature of the imputation and the

potential for stratification, we need to adjust models for site, platform, sex, and PCs

  • Having only cases or controls in some of these

strata leads to non-convergence

  • If you evaluate the clinical model alone,

without SNP data, you will identify this early

  • Such investigations are not typically done in

PheWAS

slide-10
SLIDE 10

Corrections

  • Two options to deal with this issue

1) Drop any rows from the table where you have zeros

  • Caveat: may become small sample size
  • Caveat: works best when your case or control group with

data present is large –no convergence issue when # cases = 10 –yes convergence issue when # controls = 100

2) Use firth regression

  • Caveat: not available in PLINK
  • Caveat: may run slower in R than standard regression;

(however, not slower in PLATO)

slide-11
SLIDE 11

Hypothyroidism- Converged

slide-12
SLIDE 12

Firth regression comparison between PLATO and SAS

ICD-9 code: 272.1 SNP: rs328 CHR:BP: 8:19819724 Allele:MAF: G:0.0972208 Num_Missing: 772 Cases: 394 Converged: 1 Raw_LRT_Pval: 2.5945569177565631e-06 SNP_Pval: 2.22132e-05 SNP_Beta: -0.651743 SNP_SE: 0.153662 LRT_Pval: 2.59456e-06 PLATO Output SAS Output

slide-13
SLIDE 13

Conclusion

  • Be cautious when combining data from multiple sites where

case and control contributions are biased

  • Look carefully at regression results to confirm convergence
slide-14
SLIDE 14

Thank you for finding, evaluating, solving this issue…..

Yuki Bradford Molly Hall Sarah Pendergrass Anurag Verma Shefali Verma John Wallace