CAMDA 03: Weakest Link Models for Detecting Small Groups of Genes - - PowerPoint PPT Presentation

camda 03 weakest link models for detecting small groups
SMART_READER_LITE
LIVE PREVIEW

CAMDA 03: Weakest Link Models for Detecting Small Groups of Genes - - PowerPoint PPT Presentation

CAMDA 03: Weakest Link Models for Detecting Small Groups of Genes to Predict Lung Cancer Survival Presenter: Thomas J. Richards, Ph.D. November 13, 2003 Affiliation: Dorothy P. & Richard P. Simmons Center for Interstitial Lung


slide-1
SLIDE 1

CAMDA ’03:

Presenter: Thomas J. Richards, Ph.D.

November 13, 2003

Weakest Link Models for Detecting Small Groups of Genes to Predict Lung Cancer Survival

slide-2
SLIDE 2

Dorothy P. & Richard P. Simmons Center for Interstitial Lung Diseases Affiliation: Division of Pulmonary, Allergy, and Critical Care Medicine University of Pittsburgh

in the

slide-3
SLIDE 3

In Collaboration with: Roger S. Day, Sc.D.

University of Pittsburgh

Department of Biostatistics

and

University of Pittsburgh Cancer Institute

slide-4
SLIDE 4

Weakest Link Models

  • Make sense in biology;
  • Can be applied to gene expression data;
  • May identify novel gene interactions.
slide-5
SLIDE 5

Response: Plant Growth

5 Necessary factors:

  • Water;
  • Sunlight;
  • P;
  • K;
  • Ca;

How do factors combine to effect plant growth?

slide-6
SLIDE 6

They don’t work together like this…

slide-7
SLIDE 7

They don’t work together like this…

slide-8
SLIDE 8

They may work together like this…

slide-9
SLIDE 9

They may work together like this…

slide-10
SLIDE 10

Water Sun Water Sun Traditional Models Weakest link model excess water excess sun

Contour plots of E(Y|X)

“Curve of Optimal Use (COU)” Reality?

slide-11
SLIDE 11

They may work together like this…

slide-12
SLIDE 12

Or like this…

slide-13
SLIDE 13

Or like this…

slide-14
SLIDE 14

Or like this…

slide-15
SLIDE 15
slide-16
SLIDE 16

Source: H. Frederik Nijhout, American Scientist (2003)

slide-17
SLIDE 17

The Weakest Link Idea

E(Yi) = minj{ϕj(xij; θj): j = 1, …, m}

  • Usually, ϕj= ϕ for all j;
  • Weakest link gene minimizes ϕ;
  • Each patient has his/her own weakest link;
slide-18
SLIDE 18

WL Model for Binary Response Data:

ϕj(xij; θj) = logit –1 (αj+ βjxij) E(Yi) = minj{logit –1 (αj+ βjxij) : j = 1, …, m} and θj = (αj, βj).

Parametric Weakest Link (PWL) Model

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Parametric Weakest Link Model For Survival Data

λ (t; xij) = λ0(t)exp[minj{ϕj(xij; θj)}] λ (t; xij) = λ0(t)exp[minj{βjxij}]

slide-22
SLIDE 22

Quantile-Matching Weakest Link (QWL) Model:

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Curve of Optimal Use:

,

  • ?

F

  • r

CDF;

1

  • 1

1 ? 1 2 1

? ? ? ?

f p 1 1 f p f p 1 1 f f p f F F p F F p

         

                                                                                                               

− −

+

= − − −∆ = = − − +∆ =

Normal Logistic

2

?

p.

         

slide-26
SLIDE 26
slide-27
SLIDE 27

Data Pre-processing: Simplify!

Simplify the process, minimize data handling:

  • Affy:
  • Run RMA, then generate ratios.
  • cDNA arrays: use ratios.
  • Focus on known genes only;
  • 2000 LocusLink IDs in all 4 data sets;
slide-28
SLIDE 28

Approach to Data Analysis

Gene Selection: Based on substantive hypotheses;

  • Use DAVID at NIAID to get gene classes:
  • Not optimal, but necessary in this case;
slide-29
SLIDE 29

Approach to Data Analysis

Groups of genes, from DAVID:

  • Cell Cycle (CELL, 24 genes);
  • Apoptosis (AP, 12 genes);
  • Extracellular Matrix (ECM, 18 genes);
  • Matrix Metalloproteinases (MMPs, 10 genes);
  • WNT Pathway (11 genes).
slide-30
SLIDE 30

Approach to Data Analysis

Form dyads of genes, for testing:

  • CELL.AP (288), CELL.ECM (432), …
  • AP.ECM (216), AP.MMP (120), …

Etc.

  • Pair up all of the above genes with 45 genes

from the Beer supplemental data.

slide-31
SLIDE 31

Approach to Data Analysis

Use profile likelihood to estimate a COU for each pair of genes; Use Bonferroni-by-4 on the p-values; For the direction, take the smallest of the four p-values.

slide-32
SLIDE 32

Selected Results

CELL.AP: 60 of 288 had adjusted p < 0.05. ECM.MMP: 37 of 180 had adjusted p < 0.05. ECM.BEER: 299 of 810 had adjusted p < 0.05. WNT.BEER: 152 of 495 had adjusted p < 0.05.

slide-33
SLIDE 33

Selected Results

CELL.AP, 60 significant pairs: 5 minp1p2; 17 maxp1p2; 13 maxp1q2; 25 minp1q2. ECM.MMP, 37 significant pairs: 2 minp1p2; 6 maxp1p2; 11 maxp1q2; 18 minp1q2. ECM.BEER, 299 significant pairs: 60 minp1p2; 65 maxp1p2; 100 maxp1q2; 74 minp1q2. WNT.BEER, 152 significant pairs: 32 minp1p2; 19 maxp1p2; 56 maxp1q2; 45 minp1q2.

slide-34
SLIDE 34

Selected Results

LocusLink ID = 4175, a Cell Cycle component, MCM6, minichromosome maintenance deficient 6 (S.cerevisae), involved in initiating replication. Biological interaction with 7 LocusLink IDs in the apoptosis class (5 in same direction): 2 minp1p2: TRAF1, TNFRSF1B; 3 maxp1p2: SFRS2IP, MCL1, TRADD; 1 maxp1q2: CRADD (good prognosis) 1 minp1q2: BCL2L2

slide-35
SLIDE 35

MCM6

  • MCM’s 2- 7 binds to DNA after mitosis and

enable DNA replication.

  • MCM2 is a biomarker of proliferating cells and

a marker for premalignant lung cells.

  • MCM6 is in a chromosomal region that is

amplified in lung cancer and its mRNA level is also increased (Kaminski, Dehan unpublished data)

slide-36
SLIDE 36

Selected Results II

Can we find unexpected interactions? Biological interactions between Beer & ECM? ECM genes show up in every cancer dataset. Fibronectin is a predictor of melanoma invasiveness.

slide-37
SLIDE 37
  • Is a known marker of bad prognosis
  • Interacts significantly with at least 4 ECM

genes

  • Vitronectin maxp1p2 (Good Prognosis !)
  • Collagen 1A2 maxp1q2
  • Collagen 9A2 minp1q1
  • Collagen 5A1 minp1q1

PAI-1 (Plasminogen Inhibitor 1)

slide-38
SLIDE 38

Does it make sense?

  • Elevated PAI-1 activities are associated with

coronary thrombosis and with a poor prognosis in many cancers

  • Vitronectin binding extends the lifetime of active

PAI-1, which controls hemostasis and has also been implicated in angiogenesis.

  • The PAI-1 effects on cell adhesion and motility

depend on vitronectin binding…

slide-39
SLIDE 39

Conclusions

Weakest Link Models:

  • Make sense in biology;
  • Can be applied to gene expression data;
  • May identify novel gene interactions.
slide-40
SLIDE 40

Next Steps

  • Validation on independent data set;
  • Extend from dyads to triads;
  • Use tryads to explore pathways;
  • Extend to arbitrary number of genes.
slide-41
SLIDE 41

Acknowledgements:

Naftali Kaminski, M.D. Director, Dorothy P. & Richard P. Simmons Center for Interstitial Lung Diseases Public Defenders’ Association

slide-42
SLIDE 42

Supplementary Slides

slide-43
SLIDE 43

18-Nov-03 Introduction: Motivation for Model 43

Potential Problems with Linear Models

  • Mechanistic model, not just predictive.
  • Several covariates impact a response.

– Example: immune response in Melanoma.

  • Each covariate is “necessary.”

– Necessary = “Necessary to impact response probability.”

  • Logistic Model is unrealistic:
slide-44
SLIDE 44

18-Nov-03 Introduction: Motivation for Model 44

– Increasing a covariate always has an effect. – One covariate can be traded off for another.

  • Example: Branch, Bryant, et al (1997): N-

acetyltransferase Metabolic Activity and Bladder Cancer.

– Goal: determine role of N-acetyltransferase slow acetylator phenotype in susceptibility to

  • ccupationally related aggressive bladder

cancer. – Problem: possible interaction without main effect.

slide-45
SLIDE 45

Interaction without Main Effects

  • For categorical data, not a new idea:

– “Synergism” in BFH (1975). – 2 x 2 x 2 contingency table. – BFH cite Worcester [1971] model, for thromboembolism data.

  • My adaptation of BFH…

– (To SWP3.0)

slide-46
SLIDE 46
  • Est. RR (Controlling age, sex,

alcohol, tobacco)

Occupational exposure Acetylator Phenotype Unexposed Exposed Fast 1.0 1.0 Slow 1.1 8.0 (1.9, 3.4) = 95% ci. p < 0.01

Occupational exposure Acetylator Phenotype Unexposed Exposed Fast 1.0 1.0 Slow 1.1 8.0 (1.9, 3.4) = 95% ci. p < 0.01

Is there “synergy”, or “synergism”, here?

slide-47
SLIDE 47

( ) ( ) ( ) ( )

  • 1

1 ? 2

  • 1

1 ? 2

  • 1

1 ? 2

  • 1

1 ? 2

1 2 i i i ?

p , f p ; or p , f p ; or p , 1-f p ; or p , 1-f p ;

, ; logit p a ß? , where ? , and f : 0,1 0,1

min max max min

i i

E Y X X π

                                                                               

= = + = →

( )

?

1

is defined by f p 1 1 ,where F is a symmetric distribution function. F F p

               

= − − −∆

slide-48
SLIDE 48

The Quantile-Matching Weakest Link (QWL) Model

In p1-p2 space, the unit square, define a new covariate, one of: ρ = min{p1, p2} (minp1p2)

ρ = max{p1, p2} (maxp1p2) ρ = max{p1, 1 - p2} (maxp1q2) ρ = min{p1, 1 - p2} (minp1q2)

slide-49
SLIDE 49

QWL Model E[Yi | X1, X2] = α + β ρi

For binary response data: For survival data:

λ(t; xi) = λ0(t)exp(β ρi)

Fitting this QWL Model: Done.

slide-50
SLIDE 50
  • 1

? 2 2 ? 1 i 1 2 ? 1

f p if p f p ? if p f p p

                                        

> = <

Type B

slide-51
SLIDE 51

1 2 ? 1 i

  • 1

? 2 2 ? 1

if p f 1-p ? 1-f p if p f 1-p p

                                        

> = <

Type C

slide-52
SLIDE 52
  • 1

? 2 2 ? 1 i 1 2 ? 1

1-f p if p f 1-p ? if p f 1-p p

                                        

> = <

Type D

slide-53
SLIDE 53
slide-54
SLIDE 54

1 2 ? 1 i

  • 1

? 2 2 ? 1

if p f p ? f p if p f p p

                                  

> = <

Type A

slide-55
SLIDE 55

if p f p 1 2 ? 1

  • 1

? min p , f p 1 ? 2 i

  • 1

f p if p f p ? 2 2 ? 1 p

                                                              

> = = <

  • 1

f p if p f p ? 2 2 ? 1

  • 1

? max p , f p 1 ? 2 i if p f p 1 2 ? 1 p

                                                              

> = = < if p f 1-p 1 2 ? 1

  • 1

? max p , 1-f p 1 ? 2 i

  • 1

1-f p if p f 1-p ? 2 2 ? 1 p

                                                              

> = = <

  • 1

1-f p if p f 1-p ? 2 2 ? 1

  • 1

? min p , 1-f p 1 ? 2 i if p f 1-p 1 2 ? 1 p

                                                              

> = = <

A B C D

Simple Expressions for Covariates

slide-56
SLIDE 56

5 10 15 200 400 600 800 1000 1200 % DR+CD8+ Lymphocytes # CD8+ Lymphocytes . 3 7 . 3 7 . 4 5 . 4 5 . 5 3 . 5 3 . 6 2 . 6 2 . 7 . 7 . 1 4 . 1 4 . 2 . 2 . 2 6 . 2 6 . 3 1 . 3 1 . 3 7 . 3 7 . 7 . 7 . 7 4 . 7 4 . 7 9 . 7 9 . 8 4 . 8 4 . 8 9 . 8 9 0.41 0.41 0.47 0.47 0.52 0.52 0.58 0.58 0.64 0.64 0.7 0.7 0.76 0.76 0.81 0.81 0.87 0.87

slide-57
SLIDE 57

2 4 6 8 % DR+CD8+ Lymphocytes 1-yr DFS 1 200 400 600 800 1000 # CD8+ Lymphocytes 1-yr DFS 1

slide-58
SLIDE 58
  • 4
  • 2

2 4

  • 4
  • 2

2 4 x1 x2 . 8 . 8 . 1 5 . 1 5 . 2 3 . 2 3 . 3 1 . 3 1 . 3 1 . 3 1 . 3 5 . 3 5 . 3 9 . 3 9 . 4 2 . 4 2 . 4 6 . 4 6 . 4 6 . 4 6 . 6 . 6 . 7 3 . 7 3 . 8 7 . 8 7 1 1 0.02 0.02 0.13 0.13 0.24 0.24 0.34 0.34 0.45 0.45 0.56 0.56 0.67 0.67 0.78 0.78 0.89 0.89

> plot(WL.qregf)

slide-59
SLIDE 59
  • 3
  • 2
  • 1

1 2 3 x1 y 1

  • 2

2 4 x2 y 1

> plot(ES,sm.h=c(0.5,0.5),xlab1="x1",xlab2="x2",ylab="y")

slide-60
SLIDE 60

Weaklink Software

Here, we generate a 1000-observation data set from a Weakest Link Model with binary response, and see how a logistic regression model fails to fit the data. First, generate the objects with 3 easy commands: > WL <- SimBinaryWL(Theta=c(1.0,2.0,1.0,3.0),n.obs=1000,x.vcv=diag(rep(2,2))) > WL.qregf <- qregf(vnames=c("x1","x2"),dframe="DD",binresp="y") > ES <- EpsSelect(WL.qregf,pcontour=0.48,epsilon=1)

  • Analyze existing data; or
  • Simulate WL data.
  • Model Signatures.

Next, plot the objects …

slide-61
SLIDE 61

Results from software

  • 1.04E+02

1.17820 5 7.39E- 05 maxp1q 2 2.96E- 04 S100 calcium binding protein P fibronectin 1

  • 1.09E+02

1.16499 1 4.75E- 04 maxp1q 2 1.90E- 03 S100 calcium binding protein P collagen, type XI, alpha 1

  • 1.14E+02

1.18475 5 2.54E- 04 maxp1q 2 1.02E- 03 S100 calcium binding protein P collagen, type IX, alpha 3

  • 1.17E+02

1.17345 4 1.66E- 05 maxp1q 2 6.62E- 05 S100 calcium binding protein P tenascin C (hexabrachion) beta stage MinPval Directio n Bonferro ni name2 name1