Biology-Driven Clustering of Microarray Data Applications to the - - PowerPoint PPT Presentation

biology driven clustering of microarray data
SMART_READER_LITE
LIVE PREVIEW

Biology-Driven Clustering of Microarray Data Applications to the - - PowerPoint PPT Presentation

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee Introduction Microarray data is more than a large, unstructured matrix.


slide-1
SLIDE 1

Biology-Driven Clustering of Microarray Data

Applications to the NCI60 Data Set

K.R. Coombes, K.A. Baggerly, D.N. Stivers,

  • J. Wang, D. Gold, H.G. Sung, and S.J. Lee
slide-2
SLIDE 2

Introduction

  • Microarray data is more than a large,

unstructured matrix.

– We already know many genes important for studying cancer through their involvement in specific biological processes – We also know that reproducible chromosomal abnormalities play an important role in cancer

  • Need analytical methods that use

biological information early

slide-3
SLIDE 3

Methods

  • First, updated the annotations of the

genes on the microarray

  • Performed separate analyses

– using genes on individual chromosomes – using genes involved in different biological processes

  • Developed ways to assess how well

each set of genes classified samples

slide-4
SLIDE 4

Quality of Annotations

  • Problem:

– I.M.A.G.E. clone IDs and GenBank accession numbers are archival – UniGene clusters, gene names, descriptions, functions, etc., are changeable

  • Solution:

– Download latest UniGene (build 137) and LocusLink to update annotations

slide-5
SLIDE 5

How many genes on the array have good annotations?

Number

  • f Spots

Current UniGene Status

294 None (control spots) 128 Only 3’ – unknown to UniGene 1379 Only 3’ – known to UniGene 1 Only 5’ – unknown 6 Only 5’ – known 399 Both – unknown 763 Both – 3’ known, 5’ unknown 291 Both – 3’ unknown, 5’ known 646 Both known, but disagree 6093 Both known, and agree

Only trust the 7478 spots where the UniGene clusters match.

slide-6
SLIDE 6

Where are the genes located?

Chromosome (Observed - Expected) / SD 5 10 15 20

  • 6
  • 4
  • 2

2 4 6 X Y

chi^2 = 148.8 p < 10^(-10)

slide-7
SLIDE 7

How do we determine the functions of genes?

  • UniGene -> LocusLink -> GeneOntology
  • GeneOntology is a structured,

hierarchical vocabulary to describe gene functions in three broad areas:

– biological process (why) – molecular function (what) – cellular component (where)

slide-8
SLIDE 8

What kinds of genes are on the microarray?

Function

  • Ann. Spots

Function

  • Ann. Spots

Oncogenesis 140 180 Cell shape and size 78 101 Apoptosis 128 138 Protein traffic 157 188 Physiological proc. 180 210 Transport 146 136

  • Perc. of ext. stimuli

238 150 Cell proliferation 197 249 Ectoderm devel. 129 152 Stress response 599 372 Mesoderm devel. 92 102 Radiation response 147 136 Cell adhesion 111 140 Cell cycle 494 283 Cell-cell signaling 137 166 Nucleic acid met. 695 595 Signal transduction 222 228 Protein metabolism 471 567 Intracell sig cascade 110 110 Lipid metabolism 146 156 Cell motility 120 153 Carbohydrate met. 103 97 Cell organization 98 118 Energy pathways 88 98

slide-9
SLIDE 9

Data Preprocessing

  • Remove spots with poor annotations

and spots with median intensity below the 97th percentile of empty spots.

  • Normalize each array so median log

ratio between channels is one

  • Center each gene so mean log ratio

across experiments is zero

  • Use (1-correlation)/2 as distance metric
slide-10
SLIDE 10

How well does a set of genes distinguish types of cancer?

  • Three methods for assessment:

– Qualitative (PCA, MDS) – Quantitative (PCA + ANOVA) – Semi-quantitative (Grading Dendrograms)

slide-11
SLIDE 11

Multidimensional Scaling

coordinate 1 coordinate 2

  • 0.2
  • 0.1

0.0 0.1 0.2

  • 0.1

0.0 0.1 0.2 0.3 B B B B B B B B S S S S S S C C C C C C C L L L L L L M M M M M M M M N N N N N N N N N O O O OO O P P R R R R R R R R

slide-12
SLIDE 12

PCANOVA

slide-13
SLIDE 13

How good is a dendrogram?

  • A = cluster contains all

and only one kind of cancer

  • B = all, with extras
  • C = all except one
  • D = all except one, with

extras

  • E = all except two
  • F = all except two, with

extras

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

Cancer B C L M N O P R S Score A A D F D C B

slide-14
SLIDE 14

Can cancers be distinguished by genes on one chromosome?

ch B C L M N O P R S ch B C L M N O P R S

1 B A D F D B 13 D E 2 E C D D E D E 14 A A F 3 C E D E F 15 C B C F C 4 E E E E 16 5 A A D F E 17 A A D F E E 6 C A D E E D 18 E D 7 E A D E C E 19 D D 8 E C D 20 E C 9 B C C E E E 21 10 D E 22 A E E 11 E C C D X B A D E D 12 B C C E E E

slide-15
SLIDE 15

Heterogeneity of different types of cancer

  • Some cancers (colon, leukemia) are

fairly easy to distinguish from others

  • Some (breast, lung) are so

heterogeneous as to be almost impossible to distinguish

  • Some chromosomes (1, 2, 6, 7, 9, 12,

17) can distinguish many cancers.

  • Some (16, 21) are essentially random
slide-16
SLIDE 16 0.0 0.2 0.4 0.6 0.8

Chromosome 2

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

slide-17
SLIDE 17 0.0 0.2 0.4 0.6 0.8

Chromosome 16

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

slide-18
SLIDE 18

Can cancers be distinguished by genes of one function?

  • Table for functional categories looks a

lot like the table for chromosomes

  • Some biological process categories

(signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer

  • Others (apoptosis, energy pathways)

cannot

slide-19
SLIDE 19 0.0 0.2 0.4 0.6 0.8

cell surface receptor linked signal transduction

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

slide-20
SLIDE 20

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

0.0 0.2 0.4 0.6 0.8

protein metabolism and modification

slide-21
SLIDE 21 0.0 0.2 0.4 0.6 0.8

death (apoptosis)

0.0 0.2 0.4 0.6 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

slide-22
SLIDE 22 0.0 0.2 0.4 0.6

energy pathways

0.0 0.2 0.4 0.6 0.8 breast.bt549 breast.hs578t breast.mcf7 breast.mdamb231 breast.mdamb435 breast.mdan breast.t47d cns.sf295 cns.sf268 cns.sf539 cns.snb19 cns.u251 colon.ht29 colon.hct116 colon.hct15 colon.km12 colon.sw620 colon.hcc2998 colon.colo205 leukemia.k562 leukemia.hl60 leukemia.rpmi8226 leukemia.srcl7019 leukemia.molt4 leukemia.ccrfcem melanoma.loximvi melanoma.uacc577 melanoma.m14 melanoma.skmel2 melanoma.skmel5 melanoma.malme3m melanoma.skmel28 melanoma.uacc62 nsclc.h322 nsclc.hop62 nsclc.h23 nsclc.ekvx nsclc.h226 nsclc.a549 nsclc.h460 nsclc.hop92 nsclc.h522

  • varian.4
  • varian.3
  • varian.8
  • varian.5
  • varian.igrov1
  • varian.skov3

prostate.du145 prostate.pc3 cns.snb75 renal.caki1 renal.achn renal.tk10 renal.sn12c renal.rxf393 renal.uo31 renal.786o renal.a498 breast.unknown

slide-23
SLIDE 23

Conclusions (I)

  • Multiple views into the data provide

substantial insight into differences in cancer types and gene sets.

  • Cancer types differ greatly in their

degree of heterogeneity, ranging from homogeneous (colon, leukemia) through moderately heterogeneous (renal, melanoma) to extremely heterogeneous (breast and lung).

slide-24
SLIDE 24

Conclusions (II)

  • Homogeneous cancers exhibit strong

identifying signals across most views of the data.

  • There are large difference in the ability
  • f genes of different chromosomes or

involved in different biological processes to distinguish cancer types.

slide-25
SLIDE 25

Supplementary Material

Complete results of each analysis by chromosome and by function are available on our web site: http://www.mdanderson.org /depts/cancergenomics/camda.html