Classification Classification TNM classification Survival time - - PowerPoint PPT Presentation

classification classification
SMART_READER_LITE
LIVE PREVIEW

Classification Classification TNM classification Survival time - - PowerPoint PPT Presentation

2003 I. Jurisica November 13, 2003 Classification Classification TNM classification Survival time Survival time Tumour size, location Lymph Node involvement Metastasis 1A -> T1, N0, M0 Integrated Computational Analysis Integrated


slide-1
SLIDE 1

Integrated Computational Analysis Integrated Computational Analysis

  • f HTP Biology Data
  • f HTP Biology Data

Lung Cancer Lung Cancer

Classification Classification

TNM classification

Tumour size, location Lymph Node involvement Metastasis

1A -> T1, N0, M0 Overall 5-year survival rate ~15%

Stage I Stage I Stage II Stage II Stage III Stage III Survival time Survival time

Stage I Stage I Stage II Stage II Stage III Stage III Survival time Survival time

Lung Cancer Histology Lung Cancer Histology

adeno squamous small cell carcinoid normal large cell

Prognostic Factors in NsCLC Prognostic Factors in NsCLC

(Brundage MD, et al, Chest 122: 1037-57, 2002)

MEDLINE database search - Jan. 1990 to July 2001

887 articles that met MeSH terms prognosis and carcinoma, nonsmall cell lung 169 prognostic factors relating either to the tumor or host factors

Use large-scale screen to speedup the search phase 2003 I. Jurisica November 13, 2003 CAMDA'03 1-4

slide-2
SLIDE 2

Lung Cancer MA Data Sets Lung Cancer MA Data Sets

Bhattacharjee A, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS 98: 13790-95, 2001. Garber ME, et al. Diversity of gene expression in adenocarcinoma of lung. PNAS 98: 13784-89, 2001. Wigle DA, et al. Molecular profiling of non-small cell lung cancer and correlation with disease-free

  • survival. Cancer Res 62: 3005-08, 2002.

Beer DG, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med 8: 816-24, 2002. Miura K, et al. Laser capture microdissection and microarray expression analysis of lung adenocarcinoma reveals tobacco-smoking- and prognosis-related molecular profiles. Cancer Res 62: 3244-50, 2002. Sugita M, et al. Combined use of oligonucleotide and tissue microarrays identified cancer/testis antigens as biomarkers in lung carcinoma. Cancer Res 62: 3971-79, 2002. Heighway J, et al. Expression profiling of primary non-small cell lung cancer for target identification. Oncogene 21: 7749-63, 2002. Wikman H, et al. Identification of differentially expressed genes in pulmonary adenocarcinoma by using cDNA array. Oncogene 21: 5804-13, 2002. Virtanen C, et al. Integrated classification of lung tumors and cell lines by expression profiling. PNAS 99: 12357-62, 2002. Pedersen N, et al. Transcriptional gene expression profiling of small cell lung cancer cells. Cancer Res 63: 1943-53, 2003. Nakamura H, et al. cDNA microarray analysis of gene expression in pathologic stage IA nonsmall cell lung carcinomas. Cancer 97: 2798-805, 2003. Yamagata N et al. A Training-Testing Approach to the Molecular Classification of Resected Non-Small Cell Lung Cancer, Clinical Cancer Research 9:4695–4704, 2003.

Overlapping Markers Overlapping Markers

Histology overlaps - higher

well defined, gene set biased before the analysis

Outcome overlaps - low

different array platforms different pools of samples mix of histologies, stage, ...

normal vs cancers cancers vs cell lines cancers vs cancers

different analysis biases wide screen first - but then validating known

Goals Goals

CAMDA03 data sets overlap 2449 5 data sets overlap 1639 any two sets 5885

  • ut of 338 Stanford genes

55 in 5 sets; 75 in CAMDA

Pool of Genes

Set of Markers Set of Markers

Discover something new and useful

Pool of Samples

Lung Cancer Data Sets Lung Cancer Data Sets

ADC SCC LC Tumor Normal Cell line Array Bhattacharjee 127 21 26 17 U95 12.6K Garber 41 16 5 5 5 SA 24K Wigle 19 14 16 SA 19K Beer 86 10 UGFL 7K Miura 19 SA 18.4K Sugita 4 U95 12.6K Heighway 39 33 SA 47K Wikman 14 4 HCG 1.2K Virtanen 44 5 41 SA 7.7K Pedersen 18 21 U95 12.6K Nakamura 10 10 425 Yamagata 9 11 8 3 6 SA 5.2K

Data Biases Results

2003 I. Jurisica November 13, 2003 CAMDA'03 5-8

slide-3
SLIDE 3

Years to first failure Disease-free Survival 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 Log rank p-value = 0.23 1 , n= 47 2 , n= 84 Years to first failure Disease-free Survival 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 Log rank p-value = 0.23 1 , n= 47 2 , n= 84

OCI: BTSVQ Clustering of Cases based on PCR Values OCI: BTSVQ Clustering of Cases based on PCR Values Genes = 11 Cases =39 Genes = 11 Cases = 131

Microarrays

bad

  • k

good

No single gene may cut it, but a panel of them can be predictive

Stanford Stanford

Early filtering 23,100 (17,108) genes -> 918 (835) No overall trend with survival Adeno group can be further partitioned

Microarrays

Harvard/Michigan Harvard/Michigan

Harvard

U95 -> 3,312 genes -> 675 genes some clusters are highly differential, and show survival trend

Michigan

UGFL -> 4,966 genes some clusters have significant survival trend

Microarrays

Definition Definition

Paralysis of analysis

too many hypothesis to follow too many exceptions statistical significance may not imply biological relevance

Intellectual prosthesis

managing knowledge differential analysis reasoning biology understanding hypothesis generation

Remembering Retrieving Reasoning

Data Analysis Interpretation

Design

Microarrays

2003 I. Jurisica November 13, 2003 CAMDA'03 9-12

slide-4
SLIDE 4

From hypothesis-driven research to hypothesis-generation From biology understanding to individualized medicine

  • r from information-based, individualized medicine to

understanding

Aims Aims

Move from static to "dynamic" analysis

Microarrays

Microarray Data Analysis & Visualization Microarray Data Analysis & Visualization

SOM & k-means Gene Selection Cluster

Microarrays

Analysis: MA Analysis: MA

Microarrays

2-way approach

unbiased clustering of both samples and genes Top down, iterative k-means clustering SOM Vector quantization Vector projection

Stat

significance

Novel Approach to MA Clustering & Visualization Novel Approach to MA Clustering & Visualization

Microarrays

2003 I. Jurisica November 13, 2003 CAMDA'03 13-16

slide-5
SLIDE 5

Lung cancer

  • stage I, II, III

Same histology Same stage Different Pattern Different histology Different stage Same Pattern

Microarrays

Recurrence/Survival Recurrence/Survival

6/7 category 1 7/11 category 0

Microarrays

Stanford 22696 genes Harvard 12K genes

~90% of the samples show a strong pattern

Michigan 12K genes

alive dead

Histology Histology

coid-coid

Microarrays

Harvard OCI

Conclusion on MA Conclusion on MA

Microarrays

Large number of samples

more homogenous study of subtypes with detailed

  • utcome information

stronger validation

Validation in the clinical context

different method different samples different method & different samples

Knowledge discovery in the biological context

biology of tumorigenesis

Use

2003 I. Jurisica November 13, 2003 CAMDA'03 17-20

slide-6
SLIDE 6

Integrated Analysis Integrated Analysis

Microarrays Microarrays Protein Protein Interactions Interactions Prediction Interpretation Interpretation

Protein-Protein Interactions

  • noise
  • unknowns, uncharacterized
  • system context

PPI Data Analysis PPI Data Analysis Protein-Protein Interactions

degree, hubs, articulation points, siblings pathways, complexes function Structure - function relationship

PPI Data Analysis PPI Data Analysis Analysis: PPI Analysis: PPI Protein-Protein Interactions

Translation Transcription

Lethals Hubs&Art.pnts Top 3% Degree 1

Uncharacterized G.maintenance Cellular org. Metabolism Protein fate

2x 4x 12x 2x

  • 80% of art. points in DDR - lethal
  • 92% of art. points in MAPK - lethal

2003 I. Jurisica November 13, 2003 CAMDA'03 21-24

slide-7
SLIDE 7

Protein-Protein Interactions Differential Analysis Differential Analysis

Unigene1 Unigene2 Tumor (Pearson) non-metastatic cells 2, protein (NM23B) expressed in RAB22A, member RAS oncogene family 0.180 actin related protein 2/3 complex, subunit 4 (20 kD) actin related protein 2/3 complex, subunit 1A (41 kD) 0.003 ARP3 actin-related protein 3 homolog (yeast) actin related protein 2/3 complex, subunit 4 (20 kD) 0.097 protein phosphatase 1, regulatory subunit 7 protein phosphatase 1, catalytic subunit, gamma isoform

  • 0.095

GDP dissociation inhibitor 2 RAB5C, member RAS oncogene family 0.138 RAB5A, member RAS oncogene family GDP dissociation inhibitor 1 0.153 ARP2 actin-related protein 2 homolog (yeast) mitogen-activated protein kinase 7 0.112

Garber & PPI Garber & PPI Protein-Protein Interactions

Tumor Normal + +

  • +

+

  • Correlated Only In…

Tumor Subtype Total PPI Normal % Tumor % ADC 133 117 88.0 16 12.0 SQCC 675 613 90.8 62 9.2 SCC 575 479 83.3 96 16.7 LCLC 671 560 83.5 111 16.5

Conclusions Conclusions

Unbiased analysis is powerful (but challenging)

"too much bias (or BS) will kill you"

Hypothesis generation, not just confirmation

there are too many possibilities and even more exceptions

AI first, then stat

statistical significance does not guarantee biological significance

  • ne method/one database/set will not answer all the questions

I

3: Integrated - Iterative - Interactive

systems approach will enable us to see the forest through the trees

Evolution, theory formation, dynamicity

what we know is not necessarily correct

Models, reasoning & simulations

using computing wisely can save lot of wet lab frustration

  • Comp. Biology - Drinking from a fire hydrant

IRIS, NSERC, CITO, Precarn, NIH, IBM; NCIC,CIHR, NASA, NIH, The Harker Grant, The J.R.Oishei Foundation, M.L. Wendt Foundation

http://www.cs.utoronto.ca/~juris

  • M. Sultan, C. Cumbaa, X.Zhang, P

. Rogers, R. Lu,

  • D. Otasek, M. Popov, A. Patel
  • D. Wigle, M. Tsao, S. K.-Reid, E. Fish, P

. Shaw, J. Dick, L. Penn, S. Done, J. Sweet, F. Liu, M. Minden, {MAC}

  • K. Brown, N. Przulj
  • M. Kotlyar, E. Xia
  • N. Arshadi, M. Maziarz
  • T. Brown, A. Jurisicova
  • M. Tyers, J. Wrana
  • G. DeTitta, J. Luft
  • J. Glasgow

Acknowledgments Acknowledgments

2003 I. Jurisica November 13, 2003 CAMDA'03 25-28