SLIDE 1 Between Analysis of Microarray Data
Aedín Culhane Des Higgins
Biochemistry Dept. - University College Cork, Ireland
Guy Perrière
Laboratoire BBE - Université Claude Bernard Lyon 1
SLIDE 2 Specify Groups in Advance?
- Neighbourhood analysis (Golub et al., 1999)
- Neural network (Khan et al., 2001)
- Support vector machine (Brown et al., 2000)
- Discriminant analysis
–Linear combinations of genes which
- maximise between group variance
- minimise within group variance
However must have J (samples) >> I (genes)
SLIDE 3 Between-Group Eigenanalysis
- Dolédec, S. & Chessel, D. (1987)
Rhythmes saisonniers et composantes stationelles en milieu aquatique I- Description d’un plan d’observations complet par projection de variables. Acta Oecologica, Oecologica
- Generalis. 8(3) 403-426.
- Discriminate when Samples < Variables
- Combine with PCA, CA etc.
SLIDE 4
Between Group Eigenanalysis
GSVD
I genes J samples
SLIDE 5
ADE-4
Thioulouse J., Chessel D., Dolédec S., & Olivier J.M. (1997) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing, 7, 1, 75-83. http://pbil.univ-lyon1.fr/ADE-4/
SLIDE 6 Golub Leukaemia Data
- Molecular classification of cancer: class discovery and
class prediction by gene expression monitoring. Golub, T.R. … E.S. Lander Science, 286: 531-537 (1999) http://www.genome.wi.mit.edu/MPR
- 47 Acute Lymphoblastic Leukaemia (ALL)
– 38 B-cell – 9 T-cell
- 25 Acute Myeloid Leukaemia (AML)
- Affymetrix oligonucleotide array (6817 genes)
- 38 training samples; 34 test samples
SLIDE 7
BGA of Golub Data
Define groups Ordinate GROUP centroids (using PCA or COA) Add individual samples as supplemental data points
SLIDE 8
BGA of Golub Data
Project and classify new data points Test model – Jackknifing, Blind test data Determine threshold of discriminating axes
T
SLIDE 9
Identification of genes
Genes and samples can be plotted on “biplot” Simultaneous visual analysis of the entire set of genes
SLIDE 10 Small round blue cell tumours
- f childhood
- Classification and diagnostic prediction of cancers using
gene expression profiling and artificial neural networks.
Javed Khan, Jun S. Wei, … and Paul S. Meltzer Nature Medicine, Volume 7, Number 6, June 2001
- cDNA microarray 6567 genes, 4 classes of cancer
- EWS Ewing family of tumours
- RMS Rhabdomyosarcoma
- NB
Neuroblastoma
Burkitt lymphoma
- Training and test samples
SLIDE 11
BGA of Khan data Axis 1, 2
SLIDE 12 Accuracy
- 19/20 EWS, BL, NB and RMS test samples
were correctly predicted
- One NB test sample, a biopsy sample Test 23
was not classified
- 2 normal skeletal muscle samples clustered
closest to the RMS cluster
- 3 unrelated cancer cell lines clustered in the
centre of the figures
SLIDE 13
BGA of Khan Data Biplot of genes and arrays Axis 1,2
SLIDE 14 Discriminating Genes
- Similar to those reported by Khan
- Rank of top 10 EWS identical to Khan
- 9 of top 12 RMS discriminating genes matched Khan’s
top 10 RMS
- 4 top 5 NB genes matched Khan’s top 5
- Khan only reported 17 BL genes, 14 detected by BGA.
- RMS discriminating genes
- Image clone at locus 8p22-23
- Image clone MEST – imprinted gene on chr 7q12
SLIDE 15 Conclusions
- Ordination of grouped data
- Number of variables >> number of samples.
- Fast and simple but accurate class
assignment
- Detailed and simultaneous visualisation of
variables
- Discrimination of any number of subgroups
can be easily explored
SLIDE 16 Guy Perrière
Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS no 5558 Université Claude Bernard – Lyon 1 France.
Department of Biochemistry, University College Cork, Cork, Ireland.