1
SIB course 4-8 Feb 2008
Statistical analysis applied to genome and proteome analyses
Sven Bergmann
Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann
Part1:
Analysis tools for large datasets
- Standard tools
k-means, PCA, SVD
- Modular analysis tools
CTWC, ISA, PPA
Why to study a large heterogeneous set of expression data?
Large: Better signals from noisy data! Heterogeneous: Global view at transcription program!
Supervised vs. unsupervised approaches
Large genome-wide data may contain answers to questions we do not ask! Need for both hypothesis- driven and exploratory analyses!
Motivations
How to get large-scale expression data? Pool genome-wide expression measurements from many experiments!
stress
2 4 6 8 1000 2000 3000 4000 5000 6000
cell- cycle
1 2 3 4 5 1000 2000 3000 4000 5000 6000
200 400 600 800 1000 1000 2000 3000 4000 5000 6000
large-scale expression data
genes diverse conditions sets of specific conditions
How to make sense of millions of numbers?
New Analysis and Visualization Tools are needed!
Hundreds of samples Thousands
- f genes
K-means Clustering
“guess” k=3 (# of clusters)
http://en.wikipedia.org/wiki/K-means_algorithm