affyPara: Parallelized preprocessing algorithms for high-density oligonucleotide array data
Markus Schmidberger Ulrich Mansmann
IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
affyPara: Parallelized preprocessing algorithms for high-density - - PowerPoint PPT Presentation
affyPara: Parallelized preprocessing algorithms for high-density oligonucleotide array data Markus Schmidberger Ulrich Mansmann UseR 2008 IBE August 12-14, Technische Universitt Dortmund, http://ibe.web.med.uni-muenchen.de Germany
Markus Schmidberger Ulrich Mansmann
IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
background correction normalization summarization raw data – CEL files expression matrix
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
background correction normalization summarization raw data – CEL files ExpressionSet expression matrix AffyBatch
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Probe 1 Probe 2 Probe 3 Probe N Chip 1 Chip 2 Chip 3 Chip M
– Partition by chips – Partition by probes – Partition of CEL file name list
– A lot of data to transfer – Create AffyBatches at nodes – Complete preprocessing method: preproPara()
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Partition START normalizeAffyBatchQuantilesPara STOP normalizeAffyBatchQuantilesPara Rebuild AffyBatch Initialize AffyBatch Sort Columns Calculate row means Calculate full row means Normalize Initialize AffyBatch Sort Columns Calculate row means Initialize AffyBatch Sort Columns Calculate row means Normalize Normalize
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
R> library(affy) R> AB <- ReadAffy() R> AB_bgc <- bg.correct(AB, method="rma") R> AB_norm <- normalize.AffyBatch.quantiles(AB_bgc, type="pmonly") R> library(affyPara) R> c1 <- makeCluster(5, type=„MPI") # type=„nws“ R> AB <- ReadAffy() R> AB_bgc <- bgCorrectPara(c1, AB, method="rma") R> AB_norm <- normalizeAffyBatchQuantilesPara(c1, AB_bgc, type=„pmonly“, verbose=TRUE) R> stopCluster(c1)
Build hard disk file structure (/rawdata, /annotationData) R> library(aroma.affymetrix) R> cdf <- AffymetrixCdfFile$fromChipType(„HG-U133A") R> cs <- AffymetrixCelSet$fromName(name, tags, chipType=cdf) R> bc <- RmaBackgroundCorrection(cs) R> csBC <- process(bc) R> AB_bgc <- extractAffyBatch(csBC)
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Sp = T_1 / T_p Sp ~ 1 / [ s +p/N ]
5 10 15 20 25 5 10 15 20
Quantil Normalization
Number of processors Speedup 200 arrays 100 arrays 50 arrays 5 10 15 20 25 2 4 6 8 10
Constant Normalization
Number of processors Speedup 5 10 15 20 25 5 10 15 20
Invariantset Normalization
Number of processors Speedup 5 10 15 20 25 2 4 6 8 10
Loess Normalization
Number of processors Speedup 5 10 15 20 25 2 4 6 8 10
BGCorrection (RMA)
Number of processors Speedup
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Histogramm
Complete cyclic loess Partial cyclic loess
Permutations of Arrays 2-3 times
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Second / Referenz Data Set:
(Expression Project For Oncology)
project
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Cancer 1 / Group 1 modelling and estimating gene interaction Normalization all together Comparing networks Cancer 2 / Group 2 Cancer 3 / Group 3
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
– Multiprocessors available for everyone – Difficult to use in packages – Good for R base
– Cluster not available for everyone
– Cluster Management necessary
– Good for R packages
– New and promising technology – Probably available for everyone (graphic board - cheap) – NVIDIA CUDA <-> ?? RCUDA ??
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
density oligonucleotide array data; 22th International Parallel and Distributed Processing Symposium (IPDPS 2008), Proceedings, ISBN: 978-1-4244-1693-6, 14- 18 April 2008, Miami, FL, USA. IEEE 2008
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
How many arrays can I RMA process? (Ben Bolstad) http://bmbolstad.com/misc/Comp uteRMAFAQ/size.html Chip: HG-U133A
45.000 Probes ~ 5*1e5 rows
Markus Schmidberger, IBE http://ibe.web.med.uni-muenchen.de UseR 2008 August 12-14, Technische Universität Dortmund, Germany
7847 4017 3830 HG-U133B A-AFFY-34 GPL97 6137 1429 4708 HG-U95A A-AFFY-9 GPL91 7868 45 7823 Mapping 10K 2.0 Array Xba 142 A-AFFY-65 GPL2641 21211 6888 14323 HG-U133 Plus 2.0 A-AFFY-44 GPL570 33651 17161 16490 HG-U133A A-AFFY-33 GPL69 SUMME # AE # GEO Beschreibung AE ID GEO ID 3.6.2008