HIGH-PERFORMANCE GENOME STUDIES
Lucas Beyer Diego Fabregat-Traver and Prof. Paolo Bientinesi
RWTH Aachen University 19 June 2012, SIAM Conference on Applied Linear Algebra, Valencia, Spain
Thanks to the AICES HPAC group and DFG grant GSC111
HIGH-PERFORMANCE GENOME STUDIES Lucas Beyer Diego Fabregat-Traver - - PowerPoint PPT Presentation
HIGH-PERFORMANCE GENOME STUDIES Lucas Beyer Diego Fabregat-Traver and Prof. Paolo Bientinesi RWTH Aachen University 19 June 2012, SIAM Conference on Applied Linear Algebra, Valencia, Spain Thanks to the AICES HPAC group and DFG grant GSC111
RWTH Aachen University 19 June 2012, SIAM Conference on Applied Linear Algebra, Valencia, Spain
Thanks to the AICES HPAC group and DFG grant GSC111
2
3
4
y ∈ ℝn
Xi ∈ ℝn×p genome measurements/covariates M ∈ ℝn×n
ri ∈ ℝp relations between phenotype and genome variations n n p
5
# DNA fragments (nucleotides) m ~ 48﹣250 000 000 # samples n ~ 10 000 # covariates p = 20 y ∈ ℝn 80 MB M ∈ ℝn×n 800 MB r ∈ ℝp×m 7-40 GB X ∈ ℝn×p×m 72 TB﹣373 PB
6
7
ri ← (XT
i M −1Xi)−1XT i M −1y
8
ri ← (XT
i M −1Xi)−1XT i M −1y
LLT := M
9
ri ← (XT
i M −1Xi)−1XT i M −1y
ri ← (XT
i L−T L−1Xi)−1XT i L−T L−1y
LLT := M
10
ri ← (XT
i M −1Xi)−1XT i M −1y
ri ← (XT
i L−T L−1Xi)−1XT i L−T L−1y
LLT := M ri ← ((L−1Xi)T L−1Xi)−1L−1XiL−1y
11
ri ← (XT
i M −1Xi)−1XT i M −1y
ri ← (XT
i L−T L−1Xi)−1XT i L−T L−1y
LLT := M
ri ← ((L−1Xi)T L−1Xi)−1L−1XiL−1y ˆ Xi := L−1Xi ri ← ( ˆ XT
i ˆ
Xi)−1 ˆ XiL−1y
12
13
14
100s 1.000s 10.000s 100.000s 1.000.000s 10.000.000s 1m 10m 36m m (nucleotide count) CLAK-Chol FLMM GWFGLS EMMAX Minutes Hours Days Months Years
15
16
17
18
b-1 b
trsm b-1
b-2 b-1 b-3 Results r Data X
19
b-2 b-1 b-3 Results r Data X
b-1 b
trsm
b-1 b
trsm
20
b-2 b-1 b-3 Results r Data X
b+1 b
trsm
21
Computation b-2 b-1 b-3 Results r Data X
b+1
b+1 b
trsm
22
b-2 b-1 b-3 Results r Data X
b
trsm
b+1
23
b-2 b-1 b-3 Results r Data X
b-1 b+1 b
trsm
24
b+1 b-1
Computation b-2 b-1 b-3 Results r Data X
CPU ⇄ GPU transfer HDD ⇄ CPU transfer GPU computation CPU computation Data dependencies
25
GPU: 2x nVidia Quadro 6000 (Fermi, 515 GFlops each, 6GB memory) = 10.000$ CPU: 2x Intel Xeon X5650 (6cores, 128 GFlops, 24GB memory) = 2000$
CPU ⇄ GPU transfer HDD ⇄ CPU transfer GPU computation CPU computation
26
Blas: Intel MKL 10.2 Compiler: icc 12.1
27
25 50 75 100 1k 10k 20k 30k 40k 50k 60k 70k 80k 90k
11,6s 24,9s 32,9s 43,1s 52,4s 65,6s 74,6s 84,8s 96,7s 4,3s 6,3s 8,3s 10,3s 12,3s 14,3s 16,3s 18,3s
Time [s] m (nucleotide count) Hybrid CPU+2GPU algorithm Original CPU-only algorithm
28
⟵in-core
25 50 75 100 1k 10k 20k 30k 40k 50k 60k 70k 80k 90k
11,6s 24,9s 32,9s 43,1s 52,4s 65,6s 74,6s 84,8s 96,7s 4,3s 6,3s 8,3s 10,3s 12,3s 14,3s 16,3s 18,3s
Time [s] m (nucleotide count)
29
⟵in-core
Hybrid CPU+2GPU algorithm Original CPU-only algorithm
Throughput Computing on CPU and GPU», 2010)
30
25 50 75 100 1k 10k 20k 30k 40k 50k 60k 70k 80k 90k
11,6s 24,9s 32,9s 43,1s 52,4s 65,6s 74,6s 84,8s 96,7s 4,3s 6,3s 8,3s 10,3s 12,3s 14,3s 16,3s 18,3s
Time [s] m (nucleotide count) Hybrid CPU+2GPU algorithm Original CPU-only algorithm
31
⟵in-core
32