SLIDE 1 GPU computing and the tree of life
Michael P . Cummings Center for Bioinformatics and Computational Biology University of Maryland Institute of Advanced Computer Studies
GPU summit
27 October 2014
SLIDE 2
some domain science context
SLIDE 3
the great apes
SLIDE 4
great apes: phylogenetic relationships?
SLIDE 5
great apes: phylogenetic relationships?
SLIDE 6
great apes: phylogenetic relationships?
SLIDE 7
phylogenetic relationships of great apes
when subjected to phylogenetic analysis overwhelming evidence supports chimps and humans being each others most closest relatives
SLIDE 8
number of possible topologies
tips unrooted trees 3 1 4 3 5 15 6 105 7 945 8 10,395 9 135,135 10 2,027,025 11 34,459,425 12 654,729,075 13 13,749,310,575 14 316,234,143,225 15 7,905,853,580,625 20 213,643,476,699,771,875
SLIDE 9 phylogenetic analysis
the most accurate methods are model-based and involve likelihood calculations
- maximum likelihood estimation
- Bayesian analysis
Prob(H|D) = Prob(D|H) Prob(H) _____________ .__Prob(D) we can only directly calculate Prob(D|H)
A A C T . . . A A T G . . . A A T A . . .
(log likelihood)
A C T G . . .
SLIDE 10 likelihood calculation
x0 x2 x1 t2 t1 x
L(i)
0 (x0) =
X
x1
Prob(x1|x0, t1)L(i)
1 (x1)
! X
x2
Prob(x2|x0, t2)L(i)
2 (x2)
!
nonetheless, likelihood calculations are very computationally intensive - O(taxa x sites x rates x states²)
peeling algorithm (Felsenstein 1981) does post-order traversal with calculation of partial likelihoods at each node that depend
- nly on its immediate children
SLIDE 11 likelihood calculations: majority of computation
likelihood related calculations
nucleotide 94.69% amino acid 95.72% codon 81.24%
GARLI profiling; 11 taxa; 2178 characters
SLIDE 12 BEAGLE: broad-platform evolutionary analysis
general likelihood evaluator
an application programming interface (API) and high- performance computing library for statistical phylogenetics emphasis is evaluating phylogenetic likelihoods of biomolecular sequence evolution aim is to provide high performance evaluation 'services' to a wide range of phylogenetic software, both Bayesian samplers and maximum likelihood optimizers allows phylogenetic software using the library to make use of
- ptimized hardware such as GPUs
SLIDE 13 BEAGLE library design goals
multi-platform support (i.e., Linux, OS X, Windows) low level
C API
does not explicitly have concept of tree minimize transfer of data support multiple implementations (e.g., CPU, SSE, CUDA, OpenCL) uses dynamic plug-in system support both single and double precision
SLIDE 14
GPU implementation
CPU-side code only used to manage GPU
memory allocations and transfers, kernel launches allows client to use CPU in parallel to GPU
GPU interface abstraction layer CUDA and OpenCL implementations share same CPU-side code CUDA implementation uses the driver API
Parallel Thread Execution (PTX) kernels Java Native Interface (JIT) compilation templated kernels support arbitrary number of states multiple GPUs supported via client-side partitioning (scales linearly)
SLIDE 15 gross structure of BEAGLE
BEAGLE
BEAST
MrBayes
GARLI JNI wrapper
implementation manager
GPU implementation CPU CUDA interface
OpenCL interface
CUDA kernels
OpenCL kernels
C API
SLIDE 16 100 1,000 1e+04 1e+05 5e+05 3e+06 1 4 16 64 256 1 4 16 64
speedup factor unique site patterns
- GPU: AMD Radeon HD 7970 GHz Edition
GPU: NVIDIA GeForce GTX 580 (CUDA) GPU: NVIDIA Tesla K20m MIC : Intel Xeon Phi SE10P CPU: Intel Xeon E5−2680 x2 (16 cores) CPU: Intel Xeon E5−2680 (single core)
throughput for nucleotide data (4 states)
SLIDE 17 throughput for codon data (64 states)
GFLOPS speedup factor unique site patterns
- GPU: AMD Radeon HD 7970 GHz Edition
GPU: NVIDIA GeForce GTX 580 (CUDA) GPU: NVIDIA Tesla K20m MIC : Intel Xeon Phi SE10P CPU: Intel Xeon E5−2680 x2 (16 cores) CPU: Intel Xeon E5−2680 (single core)
100 1,000 1e+04 6e+04 4 16 64 256 1024 1 4 16 64 256
SLIDE 18 MrBayes speedup
double single double single
1 4 16 64
nucleotide model codon model
MrBayes
23 525
MrBayes SSE
15 16 18 1.3 40 35 89 1.9 3.1
3.4 10 13 15 1.3 1.6 17 20 50
precision
MIC : Xeon Phi CPU: 16 cores CPU: SSE CPU: standard
SLIDE 19 BEAST speedup
double single double single
1 4 16 64
nucleotide model codon model
precision
889 806
115 1.4 16 18 44 27 47
1.5 8.6 12 28 1.2 1.5 15 25 55
MIC : Xeon Phi CPU: 16 cores CPU: SSE CPU: standard
SLIDE 20
more than academic
academic: having no practical or useful significance
Webster’s New Collegiate Dictionary
SLIDE 21
two recent studies using BEAGLE library
SLIDE 22 phylogenetics in use: early spread of HIV-1
Faria et al. 2014 The early spread and epidemic ignition
- f HIV-1 in human populations. Science 346:56-61
SLIDE 23 phylogenetics in use: 2014 Ebola outbreak
Gire et al. 2014 Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345:1369-1372
SLIDE 24
acknowledgements
Daniel Ayres, University of Maryland Peter Beerli, Florida State University Aaron Darling, University of Technology Sydney Mark Holder, University of Kansas John Huelsenbeck, University of California, Berkeley Paul Lewis, University of Connecticut Andrew Rambaut, University of Edinburgh Fredrik Ronquist, Swedish National Museum of Natural History Marc Suchard, University of California, Los Angeles David Swofford, Duke University Derrick Zwickl, University of Arizona Dan Stanzione, Texas Advanced Computing Center Yariv Aridor and Arik Narkis, Intel Israel Altera University Donation Program National Science Foundation