ScipionCloud Large scale cryo electron microscopy image processing - - PowerPoint PPT Presentation
ScipionCloud Large scale cryo electron microscopy image processing - - PowerPoint PPT Presentation
ScipionCloud Large scale cryo electron microscopy image processing on commercial and academic clouds Who are we? The Instruct cryoEM Image Processing Center Instruct: The European Research Infrastructure for Structural Biology Providing
Who are we?
The Instruct cryoEM Image Processing Center Instruct: The European Research Infrastructure for Structural Biology
Providing access to state of the art structural biology infrastructure for researchers
What is Cryo Electron Microscopy
Among the structural biology (SB) techniques at the core of the Instruct ESFRI project, electron microscopy under cryogenic conditions (“cryo-EM”) is currently the fastest growing area, having been nominated “Method of the Year (2015)” by Nature.
Why do we hear so much about Electron Microscopy?
Because thanks to: 1) The very good performance of current microscopes 2) The very good image acquisition characteristics of Direct Electron Detector 3) The very good new software for 3D reconstruction and classification It is possible to solve the structure of large and flexible macromolecular complexes, without 3D crystals from small amounts of not very concentrated samples.
CryoEM for drug discovery
Cryo EM resolving the structure of EBOLA VIRUS key glycoprotein in complex with therapeutic antibodies
Typical EM Workflow
16 cores, 2GB/core
Hardware revolution on CryoEM processing
Traditionally HPC clusters or Fat Nodes Now two lines of improvement emerge:
- Graphical Processing Units (GPUs)
Algorithms being ported to use GPUs and new ones developed
- Cloud platforms
Plethora of EM software packages: Our answer “Scipion” Workflow Integrator
Bringing software integration to EM in workflows
Scipion Framework
Scipion Framework
Scipion encapsulates:
- Parallelization: By each EM program or by Scipion -> OpenMPI
- Environment setup, libraries
- Batch system submission: Scipion templates
- Use of GPUs: Implemented on EM packages, each with its
requirements. – Relion 2.0: Nvidia cards with at least 3.5 capability and for particles bigger than 200p 2 GPU with minimum 4GB RAM. – Motioncorr2: Cuda 8
Scipion distributions
- Binaries
- Source code + EM packages autoinstall
- ScipionCloud:
- Public AMI on AWS EC2 (EU Ireland and US North Virginia
and Oregon regions)
- Virtual Appliance on EGI AppDB
- Vagrant file and CVMFS (Westlife project)
- Puppet + Cloudify (Westlife project)
ScipionCloud
- Ubuntu 14.04 LTS
- Scipion release 1.1 (source git)
- Most important EM packages compiled with CUDA (GPU
support)
- Nvidia driver + cuda toolkit (7.5 & 8.0)
- Guacamole (remote desktop)
- Starcluster (only AWS)
ScipionCloud profiling
Profiling workflow
2 BIM correction 3 CTF estimation 1 Import movies 5 Particle Extraction 10 3D postprocessing 9 3D Refinement
Preprocessin g
Processing
Postprocessi ng
Network transfer
Acquisition
6 2D Classification 7 Initial model 8 3D Classification 4 Particle Picking
Profiling data transfer
- 966 movies, 8Kx8K -> 6.6 TB raw data
- Used Aspera connect (from EMPIAR DB)
- Tested bbcp and rsync
Profiling machine types
Environment Instance vCPUs RAM (GB) GPU model GPU RAM (GB) Cost ($/hour) AWS EC2 Ireland g2.2xlarge 8 15 GRID K520 4 0.702 p2.8xlarge 32 488 Tesla K80 12 7.776 r3.8xlarge 32 244
- 0.888
x1.32xlarge 128 1952
- 2.96
FedCloud CESNET universe 40 232
- FedCloud IISAS
gpu1cpu6 6 24 Tesla K20 4
- gpu2gpu12
12 48 Tesla K20 4
- Local
asimov 32 512
- 1.85 (est)
titanxp 32 128 Titan XP 12
Profiling results
EM Workflow AWS EC2 Ireland FedCloud Local server CNB Step Program Machine type Time (hours) Cost ($) Machine type Time (hours) Machine type Time (hours) Transfer movies Aspera g2.2xlarge 36 26 1gpu6cpu 23 Local server
- Align movies
motioncor2 GPU 41 Ctf estimation ctffind4 Particle picking Xmipp3 Interactive
- Interactive
Interactive Particle extraction Relion 2.0 0.6 0.5 0.4 0.4 2D classification Relion 2.0 GPU p2.8xlarge 6 42 2gpu12cpu 25 8 Inital volume Eman 2.12 0.08 0.7 0.16 0.22 3D classification Relion 2.0 GPU 0.6 4.7 2.1 1.3 3D refinement Relion 2.0 GPU 0.7 5.6 2.6 1.8 Postprocessing Relion 2.0 0.003 0.03 0.004 0.003
Following results are not comparable since particle size was 512 px instead of 200 px.
3D refinement Relion 1.4 CPU x1.32xlarge 28 448 universe 166 Local server CPU 74 r3.8xlarge 88 261 4 r3.8xlarge 27 325
Conclusions
- GPUs have changed the EM processing paradigm
– Time / Cost
- Cloud platforms can be a good solution for small labs that do
not want to invest on hardware or occasional needs (training)
- ScipionCloud allows scientists to try and use Scipion
framework without dealing with installation and configuration
Plans for the future
- Improve remote desktop visualization
– Update Guacamole installation – Integrate VirtualGL + TurboVNC with Guacamole
- Upgrade to Ubuntu 16.04
- Dynamic cluster support on Federated Cloud
- Improve image contextualization
– > INDIGO solutions
Acknowledgments
Projects:
- EGI Engage Competence Center
- Instruct Pilot EM cloud computing
People:
- Enol Fernandez (EGI.eu)
- Boris Parak (CESNET)
- Viet Tran and the other support staff at IISAS GPUCloud
References
- Scipion project: http://scipion.cnb.csic.es
- MoBrain project: https://mobrain.egi.eu
- INSTRUCT: http://www.structuralbiology.eu
- Westlife project: http://about.west-life.eu
- StarCluster: http://star.mit.edu/cluster/index.html
- Guacamole: http://guacamole.incubator.apache.org