Software tools to deploy and manage cryo-EM jobs in the cloud - PowerPoint PPT Presentation

Software tools to deploy and manage cryo-EM jobs in the cloud Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan

‘State of the art’ computing for cryo-EM ‣ Data processing occurring at a different site than data collection ‣ Data copying & moving; living at multiple locations ‣ New users left on their own to navigate the complex world of computing & storage

Current state of computing in cryo-EM # of cryo-EM users “Expert” level knowledge Cyber-infrastructure available per user Time

How do we scale computing and storage for cryo-EM? University-wide Computing -$$ Data storage - $$ Scalable National supercomputers Individual laboratory / facility Free computing Capital investments - $$$ Scalable IT support required - $$$ No data storage Difficult to scale Cloud computing Computing -$ Scalable Data storage - $

The cloud is a ubiquitous feature of our daily lives Data storage Cloud computing Website hosting

The cloud is a ubiquitous feature of our daily lives ‘Application’ cloud e.g. Netflix, Google, Dropbox ‘Service’ cloud e.g. Application programming interfaces (APIs) ‘Infrastructure’ cloud Amazon Web Services e.g. Virtual machines, webservers

Key principles of cloud computing 1. Cost e ff ective - Economies of scale - Someone else takes care of IT hardware support - Pay as you go (pro-rated to the minute) 2. Reliable data storage - Backed up offsite, multiple locations 3. Flexibility - Global footprint - As more computing power is needed, easy to add more - No need to ‘choose’ between CPU vs GPU resources - Instant access to hundreds of CPUs

Amazon Web Services (AWS) is the world’s largest cloud provider Region & # of availability zones New regions Region = Data center Availability Zone = Building within data center

Computing options on AWS Instance = virtual machine on AWS vCPU = hyper thread on CPU core Memory (GB RAM) t2.nano 1 vCPU 0.5 GB RAM c4.8xlarge 36 vCPU 60 GB RAM g3.4xlarge p2.xlarge m4.16xlarge r4.16xlarge 1 GPU / 16 vCPU 1 GPU / 4 vCPU 64 vCPU 64 vCPU 122 GB RAM 61 GB RAM 256 GB RAM 488 GB RAM g3.8xlarge p2.8xlarge 2 GPU / 32 vCPU 8 GPU / 32 vCPU 244 GB RAM 488 GB RAM g3.16xlarge p2.16xlarge # CPU/GPU 4 GPU / 64 vCPU 16 GPU / 64 vCPU 488 GB RAM 732 GB RAM x1.32xlarge 128 vCPU …59 total instances (48 more than shown here) 1952 GB RAM

Computing options on AWS Instance = virtual machine on AWS vCPU = hyper thread on CPU core Memory (GB RAM) Price per hour $0.0058 1 vCPU 0.5 GB RAM $1.591 36 vCPU 60 GB RAM $1.14 $0.90 $3.20 $4.256 1 GPU / 16 vCPU 1 GPU / 4 vCPU 64 vCPU 64 vCPU 122 GB RAM 61 GB RAM 256 GB RAM 488 GB RAM $2.28 $7.20 2 GPU / 32 vCPU 8 GPU / 32 vCPU 244 GB RAM 488 GB RAM $4.56 $14.40 # CPU/GPU 4 GPU / 64 vCPU 16 GPU / 64 vCPU 488 GB RAM 732 GB RAM $13.338 128 vCPU 1952 GB RAM

Reserving instances on AWS Pay-per-hour On-demand Pay ‘standard’ price Cheaper Instance is 100% yours Reserved Pay 1-3 yr upfront to reserve instance Cheaper Instance is 100% yours Spot Bid on marketplace Instance is yours until somehow outbids you

Data storage on AWS Local SSDs No cost Directly attached to instance Elastic block storage (EBS) $0.10/GB/mo. SSDs that can be attached/detached Object storage: Scalable storage (active & cold) Simple storage service (S3) $0.023/GB/mo. Glacier $0.004/GB/mo.

Previously… Local Virtual machine EBS workstation Problems: ‣ Manual deployment & management of AWS resources ‣ Cumbersome data movement ‣ Medium/expert-level linux experience required to interface CPU Cluster Pay < $10/hr for 150+ CPUs with spot pricing Cianfrocco & Leschziner, eLife 2015

What is the best way to use the cloud? “Hybrid cloud architecture is the integration of on-premises resources with cloud resources.” e.g.

Building a hybrid cloud infrastructure for cryo-EM RELION as a prototype Amazon Web Services RELION GUI Instance + Software that will interface local & cloud resources Cryo-EM software Local (e.g. laptop)

Building a hybrid cloud infrastructure for cryo-EM RELION as a prototype RELION GUI Utilize cluster submission feature to submit to AWS

Building a hybrid cloud infrastructure for cryo-EM RELION as a prototype RELION GUI Implement for all jobtypes

cryoem-cloud-tools : software to run cryo-EM on the cloud Amazon Web Services S3 EBS Instance Move Retrieve Local (e.g. laptop) data to output in the cloud real time 1. Relion: Run now! queue: qsub_aws

Amazon Web Services Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) Movie alignment S3 Bucket Aligned movies 5 x 128 vCPUs Micrographs Instance monitored by Cloud Watch to Micrograph particle extraction automatically terminate instance if idle > 1hr. Multi-file uploads: Unblur movie alignment of 1536 ‣ 150 - 300 MB/sec EBS volume (SSD) 1 x SSD 16 vCPUs movies = ~2.2 hr on 5 x 128 vCPUs ‣ 12 TB in 11 hrs ‘Enhanced networking’ on AWS (10 or 20 Gigabit) Particles Input/output files transferring over the internet Security group to restrict access to user’s IP address Particle extraction Movie alignment RELION: Run now! queue: qsub_aws

Amazon Web Services Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) S3 Bucket AutoPick, CTF estimation, 2D/3D classification, auto-refine 1 x SSD 1,2, 4, 8,16 GPUs 3D models Averages, GPU-accelerated RELION: Run now! processing queue: qsub_aws

Amazon Web Services Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) S3 Bucket Movie particle Movie extraction Particle polishing refinement Particles 8 x 36 vCPUs 1 x 128 vCPUs 1 x 128 vCPUs 42 TB / instance RELION: Run now! Movie processing queue: qsub_aws

Amazon Web Services Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) Movie alignment S3 Bucket 5 x 128 vCPUs Micrographs AutoPick, CTF estimation, Micrograph particle extraction 2D/3D classification, auto-refine 1 x gp2 SSD 1,8,16 GPUs 1 x gp2 SSD 16 vCPUs 3D models Averages, Movie particle Movie Particles extraction Particle polishing refinement Particles 8 x 36 vCPUs 1 x 128 vCPUs 1 x 128 vCPUs 42 TB / instance RELION: Run now! queue: qsub_aws

Comparison of AWS vs. local GPU workstation for RELION2 Comparing 2.2 Å beta-galactosidase dataset (EMPIAR 10061) GPU workstation AWS (Kimanius et al.) 3D Auto Auto Pick Unblur -refine 1.3h 3D Class. 13.3h 6h 7h Gctf 0.7h 2D Class. 4.3h 3D Auto-refine Unblur 5.38h 34h Movie-extract Polish 55h 115h 4h 25h Auto Pick 2h Movie refine 4.9h Gctf Polish 0.4h Movie- 6.8h refine 2D Class. 25h 8h 3D Class. 6.2h 3D Auto-refine 3D Auto- 5.3h refine Movie-extract 6.4h 9.5h GPU-based process CPU-based process

Comparison of AWS vs. local GPU workstation for RELION2 Percent-speed increase between AWS and local GPU workstation 400 300 200 100 Movie alignment Auto Pick Gctf 2D classification 3D Auto-refine Movie-extract Movie-refine Polish 3D classification 3D Auto-refine GPU-based process CPU-based process

Cost breakdown of AWS Total computing cost: $688.15 200 180 160 140 ‣ GPU costs will decrease Cost ($ USD) 120 by 50% with g3 instance 100 (M40 GPUs) 80 ‣ New unblur from cisTEM 60 is much faster 40 20 0 Movie alignment Auto Pick Gctf 2D classification 3D Auto-refine Movie-extract Movie-refine Polish 3D classification 3D Auto-refine GPU-based process CPU-based process

Cost comparison between AWS and local GPU workstation Local GPU AWS workstation 1 structure Computing costs* $688.17 $72.45 55 115 Processing time (h) Number of working days 5 10 Net cost of local GPU workstation vs. AWS # - $345.83 3 structures $2064.51 $217.35 Computing costs* Processing time (h) 55 345 Number of working days 5 30 Net cost of local GPU workstation vs. AWS # - $1998.84 *For local GPU workstation, $10,000 workstation cost = $0.63/hr (60% utilization). # Assuming daily salary of $192.31 USD (based on $50,000/yr income)

Global footprint of cryoem-cloud-tools EU-West-1 (Ireland) US-East-2 US-West-2 (Ohio) (Oregon) US-East-1 (Virginia) AP-Southeast-2 (Sydney) Region & # of availability zones New regions Tracking with expanding GPU availability on AWS

Extending cryoem-cloud-tools : Rosetta on AWS 216 structures for each CM & Relax into 2.2 Å beta-galactosidase 70 Amazon Web Services 60 Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) Rosetta run time (hr) 50 - Incorporated complex Rosetta 40 workflow for submitting 13.5X 30 data upload to AWS Output PDB models Job submission & Rosetta jobs speedup 6 x 36 vCPUs 20 - Users just need FASTA 10 sequence file and cryo- 0 EM map 1 ./rosetta_refinement_on_aws.py 6 A c W m o S a r e c h i n e Total AWS Cost: $43.60 With Indrajit Lahiri (UCSD) & Frank DiMaio (UW)

Extending cryoem-cloud-tools : Appion on AWS Amazon Web Services Region (e.g. US-West-2) Availability Zone (e.g. US-West-2a) RELION2 Output PDB models data upload to AWS Job submission & GPU machines ./apAWS.py With Carl Negro (NRAMM)

Extending cryoem-cloud-tools : Appion on AWS

Software tools to deploy and manage cryo-EM jobs in the cloud - PowerPoint PPT Presentation

Software tools to deploy and manage cryo-EM jobs in the cloud Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan State of the art computing for cryo-EM Data processing occurring at a

JOBS, JOBS, JOBS! JOBS, JOBS, JOBS! Jobs, jobs, JO JOBS! JOBS, JOBS, JOBS! The other reality

Deploy Early, Deploy Often, Deploy Safely Andy Lowe From User Story to Production Feature

Basics and progress of single particle reconstructions with cryo- EM (3DEM) Shashi Bhushan

TOM TOM A toolbox toolbox for for Cryo Cryo- -Electron Electron A Tomography and Single

Regional Consortia for High Resolution Cryo Electron Microscopy Goal: ensure access of cryo EM

Jobs at sea TRINITY HOUSE // KEY STAGE 2 JOBS AT SEA Starter Activity 1 TRINITY HOUSE //

deploy Automating Cloud Testing and Deployment with Deploy Monday 9/16/2013 5:10pm Room

Ansible workshop workshop Ansible The easiest way to: The easiest way to: orchestrate, deploy

Green Jobs, Decent Work and Sustainable Development Ana Sanchez Green Jobs Programme Green Jobs

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

Modern Continuous Delivery Modern Continuous Delivery deploy to production deploy to

Dual Deploy Recovery Why do dual deploy? What you need Mandatory

Deploy Like A Boss Oliver Nicholas DEPLOY LIKE A BOSS THE JOURNEY FROM 2 SERVERS TO 20,000 THE

NAT66 draft-mrw-behave-nat-02.txt Margaret Wasserman mrw@sandstorm.net 1 Why Do People Deploy

35T experience with Cryo Measurements and CFD Alan Hahn FNAL 8/15/18 1 35 Ton Prototype

Transformative Potential of High Resolution Cryo-Electron Microscopy Sponsoring ICOs: NIGMS,

Community Prosperity Summit May 28-29, 2020 Hosted Virtually by Hancock College Santa Maria, CA

5.9 GHz: The Best Opportunity for Better Wi-Fi Fast WI-FI IS HOW AMERICANS EXPERIENCE THE

Physics and/of Algorithms Michael (Misha) Chertkov Center for Nonlinear Studies & Theory

Microsoft Airband Initiative: Rural America Partner Summit Policy Review: Infrastructure Funding

1 ,1'!-,%%!% .!2

NSF F South Big Data Hub The South Big Data Innova6on Hub

Designing Systems for Dependability and Predictability Richard West Boston University Boston,

pCell Technology: Delivering 5G-grade Performance to 4G LTE

Software tools to deploy and manage cryo-EM jobs in the cloud - PowerPoint PPT Presentation

Software tools to deploy and manage cryo-EM jobs in the cloud Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan State of the art computing for cryo-EM Data processing occurring at a

JOBS, JOBS, JOBS! JOBS, JOBS, JOBS! Jobs, jobs, JO JOBS! JOBS, JOBS, JOBS! The other reality

Deploy Early, Deploy Often, Deploy Safely Andy Lowe From User Story to Production Feature

Basics and progress of single particle reconstructions with cryo- EM (3DEM) Shashi Bhushan

TOM TOM A toolbox toolbox for for Cryo Cryo- -Electron Electron A Tomography and Single

Regional Consortia for High Resolution Cryo Electron Microscopy Goal: ensure access of cryo EM

Jobs at sea TRINITY HOUSE // KEY STAGE 2 JOBS AT SEA Starter Activity 1 TRINITY HOUSE //

deploy Automating Cloud Testing and Deployment with Deploy Monday 9/16/2013 5:10pm Room

Ansible workshop workshop Ansible The easiest way to: The easiest way to: orchestrate, deploy

Green Jobs, Decent Work and Sustainable Development Ana Sanchez Green Jobs Programme Green Jobs

Green Jobs Employment experiences Green Jobs Employment experiences Green Jobs Employment

Modern Continuous Delivery Modern Continuous Delivery deploy to production deploy to

Dual Deploy Recovery Why do dual deploy? What you need Mandatory

Deploy Like A Boss Oliver Nicholas DEPLOY LIKE A BOSS THE JOURNEY FROM 2 SERVERS TO 20,000 THE

NAT66 draft-mrw-behave-nat-02.txt Margaret Wasserman mrw@sandstorm.net 1 Why Do People Deploy

35T experience with Cryo Measurements and CFD Alan Hahn FNAL 8/15/18 1 35 Ton Prototype

Transformative Potential of High Resolution Cryo-Electron Microscopy Sponsoring ICOs: NIGMS,

Community Prosperity Summit May 28-29, 2020 Hosted Virtually by Hancock College Santa Maria, CA

5.9 GHz: The Best Opportunity for Better Wi-Fi Fast WI-FI IS HOW AMERICANS EXPERIENCE THE

Physics and/of Algorithms Michael (Misha) Chertkov Center for Nonlinear Studies &amp; Theory

Microsoft Airband Initiative: Rural America Partner Summit Policy Review: Infrastructure Funding

1 ,1'!-,%%!% .!2

NSF F South Big Data Hub The South Big Data Innova6on Hub

Designing Systems for Dependability and Predictability Richard West Boston University Boston,

pCell Technology: Delivering 5G-grade Performance to 4G LTE

Physics and/of Algorithms Michael (Misha) Chertkov Center for Nonlinear Studies & Theory