PROOF as a Service on the Cloud a Virtual Analysis Facility based on - PowerPoint PPT Presentation

PROOF as a Service on the Cloud a Virtual Analysis Facility based on the CernVM ecosystem Dario Berzano R.Meusel, G.Lestaris, I.Charalampidis, G.Ganis, P .Buncic, J.Blomer CERN PH-SFT CHEP2013 - Amsterdam, 15.10.2013

A cloud-aware analysis facility IaaS SaaS admins provide user’s workflow virtual clusters does not change geographically distributed independent cloud providers Virtual Analysis Facility → analysis cluster on the cloud in one click Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 2

A cloud-aware analysis facility Clouds can be a troubled environment • Resources are diverse → Like the Grid but at virtual machine level • Virtual machines are volatile → Might appear and disappear without notice Building a cloud aware application for HEP • Scale promptly when resources vary → No prior pinning of data to process to the workers • Deal smoothly with crashes → Automatic failover and clear recovery procedures Usual Grid workflow → static job pre-splitting ≠ cloud-aware Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 3

PROOF is cloud-aware PROOF: the Parallel ROOT Facility • Based on unique advanced features of ROOT • Event-based parallelism • Automatic merging and display of results • Runs on batch systems and Grid with PROOF on Demand PROOF is interactive • Constant control and feedback of attached resources • Data is not preassigned to the workers → pull scheduler • New workers dynamically attached to a running process NEW Interactivity is what makes PROOF cloud-aware Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 4

PROOF is cloud-aware PROOF on Demand: runs PROOF on top of batch systems • Zero configuration No system-wide installation • Sandboxing User crashes don’t propagate to others • Self-servicing User can restart her PROOF server • Advanced scheduling pod.gsi.de Leverage policies of underlying WMS Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 5

PROOF is cloud-aware Adaptive workload: very granular (up to per event) pull architecture nonuniform workload distribution master worker Packets per worker Packets get next 4000 ready 3500 3000 2500 packet process 2000 packet generator Worker activity stop (seconds) 1500 Mean Mean 2287 2287 1000 RMS 16.61 RMS 16.61 get next time all workers 16 500 ready 0 are done 14 0.83 0.77 0.5 0.23 0.17 0.33 0.47 0.27 0.37 0.67 0.0 0.73 0.57 0.53 0.13 0.43 0.63 Worker packet process in ~20 s 12 10 get next ready 8 6 packet process 4 2 uniform completion time 2260 2270 2280 2290 2300 2310 2320 2330 Query Processing Time (s) Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 6

PROOF is cloud-aware Dynamic addition of workers new workers can join and offload a running process master initially available NEW IN init bulk init init register register ROOT init v5.34.10 worker worker worker worker worker worker init new workers autoregister deferred init init time init process process process Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 7

PROOF dynamic workers User requests N workers Old Workflow New Workflow Wait until at least one worker Wait until “some” workers becomes available are ready A bunch of workers is started Run the full analysis on Run the analysis such workers only Other workers will gradually become available Additional workers join They will be available the processing at next run only Minimal latency and optimal resources usage See ATLAS use case here: http://chep2013.org/contrib/256 Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 8

PROOF dynamic workers 100 Num. available workers 80 60 40 CERN CNAF ROMA1 20 NAPOLI MILANO 0 0 500 1000 1500 2000 2500 3000 3500 Time [s] Measured time taken for 100 Grid jobs requested at the Various ATLAS Grid See ATLAS talk: same time to start sites considered http://chep2013.org/contrib/256 Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 9

PROOF dynamic workers 1200 Actual time to results [s] Grid batch jobs (ideal num. of workers) 1000 PROOF pull and dynamic workers Analytically derived from 800 actual startup latency measurements 600 400 Batch jobs: results collected only when late workers are finished (latencies and dead times) 200 0 5000 10000 15000 20000 25000 30000 35000 Total required computing time [s] PROOF with Dynamic Workers: PROOF is up 30% more efficient all job time spent in computing on the same computing resources by (never idle, no latencies) design (analytical upper limit) Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 10

The virtual analysis facility PROOF PoD µν CernVM Elastiq HTCondor CVM online CernVM-FS authn/authz • What: a cluster of µν CernVMs with HTCondor → One head node plus a scalable number of workers • How: contextualization configured on the Web → Simple web interface: http://cernvm-online.cern.ch • Who: so easy that can even be created by end users → You can have your personal analysis facility • When: scales up/down automatically → Optimal usage of resources: fundamental when you pay for them! Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 11

The virtual analysis facility VAF leverages the CernVM ecosystem and HTCondor • µ CernVM: SLC6 compatible OS on demand → See previous talk: http://chep2013.org/contrib/213 • CernVM-FS: HTTP-based cached FUSE filesystem → Both OS and experiments software downloaded on demand • CernVM Online: safe context GUI and repository → See previous talk: http://chep2013.org/contrib/185 • HTCondor: light and stable workload management system → Workers auto-register to head node: no static resources configuration The full stack of components is cloud-aware Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 12

Elastiq queue monitor Python app to monitor HTCondor and scale up or down working running HTCondor queue working running Running shutdown idle VMs start new idle running VMs VMs idle waiting cloud controller idle waiting or CernVM Cloud Experimental meta cloud controller Jobs waiting too EC2 interface • Accepts scale requests long will trigger a (credentials given • Translates them to multiple clouds scale up securely in the context) Code available at Can be used on any HTCondor cluster http://bit.ly/elastiq and has a trivial configuration Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 13

Elastic cloud computing in action Context creation with CernVM Online : http://cernvm-online.cern.ch Create new special context Customize a few options Get generated user-data Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 14

Elastic cloud computing in action Screencast: http://youtu.be/fRq9CNXMcdI Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 15

µν CernVM+PROOF startup latency Measured the delay before requested resources become available Target clouds: Note: not comparing cloud • Small: OpenNebula @ INFN Torino infrastructures. Only measuring µ CernVM+PROOF latencies . • Large: OpenStack @ CERN (Agile) Test conditions: Measuring latency due to: • µν CernVM use a HTTP caching proxy • µν CernVM boot time → Precaching via a dummy boot • HTCondor automatic • µν CernVM image is 12 MB big registration of new nodes → Image transfer time negligible • PoD and PROOF reaction time • VMs deployed when resources are available → Rule out delay and errors due to lack of resources Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 16

µν CernVM+PROOF startup latency Compatible results: latency is ~6 minutes from scratch 8:00 7:00 Time to wait for workers [m:ss] 6:00 5:00 4:00 3:00 2:00 1:00 0:00 CERN OpenStack Torino OpenNebula Measured time elapsed between PoD workers’ request and availability: 10 VMs started in the test pod-info -l Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 17

Conclusions Every VAF layer is cloud-aware • PROOF+HTCondor deal with “elastic” addition/removal of workers • µ CernVM is very small and fast to deploy • CernVM-FS downloads only what is needed Consistent configuration of solid and independent components • No login to configure: all done via CernVM Online context • PROOF+PoD also work dynamically on the Grid • Elastiq can scale any HTCondor cluster , not PROOF-specific • Reused existing components wherever possible Dario.Berzano@cern.ch - PROOF as a Service on the Cloud - http://chep2013.org/contrib/308 18

Thank you for your attention!

PROOF as a Service on the Cloud a Virtual Analysis Facility based on - PowerPoint PPT Presentation

PROOF as a Service on the Cloud a Virtual Analysis Facility based on the CernVM ecosystem Dario Berzano R.Meusel, G.Lestaris, I.Charalampidis, G.Ganis, P .Buncic, J.Blomer CERN PH-SFT CHEP2013 - Amsterdam, 15.10.2013 A cloud-aware analysis

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

http://cloud-council.org/resource-hub.htm#practical-guide-to-cloud-service- agreements-version-2

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

1 Outline Intro Concepts: Contextualization & Base Images Efficiency

Education presented by Dr. Krista Lynn Minnotte co-facilitated by Dr. Anne Kelsch & Carrie

Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker Carl Ebeling Moores Law: Is

Human in the Loop Network Control Systems Mike McCourt University of Washington Tacoma Prio

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Lecture 4 Specifications Zach Tatlock / Spring 2018 Administrivia Next assignments posted

Literacy Environmental Scan Coaching with Madison Metropolitan School District: Session 4

Computer Science II for Majors Lecture 16 Exceptions Dr. Katherine Gibson www.umbc.edu Last

PROOF as a Service on the Cloud a Virtual Analysis Facility based on - PowerPoint PPT Presentation

PROOF as a Service on the Cloud a Virtual Analysis Facility based on the CernVM ecosystem Dario Berzano R.Meusel, G.Lestaris, I.Charalampidis, G.Ganis, P .Buncic, J.Blomer CERN PH-SFT CHEP2013 - Amsterdam, 15.10.2013 A cloud-aware analysis

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

http://cloud-council.org/resource-hub.htm#practical-guide-to-cloud-service- agreements-version-2

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Day 4 Cloud Resource Provisioning Plans Agenda for Today Cloud service providers offer cloud

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

CS5412: THE CLOUD VALUE PROPOSITION Lecture XXII Ken Birman Cloud Hype 2 The cloud is

SAS and (the) Cloud Dave Annis SAS Solutions onDemand SAS and (the) Cloud Everyones Cloud

1 Outline Intro Concepts: Contextualization &amp; Base Images Efficiency

Education presented by Dr. Krista Lynn Minnotte co-facilitated by Dr. Anne Kelsch &amp; Carrie

Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker Carl Ebeling Moores Law: Is

Human in the Loop Network Control Systems Mike McCourt University of Washington Tacoma Prio

CSE 2221 Software I: Software Components and CSE 2231 Software II: Software Development and

Lecture 4 Specifications Zach Tatlock / Spring 2018 Administrivia Next assignments posted

Literacy Environmental Scan Coaching with Madison Metropolitan School District: Session 4

Computer Science II for Majors Lecture 16 Exceptions Dr. Katherine Gibson www.umbc.edu Last

1 Outline Intro Concepts: Contextualization & Base Images Efficiency

Education presented by Dr. Krista Lynn Minnotte co-facilitated by Dr. Anne Kelsch & Carrie