Reliability Support in Virtual Infrastructures 2 nd IEEE - PowerPoint PPT Presentation

Reliability Support in Virtual Infrastructures 2 nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, 2010 RESO Guilherme Koslovski (INRIA – University of Lyon) Wai-Leong Yeow (DoCoMo USA Labs) Cedric Westphal ( DoCoMo USA Labs) Tram Truong Huu (University of Nice – I3S) Johan Montagnat (CNRS – I3S) Pascale Vicat-Blanc Primet (INRIA - LYaTiss)

Reliability as a Service • Reliability : probability that a system will survive failures • Availability : fraction of time that a system is functional 99.95% availability 99.9% availability 99.95% reliability 100% uptime 100% network uptime • Actually nothing more than SLAs. – Failure => credits – Lock-ins – No guarantees at all 2 nd IEEE CloudCom – 2010 2 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Context Convergence of computing and communication: Virtual Infrastructure is a concept emerging from Virtual Networks and Infrastructures as a Service New models and tools to manage virtualized substrate & to help users in execution of their applications Network virtualization Users Resources virtualization Distributed & virtualized substrate Grid computing experience IaaS, PaaS, … XaaS concepts Complex applications 2 nd IEEE CloudCom – 2010 3 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Issue • Network and IT resources are subject to random failures • Failures can be measured: mean time between failures (MTBF) • Impact of a failure on a distributed application: • worker node failure: can affect the total execution time • database and servers: can compromise the entire execution • Some applications can recover from failures but • This process usually affects the execution time • This complicates the application development 2 nd IEEE CloudCom – 2010 4 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Our proposal Reliability as a service offered by the infrastructure provider Provide me a basic Provide me a reliable infrastructure infrastructure User PM VM 1 PM BKP 1 Application PM VM 2 PM BKP 2 PM VM n PM BKP n 2 nd IEEE CloudCom – 2010 5 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Our proposal Reliability becomes a service offered by the infrastructure provider Transparent realibility provisioning Users (applications) have no knowledge about physical failures User PM VM 1 PM BKP 1 Application PM VM 2 PM BKP 2 PM VM n PM BKP n 2 nd IEEE CloudCom – 2010 6 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Outline Providing Transparent Reliability Reliable Virtual Infrastructure description Automatic generation of backup nodes and backup links Allocation algorithm Evaluation through a use case application Conclusion & Future work 2 nd IEEE CloudCom – 2010 7 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Mechanism for providing transparent reliability I. Virtual Infrastructure description II. Translation of reliability requirements into real backup nodes III. Allocation of a reliable virtual infrastructure 2 nd IEEE CloudCom – 2010 8 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Virtual Infrastructure description: VXDL language VXDL: Virtual private eXecution infrastructure VXDL file Description Language – http://www.ens-lyon.fr/LIP/RESO/Software/vxdl/ vm1 General description Resources description workers [100 nodes] Network topology 1 GB, 2 GHz database description 2 cores [1 GB, 2 GB] Location: lyon.fr 2 GHz Reliability: 99.9% Timeline description 2 cores Location: lyon.fr Reliability: 99.99% 2 nd IEEE CloudCom – 2010 9 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Virtual Infrastructure extension Translation of reliability requirements into replica nodes Opportunistic Redundancy Pooling (ORP) mechanism [W. Yeow et al, 2010] : Input: Reliability level (user requirement) Probability of physical failures (from MTBF) Number of protected virtual nodes (user requirement) Output: the number of backup nodes – Backup nodes can be shared among different groups of critical nodes – For example, two sets of backup nodes (k1 and k2) can be shared to protect two groups of critical nodes. Thanks to ORP is required only the min(k1, k2) [W. Yeow et al, 2010]: Designing and embedding reliable virtual infrastructures, VISA workshop 2010. 2 nd IEEE CloudCom – 2010 10 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Virtual Infrastructure extension Backup links: consistent network topology Step 1 Step 3 Step 2 2 nd IEEE CloudCom – 2010 11 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Allocation of a Reliable Virtual Infrastructure An extended graph is composed by original description + backup components Backup components can have specific constraints: – For example, original node and backup node should be allocated on different physical racks Subgraph-isomorphism detection [Lischka et al, 2009] Physical substrate Embedded graph 2 nd IEEE CloudCom – 2010 12 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

From mapping to allocation The map provided by the allocation is interpreted and instantiate using the HIPerNet framework [P. Primet et al, 2010] • Original VMs and replicas are synchronized by a modified version of the Remus live protection mechanism [B. Cully et al, 2008] 2 nd IEEE CloudCom – 2010 13 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Evaluation through a use case application Bronze Standard: distributed large-scale application – Quantifies the maximal error resulting from medical-image analysis – Large databases: more the data, more the accuracy – 31 VMs: 512 MB,1 GHz – 10 Mbps for each virtual link between the database and the workers I) Translated into VXDL II) Submitted to HIPerNet Two scenarios of reliability requirements: – Database protection: a failure stops the application execution – Workers protection: a failure increases the execution time Testbed: Grid’5000 – Physical substrate is composed by 100 nodes: – MTBF simulation values: 60000s, 30000s, 15000s [D. Atwood et al., 2008] 2 nd IEEE CloudCom – 2010 14 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Experimental results Goal : quantify the cost of a reliable virtual infrastructure Prices are based on Amazon EC2 for Europe VM specifications Basic node We do not include any specific link pricing Short term lease $0.095 cost without reliability support (short term lease): $2.95 / h Long term lease $0.031 Prices for computing nodes protection (30 VMs, 99.9%): Short term Long term MTBF Backup Total Reliability cost / total Total Reliability cost / total Nodes cost cost cost cost 60000 5 $3.42 16.1% $3.10 5.3% s 30000 8 $3.71 25.8% $3.19 8.4% s 15000 12 $4.09 38.7% $3.32 12.6% s 2 nd IEEE CloudCom – 2010 15 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Experimental results Goal: evaluate the application behavior when executing with reliability support Application makespan without substrate failures: 1205s, used as baseline – Database protection: DB label : database is the unique component protected Makespan increases proportionally to the number of failures – Worker nodes protection: WN label : only computing nodes are protected Makespan slightly increases 1800 1600 1400 MTBF DB WN 1200 Increase Increase 1000 NI DB 800 WN 60000s 16.26% 0.2% 600 400 30000s 26.47% 1.7% 200 0 15000s 40.08% 3.2% NI 60000s 30000s 15000s 2 nd IEEE CloudCom – 2010 16 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Experimental results Goal: reliability service vs resubmission mechanism: – Application is aware about substrate failures – A task is resubmitted on a new computing node – The makespan difference would have been more if backup nodes were not pre-allocated and configured 1600 1400 MTBF Makespan 1200 Increase 1000 60000s +13.08% 800 Reliability Resubmission 600 30000s +19.67% 400 15000s +22.19% 200 0 60000s 30000s 15000s 2 nd IEEE CloudCom – 2010 17 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Conclusions Reliability becomes a service offered by the infrastructure provider We have developed a framework to provide transparent reliability: – A language to specify the reliability requirements; – A mechanism to interpret these requirements and transform it in replicas (nodes and links) – A map and allocation process to provisioning the reliability level required by the user The framework was implemented on top of the HIPerNet framework, and validated over the Grid’5000 testbed Future work includes: – the implementation of a mechanism to protect virtual links – a detailed investigation on the economical aspects – Tomorrow there is a demonstration about the industry version of the HIPerNet framework (LYaTiss core) - http://www.lyatiss.com/ 2 nd IEEE CloudCom – 2010 18 G. Koslovski, W. Yeow, C. Westphal, T. Huu, J. Montagnat, P. Vicat-Blanc

Reliability Support in Virtual Infrastructures 2 nd IEEE - PowerPoint PPT Presentation

Reliability Support in Virtual Infrastructures 2 nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, 2010 RESO Guilherme Koslovski (INRIA University of Lyon) Wai-Leong Yeow (DoCoMo USA Labs) Cedric

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Software Reliability and System Reliability Introduction 1 Software Reliability and System

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Outline Spatial Data Infrastructures Spatial Data Infrastructures Some Questions on SDIs

HPC Infrastructures HPC Infrastructures Moreno Baricevic CNR-INFM DEMOCRITOS, Trieste NETTAB

Transparent migration of virtual Transparent migration of virtual infrastructures in large

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Reliability Perspectives on Clean Power Plan Implications NERC Reliability Assessments John Moura

The Future of Reliability: Stanton Energy Reliability Center DCBO Bidders Conference

ARG Availability and reliability monitoring for e-Infrastructures C. Kanellopoulos, GRNET K.

Critical Information Infrastructures: What Lies Ahead? Giampiero Giacomello EIB Seminar

Interoperation with Interoperation with Infrastructures: Infrastructures: NDGF-EGEE NDGF-EGEE

e- -Infrastructures Infrastructures Taking stock and looking ahead an European perspective p

Shaping the Evolution of Information Infrastructures: Architecture, Governance Regime, Process

INF5210 INF5210 Information Infrastructures Information Infrastructures Information

Building Open Source Identity Building Open Source Identity Infrastructures Infrastructures

Solving the Harmonic Oscillator Equation NCSU Department of Math Morgan Root Spring-Mass System

South Caucasian Agreement: Optimization and Lingering Mysteries Steven Foley UC Santa Cruz

Field joint coatings Heat Shrink Sleeves 100% Solvent less Liquid Epoxy 100%

Higher-Order Relational Refinement Types for Mechanism Design and Differential Privacy Gilles

Computer Graphics - Spline and Subdivision Surfaces - Hendrik Lensch Computer Graphics WS07/08

Electrostatics and other interactions in proteins & water Magnus Andersson

Hp-spectral FEMs in fast domain decomposition algorithms V . KORNEEV and A . RYTOV

Modeling Cloud Compu.ng and Cloud Networking with VXDL Pascale

Reliability Support in Virtual Infrastructures 2 nd IEEE - PowerPoint PPT Presentation

Reliability Support in Virtual Infrastructures 2 nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, 2010 RESO Guilherme Koslovski (INRIA University of Lyon) Wai-Leong Yeow (DoCoMo USA Labs) Cedric

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Software Reliability and System Reliability Introduction 1 Software Reliability and System

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Outline Spatial Data Infrastructures Spatial Data Infrastructures Some Questions on SDIs

HPC Infrastructures HPC Infrastructures Moreno Baricevic CNR-INFM DEMOCRITOS, Trieste NETTAB

Transparent migration of virtual Transparent migration of virtual infrastructures in large

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Reliability Perspectives on Clean Power Plan Implications NERC Reliability Assessments John Moura

The Future of Reliability: Stanton Energy Reliability Center DCBO Bidders Conference

ARG Availability and reliability monitoring for e-Infrastructures C. Kanellopoulos, GRNET K.

Critical Information Infrastructures: What Lies Ahead? Giampiero Giacomello EIB Seminar

Interoperation with Interoperation with Infrastructures: Infrastructures: NDGF-EGEE NDGF-EGEE

e- -Infrastructures Infrastructures Taking stock and looking ahead an European perspective p

Shaping the Evolution of Information Infrastructures: Architecture, Governance Regime, Process

INF5210 INF5210 Information Infrastructures Information Infrastructures Information

Building Open Source Identity Building Open Source Identity Infrastructures Infrastructures

Solving the Harmonic Oscillator Equation NCSU Department of Math Morgan Root Spring-Mass System

South Caucasian Agreement: Optimization and Lingering Mysteries Steven Foley UC Santa Cruz

Field joint coatings Heat Shrink Sleeves 100% Solvent less Liquid Epoxy 100%

Higher-Order Relational Refinement Types for Mechanism Design and Differential Privacy Gilles

Computer Graphics - Spline and Subdivision Surfaces - Hendrik Lensch Computer Graphics WS07/08

Electrostatics and other interactions in proteins &amp; water Magnus Andersson

Hp-spectral FEMs in fast domain decomposition algorithms V . KORNEEV and A . RYTOV

Modeling Cloud Compu.ng and Cloud Networking with VXDL Pascale

Electrostatics and other interactions in proteins & water Magnus Andersson