Symmetric Active/Active High Availability for High-Performance - - PowerPoint PPT Presentation

symmetric active active high availability for high
SMART_READER_LITE
LIVE PREVIEW

Symmetric Active/Active High Availability for High-Performance - - PowerPoint PPT Presentation

Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations Christian Engelmann 1,2 , Stephen L. Scott 1 , Chokchai (Box) Leangsuksun 3 , Xubin (Ben) He 4 1 Oak Ridge National


slide-1
SLIDE 1

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 1/31

Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations

Christian Engelmann1,2, Stephen L. Scott1, Chokchai (Box) Leangsuksun3, Xubin (Ben) He4

1 Oak Ridge National Laboratory, Oak Ridge, USA 2 The University of Reading, Reading, UK 3 Louisiana Tech University, Ruston, USA 4 Tennessee Tech University, Cookeville, USA

slide-2
SLIDE 2

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 2/31

Overview

Overall background

Scientific high-performance computing Availability issues in high-performance computing systems

Service-level availability taxonomy Symmetric active/active replication

Model, algorithms, architecture

Symmetric active/active prototypes

PBS TORQUE job and resource management service Parallel Virtual File System metadata service

Symmetric active/active replication framework

slide-3
SLIDE 3

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 3/31

Scientific High-Performance Computing

Large-scale high-performance computing

Tens-to-hundreds of thousands of processors Current systems: IBM BG/L and Cray XT5 Next-generation: Petascale IBM BG/P, Cray Baker

Computationally and data intensive applications

100 TFlops - 1 PFlops with 100 TB - 1 PB of data Climate change, nuclear astrophysics, fusion energy,

materials sciences, biology, nanotechnology, …

Capability vs. capacity computing

Single jobs occupy large-scale high-performance computing

systems for weeks and months at a time

slide-4
SLIDE 4

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 4/31

Availability Measured by the Nines

see <http://www.nccs.gov/computing-resources/systems-status/> for current ORNL system status

  • Enterprise-class hardware + Stable Linux kernel

= 5+

  • Substandard hardware + Good high availability package = 2-3
  • Today’s supercomputers

= 1-2

  • My desktop

= 1-2 9’s Availability Downtime/Year Examples 1 90.0% 36 days, 12 hours Personal Computers 2 99.0% 87 hours, 36 min Entry Level Business 3 99.9% 8 hours, 45.6 min ISPs, Mainstream Business 4 99.99% 52 min, 33.6 sec Data Centers 5 99.999% 5 min, 15.4 sec Banking, Medical 6 99.9999% 31.5 seconds Military Defense

slide-5
SLIDE 5

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 5/31

Typical Failure Causes in HPC Systems

Overheating (design errors - specification vs. usage) Memory and network errors (soft errors) Hardware failures due to wear/age of:

Hard drives, memory modules, network cards, processors

Software failures due to bugs in:

Operating system, middleware, applications

Different scale requires different solutions:

Compute nodes (up to ~200,000) Front-end, service, and I/O nodes (1 to ~200)

slide-6
SLIDE 6

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 6/31

Single Head/Service Node Problem

Single point of failure Compute nodes sit idle while

head node is down

A = MTTF / (MTTF + MTTR) MTTF depends on head node

hardware/software quality

MTTR depends on the time it

takes to repair/replace node MTTR = 0 A = 1.00 (100%) continuous availability Fail-stop model

slide-7
SLIDE 7

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 7/31

Service-level Availability Taxonomy

  • No redundancy

→ Manual masking

  • Hardware redundancy only

→ Active/cold standby

  • Hardware and software redundancy:

Active/warm standby

→ Replication in intervals, 1+m service nodes

Active/hot standby

→ Replication on change, 1+m service nodes

Asymmetric active/active → High availability clustering,

n+m service nodes

Symmetric active/active

→ State-machine replication, n service nodes

slide-8
SLIDE 8

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 8/31

Symmetric Active/Active Replication

Replication of service capability via multiple active services Replication of state among active services Virtual synchrony (state-machine replication) model

slide-9
SLIDE 9

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 9/31

Comparison of Replication Methods

slide-10
SLIDE 10

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 10/31

External Symmetric Active/Active Replication

Input Replication Virtually Synchronous Processing Output Unification

slide-11
SLIDE 11

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 11/31

Internal Symmetric Active/Active Replication

Input Replication Virtually Synchronous Processing Output Unification

slide-12
SLIDE 12

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 12/31

Symmetric Active/Active PBS Torque

slide-13
SLIDE 13

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 13/31

Symmetric Active/Active PBS Torque

MTTRrecovery = 500 milliseconds MTTRcomponent = 36 hours

slide-14
SLIDE 14

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 14/31

Symmetric Active/Active PBS Torque

MTTRrecovery = 500 milliseconds MTTRcomponent = 36 hours

slide-15
SLIDE 15

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 15/31

Symmetric Active/Active PVFS MDS

slide-16
SLIDE 16

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 16/31

Symmetric Active/Active PVFS MDS

slide-17
SLIDE 17

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 17/31

Symmetric Active/Active PVFS MDS

slide-18
SLIDE 18

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 18/31

Transparent External Symmetric Active/Active Replication for Client/Service Scenarios

slide-19
SLIDE 19

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 19/31

Transparent External Symmetric Active/Active Replication: PBS TORQUE Example

slide-20
SLIDE 20

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 20/31

Transparent Internal Symmetric Active/Active Replication for Client/Service Scenarios

slide-21
SLIDE 21

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 21/31

Transparent Internal Symmetric Active/Active Replication: PVFS MDS Example

slide-22
SLIDE 22

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 22/31

Transparent Symmetric Active/Active Replication for Client/Service Scenarios – High-Level Abstraction

Replicated Service Independent Clients

slide-23
SLIDE 23

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 23/31

Transparent Symmetric Active/Active Replication for Client/Client+Service/Service Scenarios

Replicated Service 1 Independent Clients Replicated Service 2

slide-24
SLIDE 24

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 24/31

Transparent Symmetric Active/Active Replication for Client/2 Services Scenarios

Replicated Service 1 Independent Clients Replicated Service 2

slide-25
SLIDE 25

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 25/31

Transparent Symmetric Active/Active Replication for Service/Service Scenarios

Replicated Service 1 Replicated Service 2

slide-26
SLIDE 26

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 26/31

Example: Transparent Symmetric Active/Active Replication for the Lustre Cluster File System

Replicated Lustre MDS Lustre Clients Replicated Lustre OSS

slide-27
SLIDE 27

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 27/31

Interceptor Communication Overhead

slide-28
SLIDE 28

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 28/31

Interceptor Communication Overhead

slide-29
SLIDE 29

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 29/31

Accomplishments

Examined past and ongoing work in high availability for:

HPC, distributed systems, and IT/telco services

Provided a modern service high availability taxonomy Generalized HPC system architectures Identified specific HPC system availability deficiencies Defined and compared service high availability methods Developed symmetric active/active replication prototypes:

HPC job and resource management service (PBS TORQUE) HPC parallel file system metadata service (PVFS MDS) Transparent replication software framework (prelim. prototype)

slide-30
SLIDE 30

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 30/31

Limitations and Possible Future Work

Development of a production-type symmetric active/active

replication software infrastructure

Development of production-type high availability support

for HPC system services

Extending the replication software framework to support

active/standby and asymmetric active/active

Extending the replication software framework to support

non-IP communication networks

Extending the lessons learned to other service-oriented or

service-dependent architectures

slide-31
SLIDE 31

May 22, 2008 Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations 31/31

Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations

Christian Engelmann1,2, Stephen L. Scott1, Chokchai (Box) Leangsuksun3, Xubin (Ben) He4

1 Oak Ridge National Laboratory, Oak Ridge, USA 2 The University of Reading, Reading, UK 3 Louisiana Tech University, Ruston, USA 4 Tennessee Tech University, Cookeville, USA