Building High-available systems Chander Damodaran - Collabera - - PowerPoint PPT Presentation

▶

Jun 05, 2023 548 likes •669 views

Building High-available systems Chander Damodaran - Collabera Agenda Introduction Key Concepts Approach Availability Index Key HA Design Principles 3 Sample Business Scenarios Root causes IRCTC RPO Mid-size GitHub

SLIDE 1

SLIDE 2

Building High-available systems

Chander Damodaran - Collabera

SLIDE 3

Agenda

Introduction
Key Concepts
Approach
Availability Index
Key HA Design Principles

SLIDE 4

Sample Business Scenarios

Root causes IRCTC RPO Mid-size Company GitHub Single point of failure System used beyond design limits Software error Human error

Wrong design assumptions

SLIDE 5

Key Concepts

Availability: Availability is the measure of how often or how long a service or a

system component is available for use.

Reliability: Reliability is the measure of fault avoidance.
Serviceability: Serviceability is a measurement that expresses how easily a

system is serviced or repaired. Uptime __________________ Uptime + Downtime Availability =

system is serviced or repaired.

Disaster Recovery: Disaster recovery is the ability to continue with services in

the case of major outages, often with reduced capabilities or performance.

SLIDE 6

Approach

List Vulnerabilities Evaluate scenarios, and determine their probability Map scenarios to requirements Design solution Review the solution, and check its behaviour against failure scenarios

VULNERABILITY LIKELIHOOD (1-5) IMPACT (1-5) LEVEL OF CONCERN SOLUTION Failed disk 5 1 5 Implement Mirrored disks Application Crash 5 4 20 Distributed application, failover, clustering

SLIDE 7

Availability Index

Disaster Recovery Replication Failovers Services and Applications Client Management Local Environment Networking Disk and Volume Management A V A I L A B I L I

Reliable Backups Good System Administration Practices INVESTMENT I T Y

*Blueprints for High Availability

SLIDE 8

Components, failures & protection mechanism

Component category Typical failure Fault protection User environment Data deletion or corruption Disaster-recovery processes Administration environment Data deletion or corruption Disaster-recovery processes Application Crashes, data corruption Distributed application, failover, clustering Middleware Crashes, memory leaks Clustering

Middleware Crashes, memory leaks Clustering (Network) infrastructure Connection loss Independent high- availability architecture Operating system Crash, device driver errors Clustering Hardware Device defect Redundant components, hot-spare disks maintenance contracts Physical environment Power outage, fire, floods UPS, backup data center

SLIDE 9

Key High Availability Design Principles

Assume Nothing
Remove Single Points of Failure (SPOFs)
Plan Ahead & Design for Growth
One Problem, One Solution
Choose Mature, Reliable Hardware
Choose Mature Software
Learn from History
Separate Your Environments
Separate Your Environments
Test Everything
Employ Service Level Agreements
Document Everything
Enforce Change Control
Watch Your Speed
Consolidate Your Servers
Enforce Security
Don’t Be Cheap

SLIDE 10

QUESTIONS?

ChanderD@Collabera.com

SLIDE 11