CS CS 754 754 Adv Advanced ed D Distribut uted S ed System - PowerPoint PPT Presentation

CS CS 754 754 Adv Advanced ed D Distribut uted S ed System ems Introduction to Data Centers

Data Center Overview Why DC? Economy of scale (amortize capital and maintenance cost). Machines->Racks->Cluster

Design Metrics • Performance (request per second) • Cost (capital and operation) (request per dollar) • Power (request per Watt)

DC Node Design Option 1: SMP: Symmetric Multi-processor Shared memory multiprocessor: set of CPU each with its own cache, sharing the main memory over a single bus. + High performance per node - Expensive

DC Node Design Option 2: Commodity Nodes Using off the shelf components. + Equal performance to SMP at scale + Lower cost - Fails more

SMP vs Commodity Execution time = CPU time + communication_time Assume local access takes 100ns, and remote access takes 100 μs . Communication time = #_operations * [100 ns* 1/# nodes + 100 μs * (1 − 1/# nodes)] Remote access Local access

DC Node Design Option 3: Wimpy nodes Using low-end CPUs (e.g., ARM processors) + Lower cost + Lower energy Disadvantage: Hard to use efficiently

DC Node Design Wimpy design disadvantages • Amdahl’s law bounds : Task execution is T = (1-p)T + pT (p ratio of code that can run in parallel , 0 ≤ p ≤ 1) After parallelization on s cores: T’ = (1-p)T + (p/s)T Speed-up = T/T’ = 1/((1-p) + p/s) If s  inf speed-up = 1/(1-p) • Higher number of threads --> higher serialization/communication cost • Harder to program --> higher software cost • Higher networking cost • Lower utilization For I/O intensive workloads (e.g., for Google workloads) using commodity machines is a better choice.

Storage Design Design paradigms: • NAS: network attached storage, dedicated storage appliance • Distributed storage: aggregate storage space from nodes in cluster. Design dimensions: • Reliability: replication or erasure coding (RS coding) • Reduce cost by using cheap disks: they fail more but we will replicate anyway • Consistency: varies depending on application

Storage Design Option 1: Network attached storage (NAS) A dedicated storage appliance. + Simpler deployment + Control and management (QoS) + Lower network overhead (appliance replication)

Storage Design Option 2: Distributed storage: aggregate storage space from nodes in cluster. Reduce cost by using cheap disks: they fail more but we will replicate anyway. + Lower cost + Higher availability + Higher performance + Higher Data locality - Higher network overhead - Lower component reliability

Storage Design NAS Distributed Storage (GFS) Simpler deployment +Lower cost Control and management (QoS) +Higher availability Lower network overhead (appliance +Higher performance replication) + data locality (at different levels and technology) - Higher write network overhead

Network Design Challenge: build high speed, scalable network at lower cost Optimizations tricks: - Reduce core bandwidth: 5:1 ratio is common - Multiple networks (SAN, supercomputer example)

DC Design Implications Software using DC needs to be aware of the storage hierarchy Jeff Dean

Example Data location Latency Throughput RAM 100ns 20GBps Hard Disk 10ms 80MBps 70 µ s Network- Rack 128 MBps (1Gbps) 500 µ s Network – DC 25 MBps (subscription ratio of 5:1) RAM Disk Rack RAM Rack Disk DC RAM DC Disk Latency BW

Example Jeff Dean

DC Design Implications • Software using DC needs to be aware of the network and storage hierarchy • Software fault tolerance is necessary Programing framework to hide complexity • Technology changes: - Much more memory - New disks: Shingled, Kinetic, PCIeNV - SSD , NVM - SDN networks - Programmable NIC and switches - Faster network

Large Scale Services Two categories: - Online. e.g., ecommerce, instant messaging • Low latency • Highly-available • Mostly read operations - Offline. Batch processing. E.g., data processing • Compute and I/O intensive • Throughput centric

Load Manager • DNS-based - May take hours to adapt - Not available to small clusters • Appliance or switch (L4) • Smart client (L7) Load balancing techniques: • Round robin • Least number of connections • Response time • Source IP hash • SDN based • Chained failover

High Availability Metric (uptime): percent of time the system is available to answer client requests. ---|Fail |---Recover------|-------------available------------------|Fail|---Recover--|------------ Uptime: (MTBF – MTTR)/MTBF MTBF: Mean time between failures MTTR: Mean time to repair

High Availability Uptime: (MTBF – MTTR)/MTBF Brewer recommendation: Do your best effort to reduce MTBF but focus on reducing MTTR. Why? • MTBF need weeks of testing. • MTTR is easier to improve. Easier to debug and measure. Problem with uptime: Not all second as equal (idle vs peak time)

High Availability Yield = queries completed / queries offered Harvest = data available/complete data DQ principle: Data per query (D) x query per second (Q) --> constant The underlying limitation is data movement (seeks, I/O BW, ..etc) Good for: • Comparing system • Decide on upgrades • Measuring failure effect

Graceful Degradation Degradation of service under overload. (instead of complete system failure) Overload will happen: single event burst, peak-to-average ratio is 6:1, failures. Techniques: • Limit D (partial results) and maintain Q • Limit Q (by admission control) and maintain D • QoS, cost-based • Priorities • Reduce data quality (freshness)

Evolution Perfect software is hard, costly, takes a long time. Aim for: software that handles failures well (high MTBF, low MTTR, no cascading failures) Other bugs are less critical: memory leaks, slow …etc (try throwing more hardware at it) Reasoning : upgrades are controlled failures. Do it off-peak. Strategies (all have the same DQ loss over time): • Fast reboot of all cluster nodes. Easier ( jump between versions), risky (could be buggy), downtime • Rolling upgrade: 5% at a time. More complex (two versions will run at the same time), slow • Big Flip: jump from one version to the other half-a-cluster at a time. Rolling upgrade is the most popular.

Replication vs. Partitioning Replication  higher harvest Partitioning  higher yield E.g., Two node cluster, one node fails: Replication: 100% harvest, 50% yield (but replication need more DQ for write) Partitioning: 50% harvest, 100% yield Same DQ value (lower by 50%) As capacity is not an issue (capacity is cheap), use replication: Better harvest, effects yield only under heavy load, easier to manage, scales, easier disaster recovery.

CS CS 754 754 Adv Advanced ed D Distribut uted S ed System - PowerPoint PPT Presentation

CS CS 754 754 Adv Advanced ed D Distribut uted S ed System ems Introduction to Data Centers Data Center Overview Why DC? Economy of scale (amortize capital and maintenance cost). Machines->Racks->Cluster Design Metrics

SFIO progress on Swiss-Tx SCS meeting on Frangipani: a scalable distrib- uted file system to

Over ervie iew w of SP and P and the e Adv dvis isor or Vie iew Adv dvis isin

FashionTV Channel Distribution 1/13/2015 1 Fashion shionTV TV TV TV Distribut ribution ion

J. PROSPERO E. DE VERA III, DPA Commissioner, Commission on Higher Education DISTRIBUT RIBUTION

The The Coalitio Coalition to to Tr Transform Adv Advanced nced Car Care (C (C TA TAC) A

CS 754 Advanced Distributed Systems Overview Intro Samer AlKiswany PhD, UBC, 2013

Modern Sa Safety Systems and and Adv Advanced Flui Fluid Power Solu Solutions Chris Brogli

S9164 S9164 Adv Advanced nced We Weather In Inform rmatio ion Re Recall wi with th DGX DGX

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

9004177 Stainless Structurals Works, LLC 575 Conroe Park West Drive Conroe, Texas 77303 Product

GORDIAN Placement Perform GORDIAN placement Uniform area and net weight, area balance

INTRODUCTION SCHEDULE TRANSPORTATION SYSTEM ASSESSMENT VESSEL CAPACITY STUDY CONCEPT DESIGN

Fourth Quarter and Full Year 2014 Results Presentation to the Media February 12, 2015

Running Debian on Inexpensive Network Attached Storage Devices Martin Michlmayr tbm@cyrius.com

IS ARCHITECTURE OVERVIEW 2110684 Information System Architecture Natawut Nupairoj, Ph.D. Course

Distributed Storage Networks and Computer Forensics 1. Organization & Overview Christian

Outside of a Computer ICS3U: Introduction to Computer Science Most computer components are housed

CS CS 754 754 Adv Advanced ed D Distribut uted S ed System - PowerPoint PPT Presentation

CS CS 754 754 Adv Advanced ed D Distribut uted S ed System ems Introduction to Data Centers Data Center Overview Why DC? Economy of scale (amortize capital and maintenance cost). Machines->Racks->Cluster Design Metrics

SFIO progress on Swiss-Tx SCS meeting on Frangipani: a scalable distrib- uted file system to

Over ervie iew w of SP and P and the e Adv dvis isor or Vie iew Adv dvis isin

FashionTV Channel Distribution 1/13/2015 1 Fashion shionTV TV TV TV Distribut ribution ion

J. PROSPERO E. DE VERA III, DPA Commissioner, Commission on Higher Education DISTRIBUT RIBUTION

The The Coalitio Coalition to to Tr Transform Adv Advanced nced Car Care (C (C TA TAC) A

CS 754 Advanced Distributed Systems Overview Intro Samer AlKiswany PhD, UBC, 2013

Modern Sa Safety Systems and and Adv Advanced Flui Fluid Power Solu Solutions Chris Brogli

S9164 S9164 Adv Advanced nced We Weather In Inform rmatio ion Re Recall wi with th DGX DGX

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

Adv Advanced anced Worksho shop p on n Ea Earthquake Fa Fault Mechanics: The Theory, ,

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

9004177 Stainless Structurals Works, LLC 575 Conroe Park West Drive Conroe, Texas 77303 Product

GORDIAN Placement Perform GORDIAN placement Uniform area and net weight, area balance

INTRODUCTION SCHEDULE TRANSPORTATION SYSTEM ASSESSMENT VESSEL CAPACITY STUDY CONCEPT DESIGN

Fourth Quarter and Full Year 2014 Results Presentation to the Media February 12, 2015

Running Debian on Inexpensive Network Attached Storage Devices Martin Michlmayr tbm@cyrius.com

IS ARCHITECTURE OVERVIEW 2110684 Information System Architecture Natawut Nupairoj, Ph.D. Course

Distributed Storage Networks and Computer Forensics 1. Organization &amp; Overview Christian

Outside of a Computer ICS3U: Introduction to Computer Science Most computer components are housed

Distributed Storage Networks and Computer Forensics 1. Organization & Overview Christian