Scalable Object Storage with Resource Reservations and Dynamic Load - PowerPoint PPT Presentation

Scalable Object Storage with Resource Reservations and Dynamic Load Balancing Alex Aizman Nexenta Systems

The Setup • Within Data Center • Scale: 100+ nodes to unlimited • Optimized for latency; no spikes at high utilization – No “fat tails” • Layer 1 of storage stack is object – Storing and transporting immutable crypto-checksummed KVT

More Requirements • Copy-on-write, eventually consistent – Put creates a new version • Multiple replicas – Multiple replicas on the wire? • “Rampant Layering Violation” • No Incast – Mostly known as TCP Incast • No/Minimized Convergence – Multiple link- sharing flows “converge” to fair share • Linearly scalable and load balanced at all times – Uniform distribution != balanced distribution

The Claim Edge-driven resource allocation New Storage Protocol Required

Distributed clusters Unstructured Distributed Clustered Namespace Federated (striped/sharded) • GPFS Location tracking DLM • Lustre C MDS • pNFS (*) • GFS, HDFS Consistent • Ceph • Maglev • Swift Hash P A (Scale-Out + ? Load Balancing)

Minimizing flow latency Deadline-aware Deadline-agnostic Schemes Schemes Flow Switch Scheduling Support Replicast ™ DCTCP D 3 D 2 TCP PDQ DAQ (*) Schemes for Fast Transmission of Flows in Data Center Networks (**) Analysis of DCTCP: Stability, Convergence, and Fairness

Congestion: give control to the target! • Reserved bandwidth 100% utilized - Impact of one connection terminating? - Zero (or minimal) competition between flows • Compare with SJF/EDF/PDQ..

Motivations: Transport L5 over TCP Replicast Performance Throughput + fare-share Completion time General purpose Yes No Multiple replicas on the wire Yes No Mature and stable L4 Yes No (TCP) Incast Yes No Congestion control (L2) + L4 L2 + Replicast Retry L4 Replicast DCB traffic class Depending on the app Yes Motivations: Storage Replicast Built-in deduplication Yes Consistent hashing + Yes Inline load balancing Target Resource reservation Yes (Network, Disk) (Yes, Yes)

Replicast: edge-based load balancer

Tradeoffs – Protocol Variations • There is always a cost and associated tradeoffs • Replicast: all designated targets must share the timeslot • Variations(*): 1) Multicast control plane + unicast delivery 2) Choosy Initiator 3) The Better Protocol - and more (*) https://storagetarget.com

Protocol Simulation • Replicast is designed for 1000s of nodes • SURGE framework @https://github.com/hqr/surge • Each node is a goroutine; fully owns its configured resources • Any-to-any connect via Go channels • Time modeling • Same-size payload chunks indexed by a cryptohash of their content • And consistently hashed to: a) groups (Replicast), b) targets (unicast) • Non-blocking no-drop network core that connects all 10GbE ports • Flow isolation: protected VLAN • Transmission errors are sufficiently rare and therefore not modeled • Reads are modeled but remain out of scope (and space)

The “fair comparison” dilemma • Unicast Consistent Hash, Captive Congestion Point – Consistent hashing for target selection – Unicast UDP for both control and data – Idealized bandwidth reservations: RATE INIT and RATE SET – Immediate start (as opposed to TCP slow start) – 3x lower connection-setup overhead vs. Replicast

put throughput: 90x90, 128K Results 176,000 180,000 160,000 127,900 140,000 108,900 replicast-m 120,000 chunks/s 80,700 100,000 uch-ccp 73,400 58,400 80,000 replicast-h 60,000 40,000 20,000 0 400 1,000

Replicast: reservation conflicts Poisson 𝝁 Chunk Put interarrival time probability 16K 11us 0.09 46.7% 128K 50us 0.02 13% 1MB 500us 0.002 1.39% 16K chunks

Next Steps • Optimizations for small chunks • Optimizations for concurrent gets and puts • Optimal ratios of initiators to targets • Optimal sizing of the load-balancing groups • Load balancing proxies • Kernel bypass (DPDK) • Bit Index Explicit Replication (BIER) – Stateless multi-point replication

Instead of conclusions: Guiding Principles • Location independence: both chunks and MD • No SPOF (no single-MDS, at least on this level) • Inline load balancing | Inline global dedup • Storage-level end-to-end resource reservation • 100% bandwidth utilization – During the reserved timeslot • Single copy on the wire – If possible • Close-to-open, ACID/transactional and other types of consistency – by upper layers • and more

Credits: Caitlin Bestler Thank You

Scalable Object Storage with Resource Reservations and Dynamic Load - PowerPoint PPT Presentation

Scalable Object Storage with Resource Reservations and Dynamic Load Balancing Alex Aizman Nexenta Systems The Setup Within Data Center Scale: 100+ nodes to unlimited Optimized for latency; no spikes at high utilization No fat

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Zarr - scalable storage of tensor Zarr - scalable storage of tensor data for parallel and

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

: Taming the Cloud Object Storage Ali Anwar , Yue Cheng , Aayush Gupta , Ali R. Butt

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

MetaBalancer: An automatic load balancer based on application characteristics Harshitha Menon

Today Load balancing. Balls in Bins. Power of two choices. Cuckoo hashing. n k k n

TomcatCon Tomcat Load-balancing Mark Thomas TM Introduction TM Terminology TM Terminology:

Resource Allocation Introduction Molers law, Sullivans theorem give upper bounds on the

The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September

salbnet: A Self-Adapting Load Balancing Network Jrg Jung University of Potsdam Institute for

Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz s 1 , 2 ,

Content-distribution networks Str trat ategie egies Divide and conquer Partition

Sambuz

Useful Links

Newsletter

Mail Us

Scalable Object Storage with Resource Reservations and Dynamic Load - PowerPoint PPT Presentation

Scalable Object Storage with Resource Reservations and Dynamic Load Balancing Alex Aizman Nexenta Systems The Setup Within Data Center Scale: 100+ nodes to unlimited Optimized for latency; no spikes at high utilization No fat

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Zarr - scalable storage of tensor Zarr - scalable storage of tensor data for parallel and

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

: Taming the Cloud Object Storage Ali Anwar , Yue Cheng , Aayush Gupta , Ali R. Butt

Cloud object storage in Ceph Orit Wasserman owasserm@redhat.com Fosdem 2017 AGENDA What is

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Towards a Unified Object Storage Foundation for Scalable Storage Systems Authors: Cengiz

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

MetaBalancer: An automatic load balancer based on application characteristics Harshitha Menon

Today Load balancing. Balls in Bins. Power of two choices. Cuckoo hashing. n k k n

TomcatCon Tomcat Load-balancing Mark Thomas TM Introduction TM Terminology TM Terminology:

Resource Allocation Introduction Molers law, Sullivans theorem give upper bounds on the

The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September

salbnet: A Self-Adapting Load Balancing Network Jrg Jung University of Potsdam Institute for

Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz s 1 , 2 ,

Content-distribution networks Str trat ategie egies Divide and conquer Partition

Sambuz

Useful Links

Newsletter

Mail Us

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage