FastScale : Accelerate RAID Scaling by Mi i Minimizing Data - PowerPoint PPT Presentation

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data Migration i i D t Mi ti Weimin Zheng, Guangyan Zhang gyzh@tsinghua.edu.cn Tsinghua University

Outline • Motivation • Minimizing data migration g g • Optimizing data migration • Evaluation • Evaluation • Conclusions 2 2/16/2011

Why Scale a RAID y • A disk is a simple computer • A RAID vol. can deliver high perf. – Multi disks serve an App concurrently. • applications often require larger capacity and higher performance. – As user data increase and computing As user data increase and computing powers enhance • One solution is to add new disks to a RAID volume – This disk addition is termed “ RAID scaling ”. • To regain a balanced load, some blocks needs to be moved to new disks. • Data migration need to be performed online – To supply non-stop services. 3 2/16/2011

Limitation of Existing Approach g pp • Existing approach to RAID scaling preserves the round- robin order after adding disks. bi d f ddi di k – Pro: the addressing function is simple. – Con: all the data need to be moved Con: all the data need to be moved • Recent work has optimized data migration, among which Recent work has optimized data migration, among which one typical example is SLAS ( ACM TOS 2007 ) : – Uses I/O aggregation and lazy checkpointing to improve the efficiency ffi i – Due to migration of all the data, RAID scaling remains costly Can we reduce the total number of migrated data Can we reduce the total number of migrated data blocks? blocks? 4 2/16/2011

Minimizing Data Migration g g • FastScale moves only data blocks f from old disks to new disks, while ld di k di k hil 1 m not migrating data among old m + 1 ( n ) disks. … … – It is enough for preserving the uniformity of data distribution 0 0 • In this manner, FastScale D 1 D m-1 D m D m+n-1 D 0 minimizes data migration for old disks new disks RAID scaling. RAID scaling. • We design an elastic addressing function, through which – the location of one block can be easily computed th l ti f bl k b il t d – without any lookup operation. 5 2/16/2011

Optimizing Data Migration p g g • FastScale also exploits physical properties to optimize online data migration. li d i i – First, it uses aggregate accesses to improve the efficiency of First it uses aggregate accesses to improve the efficiency of data migration. – Second, it records data migration lazily to minimize the number of metadata updates while ensuring data consistency. 6 2/16/2011

Results • Implemented FastScale and SLAS in DiskSim 4.0 – Compared with SLAS, Round-robin RAID-0 scaling. • Evaluation during RAID scaling: E l ti d i RAID li – reduce redistribution time by up to 86.06% – with smaller maximum response time of user I/Os – with smaller maximum response time of user I/Os • Evaluation after 1 or 2 RAID scaling operations: • Evaluation after 1 or 2 RAID scaling operations: – is almost identical with the round-robin RAID-0. 7 2/16/2011

Coverage of FastScale g • In this paper, we only describe our solution for RAID-0, i.e., striping without parity. – FastScale can also work for RAID-10 and RAID-01. F tS l l k f RAID 10 d RAID 01 – Some large storage systems slice disks into many segments, several segments are organized into a RAID. • Although we do not handle RAID-4 and RAID-5, we g believe that our method provides a good starting point for efficient scaling of RAID-4 and RAID-5 arrays. 8 2/16/2011

Requirements for RAID Scaling q g • Requirement 1 (Uniform data distribution): – If there are B blocks stored on m disks, the expected number of blocks on each disk is approximately B/m so as to maintain an even load. • Requirement 2 (Minimal Data Migration): – During the addition of n disks to a RAID with m disks storing B blocks, the expected number of blocks to be moved is B*n/(m+n). • Requirement 3 (Fast data Addressing): – In a m-disk RAID, the location of a block is computed by an algorithm with low space and time complexity. 10 2/16/2011

Semi-RR: the Most Intuitive Method • semi-RR is based on Round-robin scaling – Only if the resulting disk is one of new disks, it moves a data block. – Otherwise, it does not move a data block. Otherwise, it does not move a data block. • Good news: Semi-RR can reduce data migration Good news: Semi RR can reduce data migration significantly. • Bad news: it does not guarantee uniform distribution of data blocks after multiple scaling operations 11 2/16/2011

FastScale: Min Migr. & Uniform Dist. g • take RAID scaling from 3 disks to 5 as an example. • one RAID scaling process can be divided into two stages logically: – data migration and, d t i ti d – data filling. • all the data blocks within ll th d t bl k ithi a parallelogram will be moved. – 2 data blocks are migrated from each old disk. – while its physical block hil it h i l bl k number is unchanged. • An elastic function to describe the data layout 12 2/16/2011

FastScale: Property Examination p y • Does FastScale satisfies the three requirements? – compared with the round-robin and semi-RR algorithms. • From a 4-disk array, we add one disk repeatedly for 10 times, using the three algorithms respectively. • Each disk has a capacity of 128 GB, and the block size is 64 KB 64 KB. – In other words, each disk holds 2M blocks. 13 2/16/2011

Comparison in Migration Fraction p g • Using the round-robin algorithm, i 1.0 – the migration fraction is 0.8 constantly 100% co s y 00% Ratio Round-Robin 0 6 0.6 Migration R FastScale Semi-RR • using semi-RR and FastScale 0.4 – The migration fractions are 0.2 0 identical. 0.0 – They are significantly smaller 0 1 2 3 4 5 6 7 8 9 10 11 Times of Disk Additions – Restricted by uniformity, they Restricted by uniformity they are also minimal. Compared in migration fraction, Semi-RR and Compared in migration fraction, Semi-RR and FastScale win! FastScale win! FastScale win! FastScale win! 14 2/16/2011

Comp. in Uniformity of Distribution p y • We use the coefficient of variation as a metric to evaluate the uniformity of data distribution across all the disks the uniformity of data distribution across all the disks. – The C.V. expresses the std dev. as a percentage of the average. • For the round robin and • For the round-robin and 14 Round-Robin FastScale algorithms, 12 FastScale ation (%) Semi-RR 10 – C.V. remain 0 percent as the p 8 8 efficient of Varia addition times increases. 6 • the semi-RR algorithm 4 2 – causes excessive oscillation in Coe 0 the C.V. -2 0 1 2 3 4 5 6 7 8 9 10 11 – Maximum is even 13.06%. Times of Disk Additions Compared in uniformity of distribution, Semi-RR Compared in uniformity of distribution, Semi-RR fails and FastScale wins again! f il fails and FastScale wins again! f il d F d F tS tS l l i i i ! i ! 15 2/16/2011

Comparison in Calculation Overhead p • we run different algorithms to calculate the physical addresses for all data blocks on a scaled RAID. addresses for all data blocks on a scaled RAID. – the average addressing time for each block is calculated. – Setup: Intel Dual Core T9400 2.53 GHz, 4 GB Memory, Windows 7 Round-Robin 0.25 • The Round-robin algorithm FastScale Semi-RR 0.20 has the lowest overhead has the lowest overhead, us) ressing Time (u 0.15 – 0.014 μ s or so. 0.10 • FastScale has the largest FastScale has the largest Add overhead. 0.05 – the largest time is 0.24 μ s 0.00 0 1 2 3 4 5 6 7 8 9 10 11 Times of Disk Additions compared to milliseconds of disk I/O time, the compared to milliseconds of disk I/O time, the calculation overhead is negligible. calculation overhead is negligible. l l l ti l ti h h d i d i li ibl li ibl 16 2/16/2011

I/O Aggregation • Aggregate read: – Multiple successive M l i l i blocks on a disk are read via a single I/O. • Aggregate write: – Multiple successive blocks on a disk are blocks on a disk are written via a single I/O. converts small requests into fewer, larger requests. converts small requests into fewer, larger requests. seek cost is mitigated over multiple blocks. seek cost is mitigated over multiple blocks. k k t i t i iti iti t d t d lti l bl lti l bl k k 18 2/16/2011

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data - PowerPoint PPT Presentation

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data Migration i i D t Mi ti Weimin Zheng, Guangyan Zhang gyzh@tsinghua.edu.cn Tsinghua University Outline Motivation Minimizing data migration g g Optimizing data

MD/RAID-456 Write Journal and Cache Shaohua Li & So Song g Liu Software Engineer, Facebook

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor taxonomy

38. RAID Operating System: Three Easy Pieces 1 Youjip Won RAID (Redundant Array of Inexpensive

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

A RAID AT THE HEART OF THE OILIBYA RALLY OF MOROCCO Discover the Cross- Country Raid in the

Welcome to RAID 2009 Saint-Malo France Septembre 23-25 and to Saint-Malo, Brittany RAID

RAID Summer 2016 Cornell University Today Performance and reliability using RAID. 2 Need

Generic RAID Reassembly using Block-Level Entropy Christian Zoubek, Sabine Seufert, Andreas

Software RAID on Linux Software RAID on Linux Presented by: Niladri Saha Niladri Saha Amit

ZFS The Last Word in Filesystem tzute Computer Center, CS, NCTU What is RAID? 2 Computer

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Rough solutions of the Stochastic Birnir The Navier-Stokes Equation Stochastic Closure of

Lattice study of conformality in twelve-flavor QCD Hiroshi Ohki for LatKMI collaboration

Outline Introduction What are the cusp anomaly and generalized scaling function? Superstring in

From Zero to Serverless CodeMash January 11, 2019 Who is Chad Green Data & Solutions

Multiresolution Analysis (MRA) WTBV January 10, 2017 WTBV Multiresolution Analysis (MRA)

Hyperon puzzle and RMF models with scaled hadron masses and coupling constants Evgeni E.

Block I Connections between multivariate and FDA Beatriz Bueno-Larraz Real Eyes Universidad

A new estimator for quantile-oriented sensitivity indices Thomas Browne Supervisors : J-C. Fort

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data - PowerPoint PPT Presentation

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data Migration i i D t Mi ti Weimin Zheng, Guangyan Zhang gyzh@tsinghua.edu.cn Tsinghua University Outline Motivation Minimizing data migration g g Optimizing data

MD/RAID-456 Write Journal and Cache Shaohua Li &amp; So Song g Liu Software Engineer, Facebook

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor taxonomy

38. RAID Operating System: Three Easy Pieces 1 Youjip Won RAID (Redundant Array of Inexpensive

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

A RAID AT THE HEART OF THE OILIBYA RALLY OF MOROCCO Discover the Cross- Country Raid in the

Welcome to RAID 2009 Saint-Malo France Septembre 23-25 and to Saint-Malo, Brittany RAID

RAID Summer 2016 Cornell University Today Performance and reliability using RAID. 2 Need

Generic RAID Reassembly using Block-Level Entropy Christian Zoubek, Sabine Seufert, Andreas

Software RAID on Linux Software RAID on Linux Presented by: Niladri Saha Niladri Saha Amit

ZFS The Last Word in Filesystem tzute Computer Center, CS, NCTU What is RAID? 2 Computer

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Rough solutions of the Stochastic Birnir The Navier-Stokes Equation Stochastic Closure of

Lattice study of conformality in twelve-flavor QCD Hiroshi Ohki for LatKMI collaboration

Outline Introduction What are the cusp anomaly and generalized scaling function? Superstring in

From Zero to Serverless CodeMash January 11, 2019 Who is Chad Green Data &amp; Solutions

Multiresolution Analysis (MRA) WTBV January 10, 2017 WTBV Multiresolution Analysis (MRA)

Hyperon puzzle and RMF models with scaled hadron masses and coupling constants Evgeni E.

Block I Connections between multivariate and FDA Beatriz Bueno-Larraz Real Eyes Universidad

A new estimator for quantile-oriented sensitivity indices Thomas Browne Supervisors : J-C. Fort

MD/RAID-456 Write Journal and Cache Shaohua Li & So Song g Liu Software Engineer, Facebook

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

From Zero to Serverless CodeMash January 11, 2019 Who is Chad Green Data & Solutions