FastScale : Accelerate RAID Scaling by Mi i Minimizing Data - - PowerPoint PPT Presentation

fastscale accelerate raid scaling by mi i minimizing data
SMART_READER_LITE
LIVE PREVIEW

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data - - PowerPoint PPT Presentation

FastScale : Accelerate RAID Scaling by Mi i Minimizing Data Migration i i D t Mi ti Weimin Zheng, Guangyan Zhang gyzh@tsinghua.edu.cn Tsinghua University Outline Motivation Minimizing data migration g g Optimizing data


slide-1
SLIDE 1

FastScale: Accelerate RAID Scaling by Mi i i i D t Mi ti Minimizing Data Migration

Weimin Zheng, Guangyan Zhang

gyzh@tsinghua.edu.cn Tsinghua University

slide-2
SLIDE 2

Outline

  • Motivation
  • Minimizing data migration

g g

  • Optimizing data migration
  • Evaluation
  • Evaluation
  • Conclusions

2/16/2011 2

slide-3
SLIDE 3

Why Scale a RAID y

  • A disk is a simple computer
  • A RAID vol. can deliver high perf.

– Multi disks serve an App concurrently.

  • applications often require larger

capacity and higher performance.

As user data increase and computing – As user data increase and computing powers enhance

  • One solution is to add new disks to a RAID volume

– This disk addition is termed “RAID scaling”.

  • To regain a balanced load, some blocks needs to be

moved to new disks.

  • Data migration need to be performed online

2/16/2011 3

– To supply non-stop services.

slide-4
SLIDE 4

Limitation of Existing Approach g pp

  • Existing approach to RAID scaling preserves the round-

bi d f ddi di k robin order after adding disks.

– Pro: the addressing function is simple. Con: all the data need to be moved – Con: all the data need to be moved

  • Recent work has optimized data migration, among which

Recent work has optimized data migration, among which

  • ne typical example is SLAS (ACM TOS 2007):

– Uses I/O aggregation and lazy checkpointing to improve the ffi i efficiency – Due to migration of all the data, RAID scaling remains costly

Can we reduce the total number of migrated data blocks? Can we reduce the total number of migrated data blocks?

2/16/2011 4

slide-5
SLIDE 5

Minimizing Data Migration g g

  • FastScale moves only data blocks

f ld di k di k hil from old disks to new disks, while not migrating data among old disks.

) ( 1 n m + m 1

– It is enough for preserving the uniformity of data distribution

… …

  • In this manner, FastScale

minimizes data migration for RAID scaling.

D0 D1 Dm-1 Dm Dm+n-1 new disks

  • ld disks

RAID scaling.

  • We design an elastic addressing function, through which

th l ti f bl k b il t d – the location of one block can be easily computed – without any lookup operation.

2/16/2011 5

slide-6
SLIDE 6

Optimizing Data Migration p g g

  • FastScale also exploits physical properties to optimize

li d i i

  • nline data migration.

First it uses aggregate accesses to improve the efficiency of – First, it uses aggregate accesses to improve the efficiency of data migration. – Second, it records data migration lazily to minimize the number

  • f metadata updates while ensuring data consistency.

2/16/2011 6

slide-7
SLIDE 7

Results

  • Implemented FastScale and SLAS in DiskSim 4.0

– Compared with SLAS, Round-robin RAID-0 scaling.

E l ti d i RAID li

  • Evaluation during RAID scaling:

– reduce redistribution time by up to 86.06% – with smaller maximum response time of user I/Os – with smaller maximum response time of user I/Os

  • Evaluation after 1 or 2 RAID scaling operations:
  • Evaluation after 1 or 2 RAID scaling operations:

– is almost identical with the round-robin RAID-0.

2/16/2011 7

slide-8
SLIDE 8

Coverage of FastScale g

  • In this paper, we only describe our solution for RAID-0,

i.e., striping without parity.

F tS l l k f RAID 10 d RAID 01 – FastScale can also work for RAID-10 and RAID-01. – Some large storage systems slice disks into many segments, several segments are organized into a RAID.

  • Although we do not handle RAID-4 and RAID-5, we

g believe that our method provides a good starting point for efficient scaling of RAID-4 and RAID-5 arrays.

2/16/2011 8

slide-9
SLIDE 9

Outline

  • Motivation
  • Minimizing data migration

g g

  • Optimizing data migration
  • Evaluation
  • Evaluation
  • Conclusions

2/16/2011 9

slide-10
SLIDE 10

Requirements for RAID Scaling q g

  • Requirement 1 (Uniform data distribution):

– If there are B blocks stored on m disks, the expected number of blocks on each disk is approximately B/m so as to maintain an even load.

  • Requirement 2 (Minimal Data Migration):

– During the addition of n disks to a RAID with m disks storing B blocks, the expected number of blocks to be moved is B*n/(m+n).

  • Requirement 3 (Fast data Addressing):

– In a m-disk RAID, the location of a block is computed by an algorithm with low space and time complexity.

2/16/2011 10

slide-11
SLIDE 11

Semi-RR: the Most Intuitive Method

  • semi-RR is based on Round-robin scaling

– Only if the resulting disk is one of new disks, it moves a data block. – Otherwise, it does not move a data block. Otherwise, it does not move a data block.

  • Good news: Semi-RR can reduce data migration

Good news: Semi RR can reduce data migration significantly.

  • Bad news: it does not guarantee uniform distribution of

data blocks after multiple scaling operations

2/16/2011 11

slide-12
SLIDE 12

FastScale: Min Migr. & Uniform Dist. g

  • take RAID scaling from 3 disks to 5 as an example.
  • one RAID scaling process can be divided into two stages

logically:

d t i ti d – data migration and, – data filling.

ll th d t bl k ithi

  • all the data blocks within

a parallelogram will be moved.

– 2 data blocks are migrated from each old disk. hil it h i l bl k – while its physical block number is unchanged.

  • An elastic function to

2/16/2011 12

describe the data layout

slide-13
SLIDE 13

FastScale: Property Examination p y

  • Does FastScale satisfies the three requirements?

– compared with the round-robin and semi-RR algorithms.

  • From a 4-disk array, we add one disk repeatedly for 10

times, using the three algorithms respectively.

  • Each disk has a capacity of 128 GB, and the block size is

64 KB 64 KB.

– In other words, each disk holds 2M blocks.

2/16/2011 13

slide-14
SLIDE 14

Comparison in Migration Fraction p g

  • Using the round-robin

i

0 6 0.8 1.0

Ratio Round-Robin

algorithm,

– the migration fraction is constantly 100%

0.2 0.4 0.6

Migration R FastScale Semi-RR

co s y 00%

  • using semi-RR and FastScale

– The migration fractions are

1 2 3 4 5 6 7 8 9 10 11 0.0

Times of Disk Additions

identical. – They are significantly smaller Restricted by uniformity they – Restricted by uniformity, they are also minimal.

Compared in migration fraction, Semi-RR and FastScale win! Compared in migration fraction, Semi-RR and FastScale win!

2/16/2011 14

FastScale win! FastScale win!

slide-15
SLIDE 15
  • Comp. in Uniformity of Distribution

p y

  • We use the coefficient of variation as a metric to evaluate

the uniformity of data distribution across all the disks

  • For the round robin and

the uniformity of data distribution across all the disks.

– The C.V. expresses the std dev. as a percentage of the average.

  • For the round-robin and

FastScale algorithms,

– C.V. remain 0 percent as the

8 10 12 14

ation (%) Round-Robin FastScale Semi-RR

p addition times increases.

  • the semi-RR algorithm

2 4 6 8

efficient of Varia

– causes excessive oscillation in the C.V. – Maximum is even 13.06%.

1 2 3 4 5 6 7 8 9 10 11

  • 2

Coe Times of Disk Additions

Compared in uniformity of distribution, Semi-RR f il d F tS l i i ! Compared in uniformity of distribution, Semi-RR f il d F tS l i i !

2/16/2011 15

fails and FastScale wins again! fails and FastScale wins again!

slide-16
SLIDE 16

Comparison in Calculation Overhead p

  • we run different algorithms to calculate the physical

addresses for all data blocks on a scaled RAID. addresses for all data blocks on a scaled RAID.

– the average addressing time for each block is calculated. – Setup: Intel Dual Core T9400 2.53 GHz, 4 GB Memory, Windows 7

0.20 0.25

us) Round-Robin FastScale Semi-RR

  • The Round-robin algorithm

has the lowest overhead

0.10 0.15

ressing Time (u

has the lowest overhead,

– 0.014 μs or so.

  • FastScale has the largest

1 2 3 4 5 6 7 8 9 10 11 0.00 0.05

Add

FastScale has the largest

  • verhead.

– the largest time is 0.24 μs

Times of Disk Additions

compared to milliseconds of disk I/O time, the l l ti h d i li ibl compared to milliseconds of disk I/O time, the l l ti h d i li ibl

2/16/2011 16

calculation overhead is negligible. calculation overhead is negligible.

slide-17
SLIDE 17

Outline

  • Motivation
  • Minimizing data migration

g g

  • Optimizing data migration
  • Evaluation
  • Evaluation
  • Conclusions

2/16/2011 17

slide-18
SLIDE 18

I/O Aggregation

  • Aggregate read:

M l i l i – Multiple successive blocks on a disk are read via a single I/O.

  • Aggregate write:

– Multiple successive blocks on a disk are blocks on a disk are written via a single I/O.

converts small requests into fewer, larger requests. k t i iti t d lti l bl k converts small requests into fewer, larger requests. k t i iti t d lti l bl k

2/16/2011 18

seek cost is mitigated over multiple blocks. seek cost is mitigated over multiple blocks.

slide-19
SLIDE 19

Why can Lazy Checkpointing work?

  • Each metadata update causes one long seek :

MetaData is usually stored at the beginning of member disks – MetaData is usually stored at the beginning of member disks

  • after data copying, new replica and original are valid.

– block copying does not overwrite any valid data block copying does not overwrite any valid data

  • when the system fails and reboots, the original replica

will be used.

3 4

mapping metadata

7 8 11

  • As long as data has not

been written since being i d th d t i

D0 D1 D2 24 21 18 15 12 9 6 27 3 30 2 26 23 20 17 14 11 8 29 5 32 1 25 22 19 16 13 10 7 28 4 31

copied, the data remain consistent.

– Only some I/Os are wasted

D2 2 26 23 20 17 14 11 8 29 5 32 D3 D4 11 8 4 7 3

Only some I/Os are wasted

not updating MD immediately does not sacrifice data li bilit Th l th t i it t i t d d t not updating MD immediately does not sacrifice data li bilit Th l th t i it t i t d d t

2/16/2011 19

  • reliability. The only threat is write to migrated data.
  • reliability. The only threat is write to migrated data.
slide-20
SLIDE 20

Lazy Checkpointing

  • data blocks are copied to new locations continuously

while the mapping metadata is not updated onto the disks until

  • In the figure,

– while the mapping metadata is not updated onto the disks until a threat to data consistency appears.

In the figure,

– “C”: migrated and checkpointed “M” i d b – “M”: migrated but not checkpointed; – “U”:not migrated

  • only when a user write request arrives in the area “M”,

data migration is checkpointed lazy checkpointing minimizes the number of t d t it ith t l f d t i t lazy checkpointing minimizes the number of t d t it ith t l f d t i t data migration is checkpointed.

2/16/2011 20

metadata writes without loss of data consistency. metadata writes without loss of data consistency.

slide-21
SLIDE 21

Outline

  • Motivation
  • Minimizing data migration

g g

  • Optimizing data migration
  • Evaluation
  • Evaluation
  • Conclusions

2/16/2011 21

slide-22
SLIDE 22

Evaluation

  • Questions that we want to answer:

– Can FastScale accelerate RAID scaling? – What is the effect on user workloads? How about the performance of a scaled RAID? – How about the performance of a scaled RAID?

W d d il d i l i i h SLAS

  • We used detailed simulations to compare with SLAS

– The simulator is implemented with DiskSim as a worker module

ith se eral disk traces collected in real s stems

  • with several disk traces collected in real systems

– The traces are TPC-C, Financial trace from SPC, Web search engine trace from SPC

2/16/2011 22

slide-23
SLIDE 23

Evaluation

  • The simulator is made up of a workload generator and a

di k disk array.

– workload generator initiates an I/O request at the appropriate time.

  • The disk array consists of

– an array controller and, y , – Storage components.

  • The array controller is logically divided

into:

– an I/O processor and, a data mover – a data mover.

  • The simulator is implemented in SimPy

and DiskSim.

2/16/2011 23

and DiskSim.

slide-24
SLIDE 24

Scaling under the Financial Workload g

  • Under the Fin workload, we conduct a scaling op:

– adding 2 disks to a 4-disk RAID, – each disk has a capacity of 4 GB, ith th 32KB t i it i – with the 32KB stripe unit size

  • The figure plots local max latencies as the time increases

90

SLAS end

50 60 70 80

SLAS end, 6830 s ms)

  • FastScale accelerates RAID

scaling significantly

20 30 40 50

latency (m SLAS FastScale FastScale end, 952 s

– 952s vs 6,830s, 86.06% improved

  • local max latencies are also

smaller

1000 2000 3000 4000 5000 6000 7000 10

timeline (s)

smaller

2/16/2011 24

slide-25
SLIDE 25

Scaling under the TPC-C Workload g

  • Under the TPC-C workload, we redo the scaling:

– adding 2 disks to a 4-disk RAID,

  • The figure plots local max latencies as the time increases

120 140

SLAS end, 6820 s SLAS FastScale

  • Once again, shows the efficiency

in improving redistribution time

60 80 100

latency (ms)

p g

– 964s vs 6,820s, 85.87% improved

  • local max latencies are also

1000 2000 3000 4000 5000 6000 7000 20 40

FastScale end, 964 s

smaller

1000 2000 3000 4000 5000 6000 7000

timeline (s)

Fastscale improves the scaling efficiency of RAID i ifi tl Fastscale improves the scaling efficiency of RAID i ifi tl

2/16/2011 25

significantly. significantly.

slide-26
SLIDE 26

After One Scaling Operation g p

  • We compared the performance of two RAIDs scaled using

F S l d SLAS FastScale and SLAS:

– “4+1”: adding 1 disk to a 4-disk RAID

We replayed the Web workload on two RAIDs

  • We replayed the Web workload on two RAIDs.
  • The figure plots local avg latencies as the time increases

14 10 12 14

ms)

  • the performances of the two

RAIDs are very close.

For the round robin RAID the

4 6 8

average latency (m round-robin FastScale

– For the round-robin RAID, the average latency is 11.36 ms. – For the FastScale RAID, the l t i 11 37

100 200 300 400 500 2

timeline (s)

average latency is 11.37 ms.

2/16/2011 26

slide-27
SLIDE 27

After Two Scaling Operations g p

  • We compared the performance of two RAIDs scaled twice

i F S l d SLAS using FastScale and SLAS:

– “4+1+1”: adding 1 disk to a 4-disk RAID twice

The figure plots local avg latencies as the time increases

  • The figure plots local avg latencies as the time increases

12 14

  • It again reveals the

approximate equality in the

6 8 10

e latency (ms)

approximate equality in the performances.

– For the round-robin RAID, the

2 4

average round-robin FastScale

, average latency is 11.21 ms. – For the FastScale RAID, the average latency is 11.03 ms.

100 200 300 400 500

timeline (s)

average latency is 11.03 ms.

the performance of the FastScale RAID-0 is almost id ti l ith th t f th RR RAID 0 the performance of the FastScale RAID-0 is almost id ti l ith th t f th RR RAID 0

2/16/2011 27

identical with that of the RR RAID-0 identical with that of the RR RAID-0

slide-28
SLIDE 28

Outline

  • Motivation
  • Minimizing data migration

g g

  • Optimizing data migration
  • Evaluation
  • Evaluation
  • Conclusions

2/16/2011 28

slide-29
SLIDE 29

Conclusions

  • FastScale accelerates RAID-0 scaling significantly

– minimizes data migration without loss of the uniformity of data distribution – optimizes data migration with I/O aggregation and lazy

  • ptimizes data migration with I/O aggregation and lazy

checkpointing

  • Compared with a round-robin scaling approach,

F S l FastScale can:

– reduce redistribution time by up to 86.06% with smaller maximum response time of user I/Os – with smaller maximum response time of user I/Os.

  • the performance of the RAID scaled using FastScale is

almost identical with that of the round-robin RAID.

2/16/2011 29

slide-30
SLIDE 30

Thank you! y Questions? Questions?

Guangyan Zhang

http://storage.cs.tsinghua.edu.cn/~zgy

2/16/2011 30

slide-31
SLIDE 31

How is a Block Moved?

  • a parallelogram is divided into three parts:

– a head triangle, unchanged shape – a body parallelogram, t il t i l h d h – a tail triangle, unchanged shape

head head

  • The body parallelogram:

– If m>=n, not a rectangle,

head head body tail body tail m m n

change it into a rectangle – Otherwise, change the t l i t

n head

rectangle into a parallelogram.

head body tail n m body tail n m

2/16/2011 31

(a) m>=n (b) m<n m m

slide-32
SLIDE 32

Comparison in Local avg Latencies p g

  • Under the Fin workload, we conduct a scaling op:

– adding 2 disks to a 4-disk RAID, – each disk has a capacity of 4 GB, – with the 32KB stripe unit size

10

– with the 32KB stripe unit size

  • The figure plots local avg latencies as the time increases
  • local avg latencies are close

– FastScale 8.01 ms, SLAS 7 53

6 7 8 9 10

SLAS end, 6830 s FastScale end, 952 s

– SLAS 7.53 ms

  • shorter data

redistribution time

2 3 4 5 6

latency (ms) SLAS FastScale

redistribution time

1000 2000 3000 4000 5000 6000 7000 1 2

timeline (s)

2/16/2011 32