WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction - - PowerPoint PPT Presentation

workout i o workload outsourcing for boosting raid
SMART_READER_LITE
LIVE PREVIEW

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction - - PowerPoint PPT Presentation

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance Reconstruction Performance Suzhen Wu 1 , Hong Jiang 2 , Dan Feng 1 , Lei Tian 12 , Bo Mao 1 1 Huazhong University of Science & Technology 2 University of


slide-1
SLIDE 1

WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance Reconstruction Performance

Suzhen Wu1, Hong Jiang2, Dan Feng1, Lei Tian12, Bo Mao1

1Huazhong University of Science & Technology 2University of Nebraska-Lincoln

University of Nebraska Lincoln

slide-2
SLIDE 2

Outline

Background Motivation Motivation WorkOut Performance Evaluations Conclusion

HUST & UNL

2

slide-3
SLIDE 3

RAID Reconstruction

R th d t t t f il d di k Recovers the data content on a failed disk

Two metrics

Reconstruction time User response time User response time

Categories

Off li st ti

Off-line reconstruction On-line reconstruction (commonly deployed)

HUST & UNL

3

slide-4
SLIDE 4

Challenges g

Higher error rates than expected

g p

Complete disk failures [Schroeder07,

Pinheiro07, Jiang08] g ]

Latent sector errors [Bairavasundaram07]

Correlation in drive failures Correlation in drive failures

e.g. after one disk fails, another disk failure

will likely occur soon will likely occur soon.

RAID reconstruction might become the

i l l t common case in large-scale systems.

Increasing number of drives

HUST & UNL

4

slide-5
SLIDE 5

Reconstruction and Its Performance Impact

70 times 3 times

HUST & UNL

5

slide-6
SLIDE 6

I/O Intensity Impact on Reconstruction

21 times ~4 times

Both the reconstruction time and user

response time increase with IOPS.

HUST & UNL

6

p

slide-7
SLIDE 7

Intuitive Idea

Observation

Performing the rebuild IOs and user IOs

simultaneously leads to disk bandwidth y contention and frequent long seeks to and from the multiple separate data areas.

Our intuitive idea Our intuitive idea

To redirect the amount of user IOs that are

issued to the degraded RAID set issued to the degraded RAID set.

But, What to redirect? & Where to redirect to?

HUST & UNL

7

slide-8
SLIDE 8

What To Redirect

Access locality

cc ca ty

Existing studies on workload analysis revealed

that strong spatial and temporal locality exists that strong spatial and temporal locality exists even underneath the storage cache.

Answer to “what to redirect?”

P l d t

Popular read requests All write requests

8

HUST & UNL

slide-9
SLIDE 9

Where To Redirect To

Availability of spare or free space in data

centers

A spare pool including a number of disks

p p g

Free space on other RAID sets

Answer to “Where to redirect to?” Answer to Where to redirect to?

Spare or free space

C i

Comparison

Existing approaches: in the context of a single

RAID set

Our approach: in the context of data centers

HUST & UNL

9

with multiple RAID sets

slide-10
SLIDE 10

Main Idea of WorkOut

Workload Outsourcing (Workout)

W r a ut urc ng (W r ut)

Temporarily redirect all write requests and

popular read requests originally targeted at the popular read requests originally targeted at the degraded RAID set to a surrogate RAID set, to significantly improve on-line reconstruction g y p performance.

Goal Goal

Approaches reconstruction-time performance

  • f the off-line reconstruction without
  • f the off line reconstruction without

affecting user-response-time performance at the same time.

HUST & UNL

10

m m .

slide-11
SLIDE 11

WorkOut Architecture

Administrator Popular Data Identifier Administrator Interface Surrogate Space Manager Identifier Request Redirector Space Manager Reclaimer

Failed Disk Disk Disk Disk Disk Disk Spare Disk

HUST & UNL

11

slide-12
SLIDE 12

Data Structure

D T bl l t bl th t th

D_Table: a log table that manages the

redirected data

D Fl 1 W it d t f th li ti

D_Flag=1: Write data from the user application D_Flag=0: Popular read data from D-RAID to S-RAID

R LRU: n LRU st l list th t id ntifi s th

R_LRU: an LRU-style list that identifies the

most recent reads

HUST & UNL

12

slide-13
SLIDE 13

Algorithm During Reconstruction g g

Workflow Workflow

For each write, it will be redirected to its

previous location or a new location on the previous location or a new location on the surrogate RAID set according to whether it is an overwrite or not an overwrite or not.

For each read, Check the D_Table:

Whether it hits D Table or not? Whether it hits D_Table or not? If a hit, full hit or partial hit? If a miss, whether it hits R_LRU?

HUST & UNL

13

slide-14
SLIDE 14

Algorithm During Reclaim g g

The redirected write data should be The redirected write data should be

reclaimed back to the newly recovered RAID set after the reconstruction process set after the reconstruction process completes. All b h k d i D T bl

All requests must be checked in D_Table:

Each write request is served by the recovered

RAID set and the corresponding log in D_Table should be deleted if it exists.

Read requests can be also handled well, but it is

complicated to explain in a short time. More d l b f d

HUST & UNL

14

details can be found in our paper.

slide-15
SLIDE 15

Design Choices g

Optional De ice p surrogate RAID set Device Overhead Performance Reliability Maintainability A dedicated A dedicated surrogate RAID1 set medium medium high simple A dedicated surrogate RAID5 set high high high simple D5 s t A live surrogate RAID5 t low low medium-high complicated RAID5 set HUST & UNL

15

slide-16
SLIDE 16

Data Consistency

Data Protection

In order to avoid data loss caused by a disk

failure in the surrogate RAID set, all g redirected write data in the surrogate RAID set should be protected by a redundancy scheme, such as RAID1 or RAID5.

“Metadata” Protection

The content of D_Table should be stored in a

NVRAM during the entire period when NVRAM during the entire period when WorkOut is activated, to prevent data loss in the event of a power supply failure

HUST & UNL

16

the event of a power supply failure.

slide-17
SLIDE 17

Performance Evaluation

Prototype implementation

A built-in module in MD Incorporated into PR & PRO

Experimental setup

Intel Xeon 3.0GHz processor, 1GB DDR memory, 15

S t SATA di k (10GB) Li 2 6 11 Seagate SATA disks (10GB), Linux 2.6.11

Methodology

O l l

Open-loop: trace replay

  • Trace: Financial1, Financial2, Websearch2
  • Tool: RAIDmeter
  • Tool: RAIDmeter

Closed-loop: TPC-C-like benchmark

HUST & UNL

17

slide-18
SLIDE 18

Experimental Results

Trace

Reconstruction Time (second)

Off-line PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup Fin1 136.4 1121.75 203.13 5.52 1109.62 188.26 5.89 Fin2 745.19 453.32 1.64 705.79 431.24 1.64 Web 9935.6 7623.22 1.30 9888.27 7851.36 1.26 Trace

Average User Response Time during Reconstruction (millisecond) g p g ( )

Normal Degraded PR WorkOut+PR Speedup PRO WorkOut+PRO Speedup Fin1 7.92 9.52 12.71 4.43 2.87 9.83 4.58 2.15 Fin2 8.13 13.36 25.8 9.69 2.66 22.97 10.19 2.25 Web 18.46 26.95 38.57 28.35 1.36 35.58 29.12 1.22 Degraded RAID set: RAID5, 8 disks, 64KB stripe unit size Surrogate RAID set: RAID5, 4 disks, 64KB stripe unit size Minimum reconstruction bandwidth: 1MB/s

HUST & UNL

18

Minimum reconstruction bandwidth: 1MB/s

slide-19
SLIDE 19

Percentage of Redirected Requests g q

84%

Minimum reconstruction bandwidth of 1MB/s

HUST & UNL

19

slide-20
SLIDE 20

Sensitivity Study (1)

ms)

  • n Time (s)

se Time (m constructio ge Respons Rec Averag

D ff b d d h (a) (b)

Different minimum reconstruction bandwidth:

1MB/s, 10MB/s, 100MB/s

HUST & UNL

20

slide-21
SLIDE 21

Sensitivity Study (2)

800 900 ) 40 45 ms) PR 500 600 700 800

  • n Time (s)

25 30 35 40 nse Time (m PRO WorkOut 200 300 400 500 econstructio PR PRO 10 15 20 25 age Respon 100 Re 5 8 11 PRO WorkOut 5 Avera 5 8 11

D ff b f d k (5 8 11) (a) (b)

Different number of disks (5, 8, 11)

HUST & UNL

21

slide-22
SLIDE 22

Sensitivity Study (3)

40 n Time (s) 25 30 35 40 PR WorkOut

  • nstruction

10 15 20 25 Reco 5 RAID10 RAID6

(a) (b)

Different RAID level: RAID10 (4 disks), RAID6 (8 disks)

HUST & UNL

22

slide-23
SLIDE 23

Different Surrogate Set g

40 45 Dedicated RAID1 30 35 40 Dedicated RAID5 Live RAID5 PR

The same reconstruction time for the

15 20 25

three different surrogate sets

5 10

  • Dedicated RAID1: 2 disks

Fin1 Fin2 Web

  • Dedicated RAID1: 2 disks
  • Dedicated RAID5: 4 disks
  • Live RAID5: 4 disks (Replaying the Fin1 workload on it)

HUST & UNL

23

slide-24
SLIDE 24

TPC-C-like Benchmark

15%

8000 10000 12000 tion Rate 6000 8000 d Transact 2000 4000 Normalized N

(a) Transaction rate (b) Reconstruction time

Minimum reconstruction bandwidth of 1MB/s

HUST & UNL

24

slide-25
SLIDE 25

Extendibility—Re-synchronization y y

(s) ms) ation Time nse Time (m ynchroniza age Respon Re-sy Avera

( ) (b)

Re-synchronization: RAID5, 8 disks, 64KB stripe unit size

(a) (b)

Surrogate RAID set: RAID5, 4 disks, 64KB stripe unit size Minimum Re-synchronization bandwidth: 1MB/s

HUST & UNL

25

slide-26
SLIDE 26

Conclusion

WorkOut outsources a significant amount of

I/O t f th d d d user I/O requests away from the degraded RAID set to a surrogate RAID set, thus i i RAID t ti f improving RAID reconstruction performance;

Insights and guidance for storage system

designers and administrators by exploiting three design options;

WorkOut can improve the performance of

  • ther background support RAID tasks such as

g pp re-synchronization.

HUST & UNL

26

slide-27
SLIDE 27

HUST & UNL

27