[PPT] - A Scheduling Framework that Makes any Disk Schedulers PowerPoint Presentation

SLIDE 1

Wayne State University Cluster and Internet Computing Laboratory

A Scheduling Framework that Makes any Disk Schedulers Non-work-conserving solely based on Request Characteristics Yuehai Xu and Song Jiang

Department of Electrical and Computer Engineering Wayne State University

SLIDE 2

2

Disk Performance and Workload Spatial Locality

The disk is cost effective with its ever increasing

capacity and peak throughput.

The performance with non-sequential access is critical

for the disk to be competitive.

– Virtual machine environment – Consolidated storage system

The effective performance depends on exploitation of

spatial locality.

– This locality is usually exploited statically in the request scheduling. – In this work, we exploit it in both space and time dimensions.

SLIDE 3

3

Quantifying Request Service Time

Logical Block Address (LBA)

SLIDE 4

From 1-D Locality to 2-D Locality

4

LBA LBA Time Time Current Time Disk Head T1 = service_time(pending_request) To exploit the locality, usually select minimal T1 among pending requests. T1

SLIDE 5

From 1-D Locality to 2-D Locality

Time Time

5

LBA LBA T1 T3 T2 T1 = service_time(pending_request) T2 = wait_time (future_request) T3 = service_time (future_request)

To exploit 1-D locality, select

min(T1) among pending requests.

To exploit 2-D locality, select

min(T1, T2+T3) among pending and future requests with non-work- conserving scheduling. Disk Head Current Time

SLIDE 6

Challenges of Exploiting 2-D Locality

Time Time

6

LBA LBA T1 T3 T2 T2+T3 < T1

Predicting arrival times and locations of

future requests whose T2+T3 < T1; Determining what request history should be used for the prediction. T1 = service_time(pending_request) T2 = wait_time (future_request) T3 = service_time (future_request) Disk Head Current Time

SLIDE 7

How does anticipatory handle them?

7

LBA LBA Time Time Current Time Disk Head

The anticipatory scheduling (AS) groups requests according to their issuing processes. AS explicitly tracks request arrival times and locations for each process to make a prediction for the next request.

SLIDE 8

Anticipatory’s Limitations

8

LBA LBA Time Time Disk Head

Requests in a local disk region may be issued by different processes. Maintaining/analyzing long history access statistics can be expensive. The process information may be unavailable ! (VM, SAN, NFS, and PVFS etc.)

SLIDE 9

9

Related Approaches

Antfarm infers process information in the virtual machine monitor

by tracking activities of processes in VMs [USENIX ATC’06].

– Applicable only to VM. – Guest OS needs to be open for instrumentation.

Hints, such as accessed files’ directory or owner, are used for

grouping requests in the NFS servers. [Cluster’08].

– Hints may not be always relevant.

The Linux prefetching policy exploits spatial locality by tracking file

access for every processes’ opened file. [Linux Symposium’04]

– File abstraction may not be available to the disk schedulers. – Its efficient tracking and decision making mechanisms can be leveraged.

SLIDE 10

Design Goals of Stream Scheduling

Use only request characteristics, i.e., request arrival

times and locations

– Process information is not required in any way.

Introduce minimal overhead

– Remember minimal history access information – Conduct minimal computation in its locality analysis

Integrate seamlessly with any work-conserving

schedulers

– Designed as a framework to make them non-work-conserving

10

SLIDE 11

11

Design of Stream Scheduling

Group requests into streams so that the intra-stream

locality is stronger than the inter-stream locality.

Track judicious scheduling decisions rather than locality

metrics

– Wait or not wait? (future request vs. pending request) – A stream is a sequence of requests for which judicious decisions are “wait”.

A stream is maintained as Linux prefetching does.

– A stream is built up or torn down depending on next judicious decision.

SLIDE 12

Arrival of a request Completion of a request Time period serving other requests Time period serving this request Link showing relationship between parent request and child request

Time Time LBA LBA

1

Stream Scheduling Illustration

a 1 b 2 b 3 c 3 2 c d 4 4

Req 1 has its child (Req 2). The stream length increases to two.

T2+T3 < T1

SLIDE 13

13

Maintenance of Streams

A stream grows when a completed request sees its child.

– Determining existence of a child is independent of actual scheduling. – A stream is established when its length exceeds a threshold. – An established stream leads to non-work-conserving scheduling.

The scheduler stops serving a stream when

– the stream is broken; or – the time slice allocated to the stream runs out; or – an urgent request appears.

To maintain a stream, only current stream lengths need to

be remembered.

– The cost is trivial !

We have design of stream scheduling for the disk array.

– It is described in the paper.

SLIDE 14

Experiment Settings

Software settings

– Stream Scheduling (SS) is prototyped in Linux kernel 2.6.31.3 using Deadline as its work-conserving component. – The default stream length threshold is 4. – The default stream time slice is 124ms.

Hardware settings

– Intel Core2 Duo with 2GB DRAM memory. – 7200RPM, 500GB Western Digital Caviar Blue SATA II with a 16MB built-in cache.

Adaptation for NCQ

– Disk head position is indicated by the last request sent to the disk.

14

SLIDE 15

Storage without Process Information

par

par-

read

read: four independent processes, each reading a 1GB file using 4KB requests in parallel.

Grep

Grep: two grep instances, each searching in a Linux directory tree.

TPC

TPC-

H

H: three TPC-H instances, each using PostgreSQL as its database server and DBT3 to create its tables.

PostMark

PostMark: four PostMark instances, each creating a data set of 10,000 files.

par-read grep TPC-H PostMark

SLIDE 16

Storage without Process Information

par par-

read

read: four independent processes, each reading a 1GB file using 4KB requests in parallel. Execution Time (s) Pending Time (ms) Service Time (ms) Execution Time (s)

SLIDE 17

Storage with Inadequate Process Information

multi

multi-

threads:

threads: four processes, each forking two threads for reading files with periodic synchronization between them.

mpi

mpi-

io

io-

test

test: : four mpi-io-test program instances running on PVFS2 where files are striped over eight data servers.

ProFTPD

ProFTPD: : a ProFTPD FTP server on each Xen VM supporting four clients to simultaneously download four 300MB files.

TPC

TPC-

H:

H: three TPC-H instances on each Xen VM.

SLIDE 18

18

Conclusions

The stream scheduling framework turns any disk

scheduler into a non-work-conserving one.

– Process information is not required in the scheduling. – Both time and space overheads are low.

The framework can be extended to disk arrays to

recover and exploit the locality weakened by file striping.

Experiments on its Linux prototype show significantly

A Scheduling Framework that Makes any Disk Schedulers Non-work-conserving solely based on Request Characteristics Yuehai Xu and Song Jiang

Department of Electrical and Computer Engineering Wayne State University

Disk Performance and Workload Spatial Locality

capacity and peak throughput.

for the disk to be competitive.

spatial locality.

Quantifying Request Service Time

Logical Block Address (LBA)

From 1-D Locality to 2-D Locality

LBA LBA Time Time Current Time Disk Head T1 = service_time(pending_request) To exploit the locality, usually select minimal T1 among pending requests. T1

From 1-D Locality to 2-D Locality

Time Time

LBA LBA T1 T3 T2 T1 = service_time(pending_request) T2 = wait_time (future_request) T3 = service_time (future_request)

min(T1) among pending requests.

min(T1, T2+T3) among pending and future requests with non-work- conserving scheduling. Disk Head Current Time

Challenges of Exploiting 2-D Locality

Time Time

LBA LBA T1 T3 T2 T2+T3 < T1

Predicting arrival times and locations of

future requests whose T2+T3 < T1; Determining what request history should be used for the prediction. T1 = service_time(pending_request) T2 = wait_time (future_request) T3 = service_time (future_request) Disk Head Current Time

How does anticipatory handle them?

LBA LBA Time Time Current Time Disk Head

The anticipatory scheduling (AS) groups requests according to their issuing processes. AS explicitly tracks request arrival times and locations for each process to make a prediction for the next request.

Anticipatory’s Limitations

LBA LBA Time Time Disk Head

Related Approaches

by tracking activities of processes in VMs [USENIX ATC’06].

grouping requests in the NFS servers. [Cluster’08].

access for every processes’ opened file. [Linux Symposium’04]

Design Goals of Stream Scheduling

times and locations

schedulers

Design of Stream Scheduling

locality is stronger than the inter-stream locality.

metrics

1

Stream Scheduling Illustration

a 1 b 2 b 3 c 3 2 c d 4 4

T2+T3 < T1

Maintenance of Streams

be remembered.

Experiment Settings

Storage without Process Information

Storage without Process Information

Storage with Inadequate Process Information

Conclusions

scheduler into a non-work-conserving one.

– Process information is not required in the scheduling. – Both time and space overheads are low.

recover and exploit the locality weakened by file striping.

improved performance for representative benchmarks.