vPFS +: Managing I/ O Performance for Diverse HPC Applications - - PowerPoint PPT Presentation

vpfs managing i o performance for diverse hpc applications
SMART_READER_LITE
LIVE PREVIEW

vPFS +: Managing I/ O Performance for Diverse HPC Applications - - PowerPoint PPT Presentation

vPFS +: Managing I/ O Performance for Diverse HPC Applications Ming Zhao, Arizona S tate University Yiqi Xu, VMware ht t p:/ / visa.lab.asu.edu V irtualized I nfrastructures, S ystems, & A pplications Background: HPC I/ O Management


slide-1
SLIDE 1

Virtualized Infrastructures, Systems, & Applications

vPFS +: Managing I/ O Performance for Diverse HPC Applications

Ming Zhao, Arizona S tate University Yiqi Xu, VMware

ht t p:/ / visa.lab.asu.edu

slide-2
SLIDE 2

Background: HPC I/ O Management

 Increasing diverse HPC applications on shared storage

  • Different I/ O rates, sizes, and data/ metadata intensities

 Lack of I/ O QoS

differentiation

  • Parallel file systems treat all I/ Os equally

Compute nodes

APP1 APP2 APPn APPn APPn

S torage nodes

Generic parallel file system I/Os Ming Zhao, AS U 2

Mismatch!

slide-3
SLIDE 3

Background: vPFS

 Proxy-based interposition of application data requests

  • Transparent to applications, support different setups

 Proportional I/ O bandwidth scheduling using S

FQ(D)

  • Work conserving, strong fairness

3

App App App PFS HPC application 1 HPC application n HPC application 2 SFQ(D) Proxy

1.99:1 3.97:1 8.10:1 16:21 32.73:1

20 40 60 80 100 120 140 2:1 4:1 8:1 16:1 32:1 Throughput (MB/s) App1

2.02:1 3.95:1 8.01:1 16.01:1 31.34:1

2:1 4:1 8:1 16:1 32:1 App1 Write vs. Read Write vs. Random R/ W Target Ratio

Limitations?

slide-4
SLIDE 4

 Lack of isolat ion bet ween large and small workloads  S

FQ (D): st art -t ime fair queueing wit h I/ O dept h D

  • S

tart times capture each flow’s service usage

  • Dispatch requests in the increasing order of their start times
  • D captures the available I/ O parallelism
  • Allow up to D of outstanding requests

Limitations

4

f1 fn

I/Os

f2

SFQ(D)

D=10 f1 fn

I/Os

f2

SFQ(D)

D=10 But used 14!

In theory In practice

Interference!

slide-5
SLIDE 5

Limitations

 Lack of Met adat a I/ O scheduling  Many HPC applicat ions are met adat a int ensive

  • Metadata I/ O performance is important

5

slide-6
SLIDE 6

S

  • lution: vPFS

+

 S

FQ(D)+

  • A new scheduler to support diverse I/ O sizes

 Metadata I/ O management

  • An extension to support distributed scheduling of metadata

requests

 PVFS

2-based real prototype

 Comprehensive experimental evaluation

6 Ming Zhao, AS U

slide-7
SLIDE 7

S FQ(D)+: Variable Cost I/ O Depth Allocation

 Allocate the limited I/ O depth D to outstanding

requests based on their sizes

  • Consider D as the number of available I/ O slots
  • Each slot represents the cost of the smallest I/ Os
  • Each outstanding request occupies one or multiple

slots based on its size

  • S

top dispatching when D is used up

 Effectively protect small I/ O workloads

  • Low-rate I/ Os wait less for large outstanding

I/ Os to complete

  • S

mall I/ Os are less affected by large I/ Os after dispatched

7 Ming Zhao, AS U

f1 fn

I/Os

f2

SFQ(D)

D=10 Used 10

slide-8
SLIDE 8

S FQ(D)+: I/ O Backfilling

 Large I/ Os at t he head of queue have t o wait t ill t here

are enough slot s

  • Waste the currently available slots

 Backfill promot es small I/ Os t o ut ilize t he available slot s

  • S

imilar to the backfill of small j obs in batch scheduling

8

f1 f2

SFQ(D)

D=10 Used only 9 Backfill t1 t0 t0 < t1 f1 f2

SFQ(D)

D=10 Used 10 t1 t0 t0 < t1 Wasted 1

slide-9
SLIDE 9

Metadata I/ O S cheduling

 Extends the scheduling to both

data and metadata requests

  • Apply S

FQ(D)+ to schedule metadata I/ Os on each server

  • Treat metadata I/ Os as small I/ Os

 Achieve total-metadata-service fair sharing for

distributed metadata servers

  • Coordinate scheduling across distributed metadata servers
  • Each scheduler adj usts its scheduling of local metadata

requests based on global metadata service distribution

9 Ming Zhao, AS U

slide-10
SLIDE 10

Evaluation

 Testbed

  • vPFS

+ implemented for PVFS 2

  • 8 Clients & 8 S

ervers, 1 gigabit switch

 Workloads

  • IOR: intensive checkpointing I/ Os
  • multi-md-test: intensive metadata I/ Os
  • BTIO: scientific application benchmark
  • WRF: real-world scientific application

10 Ming Zhao, AS U

slide-11
SLIDE 11

BTIO vs. IOR

 BTIO—

Class C (4MB-16MB I/ Os), Class A (320B I/ Os)

 vPFS

+ substantially reduces BTIO slowdown

11 Ming Zhao, AS U

S lows down IOR by 99% S lows down IOR by 56%

slide-12
SLIDE 12

WRF vs. IOR

 WRF—

a large number of small I/ Os and int ensive met adat a request s

 vPFS

+ achieves 80% and 281% bet t er performance for WRF t han Nat ive and vPFS , respect ively

12 Ming Zhao, AS U

slide-13
SLIDE 13

Metadata I/ O S cheduling

 mult i-md-t est —

mkt est dir, creat e, writ e, readdir, read, close, rm, rmt est dir

 vPFS

+ achieves nearly perfect fairness despit e dynamic met adat a demands for t wo met adat a-int ensive apps

13 Ming Zhao, AS U

slide-14
SLIDE 14

Conclusions

 I/ O diversity is becoming a top concern

  • Different types of requests (POS

IX vs. MPI-IO, data vs. metadata)

  • Different I/ O rates and sizes

 vPFS

+ manages I/ O performance for diverse apps

  • S

FQ(D)+ recognizes the variable cost of different I/ Os and takes it under control

  • Distributed metadata scheduling supports metadata-

intensive applications

14 Ming Zhao, AS U

slide-15
SLIDE 15

Future Work

 Implement S

FQ(D)+ directly into data/ metadata servers

  • Proxy-based scheduling may incur extra latency
  • But its impact to throughput is small (< 1%

)

 Evaluate vPFS

+ in larger and more diverse environments

  • Performance isolation is even more important on larger

systems with more diverse workloads

  • Faster storage does not eliminate performance isolation

 the gap between processor and I/ O performance is still

increasing

15 Ming Zhao, AS U

slide-16
SLIDE 16

Acknowledgement

16 Ming Zhao

 National S

cience Foundation

  • CNS
  • 1629888, CNS
  • 1619653, CNS
  • 1562837,

CNS

  • 1629888, CMMI-1610282, IIS
  • 1633381

 VIS

A Lab @ AS U

 Thank you!