vPFS +: Managing I/ O Performance for Diverse HPC Applications - - PowerPoint PPT Presentation

▶

May 17, 2023 8 likes •170 views

vPFS +: Managing I/ O Performance for Diverse HPC Applications Ming Zhao, Arizona S tate University Yiqi Xu, VMware ht t p:/ / visa.lab.asu.edu V irtualized I nfrastructures, S ystems, & A pplications Background: HPC I/ O Management

SLIDE 1

Virtualized Infrastructures, Systems, & Applications

vPFS +: Managing I/ O Performance for Diverse HPC Applications

Ming Zhao, Arizona S tate University Yiqi Xu, VMware

ht t p:/ / visa.lab.asu.edu

SLIDE 2

Background: HPC I/ O Management

 Increasing diverse HPC applications on shared storage

Different I/ O rates, sizes, and data/ metadata intensities

 Lack of I/ O QoS

differentiation

Parallel file systems treat all I/ Os equally

Compute nodes

APP1 APP2 APPn APPn APPn

S torage nodes

Generic parallel file system I/Os Ming Zhao, AS U 2

Mismatch!

SLIDE 3

Background: vPFS

 Proxy-based interposition of application data requests

Transparent to applications, support different setups

 Proportional I/ O bandwidth scheduling using S

FQ(D)

Work conserving, strong fairness

App App App PFS HPC application 1 HPC application n HPC application 2 SFQ(D) Proxy

1.99:1 3.97:1 8.10:1 16:21 32.73:1

20 40 60 80 100 120 140 2:1 4:1 8:1 16:1 32:1 Throughput (MB/s) App1

2.02:1 3.95:1 8.01:1 16.01:1 31.34:1

2:1 4:1 8:1 16:1 32:1 App1 Write vs. Read Write vs. Random R/ W Target Ratio

Limitations?

SLIDE 4

 Lack of isolat ion bet ween large and small workloads  S

FQ (D): st art -t ime fair queueing wit h I/ O dept h D

tart times capture each flow’s service usage

Dispatch requests in the increasing order of their start times
D captures the available I/ O parallelism
Allow up to D of outstanding requests

Limitations

f1 fn

I/Os

f2

SFQ(D)

D=10 f1 fn

I/Os

f2

SFQ(D)

D=10 But used 14!

In theory In practice

Interference!

SLIDE 5

Limitations

 Lack of Met adat a I/ O scheduling  Many HPC applicat ions are met adat a int ensive

Metadata I/ O performance is important

SLIDE 6

S

lution: vPFS

+

 S

FQ(D)+

A new scheduler to support diverse I/ O sizes

 Metadata I/ O management

An extension to support distributed scheduling of metadata

requests

 PVFS

2-based real prototype

 Comprehensive experimental evaluation

6 Ming Zhao, AS U

SLIDE 7

S FQ(D)+: Variable Cost I/ O Depth Allocation

 Allocate the limited I/ O depth D to outstanding

requests based on their sizes

Consider D as the number of available I/ O slots
Each slot represents the cost of the smallest I/ Os
Each outstanding request occupies one or multiple

slots based on its size

top dispatching when D is used up

 Effectively protect small I/ O workloads

Low-rate I/ Os wait less for large outstanding

I/ Os to complete

mall I/ Os are less affected by large I/ Os after dispatched

7 Ming Zhao, AS U

f1 fn

I/Os

f2

SFQ(D)

D=10 Used 10

SLIDE 8

S FQ(D)+: I/ O Backfilling

 Large I/ Os at t he head of queue have t o wait t ill t here

are enough slot s

Waste the currently available slots

 Backfill promot es small I/ Os t o ut ilize t he available slot s

imilar to the backfill of small j obs in batch scheduling

f1 f2

SFQ(D)

D=10 Used only 9 Backfill t1 t0 t0 < t1 f1 f2

SFQ(D)

D=10 Used 10 t1 t0 t0 < t1 Wasted 1

SLIDE 9

Metadata I/ O S cheduling

 Extends the scheduling to both

data and metadata requests

Apply S

FQ(D)+ to schedule metadata I/ Os on each server

Treat metadata I/ Os as small I/ Os

 Achieve total-metadata-service fair sharing for

distributed metadata servers

Coordinate scheduling across distributed metadata servers
Each scheduler adj usts its scheduling of local metadata

requests based on global metadata service distribution

9 Ming Zhao, AS U

SLIDE 10

Evaluation

 Testbed

vPFS

+ implemented for PVFS 2

8 Clients & 8 S

ervers, 1 gigabit switch

 Workloads

IOR: intensive checkpointing I/ Os
multi-md-test: intensive metadata I/ Os
BTIO: scientific application benchmark
WRF: real-world scientific application

10 Ming Zhao, AS U

SLIDE 11

BTIO vs. IOR

 BTIO—

Class C (4MB-16MB I/ Os), Class A (320B I/ Os)

 vPFS

+ substantially reduces BTIO slowdown

11 Ming Zhao, AS U

S lows down IOR by 99% S lows down IOR by 56%

SLIDE 12

WRF vs. IOR

 WRF—

a large number of small I/ Os and int ensive met adat a request s

 vPFS

+ achieves 80% and 281% bet t er performance for WRF t han Nat ive and vPFS , respect ively

12 Ming Zhao, AS U

SLIDE 13

Metadata I/ O S cheduling

 mult i-md-t est —

mkt est dir, creat e, writ e, readdir, read, close, rm, rmt est dir

 vPFS

+ achieves nearly perfect fairness despit e dynamic met adat a demands for t wo met adat a-int ensive apps

13 Ming Zhao, AS U

SLIDE 14

Conclusions

 I/ O diversity is becoming a top concern

Different types of requests (POS

IX vs. MPI-IO, data vs. metadata)

Different I/ O rates and sizes

 vPFS

+ manages I/ O performance for diverse apps

FQ(D)+ recognizes the variable cost of different I/ Os and takes it under control

Distributed metadata scheduling supports metadata-

intensive applications

14 Ming Zhao, AS U

SLIDE 15

Future Work

 Implement S

FQ(D)+ directly into data/ metadata servers

Proxy-based scheduling may incur extra latency
But its impact to throughput is small (< 1%

)

 Evaluate vPFS

+ in larger and more diverse environments

Performance isolation is even more important on larger

systems with more diverse workloads

Faster storage does not eliminate performance isolation

 the gap between processor and I/ O performance is still

increasing

15 Ming Zhao, AS U