Yiqi Xu, Lixi Wang, Dulcardo Arteaga,
- Dr. Ming Zhao
School of Computing and Information Sciences Florida International University, Miami, FL
Virtualization-based Bandwidth Management for Parallel Storage Systems
Yonggang Liu,
- Dr. Renato Figueiredo
Virtualization-based Bandwidth Management for Parallel Storage - - PowerPoint PPT Presentation
Virtualization-based Bandwidth Management for Parallel Storage Systems Yiqi Xu , Lixi Wang, Dulcardo Arteaga, Yonggang Liu, Dr. Ming Zhao Dr. Renato Figueiredo School of Computing and Department of Electrical and Computer Engineering
Unable to recognize different application I/O workloads Unable to satisfy users’ different I/O performance needs
2
Compute nodes
APP1 APP2 APPn APPn APPn
Storage nodes
Generic parallel I/Os PDSW 2010
Diverse I/O demands and performance requirements Examples:
WRF: Hundreds of MBs of inputs and outputs mpiBLAST: GBs of input databases S3D: GBs of restart files on a regular basis
3 PDSW 2010
4 PDSW 2010
Static partition of storage nodes is inflexible Compute nodes based partition is insufficient
Parallel file system (PFS) virtualization Per-application virtual PFSes
5 PDSW 2010
6 PDSW 2010 Generic parallel I/Os 6 PDSW 2010 App
Compute nodes Data servers
App App
HPC application 1 HPC application 2
PFS
7 PDSW 2010 App
Compute nodes Data servers
App App PFS Virtual PFS1 Virtual PFS2
HPC application 1 HPC application 2
Queue1 Queue2
Proxy
Enable various scheduling algorithms
SFQ(D) – proportional sharing algorithm EDF – deadline based scheduling
Transparent to the existing PFSes
No change in existing implementation needed Support different parallel storage systems
Overhead involved in user-level proxy Extra processing of data and communication
8 PDSW 2010
Deployed on every data server Intercepts and forwards PVFS2 messages Asynchronous I/O for less overhead Identifies I/Os from different applications Dynamically configured by a configuration file
Supports a generic scheduling interface for other
9 PDSW 2010
10 PDSW 2010
System Interface Job Interface BMI/Flows Interconnection Network (TCP) State Machines Job Interface BMI/Flows/Trove Application MPI-IO or Linux VFS
App
Compute nodes Data servers
App App PFS
HPC application 1 HPC application 3
Proxy
HPC application 2
Virtual PFS1 Virtual PFS2
Data Meta-data
Virtual PFS3
Data
PDSW 2010 11
Originally designed for network packet scheduling Work-conserving, proportional sharing-based scheduling Adapts to variation in server capacity
Extended from SFQ Adds depth to allow and control concurrency Multiplexes storage bandwidth and enhance utilization
12 PDSW 2010
A cluster of 8 DELL PowerEdge 2970 servers Ubuntu 8.04, Kernel 2.6.18.8 Up to 64 PVFS2 clients and 4 PVFS2 servers (version 2.8.2) 2 competing parallel applications
Virtualization overhead Effectiveness of proportional sharing
13 PDSW 2010
14 PDSW 2010
5 10 15 20 25 30 35 16:1 32:2 64:4
# Clients : # Servers Virtual Native
15 PDSW 2010
App
App PFS Virtual PFS1 Virtual PFS2
Queue1 Queue2
Proxy
16 PDSW 2010
5 10 15 20 25 30
2:1 4:1 8:1 16:1
app1 app2
5 10 15 20 25 30
2:1 4:1 8:1 16:1
Desired Ratio app1 app2
Previous Request Size: 256KB/Server Current Request Size: 64KB/Server
17 PDSW 2010
18 PDSW 2010
App
App PFS Proxy Virtual PFS1
Virtual PFS2
Queue1 Queue2
19 PDSW 2010
5 10 15 20 25
# of App1 Clients (m) : # of App2 Clients (n) app1 app2
Proxy-based PFS virtualization is feasible with negligible
Effective proportional bandwidth sharing using SFQ(D)
More scheduling algorithms
Global proportional sharing Deadline based scheduling
Evaluate on applications’ access patterns and I/O
20 PDSW 2010
21 PDSW 2010
VISA lab at FIU
Yiqi Xu, Dulcardo Clavijo, Lixi Wang, Dr. Ming Zhao
ACIS lab at UF
Yonggang Liu, Dr. Renato Figueiredo
Industry collaborator
Dr. Seetharami Seelam (IBM T.J. Watson)
22 PDSW 2010