Virtualization-based Bandwidth Management for Parallel Storage - - PowerPoint PPT Presentation

virtualization based bandwidth management for parallel
SMART_READER_LITE
LIVE PREVIEW

Virtualization-based Bandwidth Management for Parallel Storage - - PowerPoint PPT Presentation

Virtualization-based Bandwidth Management for Parallel Storage Systems Yiqi Xu , Lixi Wang, Dulcardo Arteaga, Yonggang Liu, Dr. Ming Zhao Dr. Renato Figueiredo School of Computing and Department of Electrical and Computer Engineering


slide-1
SLIDE 1

Yiqi Xu, Lixi Wang, Dulcardo Arteaga,

  • Dr. Ming Zhao

School of Computing and Information Sciences Florida International University, Miami, FL

Virtualization-based Bandwidth Management for Parallel Storage Systems

Yonggang Liu,

  • Dr. Renato Figueiredo

Department of Electrical and Computer Engineering University of Florida, Gainesville, FL

slide-2
SLIDE 2

Motivation

 The lack of QoS differentiation in HPC storage systems

 Unable to recognize different application I/O workloads  Unable to satisfy users’ different I/O performance needs

2

Compute nodes

APP1 APP2 APPn APPn APPn

Storage nodes

Generic parallel I/Os PDSW 2010

slide-3
SLIDE 3

Motivation

 The need for different I/O QoS from HPC applications

 Diverse I/O demands and performance requirements  Examples:

 WRF: Hundreds of MBs of inputs and outputs  mpiBLAST: GBs of input databases  S3D: GBs of restart files on a regular basis

 This mismatch will become even more serious in

future ultra-scale HPC systems

3 PDSW 2010

slide-4
SLIDE 4

Objective

4 PDSW 2010

 Problem: Lack of per-application I/O bandwidth

allocation

 Static partition of storage nodes is inflexible  Compute nodes based partition is insufficient

 Proposed Solution: Per-application storage resource

allocation

 Parallel file system (PFS) virtualization  Per-application virtual PFSes

slide-5
SLIDE 5

Outline

 Background  Design  Implementation  Evaluation

5 PDSW 2010

slide-6
SLIDE 6

Proxy-based PFS Virtualization (Before)

6 PDSW 2010 Generic parallel I/Os 6 PDSW 2010 App

Compute nodes Data servers

App App

HPC application 1 HPC application 2

 Storage nodes are shared without any isolation  No distinction of I/Os from different applications  Lack of native scheduling support for bandwidth allocation

PFS

slide-7
SLIDE 7

Proxy-based PFS Virtualization (After)

7 PDSW 2010 App

Compute nodes Data servers

App App PFS Virtual PFS1 Virtual PFS2

HPC application 1 HPC application 2

Queue1 Queue2

Proxy

 Indirection of application I/O access  Creation of per-application virtual PFS  Dynamically spawned on the server

slide-8
SLIDE 8

Virtualization Benefits and Costs

 Benefits

 Enable various scheduling algorithms

 SFQ(D) – proportional sharing algorithm  EDF – deadline based scheduling

 Transparent to the existing PFSes

 No change in existing implementation needed  Support different parallel storage systems

 Costs

 Overhead involved in user-level proxy  Extra processing of data and communication

8 PDSW 2010

slide-9
SLIDE 9

Prototype Implementation

 A PVFS 2.8.2 (Parallel Virtual File System) proxy

 Deployed on every data server  Intercepts and forwards PVFS2 messages  Asynchronous I/O for less overhead  Identifies I/Os from different applications  Dynamically configured by a configuration file

 Proxy implements SFQ(D) scheduling

 Supports a generic scheduling interface for other

algorithms

9 PDSW 2010

slide-10
SLIDE 10

PVFS2 Background

10 PDSW 2010

System Interface Job Interface BMI/Flows Interconnection Network (TCP) State Machines Job Interface BMI/Flows/Trove Application MPI-IO or Linux VFS

PVFS2 Server

… …

slide-11
SLIDE 11

PVFS2 Proxy

App

Compute nodes Data servers

App App PFS

HPC application 1 HPC application 3

Proxy

HPC application 2

Virtual PFS1 Virtual PFS2

Data Meta-data

Virtual PFS3

Data

 Non-I/O messages are not scheduled  Extra processing for I/O messages at proxy side

PDSW 2010 11

slide-12
SLIDE 12

SFQ – Start Time Fair Queuing

 SFQ

 Originally designed for network packet scheduling  Work-conserving, proportional sharing-based scheduling  Adapts to variation in server capacity

 SFQ(D)

 Extended from SFQ  Adds depth to allow and control concurrency  Multiplexes storage bandwidth and enhance utilization

12 PDSW 2010

slide-13
SLIDE 13

Evaluation

 A virtual machine based testbed

 A cluster of 8 DELL PowerEdge 2970 servers  Ubuntu 8.04, Kernel 2.6.18.8  Up to 64 PVFS2 clients and 4 PVFS2 servers (version 2.8.2)  2 competing parallel applications

 Benchmark: IOR version 2.10.3 with sequential

writes

 Proxy implements performance monitoring  Experiments:

 Virtualization overhead  Effectiveness of proportional sharing

13 PDSW 2010

slide-14
SLIDE 14

Virtualization Overhead

14 PDSW 2010

 Throughput overhead is small  About 3% Proxy CPU and 1MB RAM usage on each node

Overhead:2.9% Overhead:1.6% Overhead:2.3%

5 10 15 20 25 30 35 16:1 32:2 64:4

Throughput (MB/S)

# Clients : # Servers Virtual Native

slide-15
SLIDE 15

Proportional Sharing in a Symmetric Setup

15 PDSW 2010

App

Compute nodes 4 Data servers

App PFS Virtual PFS1 Virtual PFS2

Queue1 Queue2

Proxy

32 clients 32 clients

slide-16
SLIDE 16

Proportional Sharing with Varying Ratios

16 PDSW 2010

Ratio: 1.90:1 Ratio: 3.32:1 Ratio: 4.33:1 Ratio: 5.84:1

5 10 15 20 25 30

2:1 4:1 8:1 16:1

Throughput (MB/S) Desired Ratio

app1 app2

 Good proportional sharing can be achieved  The actual ratio drops when the desired ratio is high

slide-17
SLIDE 17

Ratio: 1.96:1 Ratio: 3.74:1 Ratio: 6.33:1 Ratio: 10.13:1

5 10 15 20 25 30

2:1 4:1 8:1 16:1

Throughput (MB/S)

Desired Ratio app1 app2

Previous Request Size: 256KB/Server Current Request Size: 64KB/Server

Proportional Sharing with Smaller l/Os

17 PDSW 2010

 Increasing request rate improves the actual ratio  Can be further improved by increasing number of clients

slide-18
SLIDE 18

Proportional Sharing in an Asymmetric Setup

18 PDSW 2010

App

Compute nodes 4 Data servers

App PFS Proxy Virtual PFS1

Virtual PFS2

HPC application 1:

m clients

HPC application 2:

n clients

Queue1 Queue2

slide-19
SLIDE 19

Proportional Sharing in an Asymmetric Setup

19 PDSW 2010

Ratio: 1.01:1 Ratio: 1.03:1 Ratio: 1.01:1 Ratio: 1.23:1

5 10 15 20 25

40:20 48:12 56:7 48:3 Throughput (MB/S)

# of App1 Clients (m) : # of App2 Clients (n) app1 app2

 Almost perfect fairness can be achieved

slide-20
SLIDE 20

Conclusions and Future Work

 Conclusions

 Proxy-based PFS virtualization is feasible with negligible

  • verhead

 Effective proportional bandwidth sharing using SFQ(D)

 Future work

 More scheduling algorithms

 Global proportional sharing  Deadline based scheduling

 Evaluate on applications’ access patterns and I/O

requirements with more diversities

20 PDSW 2010

slide-21
SLIDE 21

References

[1] PVFS2. URL: http://www.pvfs.org/pvfs2/. [2] P. Goyal, H. M. Vin, and H. Cheng, ― Start Time Fair Queuing: A Scheduling Algorithm For Integrated Services Packet Switching Networks, IEEE/ACM Trans. Networking, vol. 5, no. 5, 1997. [3] Yin Wang and Arif Merchant. 2007. Proportional-share scheduling for distributed storage systems. In Proceedings of the 5th USENIX conference

  • n File and Storage Technologies (FAST '07). USENIX Association, Berkeley,

CA, USA, 4-4. [4] W. Jin, J. S. Chase, and J. Kaur, “Interposed Proportional Sharing For A Storage Service Utility”, SIGMETRICS, 2004. [5] IOR HPC Benchmark, http://sourceforge.net/projects/ior-sio/. [6] P. Welsh, P. Bogenschutz, ―Weather Research and Forecast (WRF) Model: Precipitation Prognostics from the WRF Model during Recent Tropical Cyclones, Interdepartmental Hurricane Conference, 2005. [7] A. Darling, L. Carey, and W. Feng, ―The Design, Implementation, and Evaluation of mpiBLAST, ClusterWorld Conf. and Expo, 2003. [8] R. Sankaran, et al., ―Direct Numerical Simulations of Turbulent Lean Premixed Combustion, Journal of Physics Conference Series, 2006.

21 PDSW 2010

slide-22
SLIDE 22

Acknowledgement

 Research team

 VISA lab at FIU

 Yiqi Xu, Dulcardo Clavijo, Lixi Wang, Dr. Ming Zhao

 ACIS lab at UF

 Yonggang Liu, Dr. Renato Figueiredo

 Industry collaborator

 Dr. Seetharami Seelam (IBM T.J. Watson)

 Sponsor: NSF HECURA CCF-0937973/CCF-0938045  More information: http://visa.cis.fiu.edu/hecura

22 PDSW 2010

Thank You!