virtualization based bandwidth management for parallel
play

Virtualization-based Bandwidth Management for Parallel Storage - PowerPoint PPT Presentation

Virtualization-based Bandwidth Management for Parallel Storage Systems Yiqi Xu , Lixi Wang, Dulcardo Arteaga, Yonggang Liu, Dr. Ming Zhao Dr. Renato Figueiredo School of Computing and Department of Electrical and Computer Engineering


  1. Virtualization-based Bandwidth Management for Parallel Storage Systems Yiqi Xu , Lixi Wang, Dulcardo Arteaga, Yonggang Liu, Dr. Ming Zhao Dr. Renato Figueiredo School of Computing and Department of Electrical and Computer Engineering Information Sciences University of Florida, Florida International University, Gainesville, FL Miami, FL

  2. Motivation  The lack of QoS differentiation in HPC storage systems  Unable to recognize different application I/O workloads  Unable to satisfy users’ different I/O performance needs APP1 APPn Compute Generic Storage APP2 parallel I/Os nodes nodes APPn APPn PDSW 2010 2

  3. Motivation  The need for different I/O QoS from HPC applications  Diverse I/O demands and performance requirements  Examples:  WRF: Hundreds of MBs of inputs and outputs  mpiBLAST: GBs of input databases  S3D: GBs of restart files on a regular basis  This mismatch will become even more serious in future ultra-scale HPC systems PDSW 2010 3

  4. Objective  Problem: Lack of per-application I/O bandwidth allocation  Static partition of storage nodes is inflexible  Compute nodes based partition is insufficient  Proposed Solution: Per-application storage resource allocation  Parallel file system (PFS) virtualization  Per-application virtual PFSes PDSW 2010 4

  5. Outline  Background  Design  Implementation  Evaluation PDSW 2010 5

  6. Proxy-based PFS Virtualization (Before) HPC App application 1 App Generic parallel I/Os PFS HPC App application 2 Compute Data nodes servers  Storage nodes are shared without any isolation  No distinction of I/Os from different applications  Lack of native scheduling support for bandwidth allocation PDSW 2010 PDSW 2010 6 6

  7. Proxy-based PFS Virtualization (After) HPC Virtual PFS1 App application 1 Queue 1 App PFS Proxy HPC App Queue 2 Virtual PFS2 application 2 Compute Data nodes servers  Indirection of application I/O access  Creation of per-application virtual PFS  Dynamically spawned on the server PDSW 2010 7

  8. Virtualization Benefits and Costs  Benefits  Enable various scheduling algorithms  SFQ(D) – proportional sharing algorithm  EDF – deadline based scheduling  Transparent to the existing PFSes  No change in existing implementation needed  Support different parallel storage systems  Costs  Overhead involved in user-level proxy  Extra processing of data and communication PDSW 2010 8

  9. Prototype Implementation  A PVFS 2.8.2 (Parallel Virtual File System) proxy  Deployed on every data server  Intercepts and forwards PVFS2 messages  Asynchronous I/O for less overhead  Identifies I/Os from different applications  Dynamically configured by a configuration file  Proxy implements SFQ(D) scheduling  Supports a generic scheduling interface for other algorithms PDSW 2010 9

  10. PVFS2 Background System Interface Application … MPI-IO or Linux VFS Job Interface Interconnection Network (TCP) BMI/Flows State Machines PVFS2 Server … Job Interface BMI/Flows/Trove PDSW 2010 10

  11. PVFS2 Proxy Virtual PFS1 HPC App application 1 Data HPC App application 2 Data Proxy Virtual PFS2 Meta-data PFS HPC App Virtual PFS3 application 3 Compute nodes Data servers  Non-I/O messages are not scheduled  Extra processing for I/O messages at proxy side PDSW 2010 11

  12. SFQ – Start Time Fair Queuing  SFQ  Originally designed for network packet scheduling  Work-conserving, proportional sharing-based scheduling  Adapts to variation in server capacity  SFQ(D)  Extended from SFQ  Adds depth to allow and control concurrency  Multiplexes storage bandwidth and enhance utilization PDSW 2010 12

  13. Evaluation  A virtual machine based testbed  A cluster of 8 DELL PowerEdge 2970 servers  Ubuntu 8.04, Kernel 2.6.18.8  Up to 64 PVFS2 clients and 4 PVFS2 servers (version 2.8.2)  2 competing parallel applications  Benchmark: IOR version 2.10.3 with sequential writes  Proxy implements performance monitoring  Experiments:  Virtualization overhead  Effectiveness of proportional sharing PDSW 2010 13

  14. Virtualization Overhead 35 Virtual Overhead:2.3% Native 30 Throughput (MB/S) 25 20 Overhead:1.6% 15 Overhead:2.9% 10 5 0 16:1 32:2 64:4 # Clients : # Servers  Throughput overhead is small  About 3% Proxy CPU and 1MB RAM usage on each node PDSW 2010 14

  15. Proportional Sharing in a Symmetric Setup Virtual PFS1 App 32 clients Queue 1 PFS Proxy 32 clients App Virtual PFS2 Queue 2 Compute 4 Data nodes servers PDSW 2010 15

  16. Proportional Sharing with Varying Ratios 30 Ratio: 5.84:1 app1 Ratio: 4.33:1 app2 25 Throughput (MB/S) Ratio: 3.32:1 Ratio: 1.90:1 20 15 10 5 0 2:1 4:1 8:1 16:1 Desired Ratio  Good proportional sharing can be achieved  The actual ratio drops when the desired ratio is high PDSW 2010 16

  17. Proportional Sharing with Smaller l/Os 30 Previous Request Size: 256KB/Server app1 Current Request Size: 64KB/Server app2 25 Ratio: 10.13:1 Throughput (MB/S) Ratio: 3.74:1 Ratio: 6.33:1 20 Ratio: 1.96:1 15 10 5 0 2:1 4:1 8:1 16:1 Desired Ratio  Increasing request rate improves the actual ratio  Can be further improved by increasing number of clients PDSW 2010 17

  18. Proportional Sharing in an Asymmetric Setup Virtual PFS1 HPC App application 1: Queue 1 m clients Proxy PFS App HPC Queue 2 Virtual PFS2 application 2: 4 Data servers Compute nodes n clients PDSW 2010 18

  19. Proportional Sharing in an Asymmetric Setup 25 app1 app2 20 Ratio: 1.23:1 Throughput (MB/S) Ratio: 1.01:1 Ratio: 1.03:1 Ratio: 1.01:1 15 10 5 0 40:20 48:12 56:7 48:3 # of App1 Clients (m) : # of App2 Clients (n)  Almost perfect fairness can be achieved PDSW 2010 19

  20. Conclusions and Future Work  Conclusions  Proxy-based PFS virtualization is feasible with negligible overhead  Effective proportional bandwidth sharing using SFQ(D)  Future work  More scheduling algorithms  Global proportional sharing  Deadline based scheduling  Evaluate on applications’ access patterns and I/O requirements with more diversities PDSW 2010 20

  21. References [1] PVFS2. URL: http://www.pvfs.org/pvfs2/. [2] P. Goyal , H. M. Vin, and H. Cheng, ― Start Time Fair Queuing: A Scheduling Algorithm For Integrated Services Packet Switching Networks, IEEE/ACM Trans. Networking, vol. 5, no. 5, 1997. [3] Yin Wang and Arif Merchant. 2007. Proportional-share scheduling for distributed storage systems. In Proceedings of the 5th USENIX conference on File and Storage Technologies (FAST '07). USENIX Association, Berkeley, CA, USA, 4-4. [4] W. Jin, J. S. Chase, and J. Kaur, “Interposed Proportional Sharing For A Storage Service Utility”, SIGMETRICS, 2004. [5] IOR HPC Benchmark, http://sourceforge.net/projects/ior-sio/. [6] P. Welsh, P. Bogenschutz , ―Weather Research and Forecast (WRF) Model: Precipitation Prognostics from the WRF Model during Recent Tropical Cyclones, Interdepartmental Hurricane Conference, 2005. [7] A. Darling, L. Carey, and W. Feng , ―The Design, Implementation, and Evaluation of mpiBLAST, ClusterWorld Conf. and Expo, 2003. [8] R. Sankaran , et al., ―Direct Numerical Simulations of Turbulent Lean Premixed Combustion, Journal of Physics Conference Series, 2006. PDSW 2010 21

  22. Acknowledgement  Research team  VISA lab at FIU  Yiqi Xu, Dulcardo Clavijo, Lixi Wang, Dr. Ming Zhao  ACIS lab at UF  Yonggang Liu, Dr. Renato Figueiredo  Industry collaborator  Dr. Seetharami Seelam (IBM T.J. Watson)  Sponsor: NSF HECURA CCF-0937973/CCF-0938045  More information: http://visa.cis.fiu.edu/hecura Thank You! PDSW 2010 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend