Experiences in Using OS- level Virtualization for Block I/O Dan - - PowerPoint PPT Presentation

experiences in using os level virtualization for block i o
SMART_READER_LITE
LIVE PREVIEW

Experiences in Using OS- level Virtualization for Block I/O Dan - - PowerPoint PPT Presentation

Experiences in Using OS- level Virtualization for Block I/O Dan Huang, University of Central Florida Jun Wang, University of Central Florida Gary Liu, Oak Ridge National Lab Contents Motivation Background for Virtualization Our


slide-1
SLIDE 1

Experiences in Using OS- level Virtualization for Block I/O

Dan Huang, University of Central Florida Jun Wang, University of Central Florida Gary Liu, Oak Ridge National Lab

slide-2
SLIDE 2

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-3
SLIDE 3

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-4
SLIDE 4

University of Central Florida

Motivation

 Nowadays in HPC, job schedulers such as

PBS/TORQUE are used to assign physical nodes, exclusively, to users for running jobs.

 Easy configuration through batch scripts  Low resource utilization  Hard to meet interactive and ad-hoc analytics’ QoS

requirements.

 Multiple jobs access to shared distributed or parallel file

systems to load or save data.

 Interference on PFS  Negative impact on jobs’ QoS

slide-5
SLIDE 5

University of Central Florida

Resource Consolidation in Cloud Computing

 In data centers, cloud computing has been widely

deployed for elastic resource provisioning.

 High isolation with low mutual interference  Cloud computing employs various virtualization

technologies to consolidate physical resources.

 Hypervisor-based virtualization: VMWare, Xen, KVM  OS-level virtualization: Linux container, OpenVZ, Docker

slide-6
SLIDE 6

University of Central Florida

Virtualization in HPC

 HPC uses high-end and dedicated nodes to run scientific

computing jobs.

 Could HPC analysis cluster be virtualized with low

  • verhead?

 What type of virtualization should be adopted?  According to the previous studies[1, 2, 3], the overhead

  • f hypervisor-based virtualization is high.

 Overhead on disk throughput ≈ 36%  Overhead on memory throughput ≈ 53%

[1] Nikolaus Huber, Marcel von Quast, Michael Hauck, and Samuel Kounev. Evaluating and modeling virtualization performance overhead for cloud environments. In CLOSER, pages 563-573, 2011.

[2] Stephen Soltesz, Herbert Potzl, Marc E Fiuczynski, Andy Bavier, and Larry Peterson. Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In ACM SIGOPS Operating Systems Review, volume 41, pages 275-287. ACM, 2007.

[3] Miguel G Xavier, Marcelo Veiga Neves, Fabio D Rossi, Tiago C Ferreto, Timoteo Lange, and Cesar AF De Rose. Performance evaluation of container-based virtualization for high performance computing environments. In Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference

  • n, pages 233-240. IEEE, 2013.
slide-7
SLIDE 7

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-8
SLIDE 8

University of Central Florida

Hypervisor and OS-level Virtualization

 Virtualization technology takes advantage of the trade-off

between isolation and overhead.

 Hypervisor-based virtualization has a hypervisor (or VM

monitor) layer under the guest OS and it introduces high performance overhead and is not acceptable to HPC.

 OS-level virtualization (container based) is a lightweight

layer in Linux kernel.

slide-9
SLIDE 9

University of Central Florida

Hypervisor and OS-level Virtualization (cont.)

slide-10
SLIDE 10

University of Central Florida

The Internal Components of OS- level Virtualization

 OS-level virtualization shares the same operating system

kernel.

 1) Control Groups (CGroups)

 CGroups controls the resource usage per process group.

 2) Linux Namespaces

 Linux Namespace creates a set of isolated namespaces such as

PID and Network Namespaces etc.

slide-11
SLIDE 11

University of Central Florida

Allocating Block I/O via OS-level Virtualization

 There are two methods for allocating block I/O in

CGroups module.

 1) Throttling functionality

 Set an upper limit to a process group’s block I/O

 2) Weight functionality

 Assign shares of block I/O to a group of processes

slide-12
SLIDE 12

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-13
SLIDE 13

University of Central Florida

Create Virtual Node (VNode)

slide-14
SLIDE 14

University of Central Florida

The Gap Between Virtual Node and PFS

Configuration Gap: The shared I/O resources of a PFS is hard to be controlled by current resource allocation mechanisms, since the I/O configurations on users' VNodes can not take effect

  • n a remote PFS.
slide-15
SLIDE 15

University of Central Florida

The Design of I/O Throttling Middleware

slide-16
SLIDE 16

University of Central Florida

The Structure of VNode Sync

VNode Sync: 1) Accept I/O configurations 2) Apply I/O configurations into VNodes 3) Intercept users’ I/O request handlers 4) Insert handlers into corresponding VNodes

slide-17
SLIDE 17

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-18
SLIDE 18

University of Central Florida

Single Node Testbed

The Configuration of Single Node Testbed Make& Model Dell XPS 8700 CPU Intel i7 Processor, 64 bit, 18 MB L2, 2.8 GHz, 4 cores RAM 8×2 GB Internal Hard Disk 1× Western Digital Black SATA 7200rpm 1 TB Local File System EXT3 Operating System CentOS 6 64-bit, kernel 2.6.32 504.8.1.el6

slide-19
SLIDE 19

University of Central Florida

Distributed Testbed

The Configuration of Marmot Cluster Reserve 17 nodes in Marmot Make& Model Dell PowerEdge 1950 CPU 2 Opteron 242, 64 bit, 1 MB L2, 1GHz RAM 8×2.0 GB RDIMM, PC3200, CL3 Internal Hard Disk 1× Western Digital Black SATA 7200rpm 2 TB Network Connection 1 × Gigabit Ethernet Operating System CentOS 6 64-bit, 2.6.32 504.8.1.el6 Switch Make & Model 152 port Extreme Networks BlackDiamond 6808 HDFS 1 head node and 16 storage nodes Lustre 1 head node, 8 storage nodes and 8 client nodes

slide-20
SLIDE 20

University of Central Florida

Read Overhead on Single Node

0.2 0.4 0.6 0.8 1 1 V N _ 1 6 K B 1 V N _ 1 6 M B 2 V N _ 1 6 K B 2 V N _ 1 6 M B 4 V N _ 1 6 K B 4 V N _ 1 6 M B 8 V N _ 1 6 K B 8 V N _ 1 6 M R e a d B a n d w i d t h N

  • r

m a l i z e d t

  • P

h y s i c a l C a s e Numble of VNodes and Object Size

The worst read overhead is less than 10%.

slide-21
SLIDE 21

University of Central Florida

20 40 60 80 100 120 140 P h y _ 1 6 K B P h y _ 1 6 M B 1 _ 1 6 K B 1 _ 1 6 M B 2 _ 1 6 K B 2 _ 1 6 M B 3 _ 1 6 K B 3 _ 1 6 M B 4 _ 1 6 K B 4 _ 1 6 M B R e a d B a n d w i d t h ( M B / s ) Throttle Rate on Bottom VNode (MB/s) and Object Size

Throttling Read on Single Node

The throttle functionality could guarantee the process’s I/O does not exceed the upper limits. But it is largely influenced by other concurrent processes

10MB/s 20MB/s 30MB/s 40MB/s

Read

slide-22
SLIDE 22

University of Central Florida

0.2 0.4 0.6 0.8 1 1 V N _ 1 6 K B 1 V N _ 1 6 M B 2 V N _ 1 6 K B 2 V N _ 1 6 M B 3 V N _ 1 6 K B 3 V N _ 1 6 M B 4 V N _ 1 6 K B 4 V N _ 1 6 M R e a d B a n d w i d t h N

  • r

m a l i z e d t

  • P

h y s i c a l C a s e Numble of VNodes and Object Size

Weight Read on Single Node

The result shows that the overhead of the weight function is less that 8%. The weight module does not suffer from interference and can provide effective isolation.

100% 50% 50% 50% 25% 25% 40% 20% 20% 20%

Read

slide-23
SLIDE 23

University of Central Florida

I/O Throttling on PFS

200 400 600 800 1000 1200 W /O _V N 10 M B /s20 M B /s40 M B /s80 M B /s160 M B /s 20 40 60 80 100 120 140 A g g r e g a t e R e a d B a n d w i d t h ( M B / s ) A g g r e g a f

  • r

L Throttle Rate to DFS Block I/O

HDFS with Data Locality HDFS W/O Data Locality Lustre N-to-N Lustre N-to-1

I/O throttling middleware can effectively control the aggregate bandwidth of PFSs and introduces negligible overhead

slide-24
SLIDE 24

University of Central Florida

I/O Throttling on Real Application

20 40 60 80 100 120 140 160 180 W / O _ D M W / O _ T H T L 5 M B / s 1 M B / s 2 M B / s 4 M B / s 6 M B / s 8 M B / s 1 M B / s F i n i s h T i m e ( m s )

  • f

P a r a V i e w Throttle Rate to Competing Daemons' I/O

Data Load Time of ParaView (ms) Computing Time of Paraview (ms)

The finish time of ParaView is increasing as the I/O throttle rate of background daemons increasing.

slide-25
SLIDE 25

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-26
SLIDE 26

University of Central Florida

Related Work

 OS-level virtualization:

 Authors [1, 2, 3], have evaluated the overhead (CPU, memory

and disk) of OS-level virtualization compared with the traditional hypervisor based virtualization.

 Multilanes [4] builds an isolated I/O stack for eliminating

contentions on shared kernel structures and locks, while applying OS-level virtualization to control the I/O of fast block devices (SSD).

 Resource allocation platform via OS-level virtualization:

 Mesos [5] is a resource allocation platform for multiple users and

multiple computing platforms such as Hadoop and MPI. Mesos takes advantage of OS-level virtualization (LXC) to provide cluster resource sharing (only CPU and memory) in a fine- grained manner.

slide-27
SLIDE 27

University of Central Florida

Contents

 Motivation  Background for Virtualization  Our Solution: I/O Throttling Middleware  Evaluations  Related Work  Conclusion  Acknowledgement

slide-28
SLIDE 28

University of Central Florida

Conclusion

 In this paper, we investigate the overhead and isolation

  • f OS-level virtualization on block I/O control.

 The block I/O control of OS-level virtualization introduces

less than 15% overhead in average.

 The weight functionality introduces at most 8% overhead

and shows good performance isolation.

 The throttle functionality introduces low performance

  • verhead but has limited performance on the isolation.

 The I/O throttling middleware can allocate PFS’s I/O to

multiple users based on their priorities, with negligible

  • verhead.
slide-29
SLIDE 29

University of Central Florida

Acknowledgement

 The experiments of this work are conducted at the

PRObE Marmot cluster.

slide-30
SLIDE 30

University of Central Florida

Reference

[1] Nikolaus Huber, Marcel von Quast, Michael Hauck, and Samuel Kounev. Evaluating and modeling virtualization performance overhead for cloud environments. In CLOSER, pages 563- 573, 2011.

[2] Stephen Soltesz, Herbert Potzl, Marc E Fiuczynski, Andy Bavier, and Larry Peterson. Container-based operating system virtualization: a scalable, high-performance alternative to

  • hypervisors. In ACM SIGOPS Operating Systems Review, volume 41, pages 275-287. ACM,

2007.

[3] Miguel G Xavier, Marcelo Veiga Neves, Fabio D Rossi, Tiago C Ferreto, Timoteo Lange, and Cesar AF De Rose. Performance evaluation of container-based virtualization for high performance computing environments. In Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on, pages 233-240. IEEE, 2013.

[4] Junbin Kang, Benlong Zhang, Tianyu Wo, Chunming Hu, and Jinpeng Huai. Multilanes: providing virtualized storage for os-level virtualization on many cores. In FAST, pages 317-329, 2014.

[5] Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, NSDI'11, pages 22-22, Berkeley, CA, USA, 2011. USENIX Association.

slide-31
SLIDE 31

University of Central Florida