PERFORMANCE ANALYSIS OF CONTAINERIZED APPLICATIONS ON LOCAL AND REMOTE STORAGE
1
Qiumin Xu1, Manu Awasthi2, Krishna T. Malladi3, Janki Bhimani4, Jingpei Yang3, Murali Annavaram1
1USC, 2IIT Gandhinagar 3Samsung 4Northeastern
PERFORMANCE ANALYSIS OF CONTAINERIZED APPLICATIONS ON LOCAL AND - - PowerPoint PPT Presentation
PERFORMANCE ANALYSIS OF CONTAINERIZED APPLICATIONS ON LOCAL AND REMOTE STORAGE Qiumin Xu 1 , Manu Awasthi 2 , Krishna T. Malladi 3 , Janki Bhimani 4 , Jingpei Yang 3 , Murali Annavaram 1 1 USC, 2 IIT Gandhinagar 3 Samsung 4 Northeastern 1 Docker
1
1USC, 2IIT Gandhinagar 3Samsung 4Northeastern
2
Ease of deployment, developer friendliness and light virtualization
Google Cloud Platform, Amazon EC2, Microsoft Azure
High Performance SSDs NVMe, NVMe over Fabrics
Best configuration performs similar to raw performance Where do the performance anomalies come from?
Exemplify using Cassandra Best strategy to divide the resources
3
4
figure from https://docs.docker.com
Since 2014 (Intel, Samsung) Enterprise and consumer variants
5X to 10X over SATA SSD[1]
5
[1] Qiumin Xu et al. “Performance analysis of NVMe SSDs and their implication on real world databases.” SYSTOR’15
6
7
Container Read / Write Operations Host Backing File system (EXT4, XFS, etc.) Thin Pool Aufs, Btrfs, Overlayfs Devicemapper
(Loop-lvm, direct-lvm)
Data Volume Base Device NVMe SSDs Sparse Files Storage Driver 1 2 3
2.a 2.b
Storage Options:
1 2 3
Samsung XS1715
Asynchronous IO engine, libaio 32 concurrent jobs and iodepth is 32 Measure steady state performance
8
Experimental Environment
9
1000 1500 2000 2500 3000 SR SW Average BW (MB/s) RAW EXT4 XFS 200 400 600 800 RR RW Average IOPS K RAW EXT4 XFS
10
200 400 600 800
Default dioread_nolock
IOPS of RR K EXT4
Uses allocation groups which can be accessed independently
11
[1] https://www.percona.com/blog/2012/03/15/ext4-vs-xfs-on-ssd/
50 100 150 200 250 1 2 4 8 16 24 28 32 48 64 IOPS of RW # of Jobs RAW EXT4 XFS K
Used by extent look up and write checks Patch available but not for Linux 4.6 [1]
12
Container Read / Write Operations Host Backing File system (EXT4, XFS, etc.) Thin Pool Aufs, Btrfs, Overlayfs Devicemapper
(Loop-lvm, direct-lvm)
Data Volume Base Device NVMe SSDs Sparse Files Storage Driver 1 2 3
2.a 2.b
Storage Options:
1 2 3
13
A fast reliable unification file system
A modern CoW file system which implements many advanced features for fault tolerance, repair and easy administration
Another modern unification file system which has simpler design and potentially faster than Aufs
14
1000 1500 2000 2500 3000 SR SW Average BW (MB/s) Raw Aufs Btrfs Overlay
200 400 600 800 RR RW Average IOPS K Raw Aufs Btrfs Overlay
15
1000 2000 3000 4000 BW (MB/s) of RR Block Size RAW EXT4 Btrfs
Large block size reduces the frequency of reading metadata
16
17
Container Read / Write Operations Host Backing File system (EXT4, XFS, etc.) Thin Pool Aufs, Btrfs, Overlayfs Devicemapper
(Loop-lvm, direct-lvm)
Data Volume Base Device NVMe SSDs Sparse Files Storage Driver 1 2 3
2.a 2.b
Storage Options:
1 2 3
18
19
* figure from https://github.com/libopenstorage/openstorage
20
200 400 600 800 RR RW Average IOPS K RAW Direct-lvm Loop-lvm -v Aufs -v Overlay -v
1000 1500 2000 2500 3000 SR SW Average BW (MB/s) RAW Direct-lvm Loop-lvm -v Aufs -v Overlay -v
21
uses docker volume to store data
[1] Rabl, Tilmann et al. "Solving Big Data Challenges for Enterprise Application Performance Management”, VLDB’13
22
multiple containerized Cassandra Databases Experiment Setup
Workloads
23
10000 20000 30000 40000 50000 1 2 3 4 5 6 7 8 Throughput (ops/sec) # of Cassandra Containers C1 C2 C3 C4 C5 C6 C7 C8 Cgroups
24
10000 20000 30000 40000 50000 1 2 3 4 5 6 7 8 9 Throughput (ops/sec) # of Cassandra Containers
CPU MEM CPU+MEM BW All Uncontrolled
Assign 6 CPU cores for each container, leave other resource uncontrolled
25
Experiment Setup
YCSB Clients Application Server NVMf Target Storage Server 10Gbe 40Gbe Cassandra + Docker
26
27
2 4 6 8 1 2 3 4 5 6 7 8 Relative Latency # of Cassandra Instances DAS_A NVMf_A DAS_D NVMf_D
28
29
Overlay FS + XFS + Data Volume
Control only the CPU resources
Throughput: within 6% - 12% vs. DAS