 
              Understanding Manycore Scalability of File Systems Changwoo Min , Sanidhya Kashyap, Stefgen Maass Woonhak Kang, and Taesoo Kim
Application must parallelize I/O operations ● Death of single core CPU scaling – CPU clock frequency: 3 ~ 3.8 GHz – # of physical cores: up to 24 (Xeon E7 v4) ● From mechanical HDD to fmash SSD – IOPS of a commodity SSD: 900K – Non-volatile memory (e.g., 3D XPoint): 1,000x ↑ But fjle systems become a scalability bottleneck
Problem: Lack of understanding in internal scalability behavior Exim mail server on RAMDISK Embarrassingly parallel application! btrfs F2FS 14k ext4 XFS 12k messages/sec 10k 1. Saturated 8k 6k 4k 2k 2. Collapsed 3. Never scale 0k 0 10 20 30 40 50 60 70 80 #core ● Intel 80-core machine: 8-socket, 10-core Xeon E7-8870 ● RAM: 512GB, 1TB SSD, 7200 RPM HDD 3
Even in slower storage medium fjle system becomes a bottleneck Exim email server at 80 cores 12k RAMDISK SSD 10k HDD 8k messages/sec 6k 4k 2k 0k btrfs ext4 F2FS XFS 4
Outline ● Background ● FxMark design – A fjle system benchmark suite for manycore scalability ● Analysis of fjve Linux fjle systems ● Pilot solution ● Related work ● Summary 5
Research questions ● What fjle system operations are not scalable? ● Why they are not scalable? ● Is it the problem of implementation or design? 6
Technical challenges ● Applications are usually stuck with a few bottlenecks → cannot see the next level of bottlenecks before resolving them → diffjcult to understand overall scalability behavior ● How to systematically stress fjle systems to understand scalability behavior 7
FxMark : evaluate & analyze manycore scalability of fjle systems FxMark: 3 applications 19 micro-benchmarks File ext4 tmpfs XFS F2FS btrfs J/NJ systems: Memory FS Journaling FS CoW FS Log FS Storage medium: SSD # core: 1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 8
FxMark : evaluate & analyze manycore scalability of fjle systems FxMark: 3 applications 19 micro-benchmarks File ext4 tmpfs XFS F2FS btrfs >4,700 J/NJ systems: Memory FS Journaling FS CoW FS Log FS Storage medium: SSD # core: 1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 9
Microbenchmark: unveil hidden scalability bottlenecks ● Data block read Low Medium High Sharing Level R R R R R R File Block Process Operation R Legend: 10
Stress difgerent components with various sharing levels 11
Evaluation ● Data block read R R Linear scalability 250 Low: 200 Legend 150 M ops/sec File btrfs ext4 100 ext4NJ systems: F2FS tmpfs XFS 50 0 Storage 0 10 20 30 40 50 60 70 80 #core medium: 12
Outline ● Background ● FxMark design ● Analysis of fjve Linux fjle systems – What are scalability bottlenecks? ● Pilot solution ● Related work ● Summary 13
Summary of results: fjle systems are not scalable DRBL DRBM DRBH DWOL DWOM DWAL DWTL 250 250 10 160 2 10 4 9 1.8 9 140 3.5 200 200 8 1.6 8 120 3 7 1.4 7 100 2.5 150 150 6 1.2 6 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 5 80 1 5 2 100 100 4 0.8 4 60 1.5 3 0.6 3 40 1 50 50 2 0.4 2 20 0.5 1 0.2 1 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core DWSL MRPL MRPM MRPH MRDL MRDM MWCL 140 80 9 5 500 8 2.5 4.5 450 8 70 7 120 4 400 2 7 60 6 100 3.5 350 6 50 5 3 300 1.5 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 80 5 40 2.5 250 4 4 60 2 200 1 30 3 3 1.5 150 40 20 2 2 1 100 0.5 20 10 1 1 0.5 50 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core MWRL DRBM:O_DIRECT MWCM MWUL MWUM MWRM DRBL:O_DIRECT 0.5 2.5 0.45 0.5 0.7 0.35 0.5 0.45 0.45 0.45 0.4 0.6 0.3 2 0.4 0.4 0.4 0.35 0.35 0.5 0.25 0.35 0.35 0.3 0.3 0.3 1.5 0.3 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 0.4 0.2 0.25 0.25 0.25 0.25 0.2 0.3 0.15 0.2 0.2 1 0.2 0.15 0.15 0.15 0.15 0.2 0.1 0.1 0.1 0.1 0.1 0.5 0.1 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core Legend DWOL:O_DIRECT DWOM:O_DIRECT Exim RocksDB DBENCH 0.45 0.45 100k 700 18 0.4 0.4 90k 16 600 0.35 0.35 80k 14 btrfs 500 70k 0.3 0.3 12 ext4 messages/sec M ops/sec M ops/sec 60k 0.25 0.25 400 ops/sec 10 ext4NJ GB/sec 50k 0.2 0.2 F2FS 8 300 40k 0.15 0.15 tmpfs 6 30k 200 XFS 0.1 0.1 4 20k 100 0.05 0.05 10k 2 0 0 0k 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core 14 #core #core
Summary of results: fjle systems are not scalable DRBL DRBM DRBH DWOL DWOM DWAL DWTL 250 250 10 160 2 10 4 9 1.8 9 140 3.5 200 200 8 1.6 8 120 3 7 1.4 7 100 2.5 150 150 6 1.2 6 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 5 80 1 5 2 100 100 4 0.8 4 60 1.5 3 0.6 3 40 1 50 50 2 0.4 2 20 0.5 1 0.2 1 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core DWSL MRPL MRPM MRPH MRDL MRDM MWCL 140 80 9 5 500 8 2.5 4.5 450 8 70 7 120 4 400 2 7 60 6 100 3.5 350 6 50 5 3 300 1.5 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 80 5 40 2.5 250 4 4 60 2 200 1 30 3 3 1.5 150 40 20 2 2 1 100 0.5 20 10 1 1 0.5 50 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core MWRL DRBM:O_DIRECT MWCM MWUL MWUM MWRM DRBL:O_DIRECT 0.5 2.5 0.45 0.5 0.7 0.35 0.5 0.45 0.45 0.45 0.4 0.6 0.3 2 0.4 0.4 0.4 0.35 0.35 0.5 0.25 0.35 0.35 0.3 0.3 0.3 1.5 0.3 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 0.4 0.2 0.25 0.25 0.25 0.25 0.2 0.3 0.15 0.2 0.2 1 0.2 0.15 0.15 0.15 0.15 0.2 0.1 0.1 0.1 0.1 0.1 0.5 0.1 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core Legend DWOL:O_DIRECT DWOM:O_DIRECT Exim RocksDB DBENCH 0.45 0.45 100k 700 18 0.4 0.4 90k 16 600 0.35 0.35 80k 14 btrfs 500 70k 0.3 0.3 12 ext4 messages/sec M ops/sec M ops/sec 60k 0.25 0.25 400 ops/sec 10 ext4NJ GB/sec 50k 0.2 0.2 F2FS 8 300 40k 0.15 0.15 tmpfs 6 30k 200 XFS 0.1 0.1 4 20k 100 0.05 0.05 10k 2 0 0 0k 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core 15 #core #core
Recommend
More recommend