understanding manycore scalability of file systems
play

Understanding Manycore Scalability of File Systems Changwoo Min , - PowerPoint PPT Presentation

Understanding Manycore Scalability of File Systems Changwoo Min , Sanidhya Kashyap, Stefgen Maass Woonhak Kang, and Taesoo Kim Application must parallelize I/O operations Death of single core CPU scaling CPU clock frequency: 3 ~ 3.8 GHz


  1. Understanding Manycore Scalability of File Systems Changwoo Min , Sanidhya Kashyap, Stefgen Maass Woonhak Kang, and Taesoo Kim

  2. Application must parallelize I/O operations ● Death of single core CPU scaling – CPU clock frequency: 3 ~ 3.8 GHz – # of physical cores: up to 24 (Xeon E7 v4) ● From mechanical HDD to fmash SSD – IOPS of a commodity SSD: 900K – Non-volatile memory (e.g., 3D XPoint): 1,000x ↑ But fjle systems become a scalability bottleneck

  3. Problem: Lack of understanding in internal scalability behavior Exim mail server on RAMDISK Embarrassingly parallel application! btrfs F2FS 14k ext4 XFS 12k messages/sec 10k 1. Saturated 8k 6k 4k 2k 2. Collapsed 3. Never scale 0k 0 10 20 30 40 50 60 70 80 #core ● Intel 80-core machine: 8-socket, 10-core Xeon E7-8870 ● RAM: 512GB, 1TB SSD, 7200 RPM HDD 3

  4. Even in slower storage medium fjle system becomes a bottleneck Exim email server at 80 cores 12k RAMDISK SSD 10k HDD 8k messages/sec 6k 4k 2k 0k btrfs ext4 F2FS XFS 4

  5. Outline ● Background ● FxMark design – A fjle system benchmark suite for manycore scalability ● Analysis of fjve Linux fjle systems ● Pilot solution ● Related work ● Summary 5

  6. Research questions ● What fjle system operations are not scalable? ● Why they are not scalable? ● Is it the problem of implementation or design? 6

  7. Technical challenges ● Applications are usually stuck with a few bottlenecks → cannot see the next level of bottlenecks before resolving them → diffjcult to understand overall scalability behavior ● How to systematically stress fjle systems to understand scalability behavior 7

  8. FxMark : evaluate & analyze manycore scalability of fjle systems FxMark: 3 applications 19 micro-benchmarks File ext4 tmpfs XFS F2FS btrfs J/NJ systems: Memory FS Journaling FS CoW FS Log FS Storage medium: SSD # core: 1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 8

  9. FxMark : evaluate & analyze manycore scalability of fjle systems FxMark: 3 applications 19 micro-benchmarks File ext4 tmpfs XFS F2FS btrfs >4,700 J/NJ systems: Memory FS Journaling FS CoW FS Log FS Storage medium: SSD # core: 1, 2, 4, 10, 20, 30, 40, 50, 60, 70, 80 9

  10. Microbenchmark: unveil hidden scalability bottlenecks ● Data block read Low Medium High Sharing Level R R R R R R File Block Process Operation R Legend: 10

  11. Stress difgerent components with various sharing levels 11

  12. Evaluation ● Data block read R R Linear scalability 250 Low: 200 Legend 150 M ops/sec File btrfs ext4 100 ext4NJ systems: F2FS tmpfs XFS 50 0 Storage 0 10 20 30 40 50 60 70 80 #core medium: 12

  13. Outline ● Background ● FxMark design ● Analysis of fjve Linux fjle systems – What are scalability bottlenecks? ● Pilot solution ● Related work ● Summary 13

  14. Summary of results: fjle systems are not scalable DRBL DRBM DRBH DWOL DWOM DWAL DWTL 250 250 10 160 2 10 4 9 1.8 9 140 3.5 200 200 8 1.6 8 120 3 7 1.4 7 100 2.5 150 150 6 1.2 6 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 5 80 1 5 2 100 100 4 0.8 4 60 1.5 3 0.6 3 40 1 50 50 2 0.4 2 20 0.5 1 0.2 1 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core DWSL MRPL MRPM MRPH MRDL MRDM MWCL 140 80 9 5 500 8 2.5 4.5 450 8 70 7 120 4 400 2 7 60 6 100 3.5 350 6 50 5 3 300 1.5 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 80 5 40 2.5 250 4 4 60 2 200 1 30 3 3 1.5 150 40 20 2 2 1 100 0.5 20 10 1 1 0.5 50 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core MWRL DRBM:O_DIRECT MWCM MWUL MWUM MWRM DRBL:O_DIRECT 0.5 2.5 0.45 0.5 0.7 0.35 0.5 0.45 0.45 0.45 0.4 0.6 0.3 2 0.4 0.4 0.4 0.35 0.35 0.5 0.25 0.35 0.35 0.3 0.3 0.3 1.5 0.3 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 0.4 0.2 0.25 0.25 0.25 0.25 0.2 0.3 0.15 0.2 0.2 1 0.2 0.15 0.15 0.15 0.15 0.2 0.1 0.1 0.1 0.1 0.1 0.5 0.1 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core Legend DWOL:O_DIRECT DWOM:O_DIRECT Exim RocksDB DBENCH 0.45 0.45 100k 700 18 0.4 0.4 90k 16 600 0.35 0.35 80k 14 btrfs 500 70k 0.3 0.3 12 ext4 messages/sec M ops/sec M ops/sec 60k 0.25 0.25 400 ops/sec 10 ext4NJ GB/sec 50k 0.2 0.2 F2FS 8 300 40k 0.15 0.15 tmpfs 6 30k 200 XFS 0.1 0.1 4 20k 100 0.05 0.05 10k 2 0 0 0k 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core 14 #core #core

  15. Summary of results: fjle systems are not scalable DRBL DRBM DRBH DWOL DWOM DWAL DWTL 250 250 10 160 2 10 4 9 1.8 9 140 3.5 200 200 8 1.6 8 120 3 7 1.4 7 100 2.5 150 150 6 1.2 6 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 5 80 1 5 2 100 100 4 0.8 4 60 1.5 3 0.6 3 40 1 50 50 2 0.4 2 20 0.5 1 0.2 1 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core DWSL MRPL MRPM MRPH MRDL MRDM MWCL 140 80 9 5 500 8 2.5 4.5 450 8 70 7 120 4 400 2 7 60 6 100 3.5 350 6 50 5 3 300 1.5 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 80 5 40 2.5 250 4 4 60 2 200 1 30 3 3 1.5 150 40 20 2 2 1 100 0.5 20 10 1 1 0.5 50 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core MWRL DRBM:O_DIRECT MWCM MWUL MWUM MWRM DRBL:O_DIRECT 0.5 2.5 0.45 0.5 0.7 0.35 0.5 0.45 0.45 0.45 0.4 0.6 0.3 2 0.4 0.4 0.4 0.35 0.35 0.5 0.25 0.35 0.35 0.3 0.3 0.3 1.5 0.3 M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec M ops/sec 0.4 0.2 0.25 0.25 0.25 0.25 0.2 0.3 0.15 0.2 0.2 1 0.2 0.15 0.15 0.15 0.15 0.2 0.1 0.1 0.1 0.1 0.1 0.5 0.1 0.05 0.05 0.05 0.05 0.05 0 0 0 0 0 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core #core #core #core #core Legend DWOL:O_DIRECT DWOM:O_DIRECT Exim RocksDB DBENCH 0.45 0.45 100k 700 18 0.4 0.4 90k 16 600 0.35 0.35 80k 14 btrfs 500 70k 0.3 0.3 12 ext4 messages/sec M ops/sec M ops/sec 60k 0.25 0.25 400 ops/sec 10 ext4NJ GB/sec 50k 0.2 0.2 F2FS 8 300 40k 0.15 0.15 tmpfs 6 30k 200 XFS 0.1 0.1 4 20k 100 0.05 0.05 10k 2 0 0 0k 0 0 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 #core #core #core 15 #core #core

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend