enlightening the i o path
play

Enlightening the I/O Path: A Holistic Approach for Application - PowerPoint PPT Presentation

Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University Data-Intensive Applications Relational Document Key-value Column Search 2 Data-Intensive Applications


  1. Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University

  2. Data-Intensive Applications Relational Document Key-value Column Search 2

  3. Data-Intensive Applications • Common structure Client Application * Example: MongoDB Request Response Application - Client (foreground) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Eviction worker Operating System - … Storage Device 3

  4. Data-Intensive Applications • Common structure Client Application Background tasks are problematic * Example: MongoDB Request Response Application for application performance - Server (client) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Evict worker Operating System - … Storage Device 4

  5. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 5

  6. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 30 seconds latency at 99.99 th percentile CFQ Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Regular Elapsed time (sec) checkpoint task 6

  7. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB I/O priority does not help CFQ CFQ-IDLE Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 7

  8. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB State-of-the-art schedulers do CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO Operation throughput not help much 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 8

  9. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 9

  10. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 10

  11. Multiple Independent Layers • Independent I/O processing read() write() Application Buffer Cache Caching Layer Abstraction File System Layer FG BG FG reorder Block Layer BG BG FG BG Storage Device 11

  12. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 12

  13. I/O Priority Inversion • Task dependency Application Caching Layer Locks File System Layer Condition variables Block Layer Storage Device 13

  14. I/O Priority Inversion • I/O dependency Application Caching Layer File System Layer Outstanding I/Os Block Layer Storage Device 14

  15. Our Approach • Request-centric I/O prioritization (RCP) • Critical I/O: I/O in the critical path of request handling • Policy: holistically prioritizes critical I/Os along the I/O path CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Operation throughput 100 ms latency at 40000 99.99 th percentile 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 15

  16. Challenges • How to accurately identify I/O criticality • How to effectively enforce I/O criticality 16

  17. Critical I/O Detection • Enlightenment API • Interface for tagging foreground tasks • I/O priority inheritance • Handling task dependency • Handling I/O dependency 17

  18. I/O Priority Inheritance • Handling task dependency • Locks unlock inherit submit lock FG BG FG BG I/O FG BG complete • Condition variables wake inherit register submit FG wait CV CV BG BG I/O FG BG CV complete 18

  19. I/O Priority Inheritance • Handling I/O dependency I/O Non-critical I/O tracking PER-DEV I/O ROOT NCIO NCIO Q admission stage NCIO NCIO Descriptor Location Resolver Sector # Sched queueing stage Block Layer delete on completion I/O 19

  20. Handling Transitive Dependency • Possible states of dependent task inherit wait inherit wait inherit wait FG BG BG FG BG I/O FG BG Blocked Blocked Blocked at on task on I/O admission stage 20

  21. Handling Transitive Dependency • Recording blocking status inherit inherit inherit reprio inherit retry FG BG BG FG BG I/O FG BG Task is I/O is recorded recorded 21

  22. Challenges • How to accurately identify I/O criticality • Enlightenment API • I/O priority inheritance • Recording blocking status • How to effectively enforce I/O criticality 22

  23. Criticality-Aware I/O Prioritization • Caching layer • Apply low dirty ratio for non-critical writes (1% by default) • Block layer • Isolate allocation of block queue slots • Maintain 2 FIFO queues • Schedule critical I/O first • Limit # of outstanding non-critical I/Os (1 by default) • Support queue promotion to resolve I/O dependency 23

  24. Evaluation • Implementation on Linux 3.13 w/ ext4 • Application studies • PostgreSQL relational database • Backend processes as foreground tasks • I/O priority inheritance on LWLocks (semop) • MongoDB document store • Client threads as foreground tasks • I/O priority inheritance on Pthread mutex and condition vars (futex) • Redis key-value store • Master process as foreground task 24

  25. Evaluation • Experimental setup • 2 Dell PowerEdge R530 (server & client) • 1TB Micron MX200 SSD • I/O prioritization schemes • CFQ (default), CFQ-IDLE • SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15] • QASIO [FAST’15] • RCP 25

  26. Application Throughput • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP 8000 37% Transaction throughput 31% 6000 28% (trx/sec) 4000 2000 0 10GB dataset 60GB dataset 200GB dataset 26

  27. Application Throughput • Impact on background task CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme improves Transaction log size (GB) 35 application throughput w/o penalizing 25 background tasks 15 5 -5 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 27

  28. Application Latency • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme is effective 0th 10 0 10 0 1 .E+00 for improving tail latency CCDF P[X>=x] 90th 10 -1 10 -1 1 .E-01 99th 1 .E-02 10 -2 10 -2 Over 2 sec 99.9th 1 .E-03 10 -3 10 -3 at 99.9th 10 -4 10 -4 1 .E-04 99.99th 1 .E-05 10 -5 10 -5 99.999th 0 1000 2000 3000 4000 5000 6000 Transaction latency (msec) 300 msec at 99.999th 28

  29. Summary of Other Results • Performance results • MongoDB: 12%-201% throughput, 5x-20x latency at 99.9 th • Redis: 7%-49% throughput, 2x-20x latency at 99.9 th • Analysis results • System latency analysis using LatencyTOP • System throughput vs. Application latency • Need for holistic approach 29

  30. Conclusions • Key observation • All the layers in the I/O path should be considered as a whole with I/O priority inversion in mind for effective I/O prioritization • Request-centric I/O prioritization • Enlightens the I/O path solely for application performance • Improves throughput and latency of real applications • Ongoing work • Practicalizing implementation • Applying RCP to database cluster with multiple replicas 30

  31. Thank You! • Questions and comments 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend