Enlightening the I/O Path: A Holistic Approach for Application - - PowerPoint PPT Presentation

enlightening the i o path
SMART_READER_LITE
LIVE PREVIEW

Enlightening the I/O Path: A Holistic Approach for Application - - PowerPoint PPT Presentation

Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University Data-Intensive Applications Relational Document Key-value Column Search 2 Data-Intensive Applications


slide-1
SLIDE 1

Enlightening the I/O Path:

A Holistic Approach for Application Performance

appeared in FAST'17

Jinkyu Jeong Sungkyunkwan University

slide-2
SLIDE 2

Data-Intensive Applications

2

Relational Key-value Search Column Document

slide-3
SLIDE 3

Data-Intensive Applications

  • Common structure

3

Storage Device Operating System

T1

Client

T2 I/O T3 T4 Request Response I/O I/O I/O

Application

Application performance

* Example: MongoDB

  • Client (foreground)
  • Checkpointer
  • Log writer
  • Eviction worker
slide-4
SLIDE 4

Data-Intensive Applications

  • Common structure

4

Storage Device Operating System

T1

Client

T2 I/O T3 T4 Request Response I/O I/O I/O

Application

Application performance

* Example: MongoDB

  • Server (client)
  • Checkpointer
  • Log writer
  • Evict worker

Background tasks are problematic for application performance

slide-5
SLIDE 5

Application Impact

5

  • Illustrative experiment
  • YCSB update-heavy workload against MongoDB
slide-6
SLIDE 6

Application Impact

6

  • Illustrative experiment
  • YCSB update-heavy workload against MongoDB

10000 20000 30000 200 400 600 800 1000 1200 1400 1600 1800

Operation throughput (ops/sec) Elapsed time (sec) CFQ

Regular checkpoint task 30 seconds latency at 99.99th percentile

slide-7
SLIDE 7

10000 20000 30000 200 400 600 800 1000 1200 1400 1600 1800

Operation throughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE

Application Impact

7

  • Illustrative experiment
  • YCSB update-heavy workload against MongoDB

I/O priority does not help

slide-8
SLIDE 8

10000 20000 30000 200 400 600 800 1000 1200 1400 1600 1800

Operation throughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO

Application Impact

8

  • Illustrative experiment
  • YCSB update-heavy workload against MongoDB

State-of-the-art schedulers do not help much

slide-9
SLIDE 9

What’s the Problem?

  • Independent policies in multiple layers
  • Each layer processes I/Os w/ limited information
  • I/O priority inversion
  • Background I/Os can arbitrarily delay foreground tasks

9

slide-10
SLIDE 10

What’s the Problem?

  • Independent policies in multiple layers
  • Each layer processes I/Os w/ limited information
  • I/O priority inversion
  • Background I/Os can arbitrarily delay foreground tasks

10

slide-11
SLIDE 11

Multiple Independent Layers

  • Independent I/O processing

11

Storage Device Caching Layer Application File System Layer Block Layer Abstraction

Buffer Cache read() write() FG FG BG BG FG BG BG reorder

slide-12
SLIDE 12

What’s the Problem?

  • Independent policies in multiple layers
  • Each layer processes I/Os w/ limited information
  • I/O priority inversion
  • Background I/Os can arbitrarily delay foreground tasks

12

slide-13
SLIDE 13

I/O Priority Inversion

  • Task dependency

13

Storage Device Caching Layer Application File System Layer Block Layer

Locks Condition variables

slide-14
SLIDE 14

I/O Priority Inversion

  • I/O dependency

14

Storage Device Caching Layer Application File System Layer Block Layer

Outstanding I/Os

slide-15
SLIDE 15

15

100 ms latency at 99.99th percentile

10000 20000 30000 40000 200 400 600 800 1000 1200 1400 1600 1800

Operation throughput (ops/sec) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

  • Request-centric I/O prioritization (RCP)
  • Critical I/O: I/O in the critical path of request handling
  • Policy: holistically prioritizes critical I/Os along the I/O path

Our Approach

slide-16
SLIDE 16

Challenges

  • How to accurately identify I/O criticality
  • How to effectively enforce I/O criticality

16

slide-17
SLIDE 17

Critical I/O Detection

  • Enlightenment API
  • Interface for tagging foreground tasks
  • I/O priority inheritance
  • Handling task dependency
  • Handling I/O dependency

17

slide-18
SLIDE 18

I/O Priority Inheritance

  • Handling task dependency
  • Locks
  • Condition variables

18

FG lock BG I/O FG inherit BG submit complete FG BG unlock FG wait BG register BG inherit FG BG I/O submit complete wake CV CV CV

slide-19
SLIDE 19

I/O Priority Inheritance

  • Handling I/O dependency

19

Block Layer

Q admission stage I/O I/O Sched queueing stage I/O

Non-critical I/O tracking

Descriptor Location Resolver Sector #

PER-DEV ROOT NCIO NCIO NCIO NCIO

delete on completion

slide-20
SLIDE 20

Handling Transitive Dependency

  • Possible states of dependent task

20

FG inherit BG BG

Blocked

  • n task

I/O FG inherit BG wait wait

Blocked

  • n I/O

FG inherit BG wait

Blocked at admission stage

slide-21
SLIDE 21

Handling Transitive Dependency

  • Recording blocking status

21

FG inherit BG BG I/O FG inherit BG FG inherit BG retry reprio

I/O is recorded Task is recorded

inherit

slide-22
SLIDE 22

Challenges

  • How to accurately identify I/O criticality
  • Enlightenment API
  • I/O priority inheritance
  • Recording blocking status
  • How to effectively enforce I/O criticality

22

slide-23
SLIDE 23

Criticality-Aware I/O Prioritization

  • Caching layer
  • Apply low dirty ratio for non-critical writes (1% by default)
  • Block layer
  • Isolate allocation of block queue slots
  • Maintain 2 FIFO queues
  • Schedule critical I/O first
  • Limit # of outstanding non-critical I/Os (1 by default)
  • Support queue promotion to resolve I/O dependency

23

slide-24
SLIDE 24

Evaluation

  • Implementation on Linux 3.13 w/ ext4
  • Application studies
  • PostgreSQL relational database
  • Backend processes as foreground tasks
  • I/O priority inheritance on LWLocks (semop)
  • MongoDB document store
  • Client threads as foreground tasks
  • I/O priority inheritance on Pthread mutex and condition vars (futex)
  • Redis key-value store
  • Master process as foreground task

24

slide-25
SLIDE 25

Evaluation

  • Experimental setup
  • 2 Dell PowerEdge R530 (server & client)
  • 1TB Micron MX200 SSD
  • I/O prioritization schemes
  • CFQ (default), CFQ-IDLE
  • SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15]
  • QASIO [FAST’15]
  • RCP

25

slide-26
SLIDE 26

Application Throughput

  • PostgreSQL w/ TPC-C workload

26

2000 4000 6000 8000

10GB dataset 60GB dataset 200GB dataset Transaction throughput (trx/sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

37% 31% 28%

slide-27
SLIDE 27

Application Throughput

  • Impact on background task

27

Our scheme improves application throughput w/o penalizing background tasks

  • 5

5 15 25 35 200 400 600 800 1000 1200 1400 1600 1800

Transaction log size (GB) Elapsed time (sec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

slide-28
SLIDE 28

Application Latency

  • PostgreSQL w/ TPC-C workload

28

1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00 1000 2000 3000 4000 5000 6000

CCDF P[X>=x] Transaction latency (msec) CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP

100 10-1 10-2 10-3 10-4 10-5

300 msec at 99.999th Our scheme is effective for improving tail latency Over 2 sec at 99.9th

100 10-1 10-2 10-3 10-4 10-5

0th 90th 99th 99.9th 99.99th 99.999th

slide-29
SLIDE 29

Summary of Other Results

  • Performance results
  • MongoDB: 12%-201% throughput, 5x-20x latency at 99.9th
  • Redis: 7%-49% throughput, 2x-20x latency at 99.9th
  • Analysis results
  • System latency analysis using LatencyTOP
  • System throughput vs. Application latency
  • Need for holistic approach

29

slide-30
SLIDE 30

Conclusions

  • Key observation
  • All the layers in the I/O path should be considered as a whole with I/O

priority inversion in mind for effective I/O prioritization

  • Request-centric I/O prioritization
  • Enlightens the I/O path solely for application performance
  • Improves throughput and latency of real applications
  • Ongoing work
  • Practicalizing implementation
  • Applying RCP to database cluster with multiple replicas

30

slide-31
SLIDE 31

Thank You!

  • Questions and comments

31