Towards Application-level I/O Proportionality with a Weight-aware - - PowerPoint PPT Presentation

towards application level i o proportionality with a
SMART_READER_LITE
LIVE PREVIEW

Towards Application-level I/O Proportionality with a Weight-aware - - PowerPoint PPT Presentation

Towards Application-level I/O Proportionality with a Weight-aware Page Cache Management Jonggyu Park *, Kwonje Oh, and Young Ik Eom Sungkyunkwan University, South Korea Server Consolidation is Pervasive Multiple virtualized instances run on a


slide-1
SLIDE 1

Towards Application-level I/O Proportionality with a Weight-aware Page Cache Management

Jonggyu Park*, Kwonje Oh, and Young Ik Eom Sungkyunkwan University, South Korea

slide-2
SLIDE 2

Server Consolidation is Pervasive

  • Multiple virtualized instances run on a single host
  • Compete for system resources
  • Efficient resource scheduling is necessary

Container 4 </> Container 3 Container 2 Container 1 Hardware

HDD

SSD

Operating System I/O Requests I/O Requests

slide-3
SLIDE 3

Proportional I/O Sharing by Cgroups

  • Cgroups proportionally share I/O resources using I/O weight
  • The I/O bandwidth ratio follows the ratio of I/O weight

Group 3 Root

: I/O Weight

#

Group # : Cgroup Node : Applications : I/O Proportion

500 Group 2 400 Group 1 100 0.1 0.4 0.5 1.0

slide-4
SLIDE 4

Cgroups and the Block Layer

  • The blkio subsystem controls I/O resources collaboratively with the block layer
  • I/O scheduler in the block layer utilizes the I/O weights in scheduling
  • I/O service time (CFQ) or the number of sectors to serve (BFQ)

Container 4 </> Container 3 Container 2 Container 1 Hardware

HDD

SSD

Operating System Block Layer Cgroups

Single-queue

NOOP CFQ deadline

Multi-queue

Multi-queue

Group 3 500 Group 2 400 Group 1 100 Root

: I/O Weight

#

Group # : Cgroup Node : Applications : I/O Proportion

0.1 0.4 0.5 1.0

slide-5
SLIDE 5

The Page Cache

  • The page cache is often utilized to enhance I/O performance.
  • It directly serves I/O requests without delivering them to the block layer, if possible
  • Cgroups cannot control I/O requests that are serviced by the page cache

Container 4 </> Container 3 Container 2 Container 1 Hardware Operating System Block Layer Cgroups Page Cache I/O Requests Return

slide-6
SLIDE 6

Buffered I/O vs. Direct I/O

Weight (Set by Cgroups) 2 4 6 8 100 200 300 400 100 200 400 800

Direct I/O Buffered I/O Direct I/O Norm. Buffered I/O Norm.

I/O Bandwidth (MB/s) Normalized I/O Bandwidth Weight (Set by Cgroups) I/O Bandwidth (MB/s) Normalized I/O Bandwidth 2 4 6 8 10 100 200 300 400 500 100 200 400 800

Direct I/O Buffered I/O Direct I/O Norm. Buffered I/O Norm.

Fileserver workload Re-read workload

  • Direct I/O
  • Proportional I/O sharing according to I/O weight
  • Lower performance due to bypassing the page cache
  • Buffered I/O
  • Poor proportionality
  • Better performance due to the page cache
slide-7
SLIDE 7

The Life of the Page Cache

  • Page allocation
  • Allocates a new page for the new page cache entry
  • Qspinlock serializes page allocation
  • Critical to the write performance
  • Page reclamation
  • Deallocates pages that are not used to secure new pages
  • Reclaims the pages at the tail of the inactive list
  • Decides which pages will reside in the page cache
  • Affects the read performance
slide-8
SLIDE 8

Qspinlock of Page Allocation

  • Qspinlock prevents race condition
  • Consists of a qspinlock and per-cpu qnodes
  • Allows one CPU holding qspinlock while the head node

(CPU2) busy-waits

  • After qspinlock is released, the head node acquires the

qspinlock

  • FIFO-based holder selection
  • The conventional qspinlock for page allocation selects

the next holder in a FIFO manner

  • No consideration of I/O weight

APP 1 APP 2 APP 3 APP 4

qspinlock

locked

tail busy-waiting CPU1 CPU2 CPU4

800

CPU3 CPU1 busy-waiting CPU2 CPU4 CPU3 busy-waiting

lock waiting queue qspinlock

CPU2 CPU4 CPU3 CPU2 CPU4

qnode

next

An overview of qspinlock

Weight 100 Weight 200 Weight 400 Weight 800

slide-9
SLIDE 9

Page Reclamation

  • Page cache
  • maintains 2Q LRU
  • Keeps data frequently accessed in the active list, otherwise in the inactive list
  • Reclaims pages at the tail of the inactive list
  • Page reclamation
  • Ignores the I/O weight during reclamation
  • Pages used by higher weighted apps can be evicted

earlier

  • No scheme to reflect I/O weight

An overview of page reclamation

inactive list in Page Cache APP# APP 1 100 APP 2 200 APP 3 400 APP 4 800 page 3 1 2 4 2 APP # I/O weight 2 1 2 4 2 Cgroups node page reclamation

slide-10
SLIDE 10

Justitia

Problem #2: Page allocation/reclamation do not reflect I/O weight Problem #1: Cgroups focus on block-level I/O proportionality Justitia: new page cache management for application-level I/O proportionality

  • A. Weight-aware Qspinlock for Page Cache Allocation
  • B. Weight-aware Page Reclamation

Weight-awareness!!!

slide-11
SLIDE 11

Weight-aware Qspinlock for Page Cache Allocation

  • Weight-aware Qspinlock
  • Stores weight in the qnode
  • Reflects I/O weight by the following procedure
  • 1. qspinlock is released
  • 2. Iterates lock waiting queue to find the qnode

(maxNode) with the highest I/O weight

  • 3. Moves the maxNode next to the head node
  • 4. Next time, when the head node acquires the qspinlock,

the maxNode becomes a head node

In short, Justitia reorders the lock waiting queue based

  • n I/O weight

An overview of weight-aware qspinlock

APP 1 APP 2 APP 3 APP 4

qspinlock qnode weight next

locked tail busy-waiting

800 200

CPU1 CPU4 CPU2

800 200

CPU4 CPU2

400

CPU3 CPU1 busy-waiting

800 400

CPU4 CPU3

200

CPU2

400

CPU4 CPU3

200

CPU2 busy-waiting

lock waiting queue qspinlock

slide-12
SLIDE 12
  • How about the starvation problem?
  • When there are many high-weighted apps, the low-weighted apps can starve
  • We adopt aging technique to prevent the starvation problem
  • Whenever reordering occurs, Justitia increases I/O weight of qnodes

in the lock waiting queue

  • Justitia considers not only I/O weight but also the waiting time

800 400

CPU4 CPU3

200

CPU2

400

CPU4 CPU3

200

CPU2 busy-waiting

lock waiting queue qspinlock

300

Preventing the Race Condition

slide-13
SLIDE 13

Weight-aware Page Reclamation

Justitia imposes weight-awareness by the following procedures

  • Calculating the I/O proportion of each application
  • Recording page ownership information on the page structure
  • Page reclamation considering the I/O proportion
slide-14
SLIDE 14

Weight-aware Page Reclamation

  • Calculating the I/O proportion of each application
  • New variables in Cgroups are added
  • Proportion: Proportion of I/O weight (weight / total weight)
  • nrp_pages: The number of pages in the page cache that this cgroup is currently using

APP 1 100 APP 2 200 APP 3 400 APP 4 800 0.07 0.13 0.27 0.53

APP # I/O weight proportion nrp_pages Cgroups node 100 100+200+400+800

slide-15
SLIDE 15

Weight-aware Page Reclamation

  • Recording page ownership information on the page structure
  • New variable in the page structure
  • I/O weight
  • Pointer to the corresponding cgroups node

weight pointer to the corresponding Cgroups node page

100 APP 1 100 1 0.07 0à1

page allocation

slide-16
SLIDE 16

Weight-aware Page Reclamation

  • Page reclamation considering the I/O proportion
  • Justitia reclaims pages whose cgroups hold more pages than its threshold

*Threshold = proportion * the total # of pages in the page cache

inactive list in Page Cache

weight pointer to the corresponding Cgroups node

100 APP 1 100 APP 2 200 APP 3 400 APP 4 800

page

400 3 1 200 2 800 4 200 2 0.07 0.13 0.27 0.53 1 6à5 1 2

APP # I/O weight proportion nrp_pages

200 2 200 2 200 2 800 4 200 2

Cgroups node page reclamation

An overview of weight-aware page reclamation

slide-17
SLIDE 17

Experimental Setup

  • CPU: Intel I7-6700
  • Memory: 16GB DRAM
  • Storage: SATA SSD 256GB
  • Benchmarks: FIO (re-read) and Filebench (fileserver)

* All applications were containerized by Docker

  • A metric to quantitively measure I/O proportionality, introduced in [1]

Read à Dummy write à Read

Ref [1] J.Kim et al. “I/O scheduling schemes for better I/O proportionality on flash-based SSDs”

(Proportionality Variation)

slide-18
SLIDE 18

Evaluation (Fileserver)

  • Compared with the conventional, Justitia achieves better I/O proportionality
  • Conventional: 1 : 1.51 : 2.02 : 2.40 : 2.63 : 2.71 : 3.07 : 3.31
  • Justitia: 1 : 1.73 : 2.24 : 2.65 : 3.04 : 3.75 : 4.37 : 6.26

2 4 6 8 10 100 200 300 400 500 600 700 800 Ideal Conventional CPM Justitia

Weight (Set by Cgroups) Normalized I/O Bandwidth

slide-19
SLIDE 19

Evaluation (Aging Technique)

Extreme case where C1’s weight: 100, C2-C8’s weight: 1000

  • Justitia without aging: 1 : 12.57 : 13.31 : 11.72 : 12.443 : 13.31 : 12.77 : 13.35 (PV: 2.31)
  • Justitia: 1 : 8.94 : 9.36 : 9.08 : 8.83 : 9.49 : 9.77 : 9.43 (PV: 0.64)

Normalized I/O Bandwidth 4 8 12 16 C1 C2 C3 C4 C5 C6 C7 C8 Ideal Justitia w/o Aging Justitia Weight (Set by Cgroups)

slide-20
SLIDE 20

Evaluation (Re-read)

  • Justitia achieves better I/O proportionality than the other cases
  • PV of Conventional: 1.4
  • PV of Justitia: 0.33
  • PV of Direct I/O: 0.61

2 4 6 8 10 100 200 400 800 Ideal Conv CPM Direct Justitia Weight (Set by Cgroups) Normalized I/O Bandwidth

slide-21
SLIDE 21

Conclusion

  • Cgroups support only block-level I/O proportionality, rather than application-level

I/O proportionality

  • The conventional page cache management do not consider I/O weight either in

page allocation and reclamation

  • Justitia: a new page cache management for application-level I/O proportionality
  • Weight-aware qspinlock for page allocation
  • Weight-aware page reclamation
  • Justitia is available at github.com/kzeoh/Justitia.git
slide-22
SLIDE 22

Thank you! Any questions?

Feel free to contact jonggyu@skku.edu