Towards Application-level I/O Proportionality with a Weight-aware - - PowerPoint PPT Presentation
Towards Application-level I/O Proportionality with a Weight-aware - - PowerPoint PPT Presentation
Towards Application-level I/O Proportionality with a Weight-aware Page Cache Management Jonggyu Park *, Kwonje Oh, and Young Ik Eom Sungkyunkwan University, South Korea Server Consolidation is Pervasive Multiple virtualized instances run on a
Server Consolidation is Pervasive
- Multiple virtualized instances run on a single host
- Compete for system resources
- Efficient resource scheduling is necessary
Container 4 </> Container 3 Container 2 Container 1 Hardware
HDD
SSD
Operating System I/O Requests I/O Requests
Proportional I/O Sharing by Cgroups
- Cgroups proportionally share I/O resources using I/O weight
- The I/O bandwidth ratio follows the ratio of I/O weight
Group 3 Root
: I/O Weight
#
Group # : Cgroup Node : Applications : I/O Proportion
500 Group 2 400 Group 1 100 0.1 0.4 0.5 1.0
Cgroups and the Block Layer
- The blkio subsystem controls I/O resources collaboratively with the block layer
- I/O scheduler in the block layer utilizes the I/O weights in scheduling
- I/O service time (CFQ) or the number of sectors to serve (BFQ)
Container 4 </> Container 3 Container 2 Container 1 Hardware
HDD
SSD
Operating System Block Layer Cgroups
Single-queue
NOOP CFQ deadline
…
Multi-queue
…
Multi-queue
Group 3 500 Group 2 400 Group 1 100 Root
: I/O Weight
#
Group # : Cgroup Node : Applications : I/O Proportion
0.1 0.4 0.5 1.0
The Page Cache
- The page cache is often utilized to enhance I/O performance.
- It directly serves I/O requests without delivering them to the block layer, if possible
- Cgroups cannot control I/O requests that are serviced by the page cache
Container 4 </> Container 3 Container 2 Container 1 Hardware Operating System Block Layer Cgroups Page Cache I/O Requests Return
Buffered I/O vs. Direct I/O
Weight (Set by Cgroups) 2 4 6 8 100 200 300 400 100 200 400 800
Direct I/O Buffered I/O Direct I/O Norm. Buffered I/O Norm.
I/O Bandwidth (MB/s) Normalized I/O Bandwidth Weight (Set by Cgroups) I/O Bandwidth (MB/s) Normalized I/O Bandwidth 2 4 6 8 10 100 200 300 400 500 100 200 400 800
Direct I/O Buffered I/O Direct I/O Norm. Buffered I/O Norm.
Fileserver workload Re-read workload
- Direct I/O
- Proportional I/O sharing according to I/O weight
- Lower performance due to bypassing the page cache
- Buffered I/O
- Poor proportionality
- Better performance due to the page cache
The Life of the Page Cache
- Page allocation
- Allocates a new page for the new page cache entry
- Qspinlock serializes page allocation
- Critical to the write performance
- Page reclamation
- Deallocates pages that are not used to secure new pages
- Reclaims the pages at the tail of the inactive list
- Decides which pages will reside in the page cache
- Affects the read performance
Qspinlock of Page Allocation
- Qspinlock prevents race condition
- Consists of a qspinlock and per-cpu qnodes
- Allows one CPU holding qspinlock while the head node
(CPU2) busy-waits
- After qspinlock is released, the head node acquires the
qspinlock
- FIFO-based holder selection
- The conventional qspinlock for page allocation selects
the next holder in a FIFO manner
- No consideration of I/O weight
APP 1 APP 2 APP 3 APP 4
qspinlock
locked
tail busy-waiting CPU1 CPU2 CPU4
800
CPU3 CPU1 busy-waiting CPU2 CPU4 CPU3 busy-waiting
lock waiting queue qspinlock
CPU2 CPU4 CPU3 CPU2 CPU4
qnode
next
An overview of qspinlock
Weight 100 Weight 200 Weight 400 Weight 800
Page Reclamation
- Page cache
- maintains 2Q LRU
- Keeps data frequently accessed in the active list, otherwise in the inactive list
- Reclaims pages at the tail of the inactive list
- Page reclamation
- Ignores the I/O weight during reclamation
- Pages used by higher weighted apps can be evicted
earlier
- No scheme to reflect I/O weight
An overview of page reclamation
inactive list in Page Cache APP# APP 1 100 APP 2 200 APP 3 400 APP 4 800 page 3 1 2 4 2 APP # I/O weight 2 1 2 4 2 Cgroups node page reclamation
Justitia
Problem #2: Page allocation/reclamation do not reflect I/O weight Problem #1: Cgroups focus on block-level I/O proportionality Justitia: new page cache management for application-level I/O proportionality
- A. Weight-aware Qspinlock for Page Cache Allocation
- B. Weight-aware Page Reclamation
Weight-awareness!!!
Weight-aware Qspinlock for Page Cache Allocation
- Weight-aware Qspinlock
- Stores weight in the qnode
- Reflects I/O weight by the following procedure
- 1. qspinlock is released
- 2. Iterates lock waiting queue to find the qnode
(maxNode) with the highest I/O weight
- 3. Moves the maxNode next to the head node
- 4. Next time, when the head node acquires the qspinlock,
the maxNode becomes a head node
In short, Justitia reorders the lock waiting queue based
- n I/O weight
An overview of weight-aware qspinlock
APP 1 APP 2 APP 3 APP 4
qspinlock qnode weight next
locked tail busy-waiting
800 200
CPU1 CPU4 CPU2
800 200
CPU4 CPU2
400
CPU3 CPU1 busy-waiting
800 400
CPU4 CPU3
200
CPU2
400
CPU4 CPU3
200
CPU2 busy-waiting
lock waiting queue qspinlock
- How about the starvation problem?
- When there are many high-weighted apps, the low-weighted apps can starve
- We adopt aging technique to prevent the starvation problem
- Whenever reordering occurs, Justitia increases I/O weight of qnodes
in the lock waiting queue
- Justitia considers not only I/O weight but also the waiting time
800 400
CPU4 CPU3
200
CPU2
400
CPU4 CPU3
200
CPU2 busy-waiting
lock waiting queue qspinlock
300
Preventing the Race Condition
Weight-aware Page Reclamation
Justitia imposes weight-awareness by the following procedures
- Calculating the I/O proportion of each application
- Recording page ownership information on the page structure
- Page reclamation considering the I/O proportion
Weight-aware Page Reclamation
- Calculating the I/O proportion of each application
- New variables in Cgroups are added
- Proportion: Proportion of I/O weight (weight / total weight)
- nrp_pages: The number of pages in the page cache that this cgroup is currently using
APP 1 100 APP 2 200 APP 3 400 APP 4 800 0.07 0.13 0.27 0.53
APP # I/O weight proportion nrp_pages Cgroups node 100 100+200+400+800
Weight-aware Page Reclamation
- Recording page ownership information on the page structure
- New variable in the page structure
- I/O weight
- Pointer to the corresponding cgroups node
weight pointer to the corresponding Cgroups node page
100 APP 1 100 1 0.07 0à1
page allocation
Weight-aware Page Reclamation
- Page reclamation considering the I/O proportion
- Justitia reclaims pages whose cgroups hold more pages than its threshold
*Threshold = proportion * the total # of pages in the page cache
inactive list in Page Cache
weight pointer to the corresponding Cgroups node
100 APP 1 100 APP 2 200 APP 3 400 APP 4 800
page
400 3 1 200 2 800 4 200 2 0.07 0.13 0.27 0.53 1 6à5 1 2
APP # I/O weight proportion nrp_pages
200 2 200 2 200 2 800 4 200 2
Cgroups node page reclamation
An overview of weight-aware page reclamation
Experimental Setup
- CPU: Intel I7-6700
- Memory: 16GB DRAM
- Storage: SATA SSD 256GB
- Benchmarks: FIO (re-read) and Filebench (fileserver)
* All applications were containerized by Docker
- A metric to quantitively measure I/O proportionality, introduced in [1]
Read à Dummy write à Read
Ref [1] J.Kim et al. “I/O scheduling schemes for better I/O proportionality on flash-based SSDs”
(Proportionality Variation)
Evaluation (Fileserver)
- Compared with the conventional, Justitia achieves better I/O proportionality
- Conventional: 1 : 1.51 : 2.02 : 2.40 : 2.63 : 2.71 : 3.07 : 3.31
- Justitia: 1 : 1.73 : 2.24 : 2.65 : 3.04 : 3.75 : 4.37 : 6.26
2 4 6 8 10 100 200 300 400 500 600 700 800 Ideal Conventional CPM Justitia
Weight (Set by Cgroups) Normalized I/O Bandwidth
Evaluation (Aging Technique)
Extreme case where C1’s weight: 100, C2-C8’s weight: 1000
- Justitia without aging: 1 : 12.57 : 13.31 : 11.72 : 12.443 : 13.31 : 12.77 : 13.35 (PV: 2.31)
- Justitia: 1 : 8.94 : 9.36 : 9.08 : 8.83 : 9.49 : 9.77 : 9.43 (PV: 0.64)
Normalized I/O Bandwidth 4 8 12 16 C1 C2 C3 C4 C5 C6 C7 C8 Ideal Justitia w/o Aging Justitia Weight (Set by Cgroups)
Evaluation (Re-read)
- Justitia achieves better I/O proportionality than the other cases
- PV of Conventional: 1.4
- PV of Justitia: 0.33
- PV of Direct I/O: 0.61
2 4 6 8 10 100 200 400 800 Ideal Conv CPM Direct Justitia Weight (Set by Cgroups) Normalized I/O Bandwidth
Conclusion
- Cgroups support only block-level I/O proportionality, rather than application-level
I/O proportionality
- The conventional page cache management do not consider I/O weight either in
page allocation and reclamation
- Justitia: a new page cache management for application-level I/O proportionality
- Weight-aware qspinlock for page allocation
- Weight-aware page reclamation
- Justitia is available at github.com/kzeoh/Justitia.git