FRD: A Filtering based Buffer Cache Algorithm that Considers both - - PowerPoint PPT Presentation

frd a filtering based buffer cache algorithm that
SMART_READER_LITE
LIVE PREVIEW

FRD: A Filtering based Buffer Cache Algorithm that Considers both - - PowerPoint PPT Presentation

33rd International Conference on Massive Storage Systems and Technology (MSST 2017) Santa Clara, CA, May 15 19, 2017 FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance Sejin Park* and Chanik Park**


slide-1
SLIDE 1

FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance

Sejin Park* and Chanik Park**

  • 2017. 05. 18

*SK telecom Corporate R&D Center | Network IT Convergence R&D Center, New Computing Lab **POSTECH Department of Computer Science and Engineering, System Software Lab

33rd International Conference on Massive Storage Systems and Technology (MSST 2017) Santa Clara, CA, May 15 — 19, 2017

slide-2
SLIDE 2

Table of Contents

  • Motivation
  • Workload Analysis and Observations
  • Design
  • Evaluation
  • Summary

2

slide-3
SLIDE 3

Motivation

  • Buffer cache management algorithm is one of the oldest topic in computer science area
  • Existing buffer cache algorithm concentrates on how to maintain meaningful blocks?

– LRU, LFU, OPT, … – LIRS (ACM SIGMETRICS 2002, S. Jiang. et. al.)

  • Two LRU Stacks (LIRS, HIRS)

– Reuse distance ordering

– ARC (USENIX FAST 03, Megiddo. et. al.)

  • Two LRU Stacks (Recency-T1, Frequency-T2)

– Adaptive resizing

  • In this study, we concentrate on how to exclude the cache-unfriendly blocks

– We analyzed real-world workload and found characteristics of cache-unfriendly blocks

3

slide-4
SLIDE 4

Example: LRU

  • Depending on their eviction policy, blocks that can make cache pollution could be

maintained in cache space

  • LRU believes that recently used blocks will make more cache hit

– If the recently used blocks are infrequently accessed and rarely used, it causes cache pollution!

4

I I I I I I I F F F F MRU LRU I F Infrequently accessed block (cache-unfriendIy block) Frequently accessed block Eviction New block Insertion

slide-5
SLIDE 5

Example: ARC

  • Recency buffer T1 and Frequency buffer T2 in ARC works as LRU cache
  • If a block is reused, it moves into T2 even if it is infrequently accessed block

– This can cause cache pollution for T2

5

I I I I I F MRU LRU I F Infrequently accessed block (cache-unfriendIy block) Frequently accessed block Eviction New block Insertion LRU MRU Eviction T1 T2 History buffer History buffer Reused block Insertion (moved from T1 or T2)

slide-6
SLIDE 6

Workload Description

  • Real-world workloads downloaded from SNIA.

Name Type Description OLTP Application Online transaction processing Web12 Web server A typical retail shop Web07 Web server A typical retail shop prxy_0 Data center Firewall/web proxy wdev_0 Data center Test web server hm_0 Data center Hardware monitoring proj_0 Data center Project directories proj_3 Data center Project directories src1_2 Data center Source control

6

slide-7
SLIDE 7

Workload Analysis

  • Reuse Distance Distribution
  • Reuse Distance: # of unique blocks between the same blocks request

7

# of Accesses # of Accesses CDF (Percentage) CDF (Percentage) Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2

slide-8
SLIDE 8

Workload Analysis

  • CDF of Number of accessed count for each block

8

CDF (Percentage) CDF (Percentage) X axis: Number of accessed count for each block X axis: Number of accessed count for each block (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2

slide-9
SLIDE 9

Workload Analysis

  • Observation #1: Most blocks (about 50 – 90%) are infrequently accessed in the real-world

workload.

9

CDF (Percentage) CDF (Percentage)

80% 3 74% 3 88% 3 55% 3 80% 3 34% 3 70% 3 54% 3 74% 3

X axis: Number of accessed count for each block X axis: Number of accessed count for each block (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2

slide-10
SLIDE 10

Workload Analysis

  • CDF of reuse distance distribution for the infrequently accessed blocks (represented by

percentage of cache size)

10

X axis: Reuse distance (represented with percentage of given cache size) X axis: Reuse distance (represented with percentage of given cache size) (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) CDF (Percentage)

slide-11
SLIDE 11

Workload Analysis

11

  • Observation #2: Reuse distance for the infrequently accessed blocks is extremely long
  • r extremely short

– In terms of cache size: under 10% and over 100% of cache size are dominant 90% 90% 94% 96% 88% 98% 98% 98% 98%

X axis: Reuse distance (represented with percentage of given cache size) X axis: Reuse distance (represented with percentage of given cache size) (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) CDF (Percentage)

slide-12
SLIDE 12
  • Observation #1: Most blocks are infrequently accessed in the real-world workload

– These blocks are cache-unfriendly blocks that cause cache pollution

  • Observation #2: Reuse distance for the infrequently accessed blocks is extremely

long or extremely short

– The cache-unfriendly blocks have distinct characteristics

  • Therefore,

– “Frequency” and “Reuse distance” are the key metrics to filter out the cache-unfriendly blocks

Observations

12

slide-13
SLIDE 13
  • Block Classification
  • Design Goal

– Maintains Class 1 and 2 blocks in cache – Maintains Class 3 blocks but preventing it from polluting cache – Filters out Class 4 blocks from cache

Design

Class Accessing Frequency Reuse Distance Cache-Hit Target Cache Pollution (Filtering target) Class 1 (FS) Frequent Short V

  • Class 2 (FL)

Frequent Long V

  • Class 3 (IS)

Infrequent Short V V Class 4 (IL) Infrequent Long

  • V

13

slide-14
SLIDE 14

FRD Algorithm

  • A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance

Filter Stack Reuse distance Stack

3. History Block Insertion

  • 1. New

Entry insertion

2. Resident Block Insertion

  • 5. Cache Miss
  • 6. Cache Hit
  • 4. Cache Hit

Eviction Eviction LRU LRU MRU MRU Resident Block History Block * If RD stack is not full New entry is inserted to RD stack. Parameter = FilterStack (%) (Default = 10%)

14

slide-15
SLIDE 15

Analysis of FRD Algorithm

Filter Stack Reuse distance Stack

History Block Insertion

New Entry

Resident Block Insertion

Cache Miss Cache Hit Cache Hit Eviction Eviction LRU LRU MRU MRU Resident Block History Block

Class 1 (FS) Class 2 (FL) Class 3 (IS) Class 4 (IL) Class 1,2,3,4 Class 3 (IS) Class 2 (FL) Class 2 (FL) Class 1 (FS) Class 2 (FL) Class 4 (IL) Class 3 (IS) Class 1,3,4

Parameter = FilterStack (%) (Default = 10%) * If RD stack is not full New entry is inserted to RD stack.

15

slide-16
SLIDE 16

Evaluation

  • Environment

– Simulation based evaluation – Compared with OPT, LRU, ARC, LIRS

16

slide-17
SLIDE 17

Hitratio Result

17

Hitratio Cache size (MB) Cache size (MB) Cache size (MB)

FRD is highest LIRS is highest ARC is highest ARC is unstable LIRS is unstable

  • Case of LIRS’ unstable hitratio result

<Legend>

slide-18
SLIDE 18

18

Hitratio Cache size (MB) Cache size (MB) Cache size (MB)

Hitratio Result

FRD is highest LIRS is highest ARC is highest ARC is unstable LIRS is unstable <Legend>

slide-19
SLIDE 19
  • Case of ARC’s unstable hitratio result

19

Hitratio Cache size (MB) Cache size (MB) Cache size (MB)

Hitratio Result

FRD is highest LIRS is highest ARC is highest ARC is unstable LIRS is unstable <Legend>

slide-20
SLIDE 20

Evaluation

  • Overall Average Result (1.0 is OPT’s hitratio)

Workload LRU ARC LIRS FRD OLTP 0.674 0.746 0.691 0.753 Web12 0.829 0.852 0.827 0.857 Web07 0.800 0.839 0.812 0.847 prxy_0 0.844 0.870 0.870 0.898 wdev_0 0.647 0.723 0.728 0.745 hm_0 0.598 0.700 0.723 0.724 proj_0 0.612 0.722 0.740 0.780 proj_3 0.172 0.241 0.516 0.478 src1_2 0.620 0.697 0.799 0.813

20

slide-21
SLIDE 21

Parameter Sensitivity (Size of the Filter stack)

  • Variation of filter stack size from 1% to 25% of cache size.
  • 10% shows the best performance on average but the difference is negligible.

21

slide-22
SLIDE 22

Summary

  • FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and

Reuse Distance

– A new buffer cache algorithm that filters out cache-unfriendly blocks – Careful analysis on real-world workload gives characteristics of cache-unfriendly blocks – The experimental result shows that it outperforms state-of-the-art cache algorithms like ARC

  • r LIRS.

22

slide-23
SLIDE 23

Backup slides

23

slide-24
SLIDE 24

Hitratio Analysis

24

  • Filter stack performance
slide-25
SLIDE 25

25

ARC (Initial: T1= T2= B1 = B2 = 0, p = 0) T1+ T2+ B1+ B2 <= 2C

T1 T2 B1 B2 New Entry Replace(p)

LIRS ( HIRstack + LIRstack = c, 1:99 )

HIRstack LIRStack New Entry Keep Non-Resident till RMAX

Metadata (Non-resident)

HIT MISS NEW ENTRY Eviction Flow Metadata(History)

Non-Resident Resident Resident

Revisiting LIRS and ARC

p = min{c, p+ max{|B2|/|B1|,1} } Replace(p) p = max{0, p - max{|B1|/|B2|,1} } Replace(p) p

Resident Resident History History

Subroutine Replace(p) if (|T1| ≥ 1) and ((x ∈ B2 and |T1| = p) or (|T1| > p)) then move the LRU page of T1 to the top of B1 and remove it from the cache. else move the LRU page in T2 to the top of B2 and remove it from the cache.

slide-26
SLIDE 26

Design comparison with ARC and LIRS

26

ARC LIRS FRD # LRU stack Two Two Two Adaptive Resizing O X X Eviction Point Two (Two LRU stacks are isolated) One (Two LRU stacks are not isolated) Two (Two LRU stacks are isolated) History size Cache size x 2 Max resident block Max resident block