FRD: A Filtering based Buffer Cache Algorithm that Considers both - PowerPoint PPT Presentation

33rd International Conference on Massive Storage Systems and Technology (MSST 2017) Santa Clara, CA, May 15 — 19, 2017 FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance Sejin Park* and Chanik Park** 2017. 05. 18 *SK telecom Corporate R&D Center | Network IT Convergence R&D Center, New Computing Lab **POSTECH Department of Computer Science and Engineering, System Software Lab

Table of Contents Motivation • Workload Analysis and Observations • Design • Evaluation • Summary • 2

Motivation Buffer cache management algorithm is one of the oldest topic in computer science area • Existing buffer cache algorithm concentrates on how to maintain meaningful blocks? • – LRU, LFU, OPT, … – LIRS (ACM SIGMETRICS 2002, S. Jiang. et. al.) • Two LRU Stacks (LIRS, HIRS) – Reuse distance ordering – ARC (USENIX FAST 03, Megiddo. et. al.) • Two LRU Stacks (Recency-T1, Frequency-T2) – Adaptive resizing In this study, we concentrate on how to exclude the cache-unfriendly blocks • – We analyzed real-world workload and found characteristics of cache-unfriendly blocks 3

Example: LRU Depending on their eviction policy, blocks that can make cache pollution could be • maintained in cache space LRU believes that recently used blocks will make more cache hit • – If the recently used blocks are infrequently accessed and rarely used, it causes cache pollution! New block Insertion I I I I I I I F F F F Eviction LRU MRU I Infrequently accessed block (cache-unfriendIy block) F Frequently accessed block 4

Example: ARC Recency buffer T1 and Frequency buffer T2 in ARC works as LRU cache • If a block is reused, it moves into T2 even if it is infrequently accessed block • – This can cause cache pollution for T2 Reused block New block Insertion (moved from T1 or T2) Insertion Eviction I I I I I F Eviction LRU LRU MRU MRU History buffer T1 T2 History buffer I Infrequently accessed block (cache-unfriendIy block) F Frequently accessed block 5

Workload Description Real-world workloads downloaded from SNIA. • Name Type Description OLTP Application Online transaction processing Web12 Web server A typical retail shop Web07 Web server A typical retail shop prxy_0 Data center Firewall/web proxy wdev_0 Data center Test web server hm_0 Data center Hardware monitoring proj_0 Data center Project directories proj_3 Data center Project directories src1_2 Data center Source control 6

Workload Analysis Reuse Distance Distribution • - Reuse Distance: # of unique blocks between the same blocks request (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 CDF (Percentage) # of Accesses Reuse distance Reuse distance Reuse distance Reuse distance Reuse distance (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) # of Accesses Reuse distance Reuse distance Reuse distance Reuse distance 7

Workload Analysis CDF of Number of accessed count for each block • (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 CDF (Percentage) X axis: Number of accessed count for each block (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) X axis: Number of accessed count for each block 8

Workload Analysis Observation #1: Most blocks (about 50 – 90%) are infrequently accessed in the real-world • workload. (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 CDF (Percentage) 88% 80% 80% 74% 55% 3 3 3 3 3 X axis: Number of accessed count for each block (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) 74% 70% 54% 34% 3 3 3 3 X axis: Number of accessed count for each block 9

Workload Analysis CDF of reuse distance distribution for the infrequently accessed blocks (represented by • percentage of cache size) (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 CDF (Percentage) X axis: Reuse distance (represented with percentage of given cache size) (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) X axis: Reuse distance (represented with percentage of given cache size) 10

Workload Analysis Observation #2: Reuse distance for the infrequently accessed blocks is extremely long • or extremely short – In terms of cache size: under 10% and over 100% of cache size are dominant (a) OLTP (b) Web12 (c) Web07 (d) prxy_0 (e) wdev_0 CDF (Percentage) 88% 96% 90% 94% 90% X axis: Reuse distance (represented with percentage of given cache size) (f) hm_0 (g) proj_0 (h) proj_3 (i) src1_2 CDF (Percentage) 98% 98% 98% 98% X axis: Reuse distance (represented with percentage of given cache size) 11

Observations Observation #1: Most blocks are infrequently accessed in the real-world workload • – These blocks are cache-unfriendly blocks that cause cache pollution Observation #2: Reuse distance for the infrequently accessed blocks is extremely • long or extremely short – The cache-unfriendly blocks have distinct characteristics Therefore, • – “Frequency” and “Reuse distance” are the key metrics to filter out the cache-unfriendly blocks 12

Design Block Classification • Accessing Reuse Cache-Hit Cache Pollution Class Frequency Distance Target (Filtering target) Class 1 (FS) Frequent Short V - Class 2 (FL) Frequent Long V - Class 3 (IS) Infrequent Short V V Class 4 (IL) Infrequent Long - V Design Goal • – Maintains Class 1 and 2 blocks in cache – Maintains Class 3 blocks but preventing it from polluting cache – Filters out Class 4 blocks from cache 13

FRD Algorithm - A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance Parameter = FilterStack (%) (Default = 10%) 4. Cache Hit 2. 1. New Resident Block Eviction Insertion Entry insertion MRU LRU * If RD stack is not full Filter Stack New entry is inserted to RD stack. Reuse distance Stack 3. MRU LRU History Block Insertion Eviction 5. Cache Miss 6. Cache Hit Resident Block History Block 14

Analysis of FRD Algorithm Parameter = FilterStack (%) Class 1 Class 3 (Default = 10%) (FS) (IS) Class 2 Cache Hit (FL) Class 1 Class 2 (FS) (FL) Class 3 New Resident Block Eviction (IS) Class 3 Class 4 Insertion Entry (IS) (IL) MRU LRU Class 4 (IL) * If RD stack is not full Filter Stack New entry is inserted to RD stack. Reuse distance Stack Class Class 2 1,2,3,4 MRU LRU (FL) History Block Insertion Eviction Class 1,3,4 Cache Miss Class 2 (FL) Cache Hit Resident Block History Block 15

Evaluation Environment • – Simulation based evaluation – Compared with OPT, LRU, ARC, LIRS 16

<Legend> Hitratio Result FRD is highest LIRS is highest LIRS is unstable Case of LIRS’ unstable hitratio result • ARC is highest ARC is unstable Hitratio Cache size (MB) Cache size (MB) Cache size (MB) 17

<Legend> Hitratio Result FRD is highest LIRS is highest LIRS is unstable ARC is highest ARC is unstable Hitratio Cache size (MB) Cache size (MB) Cache size (MB) 18

<Legend> Hitratio Result FRD is highest LIRS is highest LIRS is unstable Case of ARC’s unstable hitratio result • ARC is highest ARC is unstable Hitratio Cache size (MB) Cache size (MB) Cache size (MB) 19

Evaluation Overall Average Result (1.0 is OPT’s hitratio) • Workload LRU ARC LIRS FRD OLTP 0.674 0.746 0.691 0.753 Web12 0.829 0.852 0.827 0.857 Web07 0.800 0.839 0.812 0.847 prxy_0 0.844 0.870 0.870 0.898 wdev_0 0.647 0.723 0.728 0.745 hm_0 0.598 0.700 0.723 0.724 proj_0 0.612 0.722 0.740 0.780 proj_3 0.172 0.241 0.516 0.478 src1_2 0.620 0.697 0.799 0.813 20

Parameter Sensitivity (Size of the Filter stack) Variation of filter stack size from 1% to 25% of cache size. • 10% shows the best performance on average but the difference is negligible. • 21

Summary FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and • Reuse Distance – A new buffer cache algorithm that filters out cache-unfriendly blocks – Careful analysis on real-world workload gives characteristics of cache-unfriendly blocks – The experimental result shows that it outperforms state-of-the-art cache algorithms like ARC or LIRS. 22

Backup slides 23

Hitratio Analysis Filter stack performance • 24

Revisiting LIRS and ARC ARC (Initial: T1= T2= B1 = B2 = 0, p = 0) LIRS ( HIRstack + LIRstack = c, 1:99 ) T1+ T2+ B1+ B2 <= 2C p New Entry New Entry Replace(p) HIRstack Resident T1 B1 Resident History Metadata (Non-resident) Keep Non-Resident till RMAX p = min{c, p+ max{|B2|/|B1|,1} } Replace(p) LIRStack Non-Resident Resident T2 B2 Resident History p = max{0, p - max{|B1|/|B2|,1} } Replace(p) Subroutine Replace (p ) if (| T1 | ≥ 1) and (( x ∈ B2 and | T1 | = p ) or (| T1 | > p )) then move the LRU page of T1 to the top of B1 and remove it from the cache. else move the LRU page in T2 to the top of B2 and remove it from the cache. Eviction Flow HIT MISS NEW ENTRY Metadata(History) 25

Design comparison with ARC and LIRS ARC LIRS FRD # LRU stack Two Two Two Adaptive Resizing O X X Two One Two (Two LRU stacks are (Two LRU stacks are (Two LRU stacks are Eviction Point isolated) not isolated) isolated) History size Cache size x 2 Max resident block Max resident block 26

FRD: A Filtering based Buffer Cache Algorithm that Considers both - PowerPoint PPT Presentation

33rd International Conference on Massive Storage Systems and Technology (MSST 2017) Santa Clara, CA, May 15 19, 2017 FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance Sejin Park* and Chanik Park**

FRD Center Market Entry Services Corporate Presentation www.frdcenter.ro FRD Center Market Entry

FRD Center Market Entry Services Corporate Presentation http://www.frdcenter.ro FRD Center

FRD Center Market Entry Services Bucharest, Romania Consultant to Governmental and Export

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

sysbench 1.0: teaching an old dog new tricks Alexey Kopytov akopytov@gmail.com 1 The early days

Oracle Advanced Compression Tests Svetozar Kapusta 15 th of October 2009 What is CERN? CERN is:

YMMV Ov Overv erview iew In Inte tel NV l NVM M Em Emul ulat ator or

More Than A Network: Distributed OLTP on Clusters of Hardware Islands Danica Porobic , Pnar

Ener Energy gy and and Pe Performance Can Can a Wi Wimpy mpy Node Node Cl Clus uster

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

FRD: A Filtering based Buffer Cache Algorithm that Considers both - PowerPoint PPT Presentation

33rd International Conference on Massive Storage Systems and Technology (MSST 2017) Santa Clara, CA, May 15 19, 2017 FRD: A Filtering based Buffer Cache Algorithm that Considers both Frequency and Reuse Distance Sejin Park* and Chanik Park**

FRD Center Market Entry Services Corporate Presentation www.frdcenter.ro FRD Center Market Entry

FRD Center Market Entry Services Corporate Presentation http://www.frdcenter.ro FRD Center

FRD Center Market Entry Services Bucharest, Romania Consultant to Governmental and Export

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

sysbench 1.0: teaching an old dog new tricks Alexey Kopytov akopytov@gmail.com 1 The early days

Oracle Advanced Compression Tests Svetozar Kapusta 15 th of October 2009 What is CERN? CERN is:

YMMV Ov Overv erview iew In Inte tel NV l NVM M Em Emul ulat ator or

More Than A Network: Distributed OLTP on Clusters of Hardware Islands Danica Porobic , Pnar

Ener Energy gy and and Pe Performance Can Can a Wi Wimpy mpy Node Node Cl Clus uster

U i U i Using Using Flash Fl Fl Flash SSDs h h SSD SSDs as SSD as Primary Primary P i

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

HashMap Friday Four Square Today! Outside Gates at 4:15PM Not All Data is Linear

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues