Cache Modeling and Optimization using Miniature Simulations Carl - - PowerPoint PPT Presentation

cache modeling and optimization using miniature
SMART_READER_LITE
LIVE PREVIEW

Cache Modeling and Optimization using Miniature Simulations Carl - - PowerPoint PPT Presentation

Cache Modeling and Optimization using Miniature Simulations Carl Waldspurger CachePhysics, Inc. Trausti Saemundsson CachePhysics, Inc. Irfan Ahmad CachePhysics, Inc. Nohhyun Park Datos IO, Inc. USENIX Annual Technical Conference (ATC


slide-1
SLIDE 1

Cache Modeling and Optimization using Miniature Simulations

USENIX Annual Technical Conference (ATC ’17) July 13, 2017

Carl Waldspurger CachePhysics, Inc. Trausti Saemundsson CachePhysics, Inc. Irfan Ahmad CachePhysics, Inc. Nohhyun Park Datos IO, Inc.

slide-2
SLIDE 2

Motivation

  • Caching important, ubiquitous
  • Optimize valuable cache resources

– Improve performance, QoS – Sizing, partitioning, tuning, cliff removal, …

  • Problem: need accurate, efficient models

– Complex policies, non-linear, workload-dependent – No general, lightweight, online approach

CachePhysics, Inc. USENIX ATC ’17 2

slide-3
SLIDE 3

Cache Modeling

  • Cache utility curves

– Performance as f(size, …) – Miss ratio curve (MRC) – Latency curve

  • Observations

– Non-linear, cliffs – Non-monotonic bumps

CachePhysics, Inc. USENIX ATC ’17 3

slide-4
SLIDE 4

Counter Stacks [OSDI ’14] SHARDS [FAST ’15] AET [ATC ’16]

MRC Construction Methods

Exact Approximate Stack Algorithms

LRU, LFU, …

Any Algorithm

ARC, LIRS, 2Q, FIFO, …

CachePhysics, Inc. USENIX ATC ’17 4

Mattson algorithm all sizes at once separate simulation for each cache size Counter Stacks [OSDI ’14] SHARDS [FAST ’15] AET [ATC ’16] miniature simulation

[ATC ’17]

slide-5
SLIDE 5

Miniature Simulation

  • Simulate large cache using tiny one
  • Scale down reference stream, cache size

– Random sampling based on hash(key) – Assumes statistical self-similarity

  • Run unmodified algorithm

– LRU, LIRS, ARC, 2Q, FIFO, OPT, … – Track usual stats

CachePhysics, Inc. USENIX ATC ’17 5

slide-6
SLIDE 6

Scaling Down

CachePhysics, Inc. USENIX ATC '17 6

refs cache

half key space half size hash keys (colors)

slide-7
SLIDE 7

Scaling Down

CachePhysics, Inc. USENIX ATC '17 7

refs cache

slide-8
SLIDE 8

Scaling Down

CachePhysics, Inc. USENIX ATC '17 8

refs cache

32×

slide-9
SLIDE 9

Scaling Down

CachePhysics, Inc. USENIX ATC '17 9

refs cache

128×

slide-10
SLIDE 10

Flexible Scaling

CachePhysics, Inc. USENIX ATC '17 10

Sm

mini size sampling rate

R Se

emulated size

Sm = R × Se

  • Time/space tradeoff

– Fixed sampling rate R – Fixed mini size Sm

  • Example: Se = 1M

– R = 0.005 ⇒ Sm = 5000 – Sm= 1000 ⇒ R = 0.001

slide-11
SLIDE 11

Example Mini-Sim MRCs

CachePhysics, Inc. USENIX ATC ’17 11

slide-12
SLIDE 12

Mini-Sim Accuracy

CachePhysics, Inc. USENIX ATC '17 12

  • 137 real-world traces

– Storage block traces – CloudPhysics, MSR, FIU – 100 cache sizes per trace

  • Mean Absolute Error

– | exact – approx | – Average over all sizes

slide-13
SLIDE 13

Mini-Sim Efficiency

  • Variable costs

– Both space and time scaled down by R – R=0.001 ⇒ simulation 1000× smaller, 1000× faster

  • Fixed costs

– Hashing overhead for sampling – Footprint for code, libraries, etc.

  • Net improvement

– R=0.001 ⇒ ~ 200× smaller, ~ 10× faster – Closer to 1000× if existing key hash or multiple sims

CachePhysics, Inc. USENIX ATC '17 13

slide-14
SLIDE 14

Mini-Sim Cache Tuning

  • Dynamic multi-model optimization

– Simulate candidate configurations online – Periodically apply best to actual cache

  • Parameter adaptation experiments

– LIRS S stack size, 5 mini-sims with f = 1.1 — 3 – 2Q A1out size, 8 mini-sims with Kout = 50% — 300% – R = 0.005, epoch = 1M refs

CachePhysics, Inc. USENIX ATC ’17 14

slide-15
SLIDE 15

LIRS Adaptation Examples

CachePhysics, Inc. USENIX ATC ’17 15

slide-16
SLIDE 16

2Q Adaptation Examples

CachePhysics, Inc. USENIX ATC ’17 16

slide-17
SLIDE 17

Talus Cliff Removal

CachePhysics, Inc. USENIX ATC '17 17

  • Talus [HPCA ’15]

– Needs MRC as input – Interpolates convex hull

  • Shadow partitions 𝛽, 𝛾

– Steer different fractions

  • f refs to each

– Emulate cache sizes on convex hull via hashing

𝛽 𝛾

slide-18
SLIDE 18

Talus for Non-LRU Policies?

  • Need efficient online MRCs
  • Support dynamic changes?

– Workload and MRC evolve over time – Resize partitions, lazy vs. eager? – Migrate cache entries in “wrong” partition?

  • Not clear how to merge/migrate state

CachePhysics, Inc. USENIX ATC ’17 18

slide-19
SLIDE 19

SLIDE: Transparent Cliff Removal

  • Sharded List with Internal Differential Eviction

– Single unified cache, no hard partitions – Defer partitioning decisions until eviction – Avoids resizing, migration, complexity issues

  • New SLIDE list abstraction

– No changes to ARC, LIRS, 2Q, LRU code – Replaces internal LRU/FIFO building blocks

CachePhysics, Inc. USENIX ATC ’17 19

slide-20
SLIDE 20

SLIDE List

  • Augment conventional list

– Per-item hash value – Hash threshold determines current “partition”

  • Items totally ordered, no hard partitions
  • Evict from tail of over-quota partition

CachePhysics, Inc. USENIX ATC ’17 20

slide-21
SLIDE 21

SLIDE Experiments

  • Construct MRC online

– 7 mini-sims {⅛, ¼, ½, 1, 2, 4, 8} × cache size – R=0.005, smoothed miss ratios

  • Update SLIDE settings periodically

– Discrete convex hull every epoch (1M refs) – Set new “partition” targets for SLIDE lists

CachePhysics, Inc. USENIX ATC ’17 21

slide-22
SLIDE 22

SLIDE: Cliff Reduction

CachePhysics, Inc. USENIX ATC ’17 22

69%

  • f potential gain

48% 38%

slide-23
SLIDE 23

SLIDE: Little Impact without Cliffs

CachePhysics, Inc. USENIX ATC ’17 23

slide-24
SLIDE 24

Conclusions

  • Mini-sim extremely effective

– Robust, general method (ARC, LIRS, 2Q, LRU, OPT, …) – Average error < 0.01 with 0.1% sampling

  • Can optimize workloads/policies automatically

– Dynamic parameter tuning – SLIDE transparent cliff removal

CachePhysics, Inc. USENIX ATC ’17 24