Cache Modeling and Optimization using Miniature Simulations Carl - PowerPoint PPT Presentation

Cache Modeling and Optimization using Miniature Simulations Carl Waldspurger CachePhysics, Inc. Trausti Saemundsson CachePhysics, Inc. Irfan Ahmad CachePhysics, Inc. Nohhyun Park Datos IO, Inc. USENIX Annual Technical Conference (ATC ’17) July 13, 2017

Motivation • Caching important, ubiquitous • Optimize valuable cache resources – Improve performance, QoS – Sizing, partitioning, tuning, cliff removal, … • Problem: need accurate, efficient models – Complex policies, non-linear, workload-dependent – No general, lightweight, online approach CachePhysics, Inc. USENIX ATC ’17 2

Cache Modeling • Cache utility curves �� – Performance as f (size, …) �� – Miss ratio curve (MRC) �� – Latency curve �� • Observations �� – Non-linear, cliffs �� – Non-monotonic bumps � �� CachePhysics, Inc. USENIX ATC ’17 3

MRC Construction Methods Exact Approximate Counter Stacks [OSDI ’14] Counter Stacks [OSDI ’14] Stack Algorithms Mattson algorithm SHARDS [FAST ’15] SHARDS [FAST ’15] LRU, LFU, … all sizes at once AET [ATC ’16] AET [ATC ’16] Any Algorithm separate simulation miniature simulation for each cache size [ATC ’17] ARC, LIRS, 2Q, FIFO, … CachePhysics, Inc. USENIX ATC ’17 4

Miniature Simulation • Simulate large cache using tiny one • Scale down reference stream, cache size – Random sampling based on hash (key) – Assumes statistical self-similarity • Run unmodified algorithm – LRU, LIRS, ARC, 2Q, FIFO, OPT, … – Track usual stats CachePhysics, Inc. USENIX ATC ’17 5

Scaling Down half size hash keys (colors) ≈ 2× refs cache half key space CachePhysics, Inc. USENIX ATC '17 6

Scaling Down ≈ 8× refs cache CachePhysics, Inc. USENIX ATC '17 7

Flexible Scaling • Time/space tradeoff sampling rate R – Fixed sampling rate R – Fixed mini size S m S e S m • Example: S e = 1M emulated size mini size – R = 0.005 ⇒ S m = 5000 – S m = 1000 ⇒ R = 0.001 S m = R × S e CachePhysics, Inc. USENIX ATC '17 10

Example Mini-Sim MRCs CachePhysics, Inc. USENIX ATC ’17 11

Mini-Sim Accuracy • 137 real-world traces – Storage block traces – CloudPhysics, MSR, FIU – 100 cache sizes per trace • Mean Absolute Error – | exact – approx | – Average over all sizes CachePhysics, Inc. USENIX ATC '17 12

Mini-Sim Efficiency • Variable costs – Both space and time scaled down by R – R =0.001 ⇒ simulation 1000× smaller, 1000× faster • Fixed costs – Hashing overhead for sampling – Footprint for code, libraries, etc. • Net improvement – R =0.001 ⇒ ~ 200× smaller, ~ 10× faster – Closer to 1000× if existing key hash or multiple sims CachePhysics, Inc. USENIX ATC '17 13

Mini-Sim Cache Tuning • Dynamic multi-model optimization – Simulate candidate configurations online – Periodically apply best to actual cache • Parameter adaptation experiments – LIRS S stack size, 5 mini-sims with f = 1.1 — 3 – 2Q A1 out size, 8 mini-sims with K out = 50% — 300% – R = 0.005, epoch = 1M refs CachePhysics, Inc. USENIX ATC ’17 14

LIRS Adaptation Examples CachePhysics, Inc. USENIX ATC ’17 15

2Q Adaptation Examples CachePhysics, Inc. USENIX ATC ’17 16

Talus Cliff Removal • Talus [HPCA ’15] – Needs MRC as input – Interpolates convex hull • Shadow partitions 𝛽 , 𝛾 – Steer different fractions 𝛽 of refs to each 𝛾 – Emulate cache sizes on convex hull via hashing CachePhysics, Inc. USENIX ATC '17 17

Talus for Non-LRU Policies? • Need efficient online MRCs • Support dynamic changes? – Workload and MRC evolve over time – Resize partitions, lazy vs. eager? – Migrate cache entries in “wrong” partition? • Not clear how to merge/migrate state CachePhysics, Inc. USENIX ATC ’17 18

SLIDE: Transparent Cliff Removal • S harded L ist with I nternal D ifferential E viction – Single unified cache, no hard partitions – Defer partitioning decisions until eviction – Avoids resizing, migration, complexity issues • New SLIDE list abstraction – No changes to ARC, LIRS, 2Q, LRU code – Replaces internal LRU/FIFO building blocks CachePhysics, Inc. USENIX ATC ’17 19

SLIDE List • Augment conventional list – Per-item hash value – Hash threshold determines current “partition” • Items totally ordered, no hard partitions • Evict from tail of over-quota partition CachePhysics, Inc. USENIX ATC ’17 20

SLIDE Experiments • Construct MRC online – 7 mini-sims {⅛, ¼, ½, 1, 2, 4, 8} × cache size – R =0.005, smoothed miss ratios • Update SLIDE settings periodically – Discrete convex hull every epoch (1M refs) – Set new “partition” targets for SLIDE lists CachePhysics, Inc. USENIX ATC ’17 21

SLIDE: Cliff Reduction 69% 48% 38% of potential gain CachePhysics, Inc. USENIX ATC ’17 22

SLIDE: Little Impact without Cliffs CachePhysics, Inc. USENIX ATC ’17 23

Conclusions • Mini-sim extremely effective – Robust, general method (ARC, LIRS, 2Q, LRU, OPT, …) – Average error < 0.01 with 0.1% sampling • Can optimize workloads/policies automatically – Dynamic parameter tuning – SLIDE transparent cliff removal CachePhysics, Inc. USENIX ATC ’17 24

Cache Modeling and Optimization using Miniature Simulations Carl - PowerPoint PPT Presentation

Cache Modeling and Optimization using Miniature Simulations Carl Waldspurger CachePhysics, Inc. Trausti Saemundsson CachePhysics, Inc. Irfan Ahmad CachePhysics, Inc. Nohhyun Park Datos IO, Inc. USENIX Annual Technical Conference (ATC

TOGGLE SWITCHES Series 01 Miniature Toggle Switch Series S1 Sealed Miniature Toggle Switch

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Slides Product Selection Guide Slide V Series ES TS 1000 GS L LP S Ultra- Power &

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Write Through No Write Allocate Cache Write Reference Check tag and index Yes Tag AND

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

3 = JKR F W d Pull- -Off Force Off Force Pull Contact Radius Contact Radius po A

Members of the SLS Beam Dynamics Group J. Chrin, M. Mu noz, A. Streun, M. B oge JLab

Memory Allocation Nima Honarmand (Based on slides by Don Porter and Mike Ferdman) Fall 2014 ::

Implicit Regularization in Nonconvex Statistical Estimation Yuxin Chen Electrical Engineering,

Snatch : Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

TEXTURE MAPPING SAUMITRA BAGCHI DEFINITION Texture: T he feel, appearance, or consistency of a

Cache Modeling and Optimization using Miniature Simulations Carl - PowerPoint PPT Presentation

Cache Modeling and Optimization using Miniature Simulations Carl Waldspurger CachePhysics, Inc. Trausti Saemundsson CachePhysics, Inc. Irfan Ahmad CachePhysics, Inc. Nohhyun Park Datos IO, Inc. USENIX Annual Technical Conference (ATC

TOGGLE SWITCHES Series 01 Miniature Toggle Switch Series S1 Sealed Miniature Toggle Switch

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Slides Product Selection Guide Slide V Series ES TS 1000 GS L LP S Ultra- Power &amp;

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Write Through No Write Allocate Cache Write Reference Check tag and index Yes Tag AND

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

3 = JKR F W d Pull- -Off Force Off Force Pull Contact Radius Contact Radius po A

Members of the SLS Beam Dynamics Group J. Chrin, M. Mu noz, A. Streun, M. B oge JLab

Memory Allocation Nima Honarmand (Based on slides by Don Porter and Mike Ferdman) Fall 2014 ::

Implicit Regularization in Nonconvex Statistical Estimation Yuxin Chen Electrical Engineering,

Snatch : Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks

TEXTURE MAPPING SAUMITRA BAGCHI DEFINITION Texture: T he feel, appearance, or consistency of a

Slides Product Selection Guide Slide V Series ES TS 1000 GS L LP S Ultra- Power &