comparison of cache replacement policies using
play

Comparison of Cache Replacement Policies using Teammates - - PowerPoint PPT Presentation

Comparison of Cache Replacement Policies using Teammates - Bhagyashree GEM5 - Nivin Simulator - Sri Divya Performance Bottleneck The performance gap between CPU and Dram has increased drastically. It leads to producer consumer


  1. Comparison of Cache Replacement Policies using Teammates - Bhagyashree GEM5 - Nivin Simulator - Sri Divya

  2. Performance Bottleneck • The performance gap between CPU and Dram has increased drastically. • It leads to producer consumer problem. • How do we fix this ?

  3. Ideal Cache What are we looking for ? - Cache that is big and fast. - Provides good temporal & spatial locality. - Cheap to buy. - Easier to construct.

  4. Big and Fast Memory • We introduced memory hierarchy to solve big and fast memory. • Now process can have memory to size of HDD and fast as Registers. • For simplicity lets look at the memory hierarch as a sequence <1, 2, 3, 4> • Lower number denotes faster cache.

  5. More about memory Lets model the problem to Consumer and Producer: (Mem) Processor Processor (consumer) (consumer) (Mem) - Consider the sequence <1, 2, 3, 4> increasing order of memory. - Seq1: <1, 2, 3, 4> makes perfect sense. - Seq2: <4, 3, 2, 1> is equivalent as <4> right ? - Seq3: <1, 3, 2, 4> does this improve the performance ?

  6. Research Idea 1: Increase the cache levels Level 1 • Previous discussion led to why cache level is limited 3 or 4. Level 2 • Is the cost the only factor stopping it from doing ?. Level 3 Level n

  7. Cache Replacement Policy - Cache full, CRP algorithm decides best to discard. - More about Locality - Temporal– “a resource that is referenced at one point in time will be referenced again sometime in the near future.” – eg: Web cache - Spatial - ” likelihood of referencing a resource is higher if a resource near it was just referenced” – eg: Matrix Multiplication

  8. • Is it valid to say LRU is more of a temporal locality solver. Research Idea • Simulation to the rescue. 2: Cache • What else contribute to spatial locality - * think of larger Replacement block size. Favors a Temp Spatial oral specific locality

  9. • From the previous idea, we want mix cache replacement Research Idea algorithms at different levels and study its performance. 3: Mixing of • One can argue that: Cache • Going back to Sequences <1, 2, ,3, 4> • One can argue that higher sequence number influences the rest. Replacement • Hmm, Eg: lets take matrix multiplication: algorithm in • Let Level 1 favor spatial locality and level 2 favor temporal one. different Level 1 - LRU levels Level 2 - Random Level 3 Level n

  10. Cache Associative or Set: • Fully Associative – the best miss rate, ( set size of 2^k) • Set Associative – (Intermediate) • Direct-Mapped (set size from 1) • Larger sets and higher associativity lead to fewer cache conflicts and lower miss rates, but they also increase the hardware cost.

  11. Research Idea 4: Cache associativity Intuition: Have higher set value for lower cache levels. Lets forget the cost for now ? But does reversing it gives you better performance. Coming back to seq: Cache sequence: <1, 2, 3, 4> Set sequence : <N, N-1, N-2> Reversing it:

  12. Research Idea 5: Lets combine (Flow Diagram) cache performance - Combining Let’s represent all the above hardware research idea into a tree. Application - Compare it to OPT and study where each algorithm stands. cost Complexity locality measurement mix Set associativity Levels benchmark

  13. Common Cache Replacement Policies LRU:

  14. • LRU: • Expensive in terms of speed and hardware. • Need to remember the order in which all N lines were accessed. • N! scenarios – O(log N!) LRU bits Common Cache • 2-ways → AB BA = 2 = 2! Replacement Policies • 3-ways → ABC ACB BAC BCA CAB CBA = 6 = 3! • Pseudo LRU: O(N) • Approximates LRU policy with a binary tree.

  15. Common Cache Replacement Policies Pseudo LRU:

  16. • Discards in contrast to LRU, the most recently used items first. Common Cache • MRU algorithms are most useful in Replacement Policies situations where the older an item is, MRU: the more likely it is to be accessed.

  17. Random Replacement • simpler, but at the cost of performance. Round Robin (or FIFO) Replacement Common Cache • Replacing oldest block in cache memory. Circular counter. Replacement Policies • Each cache memory set is accompanied with a circular counter which points to the next cache block to be replaced; the counter is updated on every cache miss.

  18. Common Cache Replacement Policies: L-1 Adaptive Cache Replacement: L-2 T1 |T1| + |T2| = C T2 Ghost Caches (Not in Memory) B1 B2

  19. Common Cache Replacement Policies • Clock with Adaptive Replacement: • Combines the advantages of ARC and Clock. • It used 4 doubly-linked lists : two clocks ( T1 & T2) and 2 simple LRU lists (B1 & B2). • T1 clock stores pages based on “recency” and T2 stores pages based on “frequency”. • B1 & B2 contain pages that have recently been evicted from T1 & T2 respectively.

  20. Simulation and Benchmark • Why do we need system simulator ? • CPU behavior depends on memory system and the behavior of memory system depends on the CPUs. • We choose gem5 over other simulators, as it is much easier to perform different measurements on cache replacement policies. • SPEC CPU benchmarks will be used for performance evaluation.

  21. Gem5 simulator • Gem5 = m5 + gems • a modular discrete event driven computer system simulator platform • Rich availability of modules in the framework.

  22. • GEM5 has an open source license, a good object-oriented infrastructure and a very active mailing list. 1. System Emulation • System Modes – 2. Full System 1. Atomic 2. Timing Simple • CPU models – Simple 3. In-order 4. Out-of-order 1. Classic Model (M5) • Memory System – 2. Ruby Model (GEMS) • Supported ISAs – ALPHA, ARM, X86, PowerPC, SPARC, MIPS

  23. • Flexibility • Availability • Collaboration • Example - • class L2Cache(Cache): size = '256kB' assoc = 8 hit_latency = 20 response_latency = 20 • system.cpu.apic_clk_domain.c lock 16000 # Clock period in ticks • system.cpu.numCycles 345518 # number of cpu cycles simulated

  24. • http://www.ece.uah.edu/~milenka/docs/milenkovic_acmse04r. pdf • http://www.cs.ucf.edu/~neslisah/final.pdf • http://www.cse.scu.edu/~mwang2/projects/Cache_replacemen t_10s.pdf • https://ece752.ece.wisc.edu/lect11-cache-replacement.pdf • https://en.wikipedia.org/wiki/Cache_replacement_policies • http://people.csail.mit.edu/emer/papers/2010.06.isca.rrip.pdf • http://www.cs.utexas.edu/users/mckinley/papers/evict-me-pac t-2002.pdf • http://snir.cs.illinois.edu/PDF/Temporal%20and%20Spatial%20L References ocality.pdf • https://math.mit.edu/~stevenj/18.335/ideal-cache.pdf • https://pdfs.semanticscholar.org/6ebe/c8701893a6770eb0e19a 0d4a732852c86256.pdf • https://fD3hhNnfL6kwww.youtube.com/watch?v= • http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/ge m5-tutorial/index.html • http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ • https://github.com/dependablecomputinglab/csi3102-gem5-ne w-cache-policy

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend