Comparison of Cache Replacement Policies using Teammates - - PowerPoint PPT Presentation

Comparison of Cache Replacement Policies using Teammates - Bhagyashree GEM5 - Nivin Simulator - Sri Divya

Performance Bottleneck • The performance gap between CPU and Dram has increased drastically. • It leads to producer consumer problem. • How do we fix this ?

Ideal Cache What are we looking for ? - Cache that is big and fast. - Provides good temporal & spatial locality. - Cheap to buy. - Easier to construct.

Big and Fast Memory • We introduced memory hierarchy to solve big and fast memory. • Now process can have memory to size of HDD and fast as Registers. • For simplicity lets look at the memory hierarch as a sequence <1, 2, 3, 4> • Lower number denotes faster cache.

More about memory Lets model the problem to Consumer and Producer: (Mem) Processor Processor (consumer) (consumer) (Mem) - Consider the sequence <1, 2, 3, 4> increasing order of memory. - Seq1: <1, 2, 3, 4> makes perfect sense. - Seq2: <4, 3, 2, 1> is equivalent as <4> right ? - Seq3: <1, 3, 2, 4> does this improve the performance ?

Research Idea 1: Increase the cache levels Level 1 • Previous discussion led to why cache level is limited 3 or 4. Level 2 • Is the cost the only factor stopping it from doing ?. Level 3 Level n

Cache Replacement Policy - Cache full, CRP algorithm decides best to discard. - More about Locality - Temporal– “a resource that is referenced at one point in time will be referenced again sometime in the near future.” – eg: Web cache - Spatial - ” likelihood of referencing a resource is higher if a resource near it was just referenced” – eg: Matrix Multiplication

• Is it valid to say LRU is more of a temporal locality solver. Research Idea • Simulation to the rescue. 2: Cache • What else contribute to spatial locality - * think of larger Replacement block size. Favors a Temp Spatial oral specific locality

• From the previous idea, we want mix cache replacement Research Idea algorithms at different levels and study its performance. 3: Mixing of • One can argue that: Cache • Going back to Sequences <1, 2, ,3, 4> • One can argue that higher sequence number influences the rest. Replacement • Hmm, Eg: lets take matrix multiplication: algorithm in • Let Level 1 favor spatial locality and level 2 favor temporal one. different Level 1 - LRU levels Level 2 - Random Level 3 Level n

Cache Associative or Set: • Fully Associative – the best miss rate, ( set size of 2^k) • Set Associative – (Intermediate) • Direct-Mapped (set size from 1) • Larger sets and higher associativity lead to fewer cache conflicts and lower miss rates, but they also increase the hardware cost.

Research Idea 4: Cache associativity Intuition: Have higher set value for lower cache levels. Lets forget the cost for now ? But does reversing it gives you better performance. Coming back to seq: Cache sequence: <1, 2, 3, 4> Set sequence : <N, N-1, N-2> Reversing it:

Research Idea 5: Lets combine (Flow Diagram) cache performance - Combining Let’s represent all the above hardware research idea into a tree. Application - Compare it to OPT and study where each algorithm stands. cost Complexity locality measurement mix Set associativity Levels benchmark

Common Cache Replacement Policies LRU:

• LRU: • Expensive in terms of speed and hardware. • Need to remember the order in which all N lines were accessed. • N! scenarios – O(log N!) LRU bits Common Cache • 2-ways → AB BA = 2 = 2! Replacement Policies • 3-ways → ABC ACB BAC BCA CAB CBA = 6 = 3! • Pseudo LRU: O(N) • Approximates LRU policy with a binary tree.

Common Cache Replacement Policies Pseudo LRU:

• Discards in contrast to LRU, the most recently used items first. Common Cache • MRU algorithms are most useful in Replacement Policies situations where the older an item is, MRU: the more likely it is to be accessed.

Random Replacement • simpler, but at the cost of performance. Round Robin (or FIFO) Replacement Common Cache • Replacing oldest block in cache memory. Circular counter. Replacement Policies • Each cache memory set is accompanied with a circular counter which points to the next cache block to be replaced; the counter is updated on every cache miss.

Common Cache Replacement Policies: L-1 Adaptive Cache Replacement: L-2 T1 |T1| + |T2| = C T2 Ghost Caches (Not in Memory) B1 B2

Common Cache Replacement Policies • Clock with Adaptive Replacement: • Combines the advantages of ARC and Clock. • It used 4 doubly-linked lists : two clocks ( T1 & T2) and 2 simple LRU lists (B1 & B2). • T1 clock stores pages based on “recency” and T2 stores pages based on “frequency”. • B1 & B2 contain pages that have recently been evicted from T1 & T2 respectively.

Simulation and Benchmark • Why do we need system simulator ? • CPU behavior depends on memory system and the behavior of memory system depends on the CPUs. • We choose gem5 over other simulators, as it is much easier to perform different measurements on cache replacement policies. • SPEC CPU benchmarks will be used for performance evaluation.

Gem5 simulator • Gem5 = m5 + gems • a modular discrete event driven computer system simulator platform • Rich availability of modules in the framework.

• GEM5 has an open source license, a good object-oriented infrastructure and a very active mailing list. 1. System Emulation • System Modes – 2. Full System 1. Atomic 2. Timing Simple • CPU models – Simple 3. In-order 4. Out-of-order 1. Classic Model (M5) • Memory System – 2. Ruby Model (GEMS) • Supported ISAs – ALPHA, ARM, X86, PowerPC, SPARC, MIPS

• Flexibility • Availability • Collaboration • Example - • class L2Cache(Cache): size = '256kB' assoc = 8 hit_latency = 20 response_latency = 20 • system.cpu.apic_clk_domain.c lock 16000 # Clock period in ticks • system.cpu.numCycles 345518 # number of cpu cycles simulated

• http://www.ece.uah.edu/~milenka/docs/milenkovic_acmse04r. pdf • http://www.cs.ucf.edu/~neslisah/final.pdf • http://www.cse.scu.edu/~mwang2/projects/Cache_replacemen t_10s.pdf • https://ece752.ece.wisc.edu/lect11-cache-replacement.pdf • https://en.wikipedia.org/wiki/Cache_replacement_policies • http://people.csail.mit.edu/emer/papers/2010.06.isca.rrip.pdf • http://www.cs.utexas.edu/users/mckinley/papers/evict-me-pac t-2002.pdf • http://snir.cs.illinois.edu/PDF/Temporal%20and%20Spatial%20L References ocality.pdf • https://math.mit.edu/~stevenj/18.335/ideal-cache.pdf • https://pdfs.semanticscholar.org/6ebe/c8701893a6770eb0e19a 0d4a732852c86256.pdf • https://fD3hhNnfL6kwww.youtube.com/watch?v= • http://pages.cs.wisc.edu/~david/courses/cs752/Spring2015/ge m5-tutorial/index.html • http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ • https://github.com/dependablecomputinglab/csi3102-gem5-ne w-cache-policy

Comparison of Cache Replacement Policies using Teammates - - PowerPoint PPT Presentation

Comparison of Cache Replacement Policies using Teammates - Bhagyashree GEM5 - Nivin Simulator - Sri Divya Performance Bottleneck The performance gap between CPU and Dram has increased drastically. It leads to producer consumer

Cache Replacement Championship The 3P and 4P cache replacement policies Pierre Michaud INRIA

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Policies Philipp Koehn 21 October 2019 Philipp Koehn Computer Systems Fundamentals: Cache

Cache Performance Samira Khan March 28, 2017 Agenda Review from last lecture Cache

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense

Performance analysis Agenda Code Profiling Linux tools GNU Profiler (Gprof)

The Parking Fairy Using open data effectively in mobile apps. Background December 2015:

Invited talk at Dansk Selskab for Datalogi Copenhagen, 13 June 2002 Title: Software tools for

Hiding Stars with Fireworks: Location Privacy through Camouflage Based on paper written by Joseph

Understanding Optimal Caching and Opportunistic Caching at The Edge of Information Centric

Proactive-Caching based Information Centric Networking Architecture for Reliable Green

DNS Session 2: DNS cache How caching NS works (1) operation and DNS debugging If we've dealt

Sambuz

Useful Links

Newsletter

Mail Us