Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - PowerPoint PPT Presentation

Cache Memory Raul Queiroz Feitosa

Content • Memory Hierarchy • Principle of Locality • Some Definitions • Cache Architectures • Fully Associative • Direct Mapping • Set Associative • Replacement Policy • Main Memory Update Polic y 2 Cache Memory 10/09/2020

Memory Hierarchy • Tradeoff cost  speed • Memory split in hierarchical levels capacity Access time Access probability Cost/bit spped • Request sent to the next level below until it is carried out. 3 Cache Memory 10/09/2020

Cache operation – overview • CPU requests contents of memory location • Check cache for this data • If present, get from cache (fast) • If not present, read required block from main memory to cache • Then deliver from cache to CPU • Cache includes tags to identify which block of main memory is in each cache slot 4 Cache Memory 10/09/2020

Cache Read Operation 5 Cache Memory 10/09/2020

Cache and Main Memory from now on 6 Cache Memory 10/09/2020

Cache Addressing Where does cache sit? • Between processor and virtual memory management unit • Between MMU and main memory • Logical cache (virtual cache) stores data using virtual • addresses Processor accesses cache directly, not through physical cache • Cache access faster, before MMU address translation • Virtual addresses use same address space for different applications • • Must flush cache on each context switch Physical cache stores data using main memory physical • addresses 7 Cache Memory 10/09/2020

Principle of locality • Spatial The processor tends to access few restricted areas of the address space. • Temporal The processor tends to access in the near future addresses accessed in the recent past.. 8 Cache Memory 10/09/2020

Definitions Hit : access served by the cache • Miss: access not served by the cache • Hitratio: proportion of accesses served by the cache • number of accesses served by the cach  h total number of accesses Missratio: proportion of accesses not served by the cache • number of accesses not served by the cach  m total number of accesses Clearly m+h =1 • 9 Cache Memory 10/09/2020

Definitions Example: Let  h be the hitrati o  t hit the access time on a hit  t miss the access time on a miss The average memory access time t will be: t miss t = h t hit + ( 1-h ) t miss t hit h 0 1 10 Cache Memory 10/09/2020

Definitions Block  All set of 2 b bytes in consecutive addresses, starting in addresses whose b least significant bits are zero.  Note that the addresses of the bytes belonging to the same block are coincident to the left of the b least significant bytes . address content 00000000 00000001 block 0 … … 00000111 00001000 00001001 block 1 … … 00001111  The data exchange between the cache and the main memory is carried out block-by-block. Does it make sense? 11 Cache Memory 10/09/2020

Fully Associative Cache Architeture lines 0 1 ●●● ●●● ●●● 2 L lines ●●● ●●● ●●● ●●● 2 L -1 valid TAG VALUE bit indicates if the contains the number contains a copy of the memory block line contains a of the block copied in valid memory that line block copy 12 Cache Memory 10/09/2020

Fully Associative Cache Operation ADDRESS GENERATED BY THE CPU a-1 b b-1 0 b Block number of the addressed points to the byte/word byte/word in the block  The cache controller compares the block number and the TAG field of all lines simultaneously (associative search).  If a TAG matches the block number and the valid bit is “on”, it is a hit, otherwise it is a miss.  The b least significant bits are used as points to the byte/word within the block. 13 Cache Memory 10/09/2020

Fully Associative Cache Problem To compare the block number with the TAG fields of all cache lines simultaneously (associative search) one needs lots of comparators. Consequence  Fully associative design is only used for small capacity caches.. 14 Cache Memory 10/09/2020

Direct Mapped Caches Basic Idea:  Assign each main memory block to a single cache line. ● ● ● ● f main memory ● ● ●● ● ● cache blocks ●● ● ● ● lines ● ● ● ● ● ● ● ● ● ● 15 Cache Memory 10/09/2020

Direct Mapped Caches Basic Idea:  Each main memory block can only be loaded into the cache line it is mapped to.  Thus, it will be no more necessary to check all lines but just one. 16 Cache Memory 10/09/2020

Direct Mapped Caches Operation: ADDRESS GENERATED BY THE CPU a-1 b+L b+L-1 b b-1 0 L b points to a to be compared with the TAG field points toa cache line byte/work within the block  The cache controller compares the address field to the left with the TAG field of the (single) cache line defined by the L bits. 17 Cache Memory 10/09/2020

Direct Mapped Caches Problem: Some lines may be often requested by different blocks, while other lines are rarely requested, which implies in a non-optimal use of the cache capacity. 18 Cache Memory 10/09/2020

Set Associative Caches Basic Idea:  Instead of assigning each main memory block to a single cache line, assign each block to a set (associative) of cache lines. associative sets ● ● ● f ● main memory ● ● ●● ● ● cache lines blocks ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 19 Cache Memory 10/09/2020

Set Associative Caches Basic Idea:  A block may be loaded into any cache line of the associative set it is assigned to. associative sets ● ● ● f ● main memory ● ● ●● ● ● cache lines blocks ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 20 Cache Memory 10/09/2020

Set Associative Caches Architecture set v TAG VALUE v TAG VALUE v TAG VALUE … 0 … 1 2 S sets … … … … … 2 S -1 line 0 line 1 line 2 c -1 21 Cache Memory 10/09/2020

Set Associative Caches Operation: assume that there are 2 S sets ADDRESS GENERATED BY THE CPU a-1 b+S b+S-1 b b-1 0 S b points to a to be compared with the TAG points toa set byte/work within field the block  The cache controller compares the address field to the left with the TAG field of all rows of the associative set defined by the S bits (associative search). 22 Cache Memory 10/09/2020

Set Associative Caches  Fully Associative Caches: Are set associative caches with a single associative set.  Direct Mapped Caches:  Are set associative caches whose “associative” sets contain each a single line. 23 Cache Memory 10/09/2020

Set Associative Caches Set size:  Keeping the overal cache capacity constant and changing the number of lines/set. missratio (h) Above 4 lines/set the missratio does not change significativelly Lines/set 2 0 2 1 2 2 2 3 Direct mapped two-way tour-way eigth-way Fully associative 24 Cache Memory 10/09/2020

Set Associative Caches 1.0 0.9 0.8 0.7 Hit ratio 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M C ache size (bytes) direct 2-way 4-way 8-way 16-way 25 Cache Memory 10/09/2020

Replacement Policy  Least Recently Used - LRU  The least recently used line will be merged from cache to make room for a new main memory block.  Pseudo LRU  Example: a four-way set associative cache =0 bit I 1 points to the least recently used line in this half =0 =1 points to the least recently bit I 0 =0 used half bit I 2 points to the least recently used line in this half =1 =1  The least recently used line of the least recently used half is elected to leave the cache. 26 Cache Memory 10/09/2020

Replacement Policy • Example:  A four-way set associative  The lines in the set are initially empty • LRU only accesses to the set → a b c d a e b e f a b c d a e b e a b c d a e b a b c d a a a b c d d • Pseudo LRU only accesses to the set → a b c d a e b e f c d d e e e c c d d d a b b b a a b b a a a b b a a 27 Cache Memory 10/09/2020

Main Memory Update Policy • Write Through  All writes are carried out in the cache and in the main memory.  The CPU does not halt until the main memory is updated. • Problem  Lots of traffic → specially harmful in multiprocessors  15% of memory references are writes. 28 Cache Memory 10/09/2020

Main Memory Update Policy • Write Back  Each cache line has a bit ( dirty ) that indicates when set (=1), that te block copy in the cache differ from the main memory.  When the block is brought from main memory into the cache, dirty =0  All writes are performed in the cache only and, in this case, dirty=1.  The main memory is updated when the block selected for replacement has dirty=1 . • I/O must access main memory through cache 29 Cache Memory 10/09/2020

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - PowerPoint PPT Presentation

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update Polic

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Hardware Hardware Implementation Implementation Pascal Gautron R&D Engineer Thomson

Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architecture Emmanuel Jeannot,

St orage Hierarchy 10: St orage and File Syst em Regist ers Basics L1 Cache Fast er, Smaller,

Final Exam Review CS 351: Systems Programing Michael Saelee <lee@iit.edu> Coverage -

Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1:

NOW Handout Page 1 9 Parallel Architecture Framework Scalable Machines What are the design

Distributed and on-demand cache for CMS experiment at LHC Diego Ciangottini on behalf of CMS

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy - PowerPoint PPT Presentation

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some Definitions Cache Architectures Fully Associative Direct Mapping Set Associative Replacement Policy Main Memory Update Polic

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Hardware Hardware Implementation Implementation Pascal Gautron R&amp;D Engineer Thomson

Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architecture Emmanuel Jeannot,

St orage Hierarchy 10: St orage and File Syst em Regist ers Basics L1 Cache Fast er, Smaller,

Final Exam Review CS 351: Systems Programing Michael Saelee &lt;lee@iit.edu&gt; Coverage -

Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1:

NOW Handout Page 1 9 Parallel Architecture Framework Scalable Machines What are the design

Distributed and on-demand cache for CMS experiment at LHC Diego Ciangottini on behalf of CMS

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

Hardware Hardware Implementation Implementation Pascal Gautron R&D Engineer Thomson

Final Exam Review CS 351: Systems Programing Michael Saelee <lee@iit.edu> Coverage -