Cache Memories 15-213: Introduc0on to Computer Systems 10 th - PowerPoint PPT Presentation

Carnegie Mellon Cache ¡Memories ¡ 15-‑213: ¡Introduc0on ¡to ¡Computer ¡Systems ¡ 10 th ¡Lecture, ¡Sep. ¡23, ¡2010. ¡ Instructors: ¡ ¡ Randy ¡Bryant ¡and ¡Dave ¡O’Hallaron ¡ 1

Carnegie Mellon Today ¡  Cache ¡memory ¡organiza7on ¡and ¡opera7on ¡  Performance ¡impact ¡of ¡caches ¡  The ¡memory ¡mountain ¡  Rearranging ¡loops ¡to ¡improve ¡spa0al ¡locality ¡  Using ¡blocking ¡to ¡improve ¡temporal ¡locality ¡ 2

Carnegie Mellon Cache ¡Memories ¡  Cache ¡memories ¡are ¡small, ¡fast ¡SRAM-‑based ¡memories ¡ managed ¡automa7cally ¡in ¡hardware. ¡ ¡  Hold ¡frequently ¡accessed ¡blocks ¡of ¡main ¡memory ¡  CPU ¡looks ¡first ¡for ¡data ¡in ¡caches ¡(e.g., ¡L1, ¡L2, ¡and ¡L3), ¡ then ¡in ¡main ¡memory. ¡  Typical ¡system ¡structure: ¡ CPU chip Register file Cache ALU memories System bus Memory bus Main I/O Bus interface memory bridge 3

Carnegie Mellon General ¡Cache ¡Organiza7on ¡(S, ¡E, ¡B) ¡ E ¡= ¡2 e ¡lines ¡per ¡set ¡ set ¡ line ¡ S ¡= ¡2 s ¡sets ¡ Cache ¡size: ¡ C ¡= ¡S ¡x ¡E ¡x ¡B ¡data ¡bytes ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ B-‑1 ¡ v ¡ valid ¡bit ¡ B ¡= ¡2 b ¡bytes ¡per ¡cache ¡block ¡(the ¡data) ¡ 4

Carnegie Mellon Cache ¡Read ¡ • Locate ¡set ¡ • Check ¡if ¡any ¡line ¡in ¡set ¡ has ¡matching ¡tag ¡ E ¡= ¡2 e ¡lines ¡per ¡set ¡ • Yes ¡+ ¡line ¡valid: ¡hit ¡ • Locate ¡data ¡star@ng ¡ at ¡offset ¡ Address ¡of ¡word: ¡ t ¡bits ¡ s ¡bits ¡ b ¡bits ¡ S ¡= ¡2 s ¡sets ¡ tag ¡ set ¡ block ¡ index ¡ offset ¡ data ¡begins ¡at ¡this ¡offset ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ B-‑1 ¡ valid ¡bit ¡ B ¡= ¡2 b ¡bytes ¡per ¡cache ¡block ¡(the ¡data) ¡ 5

Carnegie Mellon Example: ¡Direct ¡Mapped ¡Cache ¡(E ¡= ¡1) ¡ Direct ¡mapped: ¡One ¡line ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡int: ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ find ¡set ¡ S ¡= ¡2 s ¡sets ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 6

Carnegie Mellon Example: ¡Direct ¡Mapped ¡Cache ¡(E ¡= ¡1) ¡ Direct ¡mapped: ¡One ¡line ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡int: ¡ valid? ¡ ¡ ¡+ ¡ match: ¡assume ¡yes ¡= ¡hit ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ v ¡ tag ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ block ¡offset ¡ 7

Carnegie Mellon Example: ¡Direct ¡Mapped ¡Cache ¡(E ¡= ¡1) ¡ Direct ¡mapped: ¡One ¡line ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡int: ¡ valid? ¡ ¡ ¡+ ¡ match: ¡assume ¡yes ¡= ¡hit ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ block ¡offset ¡ int ¡(4 ¡Bytes) ¡is ¡here ¡ No ¡match: ¡old ¡line ¡is ¡evicted ¡and ¡replaced ¡ 8

Carnegie Mellon Direct-‑Mapped ¡Cache ¡Simula7on ¡ t=1 ¡ s=2 ¡ b=1 ¡ M=16 ¡byte ¡addresses, ¡B=2 ¡bytes/block, ¡ ¡ x ¡ xx ¡ x ¡ S=4 ¡sets, ¡E=1 ¡Blocks/set ¡ Address ¡trace ¡(reads, ¡one ¡byte ¡per ¡read): ¡ miss ¡ ¡ 0 ¡[0000 2 ], ¡ ¡ hit ¡ ¡1 ¡[0001 2 ], ¡ ¡ ¡ miss ¡ ¡7 ¡[0111 2 ], ¡ ¡ ¡ miss ¡ ¡8 ¡[1000 2 ], ¡ ¡ ¡ miss ¡ ¡0 ¡[0000 2 ] ¡ v ¡ Tag ¡ Block ¡ 1 ¡ 0 ¡ 1 ¡ 1 ¡ 1 ¡ 0 ¡ 0 ¡ ? ¡ M[0-‑1] ¡ M[8-‑9] ¡ M[0-‑1] ¡ ? ¡ Set ¡0 ¡ Set ¡1 ¡ Set ¡2 ¡ Set ¡3 ¡ 1 ¡ 0 ¡ M[6-‑7] ¡ 9

Carnegie Mellon Ignore ¡the ¡variables ¡sum, ¡i, ¡j ¡ A ¡Higher ¡Level ¡Example ¡ assume: ¡cold ¡(empty) ¡cache, ¡ a[0][0] ¡goes ¡here ¡ int sum_array_rows(double a[16][16]) { int i, j; double sum = 0; for (i = 0; i < 16; i++) for (j = 0; j < 16; j++) sum += a[i][j]; return sum; } int sum_array_cols(double a[16][16]) { int i, j; double sum = 0; for (j = 0; i < 16; i++) 32 ¡B ¡= ¡4 ¡doubles ¡ for (i = 0; j < 16; j++) sum += a[i][j]; return sum; blackboard ¡ } 10

Carnegie Mellon E-‑way ¡Set ¡Associa7ve ¡Cache ¡(Here: ¡E ¡= ¡2) ¡ E ¡= ¡2: ¡Two ¡lines ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡short ¡int: ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ find ¡set ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 11

Carnegie Mellon E-‑way ¡Set ¡Associa7ve ¡Cache ¡(Here: ¡E ¡= ¡2) ¡ E ¡= ¡2: ¡Two ¡lines ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡short ¡int: ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ compare ¡both ¡ valid? ¡ ¡+ ¡ ¡ match: ¡yes ¡= ¡hit ¡ v ¡ tag ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ block ¡offset ¡ 12

Carnegie Mellon E-‑way ¡Set ¡Associa7ve ¡Cache ¡(Here: ¡E ¡= ¡2) ¡ E ¡= ¡2: ¡Two ¡lines ¡per ¡set ¡ Assume: ¡cache ¡block ¡size ¡8 ¡bytes ¡ Address ¡of ¡short ¡int: ¡ t ¡bits ¡ 0…01 ¡ 100 ¡ compare ¡both ¡ valid? ¡ ¡+ ¡ ¡ match: ¡yes ¡= ¡hit ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ v ¡ tag ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ block ¡offset ¡ short ¡int ¡(2 ¡Bytes) ¡is ¡here ¡ No ¡match: ¡ ¡ • One ¡line ¡in ¡set ¡is ¡selected ¡for ¡evic7on ¡and ¡replacement ¡ • Replacement ¡policies: ¡random, ¡least ¡recently ¡used ¡(LRU), ¡… ¡ 13

Carnegie Mellon 2-‑Way ¡Set ¡Associa7ve ¡Cache ¡Simula7on ¡ t=2 ¡ s=1 ¡ b=1 ¡ M=16 ¡byte ¡addresses, ¡B=2 ¡bytes/block, ¡ ¡ xx ¡ x ¡ x ¡ S=2 ¡sets, ¡E=2 ¡blocks/set ¡ Address ¡trace ¡(reads, ¡one ¡byte ¡per ¡read): ¡ miss ¡ ¡ 0 ¡[0000 2 ], ¡ ¡ hit ¡ ¡1 ¡[0001 2 ], ¡ ¡ ¡ miss ¡ ¡7 ¡[0111 2 ], ¡ ¡ ¡ miss ¡ ¡8 ¡[1000 2 ], ¡ ¡ ¡ hit ¡ ¡0 ¡[0000 2 ] ¡ v ¡ Tag ¡ Block ¡ 1 ¡ 0 ¡ 00 ¡ ? ¡ M[0-‑1] ¡ ? ¡ Set ¡0 ¡ 0 ¡ 1 ¡ 10 ¡ M[8-‑9] ¡ 0 ¡ 1 ¡ 01 ¡ M[6-‑7] ¡ Set ¡1 ¡ 0 ¡ 14

Carnegie Mellon A ¡Higher ¡Level ¡Example ¡ Ignore ¡the ¡variables ¡sum, ¡i, ¡j ¡ assume: ¡cold ¡(empty) ¡cache, ¡ int sum_array_rows(double a[16][16]) a[0][0] ¡goes ¡here ¡ { int i, j; double sum = 0; for (i = 0; i < 16; i++) for (j = 0; j < 16; j++) sum += a[i][j]; return sum; } 32 ¡B ¡= ¡4 ¡doubles ¡ int sum_array_rows(double a[16][16]) { int i, j; double sum = 0; for (j = 0; i < 16; i++) for (i = 0; j < 16; j++) sum += a[i][j]; blackboard ¡ return sum; } 15

Cache Memories 15-213: Introduc0on to Computer Systems 10 th - PowerPoint PPT Presentation

Carnegie Mellon Cache Memories 15-213: Introduc0on to Computer Systems 10 th Lecture, Sep. 23, 2010. Instructors: Randy Bryant and Dave OHallaron 1 Carnegie

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Cache Memories 15-213: Introduc;on to Computer Systems 12 th

Virtual Memory: Systems 15-213: Introduc0on to Computer Systems

Slide 4 / 213 Slide 4 (Answer) / 213 Slide 5 / 213 Derivatives Exploration Exploration into the

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory: Concepts 15-213: Introduc0on to Computer Systems

Linking 15-213: Introduc0on to Computer Systems 11 th Lecture,

Web Services 15-213: Introduc0on to Computer Systems 21 st

Synchroniza+on: Advanced 15-213: Introduc0on to Computer Systems

System-Level I/O 15-213: Introduc0on to Computer Systems

Lecture 2: Single processor architecture and memory David Bindel 30 Aug 2011 Teaser What will

Modeling Hardware Timing 1 Caches and Pipelines Peter Puschner slides: P. Puschner, R. Kirner,

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Previous Lecture Slides for Lecture 11 ENCM 501: Principles of Computer Architecture Winter 2014

S POILER : Speculative Load Hazards Boost Rowhammer and Cache Attacks Saad Islam, Daniel

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Direct-Mapped Cache: Write Allocate with Write-Through Protocol Block size in bytes: B = 2 b WRITE

Cache Memories 15-213: Introduc0on to Computer Systems 10 th - PowerPoint PPT Presentation

Carnegie Mellon Cache Memories 15-213: Introduc0on to Computer Systems 10 th Lecture, Sep. 23, 2010. Instructors: Randy Bryant and Dave OHallaron 1 Carnegie

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Cache Memories 15-213: Introduc;on to Computer Systems 12 th

Virtual Memory: Systems 15-213: Introduc0on to Computer Systems

Slide 4 / 213 Slide 4 (Answer) / 213 Slide 5 / 213 Derivatives Exploration Exploration into the

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory: Concepts 15-213: Introduc0on to Computer Systems

Linking 15-213: Introduc0on to Computer Systems 11 th Lecture,

Web Services 15-213: Introduc0on to Computer Systems 21 st

Synchroniza+on: Advanced 15-213: Introduc0on to Computer Systems

System-Level I/O 15-213: Introduc0on to Computer Systems

Lecture 2: Single processor architecture and memory David Bindel 30 Aug 2011 Teaser What will

Modeling Hardware Timing 1 Caches and Pipelines Peter Puschner slides: P. Puschner, R. Kirner,

Cache Performance 1 C and cache misses (1) int array[1024]; // 4KB array int even_sum = 0,

Previous Lecture Slides for Lecture 11 ENCM 501: Principles of Computer Architecture Winter 2014

S POILER : Speculative Load Hazards Boost Rowhammer and Cache Attacks Saad Islam, Daniel

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Direct-Mapped Cache: Write Allocate with Write-Through Protocol Block size in bytes: B = 2 b WRITE

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL