the cpu memory gap
play

The CPU-Memory Gap 100,000,000.0 10,000,000.0 Disk 1,000,000.0 - PDF document

High-Performance Data Storage Sean Barker 1 Data Storage Disks Hard disk (HDD) Solid state drive (SSD) Random Access Memory Dynamic RAM (DRAM) Static RAM (SRAM) Registers %rax, %rbx, ... Sean Barker 2 The


  1. High-Performance Data Storage Sean Barker 1 Data Storage • Disks • Hard disk (HDD) • Solid state drive (SSD) • Random Access Memory • Dynamic RAM (DRAM) • Static RAM (SRAM) • Registers • %rax, %rbx, ... Sean Barker 2

  2. The CPU-Memory Gap 100,000,000.0 10,000,000.0 Disk 1,000,000.0 SSD 100,000.0 Disk seek time 10,000.0 Time (ns) SSD access time 1,000.0 DRAM access time SRAM access time DRAM 100.0 CPU cycle time 10.0 Effective CPU cycle time 1.0 CPU 0.1 0.0 1985 1990 1995 2000 2003 2005 2010 2015 Year Sean Barker 3 Caching Smaller, faster, more expensive Cache 8 4 9 10 14 3 memory caches a subset of the blocks Data is copied in block-sized 10 4 transfer units Larger, slower, cheaper memory Memory viewed as par@@oned into “blocks” 0 1 2 3 4 4 5 6 7 8 9 10 10 11 12 13 14 15 Sean Barker 4

  3. Cache Hit Request: 14 Cache 8 9 14 14 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 5 Cache Miss Request: 12 Cache 8 12 9 14 3 12 Request: 12 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 12 13 14 15 Sean Barker 6

  4. Direct-Mapped Cache • Line V D Tag Data (8 Bytes) 0 – 1 2 • 3 4 … … 1020 – 1021 • 1022 1023 Sean Barker 7 Direct-Mapped Address Components • Address division: Line V D Tag Data (8 Bytes) (assumes 32-bit addresses) 0 1 Tag (19 bits) Index (10 bits) Byte offset (3 bits) 2 3 4 … … Index: 1020 Which line (row) should we check? 1021 Where could data be? 1022 1023 Sean Barker 8

  5. Direct-Mapped Address Components • Address division: Line V D Tag Data (8 Bytes) 0 1 Tag (19 bits) Index (10 bits) Byte offset (3 bits) 2 4217 4 2 3 4 1 4217 … … Byte offset tells us which subset of block 1020 to retrieve. 1021 1022 1023 0 1 2 3 4 5 6 7 Sean Barker 9 Direct-Mapped Cache Lookup Data Input: Memory Address Select Byte(s) Tag Index Byte offset V D Tag Data … = & 0: miss 1: hit Sean Barker 10

  6. 2-Way Set Associative (1024 lines) Tag (20 bits) Set (9 bits) Byte offset (3 bits) Same capacity as previous example: 1024 rows with 1 entry vs. 512 rows with 2 entries 3941 4 Set # V D Tag Data (8 Bytes) V D Tag Data (8 Bytes) 0 1 2 3 4 1 1 4063 1 0 3941 … … … 508 509 510 511 Sean Barker 11 2-Way Set Associative Line Matching Tag (20 bits) Set (9 bits) Byte offset (3 bits) 3941 4 Set # V D Tag Data (8 Bytes) V D Tag Data (8 Bytes) 0 1 2 3 4 1 1 4063 1 0 3941 … … … 508 509 510 511 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Multiplexer Select correct value. Sean Barker 12

  7. General Cache Model (S, E, B) E = 2 e lines per set Address of word: t bits s bits b bits S = 2 s sets tag set block index offset v tag 0 1 2 B-1 valid bit B = 2 b bytes per cache block (the data) Sean Barker 13 Locality ¢ Temporal locality: ¢ Spa0al locality: Sean Barker 14

  8. Locality Example sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Sean Barker 15 Locality Design int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) (v1) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) (v2) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Sean Barker 16

  9. Intel Core i7 Cache Hierarchy Processor package Core 0 Core 3 L1 i-cache and d-cache: 32 KB, 8-way, Regs Regs Access: 4 cycles L1 L1 L1 L1 L2 unified cache: d-cache i-cache … d-cache i-cache 256 KB, 8-way, Access: 10 cycles L2 unified cache L2 unified cache L3 unified cache: 8 MB, 16-way, Access: 40-75 cycles L3 unified cache Block size : 64 bytes for (shared by all cores) all caches. Main memory Sean Barker 17 The Memory Hierarchy Smaller On 1 cycle to access Chip Registers Faster CPU Storage instrs Costlier can L1, L2 per byte directly Cache(s) ~10’s of cycles to access access (SRAM) Main memory ~100 cycles to access (DRAM) Larger Flash SSD / Local network Slower Cheaper ~100 M cycles to access Local secondary storage (disk) per byte slower Remote secondary storage than local (tapes, Web servers / Internet) disk to access Sean Barker 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend