 
              Review: Major Components of a Computer Review: Major Components of a Computer Processor Devices Memory Memory Control Hierarchy Hierarchy Input Memory Datapath Output Original slides from: Computer Architecture A Quantitative Approach Hennessy, Patterson Memory Main Cache Secondary Modified slides by YashwantMalaiya Memory (Disk) Colorado State University Processor-Memory Performance Gap Processor-Memory Performance Gap The Memory Hierarchy Goal The Memory Hierarchy Goal µProc 55%/year 10000 (2X/1.5yr) Fact: Large memories are slow and fast Fact: Large memories are slow and fast “Moore’s Law” memories are small memories are small 1000 Performance Processor-Memory 100 How do we create a memory that gives the How do we create a memory that gives the Performance Gap illusion of being large, cheap and fast (most of illusion of being large, cheap and fast (most of (grows 50%/year) 10 the time)? the time)? DRAM n With hierarchy n With hierarchy 7%/year 1 n With parallelism n With parallelism (2X/10yrs) 1980 1983 1986 1989 1992 1995 1998 2001 2004 Year 1
§ 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology Memory Technology q Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the Static RAM (SRAM) Static RAM (SRAM) speed offered by the fastest technology n 0.5-2.5ns, 2010: $2000–$5000 per GB (2015: same?) n 0.5-2.5ns, 2010: $2000–$5000 per GB (2015: same?) Dynamic RAM (DRAM) Dynamic RAM (DRAM) On-Chip Comp on en ts n 50-70ns, 2010: $20–$75 per GB (2015: <$10 per GB) n 50-70ns, 2010: $20–$75 per GB (2015: <$10 per GB) Control Flash Memory Flash Memory Cache Secondary Instr Second ITLB DTLB n 70-150ns, 2010: $4-$12 per GB (2015: $.14 per GB) n 70-150ns, 2010: $4-$12 per GB (2015: $.14 per GB) Memory Main Level Memory (Disk) Datapath RegFile Cache Magnetic disk Magnetic disk Cache Data (DRAM) (SRAM) n 5ms-20ms, $0.2-$2.0 per GB (2015: $.7 per GB) n 5ms-20ms, $0.2-$2.0 per GB (2015: $.7 per GB) Ideal memory Ideal memory Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s Size (bytes): 100’s 10K’s M’s G’s T’s n Access time of SRAM n Access time of SRAM Cost: highest lowest Chapter 5 — Large and Chapter 5 — Large and n Capacity and cost/GB of disk n Capacity and cost/GB of disk Fast: Exploiting Memory Fast: Exploiting Memory Hierarchy — 6 Hierarchy — 6 Principle of Locality Principle of Locality Taking Advantage of Locality Taking Advantage of Locality Programs access a small proportion of their Programs access a small proportion of their Memory hierarchy Memory hierarchy address space at any time address space at any time Store everything on disk Store everything on disk Temporal locality Temporal locality Copy recently accessed (and nearby) items from Copy recently accessed (and nearby) items from n Items accessed recently are likely to be accessed n Items accessed recently are likely to be accessed disk to smaller DRAM memory disk to smaller DRAM memory again soon again soon n Main memory n Main memory n e.g., instructions in a loop, induction variables n e.g., instructions in a loop, induction variables Copy more recently accessed (and nearby) Copy more recently accessed (and nearby) Spatial locality Spatial locality items from DRAM to smaller SRAM memory items from DRAM to smaller SRAM memory n Items near those accessed recently are likely to be n Items near those accessed recently are likely to be n Cache memory attached to CPU n Cache memory attached to CPU accessed soon accessed soon n E.g., sequential instruction access, array data n E.g., sequential instruction access, array data Chapter 5 — Large and Chapter 5 — Large and Chapter 5 — Large and Chapter 5 — Large and Fast: Exploiting Memory Fast: Exploiting Memory Fast: Exploiting Memory Fast: Exploiting Memory Hierarchy — 7 Hierarchy — 7 Hierarchy — 8 Hierarchy — 8 2
Memory Hierarchy Levels Memory Hierarchy Levels Characteristics of the Memory Hierarchy Characteristics of the Memory Hierarchy Block (aka line): unit of copying Block (aka line): unit of copying Processor Inclusive– n May be multiple words n May be multiple words 4-8 bytes (word ) what is in L1$ If accessed data is present in If accessed data is present in is a subset of upper level upper level Increasing L1$ what is in L2$ distance n Hit: access satisfied by upper level n Hit: access satisfied by upper level 8-32 bytes (b lo ck) is a subset of from the L2$ what is in MM Hit ratio: hits/accesses Hit ratio: hits/accesses processor that is a If accessed data is absent If accessed data is absent 1 to 4 blocks in access subset of is in Main Memory n Miss: block copied from lower level n Miss: block copied from lower level time SM Time taken: miss penalty Time taken: miss penalty 1,024+ bytes (d isk se cto r = pag e) Miss ratio: misses/accesses Miss ratio: misses/accesses Secondary Memory = 1 – hit ratio = 1 – hit ratio n Then accessed data supplied from n Then accessed data supplied from upper level upper level (Relative) size of the memory at each level Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9 § 5.2 The Basics of Caches Cache Size Cache Size Cache Memory Cache Memory Cache memory Cache memory n The level of the memory hierarchy closest to the CPU n The level of the memory hierarchy closest to the CPU hit rate Given accesses X 1 , …, X n–1 , X n Given accesses X 1 , …, X n–1 , X n 1/(cycle time) n How do we know if the data is present? n Where do we look? optimum Increasing cache size Chapter 5 — Large and Chapter 5 — Large and Fast: Exploiting Memory Fast: Exploiting Memory 11 11 Hierarchy — 12 Hierarchy — 12 3
Block Size Considerations Block Size Considerations Increasing Hit Rate Increasing Hit Rate Hit rate increases with cache size. Hit rate increases with cache size. Larger blocks should reduce miss rate Larger blocks should reduce miss rate Hit rate mildly depends on block size. Hit rate mildly depends on block size. n Due to spatial locality n Due to spatial locality 100% But in a fixed-sized cache But in a fixed-sized cache 0% n Larger blocks ⇒ fewer of them n Larger blocks ⇒ fewer of them miss rate = 1 – hit rate 64KB Decreasing Decreasing More competition ⇒ increased miss rate More competition ⇒ increased miss rate 16KB chances of chances of hit rate, h getting covering large n Larger blocks ⇒ pollution n Larger blocks ⇒ pollution 95% fragmented 5% data locality data Larger miss penalty Larger miss penalty n Can override benefit of reduced miss rate n Can override benefit of reduced miss rate Cache size = 4KB n Early restart and critical-word-first can help n Early restart and critical-word-first can help 10% 90% 16B 32B 64B 128B 256B Block size Chapter 5 — Large and Chapter 5 — Large and Fast: Exploiting Memory Fast: Exploiting Memory Hierarchy — 13 Hierarchy — 13 14 14 Cache Misses Cache Misses Static vs Dynamic RAMs Static vs Dynamic RAMs On cache hit, CPU proceeds normally On cache hit, CPU proceeds normally On cache miss On cache miss n Stall the CPU pipeline n Stall the CPU pipeline n Fetch block from next level of hierarchy n Fetch block from next level of hierarchy n Instruction cache miss n Instruction cache miss Restart instruction fetch Restart instruction fetch n Data cache miss n Data cache miss Complete data access Complete data access Chapter 5 — Large and Chapter 5 — Large and Chapter 5 — Large and Chapter 5 — Large and Fast: Exploiting Memory Fast: Exploiting Memory Fast: Exploiting Memory Fast: Exploiting Memory Hierarchy — 15 Hierarchy — 15 Hierarchy — 16 Hierarchy — 16 4
Random Access Memory (RAM) Random Access Memory (RAM) Six-Transistor SRAM Cell Six-Transistor SRAM Cell Address bits bit bit Address decoder Memory cell array Read/write circuits Word line Bit line Bit line Data bits 17 17 18 18 Dynamic RAM (DRAM) Cell Dynamic RAM (DRAM) Cell Advanced DRAM Organization Advanced DRAM Organization Bits in a DRAM are organized as a rectangular Bits in a DRAM are organized as a rectangular Bit array array line n DRAM accesses an entire row n DRAM accesses an entire row Word line n Burst mode: supply successive words from a row with n Burst mode: supply successive words from a row with reduced latency reduced latency Double data rate (DDR) DRAM Double data rate (DDR) DRAM n Transfer on rising and falling clock edges n Transfer on rising and falling clock edges Quad data rate (QDR) DRAM Quad data rate (QDR) DRAM “Single-transistor DRAM cell” Robert Dennard’s 1967 invevention n Separate DDR inputs and outputs n Separate DDR inputs and outputs Chapter 5 — Large and Chapter 5 — Large and Fast: Exploiting Memory Fast: Exploiting Memory 19 19 Hierarchy — 20 Hierarchy — 20 5
Recommend
More recommend