MEMORY SYSTEM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 3810: Computer Organization
Overview ¨ Notes ¤ Homework 9 (deadline Apr. 9 th ) n Verify your submitted file before midnight ¨ This lecture ¤ Pipeline hazards n Control ¤ Memory system n Cache
Recall: Pipeline Hazards ¨ Structural hazards: multiple instructions compete for the same resource ¨ Data hazards: a dependent instruction cannot proceed because it needs a value that hasn’t been produced ¨ Control hazards: the next instruction cannot be fetched because the outcome of an earlier branch is unknown
Control Hazards ¨ Sample C++ code for (i=100; i != 0; i--) { sum = sum + i; } total = total + sum; How many branches in this code?
Control Hazards ¨ Sample C++ code for (i=100; i != 0; i--) { sum = sum + i; } total = total + sum; addi $1, $0, 100 for: beq $0, $1, next add $2, $2, $1 addi $1, $1, -1 J for next: add $3, $3, $2 What are possible target instructions?
Control Hazards ¨ Sample C++ code for (i=100; i != 0; i--) { sum = sum + i; } total = total + sum; addi $1, $0, 100 IM Reg ALU DM Reg for: beq $0, $1, next IM Reg ALU DM Reg add $2, $2, $1 IM Reg ALU DM Reg addi $1, $1, -1 IM Reg ALU DM J for IM Reg ALU next: add $3, $3, $2 IM Reg What happens inside the pipeline?
Control Hazards ¨ The outcome of the branch
Handling Control Hazards ¨ 1. introducing stall cycles and delay slots ¤ How many cycles/slots? ¤ One branch per every six instructions on average!! addi $1, $0, 100 IM Reg ALU DM Reg beq $0, $1, next for: IM Reg ALU DM Reg nothing IM Reg ALU DM Reg nothing IM Reg ALU DM add $2, $2, $1 IM Reg ALU addi $1, $1, -1 IM Reg 2 additional delay slots per 6 cycles! J for
Handling Control Hazards ¨ 1. introducing stall cycles and delay slots ¤ How many cycles/slots? ¤ One branch per every six instructions on average!! addi $1, $0, 100 IM Reg ALU DM Reg beq $0, $1, next for: IM Reg ALU DM Reg nothing IM Reg ALU DM Reg add $2, $2, $1 IM Reg ALU DM addi $1, $1, -1 IM Reg ALU J for IM Reg 1 additional delay slot, but longer path nothing
Handling Control Hazards ¨ 1. introducing stall cycles and delay slots ¤ How many cycles/slots? ¤ One branch per every six instructions on average!! addi $1, $0, 100 IM Reg ALU DM Reg beq $0, $1, next for: IM Reg ALU DM Reg nothing IM Reg ALU DM Reg add $2, $2, $1 IM Reg ALU DM J for IM Reg ALU addi $1, $1, -1 IM Reg Reordering instructions may help next: add r3, r3, r2
Handling Control Hazards ¨ Strategies for filling up the branch delay slot ¤ (a) is the best choice; what about (b) and (c)?
Handling Control Hazards ¨ 1. introducing stall cycles and delay slots ¤ How many cycles/slots? ¤ One branch per every six instructions on average!! addi $1, $0, 100 IM Reg ALU DM Reg beq $0, $1, next for: IM Reg ALU DM Reg nothing IM Reg ALU DM Reg add $2, $2, $1 IM Reg ALU DM J for IM Reg ALU addi $1, $1, -1 Jump and function calls can be IM Reg resolved in the decode stage. next: add r3, r3, r2
Handling Control Hazards ¨ 1. introducing stall cycles and delay slots ¨ 2. predict the branch outcome n simply assume the branch is taken or not taken n predict the next PC addi $1, $0, 100 IM Reg ALU DM Reg beq $0, $1, next for: IM Reg ALU DM Reg add $2, $2, $1 IM Reg ALU DM Reg addi $1, $1, -1 IM Reg ALU DM J for IM Reg ALU add r3, r3, r2 next: IM Reg May need to cancel the wrong path
Handling Control Hazards ¨ Pipeline without branch predictor IF (br) Reg Read Compare PC Br-target PC + 4
Handling Control Hazards ¨ Pipeline with branch predictor IF (br) Reg Read Compare PC Br-target Branch Predictor PC + 4
Handling Control Hazards ¨ The 2-bit branch predictor
Summary of the Pipeline
Memory System ¨ Data and instructions are stored on DRAM chips ¤ DRAM has high bit density and low speed ¤ An access DRAM may take about 300 processor cycles ¨ How to bridge the speed gap? Processor Memory ~300X
Memory Hierarchy ¨ The basic structure of a memory hierarchy. L1 data or Registers Memory instruction L2 cache 1KB 1GB Disk Cache 2MB 1 cycle 300 cycles 15 cycles 80 GB 32KB 10M cycles 2 cycles
Memory Hierarchy ¨ The basic structure of a memory hierarchy. ¨ Multiple levels of the memory Idea: keep important data closer to processor. Upper Level Lower Level
Cache Architecture ¨ Design principles ¤ Temporal locality: if you used some data recently, you will likely use it again ¤ Spatial locality: if you used some data recently, you will likely access its neighbors ¨ Cache terminology Processor ¤ Access time ¤ Hit vs. miss Cache ¤ Miss penalty Memory
Direct-Mapped Cache ¨ Cache address
Direct-Mapped Cache ¨ Cache lookup
Recommend
More recommend