processor pipeline
play

Processor Pipeline Nima Honarmand Spring 2016 :: CSE 502 Computer - PowerPoint PPT Presentation

Spring 2016 :: CSE 502 Computer Architecture Processor Pipeline Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture Generic Instruction Cycle Steps in processing an instruction: Instruction Fetch ( IF_STEP )


  1. Spring 2016 :: CSE 502 – Computer Architecture Processor Pipeline Nima Honarmand

  2. Spring 2016 :: CSE 502 – Computer Architecture Generic Instruction Cycle • Steps in processing an instruction: – Instruction Fetch ( IF_STEP ) – Instruction Decode ( ID_STEP ) – Operand Fetch ( OF_STEP ) • Might be from registers or memory – Execute ( EX_STEP ) • Perform computation on the operands – Result Store or Write Back ( RS_STEP ) • Write the execution results back to registers or memory • ISA determines what needs to be done in each step for each instruction • μ Arch determines how HW implements the steps

  3. Spring 2016 :: CSE 502 – Computer Architecture Datapath vs. Control Logic • Datapath is the collection of HW components and their connection in a processor – Determines the static structure of processor • Control logic determines the dynamic flow of data between the components – E.g., the control lines of MUXes and ALU in last slide – Is a function of? • Instruction words • State of the processor • Execution results at each stage

  4. Spring 2016 :: CSE 502 – Computer Architecture Generic Datapath Components • Main components – Instruction Cache – Data Cache – Register File – Functional Units (ALU, Floating Point Unit, Memory Unit, …) – Pipeline Registers • Auxiliary Components (in advanced processors) – Reservation Stations – Reorder Buffer – Branch Predictor – Prefetchers – … • Lots of glue logic (often multiplexors) to glue these together

  5. Spring 2016 :: CSE 502 – Computer Architecture Example: MIPS Instruction Set • All instructions are 32 bits

  6. Spring 2016 :: CSE 502 – Computer Architecture A Simple MIPS Datapath Write-Back (WB) + 1 Reg ALU PC File I-cache D-cache Inst. Decode & Execute Memory Inst. Fetch Register Read (IF) (EX) (MEM) (ID) IF_STEP ID_STEP OF_STEP EX_STEP RS_STEP

  7. Spring 2016 :: CSE 502 – Computer Architecture Single-Instruction Datapath Single-cycle ins0.(fetch,dec,ex,mem,wb) ins1.(fetch,dec,ex,mem,wb) Multi-cycle ins0.fetch ins0.(dec,ex) ins0.(mem,wb) ins1.fetch ins1.(dec,ex) ins1.(mem,wb) time • Process one instruction at a time • Single-cycle control: hardwired – Low CPI (1) – Long clock period (to accommodate slowest instruction) • Multi-cycle control: typically micro-programmed – Short clock period – High CPI • Can we have both low CPI and short clock period? – Not if datapath executes only one instruction at a time – No good way to make a single instruction go faster

  8. Spring 2016 :: CSE 502 – Computer Architecture Pipelined Datapath Multi-cycle ins0.fetch ins0.(dec,ex) ins0.(mem,wb) ins1.fetch ins1.(dec,ex) ins1.(mem,wb) ins0.fetch ins0.(dec,ex) ins0.(mem,wb) Pipelined ins1.fetch ins1.(dec,ex) ins1.(mem,wb) time ins2.fetch ins2.(dec,ex) ins2.(mem,wb) • Start with multi-cycle design • When insn0 goes from stage 1 to stage 2 … insn1 starts stage 1 • Each instruction passes through all stages … but instructions enter and leave at faster rate Style Ideal CPI Cycle Time (1/freq) Single-cycle 1 Long Multi-cycle > 1 Short Pipelined 1 Short Pipeline can have as many insns in flight as there are stages

  9. Spring 2016 :: CSE 502 – Computer Architecture Pipeline Illustrated Comb. Logic BW = ~(1/n) L n Gate Delay n n Gate Gate L -- L -- BW = ~(2/n) Delay Delay 2 2 n n n Gate Gate Gate L L -- -- L -- BW = ~(3/n) Delay Delay Delay 3 3 3 Pipeline Latency = n Gate Delay + (p-1) register delays p: # of stages Improves throughput at the expense of latency

  10. Spring 2016 :: CSE 502 – Computer Architecture 5-Stage MIPS Pipeline

  11. Spring 2016 :: CSE 502 – Computer Architecture Stage 1: Fetch • Fetch an instruction from instruction cache every cycle – Use PC to index instruction cache – Increment PC (assume no branches for now) • Write state to the pipeline register (IF/ID) – The next stage will read this pipeline register

  12. Spring 2016 :: CSE 502 – Computer Architecture Stage 1: Fetch Diagram target M U X 1 PC + 1 + Decode PC Instruction en Instruction bits Cache en IF / ID Pipeline register

  13. Spring 2016 :: CSE 502 – Computer Architecture Stage 2: Decode • Decodes opcode bits – Set up Control signals for later stages • Read input operands from register file – Specified by decoded instruction bits • Write state to the pipeline register (ID/EX) – Opcode – Register contents, immediate operand – PC+1 (even though decode didn’t use it) – Control signals (from insn) for opcode and destReg

  14. Spring 2016 :: CSE 502 – Computer Architecture Stage 2: Decode Diagram target PC + 1 PC + 1 regA contents regA regB Execute Fetch Register File destReg contents regB data Instruction en bits Signals/imm Control IF / ID ID / EX Pipeline register Pipeline register

  15. Spring 2016 :: CSE 502 – Computer Architecture Stage 3: Execute • Perform ALU operations – Calculate result of instruction • Control signals select operation • Contents of regA used as one input • Either regB or constant offset (imm from insn) used as second input – Calculate PC-relative branch target • PC+1+(constant offset) • Write state to the pipeline register (EX/Mem) – ALU result, contents of regB, and PC+1+offset – Control signals (from insn) for opcode and destReg

  16. Spring 2016 :: CSE 502 – Computer Architecture Stage 3: Execute Diagram target +offset PC+1 PC + 1 + contents result ALU regA A Memory Decode L U M contents contents regB U regB X Signals/imm Control Control Signals destReg data ID / EX EX/Mem Pipeline register Pipeline register

  17. Spring 2016 :: CSE 502 – Computer Architecture Stage 4: Memory • Perform data cache access – ALU result contains address for LD or ST – Opcode bits control R/W and enable signals • Write state to the pipeline register (Mem/WB) – ALU result and Loaded data – Control signals (from insn) for opcode and destReg

  18. Spring 2016 :: CSE 502 – Computer Architecture Stage 4: Memory Diagram target +offset PC+1 result ALU result ALU Write-back in_addr Execute Loaded contents data in_data regB Data Cache en R/W Control Control signals signals destReg data EX/Mem Mem/WB Pipeline register Pipeline register

  19. Spring 2016 :: CSE 502 – Computer Architecture Stage 5: Write-back • Writing result to register file (if required) – Write Loaded data to destReg for LD – Write ALU result to destReg for ALU insn – Opcode bits control register write enable signal

  20. Spring 2016 :: CSE 502 – Computer Architecture Stage 5: Write-back Diagram result ALU Loaded data Memory M data U X Control signals M destReg U Mem/WB X Pipeline register

  21. Spring 2016 :: CSE 502 – Computer Architecture Putting It All Together M U X + 1 target + PC+1 PC+1 eq? ALU regA instruction M result regB valA U A Register Inst ALU PC X mdata File L data Cache result Data valB U M dest U Cache data X dest signals/imm valB Control M Control Control U signals signals X IF/ID ID/EX EX/Mem Mem/WB

  22. Spring 2016 :: CSE 502 – Computer Architecture Issues With Pipelining

  23. Spring 2016 :: CSE 502 – Computer Architecture Pipelining Idealism • Uniform Sub-operations – Operation can partitioned into uniform-latency sub-ops • Repetition of Identical Operations – Same ops performed on many different inputs • Independent Operations – All ops are mutually independent

  24. Spring 2016 :: CSE 502 – Computer Architecture Pipeline Realism • Uniform Sub- operations … NOT! – Balance pipeline stages • Stage quantization to yield balanced stages • Minimize internal fragmentation (left-over time near end of cycle) • Repetition of Identical Operations … NOT! – Unifying instruction types • Coalescing instruction types into one “multi - function” pipe • Minimize external fragmentation (idle stages to match length) • Independent Operations … NOT! – Resolve data and resource hazards • Inter-instruction dependency detection and resolution Pipelining is expensive

  25. Spring 2016 :: CSE 502 – Computer Architecture The Generic Instruction Pipeline IF Instruction Fetch ID Instruction Decode OF Operand Fetch EX Instruction Execute WB Write-back

  26. Spring 2016 :: CSE 502 – Computer Architecture Balancing Pipeline Stages IF T IF = 6 units Without pipelining T cyc  T IF +T ID +T OF +T EX +T OS ID T ID = 2 units = 31 Pipelined T cyc  max{T IF , T ID , T OF , T EX , T OS } OF T ID = 9 units = 9 EX Speedup = 31 / 9 = 3.44 T EX = 5 units WB T OS = 9 units Can we do better?

  27. Spring 2016 :: CSE 502 – Computer Architecture Balancing Pipeline Stages (1/2) • Two methods for stage quantization – Divide sub-ops into smaller pieces – Merge multiple sub-ops into one • Recent/Current trends – Deeper pipelines (more and more stages) – Pipelining of memory accesses – Multiple different pipelines/sub-pipelines

  28. Spring 2016 :: CSE 502 – Computer Architecture Balancing Pipeline Stages (2/2) Coarser-Grained Machine Cycle: Finer-Grained Machine Cycle: 4 machine cyc / instruction 11 machine cyc /instruction IF IF T IF&ID = 8 units IF ID ID OF OF T OF = 9 units OF # stages = 4 # stages = 11 OF T cyc = 9 units EX T cyc = 3 units T EX = 5 units EX EX WB T OS = 9 units WB WB WB

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend