Designing a Pipelined Processor Computer System Architecture Go - PowerPoint PPT Presentation

Designing a Pipelined Processor Computer System Architecture • Go back and examine your datapath Pipelining Part II and control diagram • associated resources with states • ensure that flows do not conflict, or Chalermek Intanagonwiwat figure out how to resolve • assert control in appropriate stage Slides courtesy of David Patterson Control and Datapath Pipelined Processor IR <- Mem[PC]; PC <– PC+4; • What happens if we start a new A <- R[rs]; B<– R[rt] instruction every cycle? S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; If Cond PC < PC+SX; Valid Inst. Mem Mem Ctrl WB Ctrl Dcd Ctrl IRmem IRwb IRex M <– Mem[S] Mem[S] <- B IR Ctrl Ex Equal R[rd] <– S; R[rt] <– S; R[rd] <– M; Equal Reg. Exec File Next PC Reg File A S Reg. PC Inst. Mem Exec File Reg A Next PC File S B PC Acces IR Mem M s B M Access Mem Data Mem Data Mem D 1

Pipelining the Load Instruction The Four Stages of R-type Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Clock 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr 3rd lw Ifetch Reg/Dec Exec Mem Wr • Ifetch: Instruction Fetch – Fetch the instruction from the Instruction • The five independent functional units in the pipeline datapath Memory are: – Instruction Memory for the Ifetch stage • Reg/Dec: Registers Fetch and Instruction Decode – Register File’s Read ports (bus A and busB) for the Reg/Dec • Exec: stage – ALU operates on the two register operands – ALU for the Exec stage – Update PC – Data Memory for the Mem stage – Register File’s Write port (bus W) for the Wr stage • Wr: Write the ALU output back to the register file Pipelining the R-type and Load Important Observation Instruction • Each functional unit can only be used once per instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 • Each functional unit must be used at the Clock same stage for all instructions: Ops! We have a problem! R-type Ifetch Reg/Dec Exec Wr – Load uses Register File’s Write Port during R-type Ifetch Reg/Dec Exec Wr its 5th stage Load Ifetch Reg/Dec Exec Mem Wr R-type Ifetch Reg/Dec Exec Wr – R-type uses Register File’s Write Port during its 4th stage R-type Ifetch Reg/Dec Exec Wr • We have pipeline conflict or structural hazard: ° 2 ways to solve this pipeline hazard. 1 2 3 4 5 – Two instructions try to write to the register file Load Ifetch Reg/De Exec Mem Wr at the same time! c 1 2 3 4 – Only one write port R-type Ifetch Reg/Dec Exec Wr 2

Solution 1: Insert “Bubble” into Solution 2: Delay R-type’s Write the Pipeline by One Cycle Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 • Delay R-type’s register write by one Clock Ifetch Reg/Dec Exec Wr cycle: Load Ifetch Reg/Dec Exec Mem Wr – Now R-type instructions also use Reg File’s Ifetch Reg/Dec Exec Wr R-type write port at Stage 5 Ifetch Reg/Dec Pipeline Exec Wr R-type R-type Ifetch Bubble Reg/Dec Exec Wr – Mem stage is a NOOP stage: nothing is Ifetch Reg/Dec Exec being done. • Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle 1 2 3 4 5 – The control logic can be complex. R-type Ifetch Reg/Dec Exec Mem Wr – Lose instruction fetch and issue opportunity. • No instruction is started in Cycle 6! Modified Control & Datapath Solution 2: Delay R-type’s Write by One Cycle (cont.) IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; if Cond PC < PC+SX; Clock M <– S M <– Mem[S] Mem[S] <- B M <– S R-type Ifetch Reg/Dec Exec Mem Wr Equal R[rd] <– M; R[rt] <– M; R[rd] <– M; R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Reg. Inst. Mem File Reg A Exec M Next PC File S PC IR R-type Ifetch Reg/Dec Exec Mem Wr B Access Mem Data Mem D R-type Ifetch Reg/Dec Exec Mem Wr 3

The Four Stages of Store The Three Stages of Beq Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr Beq Ifetch Reg/Dec Exec Mem Wr • Ifetch: Instruction Fetch • Ifetch: Instruction Fetch – Fetch the instruction from the Instruction – Fetch the instruction from the Instruction Memory Memory • Reg/Dec: • Reg/Dec: Registers Fetch and – Registers Fetch and Instruction Decode Instruction Decode • Exec: • Exec: Calculate the memory address – compares the two register operand, • Mem: Write the data into the Data – select correct branch target address Memory – latch into PC Control Diagram Datapath + Data Stationary Control IR <- Mem[PC]; PC < PC+4; IR v v A <- R[rs]; B<– R[rt] v fun rw rw Inst. Mem rw Decode wb wb wb me me WB S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; rt If Cond PC Mem rs < PC+SX; ex Ctrl op Ctrl im rs rt M <– S M <– Mem[S] Mem[S] <- B M <– S Reg. File Reg A M File Exec S Equal R[rd] <– S; R[rt] <– S; R[rd] <– M; B Access Mem Mem Data Reg. D Inst. Mem File Reg A Exec M Next PC File S PC IR Next PC B PC Access Mem Data Mem D 4

Start: Fetch 10 Let’s Try it Out n n n n Inst. Mem Decode WB Mem Ctrl 10 lw r1, r2(35) Ctrl IR im rs rt 14 addI r2, r2, 3 Reg. File Reg A M File Exec S 20 sub r3, r4, r5 B 24 beq r6, r7, 100 Address is in octal Access Mem Mem Data D 10 lw r1, r2(35) 30 ori r8, r9, 17 14 addI r2, r2, 3 34 add r10, r11, r12 Next PC 20 sub r3, r4, r5 10 24 beq r6, r7, 100 30 ori r8, r9, 17 100 and r13, r14, 15 34 add r10, r11,r12 PC 100 and r13, r14, 15 Fetch 14, Decode 10 Fetch 20, Decode 14, Exec 10 n n n n n lw r1, r2(35) addI r2, r2, 3 Inst. Mem Inst. Mem Decode Decode lw r1 WB WB Mem Mem Ctrl Ctrl Ctrl Ctrl IR IR im 2 rt 2 rt 35 Reg. Reg. File File Reg A M Reg r2 M Exec Exec File S File S B B Access Access Mem Mem Data Mem Data Mem D 10 lw r1, r2(35) D 10 lw r1, r2(35) 14 addI r2, r2, 3 14 addI r2, r2, 3 Next PC Next PC 20 sub r3, r4, r5 20 sub r3, r4, r5 14 24 beq r6, r7, 100 20 24 beq r6, r7, 100 30 ori r8, r9, 17 30 ori r8, r9, 17 34 add r10, r11,r12 34 add r10, r11,r12 PC PC 100 and r13, r14, 15 100 and r13, r14, 15 5

Fetch 24, Decode 20, Exec 14, Mem 10 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 beq r6, r7 100 n sub r3, r4, r5 addI r2, r2, 3 Inst. Mem Inst. Mem Decode Decode addI r2 sub r3 lw r1 lw r1 WB WB Mem Mem Ctrl Ctrl Ctrl Ctrl IR IR 4 5 6 7 3 M[r2+35] Reg. Reg. r2+35 File File Reg r2 M Reg r4 r2+3 File Exec File Exec B r5 Access Access Mem Mem Mem Mem Data Data D 10 lw r1, r2(35) D 10 lw r1, r2(35) 14 addI r2, r2, 3 14 addI r2, r2, 3 Next PC Next PC 20 sub r3, r4, r5 20 sub r3, r4, r5 24 24 beq r6, r7, 100 30 24 beq r6, r7, 100 30 ori r8, r9, 17 30 ori r8, r9, 17 34 add r10, r11,r12 34 add r10, r11,r12 PC PC 100 and r13, r14, 15 100 and r13, r14, 15 Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14 Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20 ori r8, r9 17 add r10, r11, r12 Inst. Mem Inst. Mem Decode Decode addI r2 ori r8 sub r3 sub r3 WB beq WB beq Mem Mem Ctrl Ctrl Ctrl Ctrl IR r1=M[r2+35] 100 9 xx 11 12 17 Reg. Reg. IR r1=M[r2+35] r2+3 r4-r5 File File Reg r6 r4-r5 Reg r9 Exec Exec File File xxx r2 = r2+3 r7 x Access Access Mem Mem Data Mem Data Mem D 10 lw r1, r2(35) D 10 lw r1, r2(35) 14 addI r2, r2, 3 14 addI r2, r2, 3 Next PC Next PC 20 sub r3, r4, r5 20 sub r3, r4, r5 100 34 24 beq r6, r7, 100 24 beq r6, r7, 100 30 ori r8, r9, 17 30 ori r8, r9, 17 34 add r10, r11,r12 34 add r10, r11, r12 PC PC ooops, we should have only one delayed instruction 100 and r13, r14, 15 100 and r13, r14, 15 6

Designing a Pipelined Processor Computer System Architecture Go - PowerPoint PPT Presentation

Designing a Pipelined Processor Computer System Architecture Go back and examine your datapath Pipelining Part II and control diagram associated resources with states ensure that flows do not conflict, or Chalermek Intanagonwiwat

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Class 14 Slides SLIDE what is the designing principle how does designing principle

Energy Minimization of Pipeline Processor Using a Low Voltage Pipelined Cache Vincent J. Mooney

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen,

SRNS: Supporting the Delivery of Strategic Materials Proposed Savannah River Plutonium Processing

@ Anglet, France Germain Adell A private non profit technology centre Two locations: >

S P . Ladevze ,D. Nron , S.Rodriguez and R.Scanff LMT (ENS Paris-Saclay / CNRS /

Acquisition of semantic relations between terms: how far can we get with standard NLP tools? Ina

Ultra Rapid Data Assimilation for Real Time Weather Walter Acevedo, Zoi Paschalidi, Christian

Overview of A.W. Faber-Castell Slide Rule Dating Chronology 1892-1920 Colin Tombeur Background

Geometrical Consistency in Processing of Helical Filaments Pawel A. Penczek The University of

Designing a Pipelined Processor Computer System Architecture Go - PowerPoint PPT Presentation

Designing a Pipelined Processor Computer System Architecture Go back and examine your datapath Pipelining Part II and control diagram associated resources with states ensure that flows do not conflict, or Chalermek Intanagonwiwat

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 9. Pipelined MIPS Processor Prof.

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Class 14 Slides SLIDE what is the designing principle how does designing principle

Energy Minimization of Pipeline Processor Using a Low Voltage Pipelined Cache Vincent J. Mooney

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and

LECTURE 9 Pipeline Hazards PIPELINED DATAPATH AND CONTROL In the previous lecture, we

Embedded systems &amp; the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Atomic COTS G. Hampson, D. Humphrey, G.Jourjon, J. Bunton, K. Bengston, A.Bolin, Y. Chen,

SRNS: Supporting the Delivery of Strategic Materials Proposed Savannah River Plutonium Processing

@ Anglet, France Germain Adell A private non profit technology centre Two locations: &gt;

S P . Ladevze ,D. Nron , S.Rodriguez and R.Scanff LMT (ENS Paris-Saclay / CNRS /

Acquisition of semantic relations between terms: how far can we get with standard NLP tools? Ina

Ultra Rapid Data Assimilation for Real Time Weather Walter Acevedo, Zoi Paschalidi, Christian

Overview of A.W. Faber-Castell Slide Rule Dating Chronology 1892-1920 Colin Tombeur Background

Geometrical Consistency in Processing of Helical Filaments Pawel A. Penczek The University of

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

@ Anglet, France Germain Adell A private non profit technology centre Two locations: >