spiral 3 3
play

Spiral 3-3 Single Cycle CPU 3-3.2 Learning Outcomes I understand - PowerPoint PPT Presentation

3-3.1 Spiral 3-3 Single Cycle CPU 3-3.2 Learning Outcomes I understand how the single-cycle CPU datapath supports each type of instruction I understand why each mux is needed to select appropriate inputs to the datapath components


  1. 3-3.1 Spiral 3-3 Single Cycle CPU

  2. 3-3.2 Learning Outcomes • I understand how the single-cycle CPU datapath supports each type of instruction • I understand why each mux is needed to select appropriate inputs to the datapath components • I know how to design the control signals as a function of the type of instruction

  3. 3-3.3 Hardware vs. Software REVIEW

  4. 3-3.4 Sorting: Software Implementation • To perform the algorithm in software means the processor fetches instructions, executes them, which causes the processor to then read and write the data in memory into it's sorted positions • Sorting 64 element on a 2.8 GHz Xeon processor Memory – 16 microseconds 0 78 Processor • Can we do better w/ more HW? 51 106 35 fffff A D C Custom (Sort) HW

  5. 3-3.5 Sorting: Hardware Implementation • Sorting 64 element on a 2.8 GHz Xeon processor [SW only] – 16 microseconds What did we do to reduce • Sorting 64 numbers in [old] custom HW CLK period in this design? – CLK period = 30 ns => 6 microseconds total – 30 ns is due to the 8 number HW sorter – Merging (Select-Val) stages are < 10 ns – Can we improve? 30 ns 10 ns 10 ns 10 ns X0 Y0 FIFO/Queue 1a/b FIFO/Queue 1a/b FIFO/Queue 1a/b X1 Y1 8 16 32 X2 Y2 ... X3 64 Y3 ...from HW Sorting memory Network SelectVal X4 SelectVal Y4 (1 per clock) SelectVal FIFO/Queue 2a/b FIFO/Queue 2a/b FIFO/Queue 2a/b ...to memory X5 Y5 8 ... 16 32 X6 Y6 X7 Y7

  6. 3-3.6 Sorting: Final Comparison • Sorting 64 element on a 2.8 GHz Xeon processor [SW only] – 16 microseconds total time • Sorting 64 numbers in [old] custom HW – CLK period = 30 ns => 6 microseconds total = ~2.5x speedup • Sorting 64 numbers in [old] pipelined HW Memory – CLK period = 10 ns => 0 78 Processor 2 microseconds total = ~8x speedup 51 106 – Processor is freed 35 fffff to do other work A D C Custom (Sort) HW

  7. 3-3.7 Building hardware to execute software GENERAL PURPOSE HARDWARE

  8. 3-3.8 CPU Organization Scope • We will build a CPU to implement our subset of the MIPS ISA – Memory Reference Instructions: • Load Word (LW) • Store Word (SW) – Arithmetic and Logic Instructions: • ADD, SUB, AND, OR, SLT – Branch and Jump Instructions: • Branch if equal (BEQ) • Jump unconditional (J) • These basic instructions exercise a majority of the necessary datapath and control logic for a more complete implementation

  9. 3-3.9 Single-Cycle CPU Datapath 0 1 + MemRead & MemWrite Sh. 4 ALUOp[1:0] Left A MemtoReg + 2 PCSrc RegDst B Control ALUSrc Branch [31:26] RegWrite [25:21] Read Reg. 1 # 5 MemRead [20:16] Read Reg. 2 # 5 Read 0 Addr. 0 Write Zero data 1 [15:11] PC Reg. # ALU Instruc. 1 Res. Addr. Read 5 Write 0 data 2 I-Cache Read Data 1 1 RegDst Data Register File [15:0] Write Data 16 32 Sign ALUSrc MemtoReg Extend D-Cache INST[5:0] ALU control ALUOp[1:0] MemWrite 9

  10. 3-3.10 Fetch • Address in PC is used to fetch instruction while it is also incremented by 4 to point to the next instruction • Remember, the PC doesn’t update until the end of the clock cycle / beginning of next cycle • Mux provides a path for branch target addresses time Fetch branch target clk PC 4 0x0040001c 400014 0x400018 0x40001c A PC+4 + Adder B 400018 0x40001c 0x400020 0x00400018 opcode rs rt rd shamt func 0x00400018 Addr. 000000 01001 01010 10000 00000 100000 0 Instruc. 1 0x012a8020 I-Cache ADD $16,$9,$10 PC

  11. 3-3.11 Decode • Opcode and func. field are decoded to produce other control signals • Execution of an ALU instruction (ADD $3,$1,$2) requires reading 2 register values and writing the result to a third • REGWrite is an enable signal indicating the write data should be written to the specified register Control Control Signals Logic opcode 000000 00001 00010 Read Read Value of $1 Reg. 1 # data 1 1 5 rs Read Read Value of $2 Reg. 2 # 2 data 2 5 Instruction Word rt Write Reg. # Register File is the collection of GPR’s. Our register 3 5 ADD $3,$1,$2 00011 file has 3 “ports” (port = ability to concurrently Write rd Data read or write a register). To see why we need 3, Register File consider an “ADD $3,$1,$2”. We need 2 read ports shamt 00000 to read two operands (i.e. $1 + $2) and 1 write port for the result ($3) 100000 func CLK REGWrite Result from add

  12. 3-3.12 Register File • 32 registers each storing 32-bits • Read registers => Muxes to choose desired value • Write register => Decoder and registers w/ enable Read Reg #1 Each Mux chooses which Register File register value to output based D 0 $0 on the 5-bit reg. # provided by EN the instruction 1 D $1 Read data 1 EN Write data D $31 31 EN 0 0 1 1 Write Read data 2 A[4:0] Reg. # RegWrite EN 31 31 5-to-32 decoder converts 5-bit write reg. # to 1-of-32 output signals to enable that register to capture the write data on the next edge. If Read Reg #2 RegWrite is 0 the decoder is disabled making CLK all outputs go to 0 and thus no register updates.

  13. 3-3.13 Datapath for ALU instruction • ALU takes inputs from register file and performs the add, sub, and, or, slt, operations • Result is written back to dest. register 1 Read Reg. 1 # ALUop 2 Read Instruc. word Reg. 2 # Read $1 value 3 Write data 1 ADD $3,$1,$2 Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File

  14. 3-3.14 Memory Access Datapath • Operands are read from register file while offset is sign extended • ALU calculates effective address • Memory access is performed • If LW, read data is written back to register LW $4,0xfff8($1) SW $3,0x1a($1) 1 Read 1 Read Reg. 1 # Reg. 1 # ADD Read 3 Read Reg. 2 # $1 value Reg. 2 # Read $1 value 4 Read Write Zero data 1 Write Zero data 1 Reg. # ALU Reg. # ALU Res. Addr. Read Read Res. Addr. Write Sum Read data 2 Data Write Sum Read Data data 2 Read Data Data Register File Data Write Register File $3 value Write Data Data Sign Write Data D-Cache Sign Extend 32 0xffff fff8 D-Cache Extend 32 0x0000001a

  15. 3-3.15 Branch Datapath • BEQ requires… – ALU for comparison (examine ‘zero’ output) – Sign extension unit for branch offset – Adder to add PC and offset • Need a separate adder since ALU is used to perform comparison PC+4 (incremented PC) Adder Sum Branch Target Shift Address to PC Left 2 1 Read Reg. 1 # byte offset ALUop 2 Read Instruc. word Reg. 2 # Read $1 value Write data 1 BEQ $1,$2,offset ZERO Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File word offset extended word offset Sign Extend

  16. 3-3.16 Branch Datapath Question • Is it okay to start adding branch offset even before determining whether the branch is taken or not? – Yes, it does not hurt because the ZERO signal will control whether that Branch Target is used to update the PC or not PC+4 (incremented PC) Adder Sum Branch Target Shift Address to PC Left 2 1 Read Reg. 1 # ALUop 2 Read Instruc. word Reg. 2 # Read $1 value Write data 1 BEQ $1,$2,offset ZERO (To control logic) Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File word offset extended word offset Sign Extend

  17. 3-3.17 Fetch Datapath Question 1 • Can the adder used to increment the PC be an ALU and be used/shared for ALU instructions like ADD/SUB/etc. – In a single-cycle CPU, resources cannot be shared thus we need a separate adder and separate ALU “Next” PC = PC + 4 A 4 + S B CLK Addr. Data PC Instruction Word Current PC / Read Address I-Cache / I-MEM Write

  18. 3-3.18 Fetch Datapath Question 2 • Do we need the “Write” enable signal on the PC register for our single-cycle CPU? – In the single-cycle CPU, the PC is updated EVERY clock cycle (since we execute a new instruction each cycle). Thus we are writing the PC every cycle and don’t need the write signal. “Next” PC = PC + 4 A 4 + S B CLK Addr. Data PC Instruction Word Current PC / Read Address I-Cache / I-MEM Write

  19. 3-3.19 RegFile Question 1 • Why do we need the write enable signal, REGWrite? – We have certain instructions like BEQ or SW that do not cause a register to be updated. Thus we need the ability to NOT change a register. Control Control Signals Logic opcode 000000 00001 00010 Read Read Value of $1 Reg. 1 # data 1 5 rs Read Read Value of $2 Reg. 2 # data 2 5 Instruction Word rt Write Reg. # 5 ex. ALU instruc. 00011 Write rd Data Register File shamt 00000 100000 func CLK REGWrite Result from add

  20. 3-3.20 RegFile Question 2 • Can write to registers be level sensitive or does it have to be edge-sensitive? – It must be edge-sensitive since a register may be source and destination (i.e. add $1,$1,$2). If it was level sensitive we would have an uncontrolled feedback loop. Control Control Signals Logic opcode 000000 Read Read Value of $1 Reg. 1 # data 1 00001 00010 00011 5 rs Read Read Reg. 2 # Value of $2 data 2 5 Instruction Word rt Write Reg. # 5 ex. ALU instruc. Write rd Data Register File shamt 00000 100000 func CLK REGWrite Result from add

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend