Hypothetical Single-cycle Implementation of DLX Assume Each - PowerPoint PPT Presentation

Hypothetical Single-cycle Implementation of DLX Assume Each instructions completes in 1 (LONG!!) clock cycle • Registers have stable values following rising clock edge During clock cycle: 1. Instruction is read from Instruction memory (IM) 2. Decoded and control signals for use during the cycle are generated 3. Register values are read 4. ALU outputs are generated 5. Data Memory is read or written for Load or Store 6. New PC value is computed • All registers and memory are updated at next rising clock edge. 1

Datapaths R-R, R-Imm, MUX lw, sw : MWRITE RDataSel DATA (rs) rs a p Cycle 3:Datapath for lw Register ALU rt (rt) File q ADDR MUX DM b + d c ALUSel ALUop rd MUX 4 WSel RWrite DATA EXT ALUop MREAD PC IM Rwrite ALUSel Decode RDataSel WSel MREAD MWRITE 2

Execution of an RI instruction R-R, R-Imm, MUX lw, sw : MWRITE RDataSel DATA (rs) rs a p Register ALU rt (rt) File q ADDR MUX DM b + d c ALUSel ALUop rd MUX 4 WSel RWrite DATA EXT ALUop MREAD PC IM Rwrite ALUSel Decode RDataSel WSel MREAD MWRITE 3

Single Cycle Design • Cycle time determined by longest instruction • No reuse of Functional Units (Separate IM and DM, ALU and Increment Unit) LW IM READ DECODE ADDRESS DATA MEMORY READ REG WRITE REG READ PC+4 ADD IM READ DECODE ADD REG WRITE IDLE REG READ PC+4 4

Multi Cycle Implementation ALUWrite IR AWrite P C A p REG ALU MEM IRWrite ALU FILE PCWrite OUT B q MDR BWrite ALUop 4 MEMRead MDRWrite STATE MACHINE DECODER 1 5

Multi-Cycle Design State Machine Model Instruction Fetch : IR = IM[PC]; S0 PC = PC+4 Instruction Decode: Generate Control Signals S1 A = REG[rs] B = REG[rt] ALUout = PC + Shift(SE(offset)) R-R : p= A q = B lw : p= A q = SE(d) sw : p= A q =SE(d) S2 S5 S8 beq : p = A; q = B; ALUout = p op q ALUout = p op q ALUout = p op q S10 Z = (p .eq. q); 10 R-R : lw : If (z == 1) PC = ALUOUT; S3 S6 sw : S9 REG[rd] = ALUout MDR = DM[ALUout] DM[ALUout] = B To S0 lw : S7 To S0 To S0 REG[rt] = MDR To S0 S6 S0 S1 S5 S7 S0 S1 S2 S3 S0 2 LD (5 cycles) ADD (4 cycles) 6

Cycle 1 : Instruction Fetch Datapath IR P C A REG ALU MEM ALUout IRWrite PCWrite B MDR ALUop 4 MEMRead IR IR = MEM[PC] Assert PCWrite, IRWrite, MemRead CONTROL Set ALUop to ADD PC = PC+4 Set MUXes at ALU inputs and PC S0 S0 S1 7 FSM

Cycle 2 :Datapath for LW rs PC AWRITE IR rt A ALU MEM ALUout REG B ALUWRITE MDR BWRITE ALUop Optimistic Reads of register file d Optimistic computing SE << of Branch target IR address CONTROL Set ALUop to ADD Set MUXes at ALU inputs Assert AWRITE, BWRITE, ALUWRITE S1 S5 FSM S0 S1 8

Cycle 3 :Datapath for lw ALUWRITE PC IR A ALU MEM ALUout REG B MDR ALUop Set ALUop to ADD SE Set MUXes at ALU inputs Assert ALUWRITE IR CONTROL S5 S6 FSM 9 S0 S1 S5

Datapath Control MUX PC IR A ALU MEM ALUout REG B MDR 4 MUX SE << 10

Cycle 4: Datapath for lw PC IR A ALU MEM ALUout REG B MDR MEM READ SE << MDRWRITE Assert MEM READ IR Assert MDRWRITE CONTROL S6 S7 FSM 11 S0 S1 S5 S6

Cycle 5: Datapath for lw PC IR A ALU MEM ALUout REG B rt MDR DATA RegWrite SE << Assert REGWRITE Set MUXEs for DEST REG and DATA IR CONTROL S7 S0 FSM S0 S1 S5 S6 S7 12

Datapath Control IR rt A r MUX ALU ALUout d REG PC MEM B c MDR DATA SE << MUX 13

Datapath Control IR A ALU ALUout REG MEM PC B MDR SE << MUX 14 PCSELECT

Cycle 1 : Instruction Fetch Datapath IR P C A REG ALU MEM ALUout IRWrite PCWrite B MDR ALUop 4 MEMRead IR IR = MEM[PC] Assert PCWrite, IRWrite, MemRead CONTROL Set ALUop to ADD PC = PC+4 Set MUXes at ALU inputs and PC S0 S0 S1 15 FSM

Cycle 2 :Datapath for BEQ AWRITE rs PC IR A rt ALU MEM ALUout REG B ALUWRITE MDR BWRITE ALUop Optimistic Reads of register file d Optimistic computing SE << of Branch target IR address CONTROL Set ALUop to ADD Set MUXes at ALU inputs Assert AWRITE, BWRITE, ALUWRITE S1 S10 FSM S0 S1 16

Cycle 3 :Datapath for BEQ ALUWRITE PC IR A ALU MEM ALUout REG z PCWRITE B MDR ALUop Set ALUop to ADD SE Set MUXes at ALU inputs Assert PCWRITE if z equals 1 IR CONTROL S10 S0 FSM 17 S0 S1 S10

Performance Model Processor Model m classes of instruction: I j , j = 1, …m Instructions of class I j require C j clock cycles to execute Clock Frequency = F (Hz = cycles/sec) Clock Period = 1/F (sec / cycle) Program Model Executes a total of N instructions: N j of class I j: N = ∑ j = 1, ..,m N j Program Execution Time (Clock Cycles) = ∑ j = 1, ..,m ( N j x C j ) Program Execution Time (sec) = T N = ( ∑ j = 1, ..,m N j x C j ) / F Average Cycles Per Instruction (CPI): Total number of cycles/ Instruction Count = ∑ j = 1, ..,m ( N j x C j ) / N Depends on both the processor and the mix of instructions in the program 18 LD (5 cycles) ADD (4 cycles)

Processor performance measures Most useful metrics depends on the benchmark program being measured • Benchmark independent measures alone (e.g. clock speed) do not provide sufficient information • Multi-cycle design: Program with all BEQ instructions is 1.67 times faster than one with all LDs !! • Real processors: Relation between instruction mix and performance is complex • Instruction execution time may depend on context of instruction • Raises issues regarding choice of benchmark programs, degree and modes of optimization permitted, etc. • Imperative to understand the issues behind the performance numbers presented by different vendors • CPI (Cycles per Instruction): Average number of clock cycles to execute an instruction • • ∑ i = 1, ..,m ( N i x C i ) / N = ∑ i = 1, ..,m ( N i x C i ) / ∑ i = 1, ..,m N i • IPC (Instructions per Cycle): Average number of instructions executed per clock cycle • IPC = 1/ CPI • Sequential processors require several clocks for an instruction, so CPI > 1 and IPC < 1 • Simple Pipelining aims for an IPC of 1 • Superscalar,VLIW, SMT processor designs try to increase the IPC beyond 1 • MIPS (Millions of Instructions per second): • MIPS = 10 -6 x N/ T N = 10 -6 x F x ( ∑ i = 1, ..,m N i ) / ( ∑ i = 1, ..,m N i x C i ) • MIPS = 10 -6 x F (cycles/sec) x IPC (instructions/cycle) = 10 -6 x 1 / (Clock Period x CPI ) • Program Execution Time (microsec 1 0 -6 sec) • T N = N / MIPS = N x CPI x Clock Period (microseconds) = N x CPI / F(MHz) 19

Hypothetical Single-cycle Implementation of DLX Assume Each - PowerPoint PPT Presentation

Hypothetical Single-cycle Implementation of DLX Assume Each instructions completes in 1 (LONG!!) clock cycle Registers have stable values following rising clock edge During clock cycle: 1. Instruction is read from Instruction memory (IM)

Comp. Organization DLX Comp. Arch. ECE 337 Unpipelined DLX Architecture Each DLX instruction

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Financial Crime Hypothetical The Law Society Financial Crime Hypothetical ABC Corp ABC

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Digital Logic Design: a rigorous approach c Chapter 22: A Simplified DLX: Implementation Guy

T - Group 1 - LCA_0195 T - Group 1 - LCA_0777 T - Group 1 - LCA_0802 Hypothetical lipoprotein

A hypothetical model of spontaneous creativity in improvisation Geraint A. Wiggins Centre for

Reasoning about Hypothetical Agent Behaviours and their Parameters Stefano Albrecht and Peter

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Multi Cycle CPU Jason Mars Monday, February 4, 13 Why a Multiple Cycle CPU? Monday, February 4,

SI232 Set #15: Multicycle Implementation (Chapter Five) 1 Recall Single Cycle

Pipelining Drawbacks of the Single Cycle Imp A single cycle machine has disadvantages such as:

CIS 371 Computer Organization and Design Unit 4: Single-Cycle Datapath Based on slides by Prof.

Spiral 3-3 Single Cycle CPU 3-3.2 Learning Outcomes I understand how the single-cycle CPU

The Microarchitecture of the LC-3 LC-3 Data Path Revisited Now Registers and Memory 5-2

Special Microarchitecture based on a lecture by Sanjay Rajopadhye modified by Yashwant Malaiya

Mellivora: Supercapacitor Power Supply Project Overview Team Introduction Project

MultidrugResistant Organism (MDRO) and Clostridium difficile Associated Disease (CDAD) Module

CS 5234 Spring 2013 Advanced Parallel Computing Architecture Yong Cao Architecture Goals

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to

CENG3420 Lab 3-1: LC-3b Datapath Wei Li Department of Computer Science and Engineering The

CSSE232 Computer Architecture I Pipelining Summary of Instruc;on

Sambuz

Useful Links

Newsletter

Mail Us