DLX computer Electronic Computers M 1 RISC architectures RISC vs - PowerPoint PPT Presentation

DLX computer Electronic Computers M 1

RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set • Computer In CISC architectures the 10% of the instructions are used in 90% of cases • Waste of silicon • Bottleneck: the bus • Mid ‘80s a new architecture: RISC • Solution: reduction of instruction number and complexity (fewer simpler machine • instructions) Fixed instruction format (simpler instruction decoders) • Simpler control logic network increasing the number of on-chip registers • Reduction of bus/memory accesses • Increase of machine instructions needed for a job which is (in many cases) more • than compensated (in term of time) by the reduction of bus accesses CISC and RISC are each one the best solution in different application fields • Nowadays coexistence of both architectures in the same processor: analysis at the end • of the course A simplified RISC architecture: DLX (implemented as real processor in the ‘80s as • R4000) 2

DLX (fixed) instruction format 31 26 25 21 20 16 15 11 10 0 6 bit 5 bit 5 bit 5 bit 11 bit R Op-code Rc Cod. op (11 bit) extension Ra Rb Arithmetic or logic instructions; i.e. Ra ← Rb op Rc or Set Conditions between registers Branch instructions 31 26 25 21 20 16 15 0 Immediate operand I Ra Rb Op-code or offset Data transfer (Load, Store), conditional Branch , JR and JALR (Control transfer via register), Set Condition e ALU with immediate operator. In Load and ALU instructions Ra=destination, in the Store Ra=source. -- Rb as ALU value for the immediate instructions - Branch instructions 31 26 25 0 J Op-code 26 bit (PC relative) offset 3 Direct, unconditional control transfer(J e JAL)

DLX non floating-point instructions (31x32bit registers R31…R1 - R0=0 fixed - Ra and Rb any of the 32 registers) Arithmetic/Logic Data Transfer Control ADD Ra,Rb,Rc LW Ra, offset(Rb) SETx Ra,Rb,Rc ADDI Ra,Rb,value LB Ra, offset(Rb) SETIx Ra,Rb,value ADDU Ra,Rb,Rc LBU Ra, offset(Rb) BEQZ Ra, offset (- - - +[PC]) ADDUI Ra,Rb, value LHU Ra, offset(Rb) BNEQZ Ra, offset (- - - +[PC]) SUB Ra,Rb,Rc LH Ra, offset(Rb) J offset SUBI Ra,Rb,value SW Ra, offset(Rb) JR Ra SUBU Ra,Rb,Rc SH Ra, offset(Rb) JL offset (- - - +[PC]) SUBUI Ra,Rb, value SB Ra, offset(Rb) JLR Ra DIV Ra,Rb,Rc LHI Ra, value DIVI Ra,Rb,value MULU Ra,Rb,Rc N.B. MULI Ra,Rb, value Postfix x (set condition) can be LT, GT, LE, GE, EQ, NE SLL Ra ,Rb,Rc JL (via or non via register) -> Jump and link saving PC in R31 SLLI Ra,Rb;value Offset is a value within the instruction SHR Ra,Rb.Rc SHRI Ra,Rb,value Postfix I means «immediate» (value within the instruction) SLA Ra,Rb,Rc PostfixA means «arithmetic» (sign extension) SLAI Ra,Rb,value Postfix U means «unsigned» OR Ra,Rb,Rc Value is the immediate within the instruction ORI Ra,Rb,value XOR Ra,Rb,Rc XORI Ra,Rb,value AND Ra,Rb,Rc ANDI Ra,Rb,value 4 No STACK registers

DLX ALU operations Two inputs data One output data plus flags Controls 32 S1 OUT ALU 32 S2 32 Flags S1 , S2 : ALU inputs (32 bit) Output Flags S1 + S2 Zero S1 – S2 Negative sign S1 and S2 S1 or S2 S1 exor S2 Left Shift S1 of S2 positions Right Shift S1 of S2 positions Arithmetic Right Shift S1 of S2 positions S1 S2 0 1 ALU is a combinatorial circuit !!! 5

Sequential DLX Ready ? INSTRUCTION FETCH [REG INSTR] ]<= M [PC] Abstract instruction [PC] <= [PC] +4 execution [A ]<= [Ra] [X] number of the INSTRUCTION [B ]<= [Rb] destination register DECODE [C] <= [Rc] [X ]<= num [Ra] PC is the Program Counter, A and B are two scratchpad internal Data transfer registers,REG instr is the register INSTRUCTION where the new fetched instruction is stored. All these registers are EXECUTION unknown to the programmer ALU This is a Set synchronous state diagram Jump Branch 6

Example: LB I NSTR <= M [PC] (LOAD BYTE format I) 31 26 25 21 20 16 15 0 Rb Op-code Ra offset [PC] <= [PC] +4 [A ]<= [Ra] [B ]<= [Rb] [C ]<= [Rc] LB Ra, offset(Rb) Instruction bit 15 (sign) is [X ]<= num [Ra] left extended 16 times Instr 15.0. is the instruction offset LOAD ## => JOIN operator Address is always 32 bit Byte Addr. < =[B] + (Instr 15 ) 16 ## Instr 15..0 31 MBbit 0 LSbit Sign extension !! Byte address compute Sign extension Example M[Addr] 7..0 =A7 H => (10100111) b [Ra] < =(M[Addr.] 7 ) 24 ## M[Addr.] 7..0 Sign extended address <= FFFFFFA7 H Byte in register 7 Next Instruction

Sign extension - example with IR (IR 15 ) 16 ## IR 15..0 From the Control Unit IR 0 15 31 31 30…………17 16 15-0 Tri-state devices 8

Ra unsigned Data transfer Addr. <= [B] + (Instr 15 ) 16 ## Instr 15..0 Instructions (R format) Examples LW Ra, offset(Rb) LB (byte) LBU (byte) LB LB Ra, offset(Rb) LBU Ra, offset(Rb) unsigned LHU Ra, offset(Rb) unsigned [ Ra] <= (M[Addr] 7 ) 24 ## M[Addr] 7..0 [Ra] < = (0) 24 ## M[Addr] 7..0 SW Ra, offset(Rb) LH (half word) LHU (half word) LH LHU [Ra ]< = (M[Addr] 15 ) 16 ## M[Addr] 15..0 [Ra] <= (0) 16 ## M[Addr] 15..0 . LW Signed SW M[Addr] <=[A] 9

Register (format R) Immediate (format I) ALUinstructions examples (I format) [T]<= (Instr 15 ) 16 ## Instr 15..0] [T]<= [Rc] (T is a hidden register Register content signed if arithmetic operations unknown to the programmer storing temporary data) ADD AND [Ra] <= [Rb] and [T] [Ra ]<= [Rb ]+ [T] ADD Ra,Rb,Rc ADDI Ra,Rb,value SUB XOR OR ADDU Ra,Rb,Rc ADDUI Ra,Rb, value [Ra] <= [Rb] xor [T] [Ra] <=[Rb] or [T] [Ra]<= [Rb] - [T] ……………………… The same scheme for the shift etc. A and B generic registers (Ra, Rb) 10

Immediate (format I) Register (format R) SET instructions [T]<= (Instr 15 ) 16 ## Instr 15..0] [T]<= [Rc] ( see branch ) ex. SLT Ra,Rb,Rc Set Ra=1 if Rb is less than Rc Register content as signed otherwise Ra=0 SEQ SLT SGE (T is a hidden register [Ra] = 1 if [Rb] < [T] [Ra] = 1 if [Rb] >= [T] [Ra] = 1 if [Rb] = [T] unknown to the programmer storing temporary data) SNE SGT SLE [ Ra] = 1 if [Rb]! = [T] [Ra] = 1 if [Rb] > [T] [Ra] =1 if [Rb] <= [T] 11

format J For saving [PC] in R31 JALR JAL [T] <= [PC] [T] <= [PC] JMP JR JAL JALR JUMP Instructions format I J offset (jump address) [PC] <= [Ra] [PC] <= [PC] + (Instr 25 ) 6 ## Instr 25..0 JR Ra (jump register) JL offset (jump and link address) JLR Ra (jump and link register) JAL JALR [R31 ]<= [T] 12

BRANCH BEQZ BNEZ format R Branch [Ra] = 1 [Ra!] = 1 Instructions YES YES NO NO Ex. BNEQZ R5, 100 Jump to PC+100 if R5 not equal 0 [PC] <= [PC] + (Instr 15 ) 16 ## Instr 15..0 INIT 13

The Pipelining Principle Pipelining is the main basic technique used for “ speeding-up ” a CPU. The key idea for pipelining is general, and is currently applied to several industry fields (productions lines, oil pipelines, …) A system S must operate N times on a task A i producing result R i : A 1 , A 2 , A 3 …A N R 1 , R 2 , R 3 …R N S Latency : time occurring between the beginning and the end of task A (T A ). Throughput : frequency of each task completion 14

The Pipelining Principle 1) Sequential System - A new instruction starts when the previous instruction is finished A 1 A n A 2 A 3 t T A A n n-th instruction - Latency (execution time of a single instruction) = T An Different execution times 2) Pipelined System (instruction are subdivided in stages – each stage during one n th – 1/4 in this example - of the entire instruction time) Successive instructions stages overlap A P 1 P 2 P 3 P 4 t S i : pipeline stage S 1 S 2 S 3 S 4 S 15

The Pipelining Principle T P T P : pipeline cycle (ideally one clock) For each cycle one instruction terminates A 1 P 1 P 2 P 3 P 4 In figure A1 terminates at t x A 2 P 2 Next cycle A2 terminates at t y etc. P 1 P 3 P 4 P 1 P 2 P 3 P 4 A 3 P 1 P 2 P 3 P 4 A 4 t y t x A n t P 1 P 2 P 3 P 4 16

Typical instruction stages IF ID EX MEM WB Instruction fetch Write-back (from memory) (if needed – jump no need) Instruction decode Data memory access (if needed – registers instructions no need) Instruction execution (ALU) N.B. The execution time (latency) of all instructions must be the same, for maintaining the results order. Some stages are not used for some instructions 17 (the stage is a NOP for them) – i.e. the stage MEM for register operations)

Pipelining of a CPU ( DLX ) Instruction sequence: I 1 , I 2 , I 3 …I N Instruction j IF ID EX MEM WB t Combinatorial circuits IF/ID ID/EX MEM/WB EX/MEM MEM IF ID EX WB Registers CPU (datapath) (Pipeline Registers D FF) Pipeline Cycle Clock Cycle Delay of the slowest stage 18 ClockPerInstruction (CPI)=1 (ideally !)

DLX computer Electronic Computers M 1 RISC architectures RISC vs - PowerPoint PPT Presentation

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90% of cases Waste of silicon

Comp. Organization DLX Comp. Arch. ECE 337 Unpipelined DLX Architecture Each DLX instruction

CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 Bhaskaran Raman Department of

Hypothetical Single-cycle Implementation of DLX Assume Each instructions completes in 1 (LONG!!)

Digital Logic Design: a rigorous approach c Chapter 22: A Simplified DLX: Implementation Guy

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

DLX Floating Point Extend MIPS Pipeline to Floating Point Operations Functional units

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Digital Logic Design: a rigorous approach c Chapter 21: The ISA of a Simplified DLX Guy Even

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Parallel architectures Electronic Computers LM Parallelism 1 Architecture Architecture:

400 . 20 . , - , , - , . , , .

Control (Branch) Hazards A: beqz R2, L1 B C D ------ L1: P Nave (Lazy) Implementation of

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Oscar Gilbert Department of Computer Science and Computer Engineering Sarah Marsh Department of

Introduction to Computer Security Rev. Sept 2015 What is Computer Security? 2 Computer

Genetische Algorithmen Christian Borgelt Arbeitsgruppe Neuronale Netze und Fuzzy-Systeme

Architecture in practice Actor Model and Event Sourcing combined with Security October 2017 by

Methods for handling uncertainty Default or nonmonotonic logic: Assume my car does not have a

Learning agents Performance standard Critic Sensors Learning from Observations feedback

Overcoming Barriers to Undergraduate Research in Logic Jeff Hirst Mathematical Sciences

The Recombination epoch of the Universe with dark matter: constraints on self-annihilation cross

INTRODUCTION TO AIR QUALITY IN CITIES Morgennebel | CC-BY-NC-ND-2.0 | https://flic.kr/p/ee6kY1

BEAM Detector Detector POWER x-ray laser - Focus e Final D a m E p l e i n