dlx computer
play

DLX computer Electronic Computers M 1 RISC architectures RISC vs - PowerPoint PPT Presentation

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set Computer In CISC architectures the 10% of the instructions are used in 90% of cases Waste of silicon


  1. DLX computer Electronic Computers M 1

  2. RISC architectures RISC vs CISC (Reduced Instruction Set Computer vs Complex Instruction Set • Computer In CISC architectures the 10% of the instructions are used in 90% of cases • Waste of silicon • Bottleneck: the bus • Mid ‘80s a new architecture: RISC • Solution: reduction of instruction number and complexity (fewer simpler machine • instructions) Fixed instruction format (simpler instruction decoders) • Simpler control logic network increasing the number of on-chip registers • Reduction of bus/memory accesses • Increase of machine instructions needed for a job which is (in many cases) more • than compensated (in term of time) by the reduction of bus accesses CISC and RISC are each one the best solution in different application fields • Nowadays coexistence of both architectures in the same processor: analysis at the end • of the course A simplified RISC architecture: DLX (implemented as real processor in the ‘80s as • R4000) 2

  3. DLX (fixed) instruction format 31 26 25 21 20 16 15 11 10 0 6 bit 5 bit 5 bit 5 bit 11 bit R Op-code Rc Cod. op (11 bit) extension Ra Rb Arithmetic or logic instructions; i.e. Ra ← Rb op Rc or Set Conditions between registers Branch instructions 31 26 25 21 20 16 15 0 Immediate operand I Ra Rb Op-code or offset Data transfer (Load, Store), conditional Branch , JR and JALR (Control transfer via register), Set Condition e ALU with immediate operator. In Load and ALU instructions Ra=destination, in the Store Ra=source. -- Rb as ALU value for the immediate instructions - Branch instructions 31 26 25 0 J Op-code 26 bit (PC relative) offset 3 Direct, unconditional control transfer(J e JAL)

  4. DLX non floating-point instructions (31x32bit registers R31…R1 - R0=0 fixed - Ra and Rb any of the 32 registers) Arithmetic/Logic Data Transfer Control ADD Ra,Rb,Rc LW Ra, offset(Rb) SETx Ra,Rb,Rc ADDI Ra,Rb,value LB Ra, offset(Rb) SETIx Ra,Rb,value ADDU Ra,Rb,Rc LBU Ra, offset(Rb) BEQZ Ra, offset (- - - +[PC]) ADDUI Ra,Rb, value LHU Ra, offset(Rb) BNEQZ Ra, offset (- - - +[PC]) SUB Ra,Rb,Rc LH Ra, offset(Rb) J offset SUBI Ra,Rb,value SW Ra, offset(Rb) JR Ra SUBU Ra,Rb,Rc SH Ra, offset(Rb) JL offset (- - - +[PC]) SUBUI Ra,Rb, value SB Ra, offset(Rb) JLR Ra DIV Ra,Rb,Rc LHI Ra, value DIVI Ra,Rb,value MULU Ra,Rb,Rc N.B. MULI Ra,Rb, value Postfix x (set condition) can be LT, GT, LE, GE, EQ, NE SLL Ra ,Rb,Rc JL (via or non via register) -> Jump and link saving PC in R31 SLLI Ra,Rb;value Offset is a value within the instruction SHR Ra,Rb.Rc SHRI Ra,Rb,value Postfix I means «immediate» (value within the instruction) SLA Ra,Rb,Rc PostfixA means «arithmetic» (sign extension) SLAI Ra,Rb,value Postfix U means «unsigned» OR Ra,Rb,Rc Value is the immediate within the instruction ORI Ra,Rb,value XOR Ra,Rb,Rc XORI Ra,Rb,value AND Ra,Rb,Rc ANDI Ra,Rb,value 4 No STACK registers

  5. DLX ALU operations Two inputs data One output data plus flags Controls 32 S1 OUT ALU 32 S2 32 Flags S1 , S2 : ALU inputs (32 bit) Output Flags S1 + S2 Zero S1 – S2 Negative sign S1 and S2 S1 or S2 S1 exor S2 Left Shift S1 of S2 positions Right Shift S1 of S2 positions Arithmetic Right Shift S1 of S2 positions S1 S2 0 1 ALU is a combinatorial circuit !!! 5

  6. Sequential DLX Ready ? INSTRUCTION FETCH [REG INSTR] ]<= M [PC] Abstract instruction [PC] <= [PC] +4 execution [A ]<= [Ra] [X] number of the INSTRUCTION [B ]<= [Rb] destination register DECODE [C] <= [Rc] [X ]<= num [Ra] PC is the Program Counter, A and B are two scratchpad internal Data transfer registers,REG instr is the register INSTRUCTION where the new fetched instruction is stored. All these registers are EXECUTION unknown to the programmer ALU This is a Set synchronous state diagram Jump Branch 6

  7. Example: LB I NSTR <= M [PC] (LOAD BYTE format I) 31 26 25 21 20 16 15 0 Rb Op-code Ra offset [PC] <= [PC] +4 [A ]<= [Ra] [B ]<= [Rb] [C ]<= [Rc] LB Ra, offset(Rb) Instruction bit 15 (sign) is [X ]<= num [Ra] left extended 16 times Instr 15.0. is the instruction offset LOAD ## => JOIN operator Address is always 32 bit Byte Addr. < =[B] + (Instr 15 ) 16 ## Instr 15..0 31 MBbit 0 LSbit Sign extension !! Byte address compute Sign extension Example M[Addr] 7..0 =A7 H => (10100111) b [Ra] < =(M[Addr.] 7 ) 24 ## M[Addr.] 7..0 Sign extended address <= FFFFFFA7 H Byte in register 7 Next Instruction

  8. Sign extension - example with IR (IR 15 ) 16 ## IR 15..0 From the Control Unit IR 0 15 31 31 30…………17 16 15-0 Tri-state devices 8

  9. Ra unsigned Data transfer Addr. <= [B] + (Instr 15 ) 16 ## Instr 15..0 Instructions (R format) Examples LW Ra, offset(Rb) LB (byte) LBU (byte) LB LB Ra, offset(Rb) LBU Ra, offset(Rb) unsigned LHU Ra, offset(Rb) unsigned [ Ra] <= (M[Addr] 7 ) 24 ## M[Addr] 7..0 [Ra] < = (0) 24 ## M[Addr] 7..0 SW Ra, offset(Rb) LH (half word) LHU (half word) LH LHU [Ra ]< = (M[Addr] 15 ) 16 ## M[Addr] 15..0 [Ra] <= (0) 16 ## M[Addr] 15..0 . LW Signed SW M[Addr] <=[A] 9

  10. Register (format R) Immediate (format I) ALUinstructions examples (I format) [T]<= (Instr 15 ) 16 ## Instr 15..0] [T]<= [Rc] (T is a hidden register Register content signed if arithmetic operations unknown to the programmer storing temporary data) ADD AND [Ra] <= [Rb] and [T] [Ra ]<= [Rb ]+ [T] ADD Ra,Rb,Rc ADDI Ra,Rb,value SUB XOR OR ADDU Ra,Rb,Rc ADDUI Ra,Rb, value [Ra] <= [Rb] xor [T] [Ra] <=[Rb] or [T] [Ra]<= [Rb] - [T] ……………………… The same scheme for the shift etc. A and B generic registers (Ra, Rb) 10

  11. Immediate (format I) Register (format R) SET instructions [T]<= (Instr 15 ) 16 ## Instr 15..0] [T]<= [Rc] ( see branch ) ex. SLT Ra,Rb,Rc Set Ra=1 if Rb is less than Rc Register content as signed otherwise Ra=0 SEQ SLT SGE (T is a hidden register [Ra] = 1 if [Rb] < [T] [Ra] = 1 if [Rb] >= [T] [Ra] = 1 if [Rb] = [T] unknown to the programmer storing temporary data) SNE SGT SLE [ Ra] = 1 if [Rb]! = [T] [Ra] = 1 if [Rb] > [T] [Ra] =1 if [Rb] <= [T] 11

  12. format J For saving [PC] in R31 JALR JAL [T] <= [PC] [T] <= [PC] JMP JR JAL JALR JUMP Instructions format I J offset (jump address) [PC] <= [Ra] [PC] <= [PC] + (Instr 25 ) 6 ## Instr 25..0 JR Ra (jump register) JL offset (jump and link address) JLR Ra (jump and link register) JAL JALR [R31 ]<= [T] 12

  13. BRANCH BEQZ BNEZ format R Branch [Ra] = 1 [Ra!] = 1 Instructions YES YES NO NO Ex. BNEQZ R5, 100 Jump to PC+100 if R5 not equal 0 [PC] <= [PC] + (Instr 15 ) 16 ## Instr 15..0 INIT 13

  14. The Pipelining Principle Pipelining is the main basic technique used for “ speeding-up ” a CPU. The key idea for pipelining is general, and is currently applied to several industry fields (productions lines, oil pipelines, …) A system S must operate N times on a task A i producing result R i : A 1 , A 2 , A 3 …A N R 1 , R 2 , R 3 …R N S Latency : time occurring between the beginning and the end of task A (T A ). Throughput : frequency of each task completion 14

  15. The Pipelining Principle 1) Sequential System - A new instruction starts when the previous instruction is finished A 1 A n A 2 A 3 t T A A n n-th instruction - Latency (execution time of a single instruction) = T An Different execution times 2) Pipelined System (instruction are subdivided in stages – each stage during one n th – 1/4 in this example - of the entire instruction time) Successive instructions stages overlap A P 1 P 2 P 3 P 4 t S i : pipeline stage S 1 S 2 S 3 S 4 S 15

  16. The Pipelining Principle T P T P : pipeline cycle (ideally one clock) For each cycle one instruction terminates A 1 P 1 P 2 P 3 P 4 In figure A1 terminates at t x A 2 P 2 Next cycle A2 terminates at t y etc. P 1 P 3 P 4 P 1 P 2 P 3 P 4 A 3 P 1 P 2 P 3 P 4 A 4 t y t x A n t P 1 P 2 P 3 P 4 16

  17. Typical instruction stages IF ID EX MEM WB Instruction fetch Write-back (from memory) (if needed – jump no need) Instruction decode Data memory access (if needed – registers instructions no need) Instruction execution (ALU) N.B. The execution time (latency) of all instructions must be the same, for maintaining the results order. Some stages are not used for some instructions 17 (the stage is a NOP for them) – i.e. the stage MEM for register operations)

  18. Pipelining of a CPU ( DLX ) Instruction sequence: I 1 , I 2 , I 3 …I N Instruction j IF ID EX MEM WB t Combinatorial circuits IF/ID ID/EX MEM/WB EX/MEM MEM IF ID EX WB Registers CPU (datapath) (Pipeline Registers D FF) Pipeline Cycle Clock Cycle Delay of the slowest stage 18 ClockPerInstruction (CPI)=1 (ideally !)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend