Reduced Instruction Set Computers Raul Queiroz Feitosa Parts of - - PDF document

reduced instruction set computers
SMART_READER_LITE
LIVE PREVIEW

Reduced Instruction Set Computers Raul Queiroz Feitosa Parts of - - PDF document

Reduced Instruction Set Computers Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To provide an overview of the innovations in the areas of computer organization and architecture


slide-1
SLIDE 1

Reduced Instruction Set Computers

Raul Queiroz Feitosa

Parts of these slides are from the support material provided by W. Stallings

RISC
  • Objective

To provide an overview of the innovations in the areas of computer organization and architecture related to Reduced Instruction Set Computers.

slide-2
SLIDE 2 RISC
  • Outline

Historical Overview Instruction Execution Characteristics Use of Large Register File Reduced Instruction Set Architecture RISC Pipelining

RISC
  • Driving forces for CISC

Software costs far exceed hardware costs Increasingly complex high level languages Semantic gap Leads to:

Inefficient code Excessive machine program size Compiler complexity Small register sets

slide-3
SLIDE 3 RISC
  • Driving forces for CISC

Access to control memory faster than to

external memory

Leads to:

Move complexity to microcode Larger and more powerful instruction sets More addressing modes Hardware implementations of HLL statements

e.g. CASE (switch) on VAX

RISC
  • Changes toward RISC

The semiconductor technology and cache

memories → reduced the memory access time

Compiler technology evolved → more

intelligence built in compilers

Pipelining → see later. The program’s dynamic behavior started

being investigated

slide-4
SLIDE 4 RISC
  • Outline

Historical Overview Instruction Execution Characteristics Use of Large Register File Reduced Instruction Set Architecture RISC Pipelining

RISC
  • Frequency of HLL Operations

Procedure call/return is the most time consuming operation in typical HLL.

slide-5
SLIDE 5 RISC
  • Operands

Furthermore, 80% of the scalars are local to procedures → → → → optimisation should concentrate on accessing local variables.

RISC
  • Procedure Calls

Registers are saved by calling and restored by returning → → → → very time consuming Programs mostly confined to a narrow window of procedure invocation depth

slide-6
SLIDE 6 RISC
  • Procedure Calls

Typically procedures employs few passed parameters and local variables

RISC
  • Implications

Best support is given by optimising most used

and most time consuming features

Large number of registers

Operand referencing

Careful design of pipelines

Branch prediction etc.

Simplified (reduced) instruction set Move complexity to compiler

slide-7
SLIDE 7 RISC
  • Outline

Historical Overview Instruction Execution Characteristics Use of Large Register File Reduced Instruction Set Architecture RISC Pipelining

RISC
  • Large Register File

Software solution

Require compiler to allocate registers Allocate based on most used variables in a given

time

Requires sophisticated program analysis

Hardware solution

Have more registers Thus more variables will be in registers

slide-8
SLIDE 8 RISC
  • SW Based Register Optimization

Assume small number of registers (16-32) Optimizing use is up to compiler HLL programs have no explicit references to registers

usually - think about C - register int

Assign symbolic or virtual register to each candidate

variable

Map (unlimited) symbolic registers to real registers Symbolic registers that do not overlap can share real

registers

If you run out of real registers some variables use

memory

RISC
  • Graph Coloring

Actual registers

A B C D E F time R1 R2 R3

Symbolic registers

D E A B D C E F

Time sequence of active use of registers Register interference graph

  • Symbolic register that are

simultaneously in use are connected by an edge and are assigned different colors

  • The aim is to minimize the number of

different colors.

slide-9
SLIDE 9 RISC
  • HW Solution - Register Window

Register set is split in windows, just one window visible at a time. A window has three fields:

Input parameter & returned results Local variables Input parameter & returned results of the procedure called by current procedure

Parameter Registers Local Registers Temporary Registers Window of level J Window of level J+1 Parameter Registers Local Registers Temporary Registers

  • verlap
RISC
  • Circular Buffer

A.p F.t B.p A.t C.p B.t D.p C.t E.p D.t F.p E.t A.l B.l C.l D.l E.l F.l WA WB WC WD WE WF Saved window pointer (SWP) Restore Save C u r r e n t W i n d

  • w

p

  • i

n t e r ( C W P ) R e t u r n C a l l

  • Only one window register is visible,

the one pointed by CWP

  • Register references are offset by CWP
  • If procedure E calls F, arguments for F

are placed in E.t, and CWP advances

  • ne window
  • SWP identifies the window most

recently saved in memory

  • If procedure F calls another one,

CWP=SWP, an interrupt occurs, and the A window is saved.

C u r r e n t W i n d

  • w

p

  • i

n t e r ( C W P )

slide-10
SLIDE 10 RISC
  • Global Variables

Allocated by the compiler to memory

Inefficient for frequently accessed variables

Have a set of registers for global variables

RISC
  • Registers × Cache
slide-11
SLIDE 11 RISC
  • Outline

Historical Overview Instruction Execution Characteristics Use of Large Register File Reduced Instruction Set Architecture RISC Pipelining

RISC
  • RISC Characteristics
  • 1. One instruction per cycle
  • 2. Register to register operations

Ex.: addu r1,r2,r4 /* add unsigned r2 to r4 and put in r1 addu r1,#imm(r4) /* add unsigned r1 to memory address r4 offset #imm

  • 3. Memory access only through Load/Store
  • 4. Few, simple addressing modes

Ex.: lw r2,128(r3) /* load address 128 offset from r3 into r2 .

FORBIDDEN

slide-12
SLIDE 12

Ex.: Intel x86

RISC
  • RISC Characteristics

5.

Few, simple, fixed instruction formats

6 5 5 5 5 6 Operation rs rt rd Shift Function Operation rs rt Immediate 6 5 5 5 5 6 Operation Target 6 26

I-type (immediate J-type (jump) R-type (register Operation Rs Rt Immediate Target Rd Shift Function Operation Code Source register specifier Source/destination register specifier Immediate, branch, or address displacement Jump target address Destination register specifier Shift amount ALU/shift function specifier

6 5 5 5 5 6 Operation rs rt rd Shift Function Operation rs rt Immediate 6 5 5 5 5 6 Operation Target 6 26

I-type (immediate J-type (jump) R-type (register Operation Rs Rt Immediate Target Rd Shift Function Operation Code Source register specifier Source/destination register specifier Immediate, branch, or address displacement Jump target address Destination register specifier Shift amount ALU/shift function specifier

Ex.: MIPS R4000

RISC
  • RISC Characteristics

6.

Hardwired design (no microcode)

7.

More compile time/effort

slide-13
SLIDE 13 RISC
  • Outline

Historical Overview Instruction Execution Characteristics Use of Large Register File Reduced Instruction Set Architecture RISC Pipelining

RISC
  • RISC Pipelining

Delayed branch Delayed Load

Register to be the target is

locked by processor

Continue execution of

instruction stream until register required

Idle until load complete Re-arranging instructions

can allow useful work whilst loading

Loop Unrolling

Load rA M1 Load rB M2 Load rC M3 Load rD M4 Add rE rA+rB NOOP Add rF rC+rD

Ex.: load complete after 2 instruction cycles

slide-14
SLIDE 14 RISC
  • Loop Unrolling

Replicate body of loop a number of times Iterate loop fewer times In consequence

Reduces loop overhead Increases instruction parallelism Improved register, data cache or TLB locality

RISC
  • Loop Unrolling (2×) Example

The code

do i=2, n-1 a[i] = a[i] + a[i-1] * a[i+l] end do

becomes

do i=2, n-2, = + a[i-1] * = + * a[i+2] end do if (mod(n-2,2) = i) then a[n-1] = a[n-1] + a[n-2] * a[n] end if

Benefits:

  • 1. loop overhead halved
  • 2. An assignment, a stores

and loop variable updated simultaneously → increase parallelism

  • 3. variables used twice in the

loop body → improve locality

2 a[i] a[i+l] = a[i] a[i+1] = a[i+l] a[i]

slide-15
SLIDE 15 RISC
  • Controversy

Quantitative

compare program sizes and execution speeds

Qualitative

examine issues of high level language support and use of

VLSI real estate

Problems

No pair of RISC and CISC that are directly comparable No definitive set of test programs Difficult to separate hardware effects from compiler

effects

Most comparisons done on “toy” rather than production

machines

Most commercial devices are a mixture

RISC
  • Exercise 1

Consider the loop below on the left. A straightforward translation of this into a generic assembly language would look something like the code below on the right. A compiler for a RISC machine will introduce delay slots into this code so that the processor can employ the delayed branch mechanism. The JMP instruction is easy to deal with, because this instruction is always followed by the SUB instruction; therefore, we can simply place a copy of the SUB instruction in the delay slot after the JMP. The BEQ presents a difficulty. We can’t leave the code as is, because the ADD instruction would then be executed one too many times. Therefore, a NOP instruction is needed. Show the resulting code.

LD R1,0 ; keep value of S in R1 LD R2,1 ; keep value of K in R LP SUB R1,R1,R2 ; W:= S-K BEQ R2,100,EXIT; done if K = 100 ADD R2,R2, 1 ; else increment K JMP LP ; back to start of loop

S := 0; for K:= 1 to 100 do S:=S – K;

Problem 13.6 from Stallings 5th Ed.
slide-16
SLIDE 16 RISC
  • Exercise 2

A RISC machine may do both a mapping of symbolic registers to actual registers and a rearrangement of instructions for pipeline efficiency. An interesting question arises as to the order in which these two operations should be done. Consider the following program fragment:

a) First do the register mapping and then any possible instruction

  • reordering. How many machine registers are used? Has there been

any pipepline improvement?

b) Starting with the original program, now do instruction reordering

and then any possible mapping. How many machine registers are used? Has there been any pipeline improvement?

Problem 13.7 from Stallings 5th Ed.

LD SR1,A ; load A into symbolic register 1 LD SR2,b ; load B into symbolic register 2 ADD SR3, SR1, SR2 ; add contents of SR1 and SR2 and store in SR3 LD SR4,C LD SR5,D ADD SR6,SR4,SR5

RISC
  • Text Book References

The topics are covered in

Stallings - sections 13.1 to 13.5 and 13.8

slide-17
SLIDE 17 RISC
  • Reduced Instruction Set

Computers

END