Lecture 4: Instruction Set Architecture Prof. V. Catania - - PDF document

lecture 4 instruction set architecture
SMART_READER_LITE
LIVE PREVIEW

Lecture 4: Instruction Set Architecture Prof. V. Catania - - PDF document

Prof. V.Catania Lecture 4: Instruction Set Architecture Prof. V. Catania Calcolatori elettron. II 2003 Towards Evaluation of ISA and Organization Prof. V.Catania software instruction set hardware Evolution of Instruction Sets Prof.


slide-1
SLIDE 1
  • Prof. V.Catania

Lecture 4: Instruction Set Architecture

  • Prof. V. Catania

Calcolatori elettron. II 2003

  • Prof. V.Catania

Towards Evaluation of ISA and Organization

instruction set software hardware

slide-2
SLIDE 2
  • Prof. V.Catania

Evolution of Instruction Sets

  • Major advances in computer architecture are

typically associated with landmark instruction set designs

– Ex: Stack vs GPR (System 360)

  • Design decisions must take into account:

– technology – machine organization – programming langauges – compiler technology – operating systems

And they in turn influence these

  • Prof. V.Catania

Design Space of ISA

Five Primary Dimensions

  • Number of explicit operands

( 0, 1, 2, 3 )

  • Operand Storage

Where besides memory?

  • Effective Address

How is memory location specified?

  • Type & Size of Operands

byte, int, float, vector, . . . How is it specified?

  • Operations

add, sub, mul, . . . How is it specifed?

Other Aspects

  • Successor

How is it specified?

  • Conditions

How are they determined?

  • Encodings

Fixed or variable? Wide?

  • Parallelism
slide-3
SLIDE 3
  • Prof. V.Catania

ISA Metrics

Aesthetics:

  • Orthogonality

– No special registers, few special cases, all operand modes available with any data type or instruction type

  • Completeness

– Support for a wide range of operations and target applications

  • Regularity

– No overloading for the meanings of instruction fields

  • Streamlined

– Resource needs easily determined

Ease of compilation (programming?) Ease of implementation Scalability

  • Prof. V.Catania

Classifying ISA

Tipo di storage interno alle CPU:

  • STACK ARCHITECTURE: operands are implicit
  • ACCUMULATOR ARCHITECTURE: one operand is

implicitly accumulator

  • GENERAL PURPOSE REGISTER: only explicit
  • perands (regs or mem)

Store C, R3 Pop C Add R3,R1,R2 Store C, R1 Store C Add Load R2, B Add R1, B Add B Push B Load R1, A Load R1, A Load A Push A Register

(load-store)

Register

(register-memory)

Accumulator Stack

slide-4
SLIDE 4
  • Prof. V.Catania

Classifying ISA

load-store architecture: no memory reference per ALU instructions register-memory (memory-memory) architecture: one memory operand per ALU instruction (multiple memory operands per ALU instruction)

FIGURE 2.2 Possible combinations of memory operands and total operands per typical ALU instruction with examples of machines. VAX (also has 2-operand formats) 3 3 VAX (also has 3-operand formats) 2 2 Intel 80x86, Motorola 68000 2 1 SPARC, MIPS, Precision Architecture, PowerPC, ALPHA 3 Examples Maximum number of

  • perands allowed

Number of memory adresses

  • Prof. V.Catania

Classifying ISA

Large variation in instruction size, expecially for 3-operand instructions. Also, large variation in work per instruction. Memory accesses create memory bottleneck Most compact. Doesn’t waste registers for temporaries. Memory- memory (3,3) Operands are not equivalent since a source operand in a binary operation is

  • destroyed. Encoding a register number and

a memory address in each instruction may restrict the number of registers. Clocks per instruction varies by operand location. Data can be accessed without loading

  • first. Instructions format tend to be easy

to encode and yeld good density. Register. memory (1,2) Higher instruction count that architecture with memory referencesin instructions. Some instructions are short and bit encoding may be wasteful. Simple, fixed-length instruction encoding. Simple code-generation model. Instructions take similar numbers of clocks to execute Register- register (0,3) Disadvantages Advantages Type

  • Fewer alternatives simpler the compiler task (fewer decisions for the

compiler)

  • Wide variety of flexible inst. formats smaller # bits to encode the program

(higher instr. density)

slide-5
SLIDE 5
  • Prof. V.Catania

Addressing modes

Addressing mode Example instruction Meaning When used Register Add R4,R3 Regs[R4]←Regs[R4]+ Regs[R3] When a value is in a register. Immediate Add R4,#3 Regs[R4]←Regs[R4]+3 For constants. Displacement Add R4,100(R1) Regs[R4]←Regs[R4]+ Mem[100+Regs[R1]] Accessing local variables. Register deferred

  • r indirect

Add R4,(R1) Regs[R4]←Regs[R4]+ Mem[Regs[R1]] Accessing using a pointer or a computed address. Indexed Add R3,(R1 + R2) Regs[R3]←Regs[R3]+ Mem[Regs[R1]+Regs[R2]] Sometimes useful in array addressing: R1 = base of array; R2 = index amount. Direct or absolute Add R1,(1001) Regs[R1]←Regs[R1]+ Mem[1001] Sometimes useful for accessing static data; address constant may need to be large. Memory indirect

  • r memory

deferred Add R1,@(R3) Regs[R1]←Regs[R1]+ Mem[Mem[Regs[R3]]] If R3 is the address of a pointer p, then mode yields *p. Autoincrement Add R1,(R2)+ Regs[R1]←Regs[R1]+ Mem[Regs[R2]] Regs[R2]←Regs[R2]+d Useful for stepping through ar- rays within a loop. R2 points to start of array; each reference in- crements R2 by size of an ele- ment, d. Auto- decrement Add R1,Ð(R2) Regs[R2]←Regs[R2]Ðd Regs[R1]←Regs[R1]+ Mem[Regs[R2]] Same use as autoincrement. Autodecrement/increment can also act as push/ pop to imple- ment a stack. Scaled Add R1,100(R2)[R3] Regs[R1]← Regs[R1]+ Mem[100+Regs[R2]+Regs [R3]*d] Used to index arrays. May be applied to any indexed address- ing mode in some machines.

FIGURE 2.5 Selection of addressing modes with examples, meaning, and usage.

  • Prof. V.Catania

Addressing modes

Wide addressing modes The knowledge on the usage of various add. modes Significantly reduce instruction counts Increase the complexity

  • f building a machine

High CPI Help the architect choose what to include! Instruction Set Mesurement!

slide-6
SLIDE 6
  • Prof. V.Catania

0% 10% 20% 30% 40% 50% 60% 24% 11% 39% 32% 40% 3% 43% 17% 55% 0% 6% 16% Scaled Register deferred Immediate Displacement TeX spice gcc TeX spice gcc TeX spice gcc TeX spice gcc 1% 6% Memory indirect TeX spice gcc 1% Frequency of the addressing mode

FIGURE 2.6 Summary of use of memory addressing modes (including immediates), using 3 programs of SPEC89 on a VAX

Mesurement on add. modes

Immediate and displacement dominate addressing mode usage!

  • Prof. V.Catania

FIGURE 2.7 Displacement values are widely distributed.

0% 5% 10% 15% 20% 25% 30% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Floating-point average Integer average Percentage of displacement Value

Range of displacement

  • 12 bit of displ. captures 75%
  • f the full 32 bit displ.
  • 16 bit of displ. captures 99%!
slide-7
SLIDE 7
  • Prof. V.Catania

Where Immediate has to be used?

0% 50% 100% 78% 58% 35% 77% 87% Compares ALU operations All instructions 10% 10% Loads 45% Percentage of operations that use immediates Integer average Floating-point average

FIGURE 2.8 We see that for integer ALU operations about one-half to three-quarters of the operations have an immediate operand, while for integer compares 75% to 85% of the occurrences use an immediate operand.

  • Int programs use immediate in

about 1/3 of the instructions

  • F.p. programs use immediate

in 1/10 of instructions!

  • Prof. V.Catania

Values for Immediate

FIGURE 2.9 The distribution of immediate values is shown. Machine used: VAX.

0% 10% 20% 30% 40% 50% 60% 32 28 24 20 16 12 8 4 Number of bits needed for an immediate value gcc TeX spice

75% to 80% of the immediate fit within 16 bits

slide-8
SLIDE 8
  • Prof. V.Catania

Summary: Memory Addressing

  • 1. A new architecture should support at least the following

addressing modes:

  • 2. The size of 12÷16 bits for displacement will capture

75%÷99% of displacements

  • 3. Immediate field of 8÷16 bits will capture 50%÷80% of the

immediates

DISPLACEMENT IMMEDIATE REGISTER DEFERRED

75% ÷99% used in our measurements

  • Prof. V.Catania

Typical operations in Instruction Set

Operator type Examples Arithmetic and logical Integer arithmetic and logical operations: add, and, subtract, or Data transfer Loads-stores (move instructions on machines with memory addressing) Control Branch, jump, procedure call and return, traps System Operating system call, virtual memory management instructions Floating point Floating-point operations: add, multiply Decimal Decimal add, decimal multiply, decimal-to-character conversions String String move, string compare, string search Graphics Pixel operations, compression/decompression operations FIGURE 2.10 Categories of instruction operators and examples of each.

slide-9
SLIDE 9
  • Prof. V.Catania

Frequency of operations in 80x86 Inst.Set.

Rank 80x86 instruction Integer average (% total executed) 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96%

FIGURE 2.11 The top 10 instructions for the 80x86.

Average values for 5 programs in SPECint92

  • Prof. V.Catania

Mesurements on Instruction for Control Flow

0% 50% 100% 4% 87% 81% 6% 11% 13% Call/return Jump Conditional branch Frequency of branch classes Integer average Floating-point average

FIGURE 2.12 Breakdown of control flow instructions into three classes: calls or returns, jumps, and conditional branches.

slide-10
SLIDE 10
  • Prof. V.Catania

Size for Branch Displacement

FIGURE 2.13 Branch distances in terms of number of instructions between the target and the branch instruction.

0% 5% 10% 15% 20% 25% 30% 35% 40% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Bits of branch displacement Floating-point average Integer average

  • Prof. V.Catania

Mesurements on Conditional Branches

0% 50% 100% 23% 37% 86% 7% 40% 7% Less than/ greater than or equal Greater than/ less than or equal Equal/ not equal Frequency of comparison types in branches Integer average Floating-point average

FIGURE 2.15 Frequency of different types of compares in conditional branches.

slide-11
SLIDE 11
  • Prof. V.Catania

Summary: operations in the Instruction Set

  • 1. Note the importance of simple instructions:

load, store, add, sub, …

  • 2. Branch addressing: 100 instructions either

above or below the branch (PC relative displacement of at least 8 bits)

  • Prof. V.Catania

Type and Size of operands

0% 40% 80% 20% 60% 0% 19% 7% 31% 74% Word Half word Byte 0% 0% Double word 69% Frequency of reference by size Integer average Floating-point average

FIGURE 2.16 Distribution of data accesses by size for the benchmark programs.

slide-12
SLIDE 12
  • Prof. V.Catania

Encoding an Instruction Set

Tradeoff between several competing forces:

  • 1. The desire to have a high number of register

and addressing modes

  • 2. The impact of the size of the register and addr.

modes fields on the average instruction size the average program size (# bits)

  • 3. Encode into lengths that will be easy to handle

in the implementation

RISC: fixed-length instruction to gain implementation benefit while sacrificing average code size!

  • Prof. V.Catania

Alternatives in Instruction Encoding

Operation &

  • no. of operands

Address specifier 1 Address field 1 Address field 1 Operation Address field 2 Address field 3 Address specifier Operation Address field Address specifier 1 Operation Address specifier 2 Address field Address specifier Operation Address field 1 Address field 2 Address specifier n Address field n (a) Variable (e.g., VAX) (b) Fixed (e.g., DLX, MIPS, Power PC, Precision Architecture, SPARC) (c) Hybrid (e.g., IBM 360/70, Intel 80x86)

FIGURE 2.17 Three basic variations in instruction encoding.

slide-13
SLIDE 13
  • Prof. V.Catania

Summary: encoding the instruction set

  • The architect more interested in code size will

pick variable encoding (CISC)

  • The one more interested in performance than

code size will pick fixed encoding (RISC)

  • Prof. V.Catania

The role of the compilers

Today almost all programming is done in high level language Since most instructions executed are the output of a compiler, an ISA is essentially a compiler target! Understanding compiler technology today is a crucial issue to designing and efficiently implementing an instruction set!

slide-14
SLIDE 14
  • Prof. V.Catania

Compiler Structure

  • Prof. V.Catania
  • Correttezza
  • Velocità di exec. Codice compilato
  • Velocità di compilaz.
  • nr. di ottimizzazioni e complessità di implementaz.

di un compilatore corretto devono costituire un trade-off (organizzazione su più passi) PROBLEMA PHASE ORDERING

Obiettivi nel design di un compilatore

slide-15
SLIDE 15
  • Prof. V.Catania

PHASE ORDERING ISA ES: global common subexpression elimination Due istanze di una espressione che computano lo stesso valore. Il primo valore viene salvato e usato al posto della seconda istanza. Occorre salvare il valore in un registro ma… l’allocazione dei registri avviene dopo! GRAPH COLORING Almeno 16 registri!

Phase Ordering

  • Prof. V.Catania

Tipi di ottimizzazioni nei compilatori

  • 1. High level optimizations: agiscono sul codice sorgente

e il risultato è passato ai passi successivi

  • 2. Local optimizations: ottimizzaono il codice dentro un

basic block

  • 3. Global optimizations: estendono le ottimizzazioni locali

attraverso i branches ed eseguono trasformazioni per

  • ttimizzare i loops
  • 4. Register allocation
  • 5. Machine dependent optimization: provano a sfruttare

la conoscenza specifica dell’architettura

slide-16
SLIDE 16
  • Prof. V.Catania

2.19

Optimization name Explanation Percentage of the total num

  • ber of optimizing transforms

High-level At or near the source level; machine

  • inde

pendent Procedure integration Replace procedure call by procedure body N.M. Local Within straight-line code Common subexpression elimination Replace two instances of the same computa tion by single copy 18% Constant propagation Replace all instances of a variable that is assigned a constant with the constant 22% Stack height reduction Rearrange expression tree to minimize re- sources needed for expression evaluation N.M. Global Across a branch Global common subexpression elimination Same as local, but this version crosses branches 13% Copy propagation Replace all instances of a variable A that has been assigned X(i.e., A = X) with X 11% Code motion Remove code from a loop that computes same value each iteration of the loop 16% Induction variable elimina tion Simplify/eliminate array-addressing calcula tions within loops 2% Machine-dependent Depends on machine knowledge Strength reduction Many examples, such as replace multiply by a con stant with adds and shifts N.M. Pipeline scheduling Reorder instructions to improve pipeline per formance N.M. Branch offset optimization Choose the shortest branch displacement that reaches target N.M. FIGURE 2.19 Major types of optimizations and examples in each class.

Mesurements on a set of Fortran, Pascal and C programs

  • Prof. V.Catania

0% 20% 40% 60% 80% 100% li level 0 li level 1 li level 2 li level 3 hydrol 0 hydrol 1 yhdrol 2 hydrol 3 100% 89% 75% 73% 100% 36% 26% 26% Program and compiler

  • ptimization level

FLOPs Loads-stores Integer ALU Branches/calls

Percentof unoptimized instructions executed

FIGURE 2.20 Change in instruction count for the programs hydro2d and li from the SPEC92 as compiler optimization levels vary.

Effect of various optimizations

slide-17
SLIDE 17
  • Prof. V.Catania

ISA choice Compiler designers

LINEE GUIDA Compiler Design:

  • make the frequent case fast!
  • make the rare case correct!
  • Regolarità (ortogonalità fra type, oper.,

indirizz.)

  • Provide primitives not solutions!
  • Un numero limitato di alternative per

semplificare il processo di decisione del compilatore ISA DESIGN Rispetto di alcune proprietà

  • Prof. V.Catania

Criteri per il DLX ISA

1. Use general purpose register with a load-store architecture 2. Addressing modes: displacement (16 bits of displacement value), immediate (size: 8-16 bit) and register deferred 3. Support simple instructions (the most frequently executed!) 4. Fixed addressing to improve performance 5.

  • Provide at least 16 general purpose regs + separate fp regs
  • All addressing modes apply to all data transfer instructions
  • Aim for a minimalist instruction set

DLX architecture

slide-18
SLIDE 18
  • Prof. V.Catania

I - type instruction Rs1 (5 bits) Rd (5 bits) Immediate ( 16 bits) Encodes: Loads and stores of bytes, words, half words All immediates (rd ← rs1 op immediate) Conditional branch instructions (rs1 is register, rd unused) Jump register, jump and linkregister (rd = 0, rs = destination, immediate = 0) R - type instruction Rs1 (5 bits) Rs2 (5 bits) Register register ALU operations: r d ← rs1 func rs2 Function encodes the datapath operation: Add, Sub , . . . Read/write special registers and moves Func (11 bits) Opcode (6 bits) J - type instruction Offset added toPC (26 bits) Jumpand jump and link Trap and return from exception Opcode (6 bits) Opcode (6 bits) Rd (5 bits) 0 ……………5 6……10 11……1516………………………………..31 0 ……………5 6……10 11……15 16…20 21…………………..31 0 ……………5 6……………………………………………..…………..31

DLX Instruction Format

  • Prof. V.Catania

Comparing ISAs

FIGURE 2.30 Ratio of MIPS M2000 to VAX 8700 in instructions executed and performance in clock cycles using SPEC89 programs.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 l i e q n t

  • t

t e s p r e s s

  • d
  • d

u c t

  • m

c a t v f p p p p n a s a 7 m a t r i x s p i c e

Performance ratio Instructions executed ratio CPI ratio SPEC 89 benchmarks MIPS/VAX