- Prof. V.Catania
Lecture 4: Instruction Set Architecture
- Prof. V. Catania
Calcolatori elettron. II 2003
- Prof. V.Catania
Lecture 4: Instruction Set Architecture Prof. V. Catania - - PDF document
Prof. V.Catania Lecture 4: Instruction Set Architecture Prof. V. Catania Calcolatori elettron. II 2003 Towards Evaluation of ISA and Organization Prof. V.Catania software instruction set hardware Evolution of Instruction Sets Prof.
FIGURE 2.2 Possible combinations of memory operands and total operands per typical ALU instruction with examples of machines. VAX (also has 2-operand formats) 3 3 VAX (also has 3-operand formats) 2 2 Intel 80x86, Motorola 68000 2 1 SPARC, MIPS, Precision Architecture, PowerPC, ALPHA 3 Examples Maximum number of
Number of memory adresses
Large variation in instruction size, expecially for 3-operand instructions. Also, large variation in work per instruction. Memory accesses create memory bottleneck Most compact. Doesn’t waste registers for temporaries. Memory- memory (3,3) Operands are not equivalent since a source operand in a binary operation is
a memory address in each instruction may restrict the number of registers. Clocks per instruction varies by operand location. Data can be accessed without loading
to encode and yeld good density. Register. memory (1,2) Higher instruction count that architecture with memory referencesin instructions. Some instructions are short and bit encoding may be wasteful. Simple, fixed-length instruction encoding. Simple code-generation model. Instructions take similar numbers of clocks to execute Register- register (0,3) Disadvantages Advantages Type
Addressing mode Example instruction Meaning When used Register Add R4,R3 Regs[R4]←Regs[R4]+ Regs[R3] When a value is in a register. Immediate Add R4,#3 Regs[R4]←Regs[R4]+3 For constants. Displacement Add R4,100(R1) Regs[R4]←Regs[R4]+ Mem[100+Regs[R1]] Accessing local variables. Register deferred
Add R4,(R1) Regs[R4]←Regs[R4]+ Mem[Regs[R1]] Accessing using a pointer or a computed address. Indexed Add R3,(R1 + R2) Regs[R3]←Regs[R3]+ Mem[Regs[R1]+Regs[R2]] Sometimes useful in array addressing: R1 = base of array; R2 = index amount. Direct or absolute Add R1,(1001) Regs[R1]←Regs[R1]+ Mem[1001] Sometimes useful for accessing static data; address constant may need to be large. Memory indirect
deferred Add R1,@(R3) Regs[R1]←Regs[R1]+ Mem[Mem[Regs[R3]]] If R3 is the address of a pointer p, then mode yields *p. Autoincrement Add R1,(R2)+ Regs[R1]←Regs[R1]+ Mem[Regs[R2]] Regs[R2]←Regs[R2]+d Useful for stepping through ar- rays within a loop. R2 points to start of array; each reference in- crements R2 by size of an ele- ment, d. Auto- decrement Add R1,Ð(R2) Regs[R2]←Regs[R2]Ðd Regs[R1]←Regs[R1]+ Mem[Regs[R2]] Same use as autoincrement. Autodecrement/increment can also act as push/ pop to imple- ment a stack. Scaled Add R1,100(R2)[R3] Regs[R1]← Regs[R1]+ Mem[100+Regs[R2]+Regs [R3]*d] Used to index arrays. May be applied to any indexed address- ing mode in some machines.
FIGURE 2.5 Selection of addressing modes with examples, meaning, and usage.
0% 10% 20% 30% 40% 50% 60% 24% 11% 39% 32% 40% 3% 43% 17% 55% 0% 6% 16% Scaled Register deferred Immediate Displacement TeX spice gcc TeX spice gcc TeX spice gcc TeX spice gcc 1% 6% Memory indirect TeX spice gcc 1% Frequency of the addressing mode
FIGURE 2.6 Summary of use of memory addressing modes (including immediates), using 3 programs of SPEC89 on a VAX
FIGURE 2.7 Displacement values are widely distributed.
0% 5% 10% 15% 20% 25% 30% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Floating-point average Integer average Percentage of displacement Value
0% 50% 100% 78% 58% 35% 77% 87% Compares ALU operations All instructions 10% 10% Loads 45% Percentage of operations that use immediates Integer average Floating-point average
FIGURE 2.8 We see that for integer ALU operations about one-half to three-quarters of the operations have an immediate operand, while for integer compares 75% to 85% of the occurrences use an immediate operand.
FIGURE 2.9 The distribution of immediate values is shown. Machine used: VAX.
0% 10% 20% 30% 40% 50% 60% 32 28 24 20 16 12 8 4 Number of bits needed for an immediate value gcc TeX spice
Operator type Examples Arithmetic and logical Integer arithmetic and logical operations: add, and, subtract, or Data transfer Loads-stores (move instructions on machines with memory addressing) Control Branch, jump, procedure call and return, traps System Operating system call, virtual memory management instructions Floating point Floating-point operations: add, multiply Decimal Decimal add, decimal multiply, decimal-to-character conversions String String move, string compare, string search Graphics Pixel operations, compression/decompression operations FIGURE 2.10 Categories of instruction operators and examples of each.
Rank 80x86 instruction Integer average (% total executed) 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96%
FIGURE 2.11 The top 10 instructions for the 80x86.
0% 50% 100% 4% 87% 81% 6% 11% 13% Call/return Jump Conditional branch Frequency of branch classes Integer average Floating-point average
FIGURE 2.12 Breakdown of control flow instructions into three classes: calls or returns, jumps, and conditional branches.
FIGURE 2.13 Branch distances in terms of number of instructions between the target and the branch instruction.
0% 5% 10% 15% 20% 25% 30% 35% 40% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Bits of branch displacement Floating-point average Integer average
0% 50% 100% 23% 37% 86% 7% 40% 7% Less than/ greater than or equal Greater than/ less than or equal Equal/ not equal Frequency of comparison types in branches Integer average Floating-point average
FIGURE 2.15 Frequency of different types of compares in conditional branches.
0% 40% 80% 20% 60% 0% 19% 7% 31% 74% Word Half word Byte 0% 0% Double word 69% Frequency of reference by size Integer average Floating-point average
FIGURE 2.16 Distribution of data accesses by size for the benchmark programs.
Operation &
Address specifier 1 Address field 1 Address field 1 Operation Address field 2 Address field 3 Address specifier Operation Address field Address specifier 1 Operation Address specifier 2 Address field Address specifier Operation Address field 1 Address field 2 Address specifier n Address field n (a) Variable (e.g., VAX) (b) Fixed (e.g., DLX, MIPS, Power PC, Precision Architecture, SPARC) (c) Hybrid (e.g., IBM 360/70, Intel 80x86)
FIGURE 2.17 Three basic variations in instruction encoding.
Optimization name Explanation Percentage of the total num
High-level At or near the source level; machine
pendent Procedure integration Replace procedure call by procedure body N.M. Local Within straight-line code Common subexpression elimination Replace two instances of the same computa tion by single copy 18% Constant propagation Replace all instances of a variable that is assigned a constant with the constant 22% Stack height reduction Rearrange expression tree to minimize re- sources needed for expression evaluation N.M. Global Across a branch Global common subexpression elimination Same as local, but this version crosses branches 13% Copy propagation Replace all instances of a variable A that has been assigned X(i.e., A = X) with X 11% Code motion Remove code from a loop that computes same value each iteration of the loop 16% Induction variable elimina tion Simplify/eliminate array-addressing calcula tions within loops 2% Machine-dependent Depends on machine knowledge Strength reduction Many examples, such as replace multiply by a con stant with adds and shifts N.M. Pipeline scheduling Reorder instructions to improve pipeline per formance N.M. Branch offset optimization Choose the shortest branch displacement that reaches target N.M. FIGURE 2.19 Major types of optimizations and examples in each class.
0% 20% 40% 60% 80% 100% li level 0 li level 1 li level 2 li level 3 hydrol 0 hydrol 1 yhdrol 2 hydrol 3 100% 89% 75% 73% 100% 36% 26% 26% Program and compiler
FLOPs Loads-stores Integer ALU Branches/calls
Percentof unoptimized instructions executed
FIGURE 2.20 Change in instruction count for the programs hydro2d and li from the SPEC92 as compiler optimization levels vary.
I - type instruction Rs1 (5 bits) Rd (5 bits) Immediate ( 16 bits) Encodes: Loads and stores of bytes, words, half words All immediates (rd ← rs1 op immediate) Conditional branch instructions (rs1 is register, rd unused) Jump register, jump and linkregister (rd = 0, rs = destination, immediate = 0) R - type instruction Rs1 (5 bits) Rs2 (5 bits) Register register ALU operations: r d ← rs1 func rs2 Function encodes the datapath operation: Add, Sub , . . . Read/write special registers and moves Func (11 bits) Opcode (6 bits) J - type instruction Offset added toPC (26 bits) Jumpand jump and link Trap and return from exception Opcode (6 bits) Opcode (6 bits) Rd (5 bits) 0 ……………5 6……10 11……1516………………………………..31 0 ……………5 6……10 11……15 16…20 21…………………..31 0 ……………5 6……………………………………………..…………..31
FIGURE 2.30 Ratio of MIPS M2000 to VAX 8700 in instructions executed and performance in clock cycles using SPEC89 programs.
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 l i e q n t
t e s p r e s s
u c t
c a t v f p p p p n a s a 7 m a t r i x s p i c e
Performance ratio Instructions executed ratio CPI ratio SPEC 89 benchmarks MIPS/VAX