[PDF] - Processor Organization and Performance Chapter 6 S. Dandamudi PDF Document

SLIDE 1

1

Processor Organization and Performance

Chapter 6

S. Dandamudi

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 2

Outline

Introduction
Number of addresses

∗ 3-address machines ∗ 2-address machines ∗ 1-address machines ∗ 0-address machines ∗ Load/store architecture

Flow control

∗ Branching ∗ Procedure calls ∗ Delayed versions ∗ Parameter passing

Instruction set design issues

∗ Operand types ∗ Addressing modes ∗ Instruction types ∗ Instruction formats

Microprogrammed control

∗ Implementation issues

Performance

∗ Performance metrics ∗ Execution time calculation ∗ Means of performance ∗ The SPEC benchmarks

SLIDE 2

2

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 3

Introduction

We discuss three processor-related issues

» Instruction set design issues – Number of addresses – Addressing modes – Instruction types – Instruction formats » Microprogrammed control – Hardware implementation – Software implementation » Performance issues – Performance metrics – Standards

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 4

Number of Addresses

Four categories

∗ 3-address machines

» 2 for the source operands and one for the result

∗ 2-address machines

» One address doubles as source and result

∗ 1-address machine

» Accumulator machines » Accumulator is used for one source and result

∗ 0-address machines

» Stack machines » Operands are taken from the stack » Result goes onto the stack

SLIDE 3

3

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 5

Number of Addresses (cont’d)

Three-address machines

∗ Two for the source operands, one for the result ∗ RISC processors use three addresses ∗ Sample instructions add dest,src1,src2 ; M(dest)=[src1]+[src2] sub dest,src1,src2 ; M(dest)=[src1]-[src2] mult dest,src1,src2 ; M(dest)=[src1]*[src2]

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 6

Number of Addresses (cont’d)

Example

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

mult T,C,D ;T = CD add T,T,B ;T = B+CD sub T,T,E ;T = B+CD-E add T,T,F ;T = B+CD-E+F add A,T,A ;A = B+C*D-E+F+A

SLIDE 4

4

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 7

Number of Addresses (cont’d)

Two-address machines

∗ One address doubles (for source operand & result) ∗ Last example makes a case for it

» Address T is used twice

∗ Sample instructions load dest,src ; M(dest)=[src] add dest,src ; M(dest)=[dest]+[src] sub dest,src ; M(dest)=[dest]-[src] mult dest,src ; M(dest)=[dest]*[src]

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 8

Number of Addresses (cont’d)

Example

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load T,C ;T = C mult T,D ;T = CD add T,B ;T = B+CD sub T,E ;T = B+CD-E add T,F ;T = B+CD-E+F add A,T ;A = B+C*D-E+F+A

SLIDE 5

5

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 9

Number of Addresses (cont’d)

One-address machines

∗ Uses special set of registers called accumulators

» Specify one source operand & receive the result

∗ Called accumulator machines ∗ Sample instructions load addr ; accum = [addr] store addr ; M[addr] = accum add addr ; accum = accum + [addr] sub addr ; accum = accum - [addr] mult addr ; accum = accum * [addr]

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 10

Number of Addresses (cont’d)

Example

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load C ;load C into accum mult D ;accum = CD add B ;accum = CD+B sub E ;accum = B+CD-E add F ;accum = B+CD-E+F add A ;accum = B+C*D-E+F+A store A ;store accum contents in A

SLIDE 6

6

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 11

Number of Addresses (cont’d)

Zero-address machines

∗ Stack supplies operands and receives the result

» Special instructions to load and store use an address

∗ Called stack machines (Ex: HP3000, Burroughs B5500) ∗ Sample instructions push addr ; push([addr]) pop addr ; pop([addr]) add ; push(pop + pop) sub ; push(pop - pop) mult ; push(pop * pop)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 12

Number of Addresses (cont’d)

Example

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

push E sub push C push F push D add Mult push A push B add add pop A

SLIDE 7

7

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 13

Number of Addresses (cont’d)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 14

Load/Store Architecture

Instructions expect operands in internal processor registers

∗ Special LOAD and STORE instructions move data between registers and memory ∗ RISC and vector processors use this architecture ∗ Reduces instruction length

SLIDE 8

8

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 15

Load/Store Architecture (cont’d)

Sample instructions

load Rd,addr ;Rd = [addr] store addr,Rs ;(addr) = Rs add Rd,Rs1,Rs2 ;Rd = Rs1 + Rs2 sub Rd,Rs1,Rs2 ;Rd = Rs1 - Rs2 mult Rd,Rs1,Rs2 ;Rd = Rs1 * Rs2

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 16

Number of Addresses (cont’d)

Example

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load R1,B mult R2,R2,R3 load R2,C add R2,R2,R1 load R3,D sub R2,R2,R4 load R4,E add R2,R2,R5 load R5,F add R2,R2,R6 load R6,A store A,R2

SLIDE 9

9

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 17

Flow of Control

Default is sequential flow
Several instructions alter this default execution

∗ Branches

» Unconditional » Conditional » Delayed branches

∗ Procedure calls

» Delayed procedure calls

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 18

Flow of Control (cont’d)

Branches

∗ Unconditional

» Absolute address » PC-relative – Target address is specified relative to PC contents

∗ Example: MIPS

» Absolute address

j target

» PC-relative

b target

SLIDE 10

10

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 19

Flow of Control (cont’d)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 20

Flow of Control (cont’d)

Branches

∗ Conditional

» Jump is taken only if the condition is met

∗ Two types

» Set-Then-Jump – Condition testing is separated from branching – Condition code registers are used to convey the condition test result » Example: Pentium code

cmp AX,BX je target

SLIDE 11

11

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 21

Flow of Control (cont’d)

» Test-and-Jump – Single instruction performs condition testing and branching » Example: MIPS instruction

beq Rsrc1,Rsrc2,target

Jumps to target if Rsrc1 = Rsrc2

Delayed branching

∗ Control is transferred after executing the instruction that follows the branch instruction

» This instruction slot is called delay slot

∗ Improves efficiency

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 22

Flow of Control (cont’d)

Procedure calls

∗ Requires two pieces of information to return

» End of procedure – Pentium uses ret instruction – MIPS uses jr instruction » Return address – In a (special) register MIPS allows any general-purpose register – On the stack Pentium

SLIDE 12

12

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 23

Flow of Control (cont’d)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 24

Flow of Control (cont’d)

Delay slot

SLIDE 13

13

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 25

Parameter Passing

Two basic techniques

∗ Register-based

» Internal registers are used – Faster – Limit the number of parameters

∗ Stack-based

» Stack is used – More general

Recent processors use

∗ Register window mechanism

» Examples: SPARC and Itanium (discussed in later chapters)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 26

Operand Types

Instructions support basic data types

∗ Characters ∗ Integers ∗ Floating-point

Instruction overload

∗ Same instruction for different data types ∗ Example: Pentium

mov AL,address ;loads an 8-bit value mov AX,address ;loads a 16-bit value mov EAX,address ;loads a 32-bit value

SLIDE 14

14

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 27

Operand Types

Separate instructions

∗ Instructions specify the operand size ∗ Example: MIPS

lb Rdest,address ;loads a byte lh Rdest,address ;loads a halfword ;(16 bits) lw Rdest,address ;loads a word ;(32 bits) ld Rdest,address ;loads a doubleword ;(64 bits)

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 28

Addressing Modes

Refers to how the operands are specified

∗ Operands can be in three places

» Registers – Register addressing mode » Part of instruction – Constant – Immediate addressing mode – All processors support these two addressing modes » Memory – Difference between RISC and CISC – CISC supports a large variety of addressing modes – RISC follows load/store architecture

SLIDE 15

15

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 29

Addressing Modes (cont’d)

∗ Most RISC processors support two memory addressing modes – address = Register + constant – address = Register + Register ∗ CISC processors like Pentium support a variety of addressing modes

» Motivation: To efficiently support high-level language data structures – Example: Accessing a 2-D array

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 30

Instruction Types

Several types of instructions

∗ Data movement

» Pentium: mov dest,src » Some do not provide direct data movement instructions » Indirect data movement

add Rdest,Rsrc,0 ;Rdest = Rsrc+0 ∗ Arithmetic and Logical » Arithmetic

– Integer and floating-point, signed and unsigned – add, subtract, multiply, divide

» Logical – and, or, not, xor

SLIDE 16

16

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 31

Instruction Types (cont’d)

Condition code bits

∗ S: Sign bit (0 = +, 1= −) ∗ Z: Zero bit (0 = nonzero, 1 = zero) ∗ O: Overflow bit (0 = no overflow, 1 = overflow) ∗ C: Carry bit (0 = no carry, 1 = carry)

Example: Pentium

cmp count,25 je target

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 32

Instruction Types (cont’d)

∗ Flow control and I/O instructions

» Branch » Procedure call » Interrupts

∗ I/O instructions

» Memory-mapped I/O – Most processors support memory-mapped I/O – No separate instructions for I/O » Isolated I/O – Pentium supports isolated I/O – Separate I/O instructions in AX,io_port ;read from an I/O port

ut io_port,AX ;write to an I/O port

SLIDE 17

17

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 33

Instruction Formats

Two types

∗ Fixed-length

» Used by RISC processors » 32-bit RISC processors use 32-bits wide instructions – Examples: SPARC, MIPS, PowerPC » 64-bit Itanium uses 41-bit wide instructions

∗ Variable-length

» Used by CISC processors » Memory operands need more bits to specify

Opcode

∗ Major and exact operation

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 34

Instruction Formats (cont’d)

SLIDE 18

18

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 35

Microprogrammed Control

Introduction in Chapter 1
1-bus datapath

∗ Assume all entities are 32-bit wide ∗ PC register

» Program counter

∗ IR register

» Holds the instruction to be executed

∗ MAR register

» Address of the operand to be stored in memory

∗ MDR register

» Holds the operand for memory operations

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 36

Microprogrammed Control (cont’d)

1-bus datapath

SLIDE 19

19

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 37

Microprogrammed Control (cont’d)

ALU circuit details

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 38

Microprogrammed Control (cont’d)

Has 32 32-bit general-purpose registers

∗ Interface only with the A-bus ∗ Each register has two control signals

» Gxin and Gxout

Control signals used by the other registers

∗ PC register:

» PCin, PCout, and PCbout

∗ IR register:

» IRout and IRbin

∗ MAR register:

» MARin, MARout, and MARbout

∗ MDR register:

» MDRin, MDRout, MDRbin and MDRbout

SLIDE 20

20

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 39

Microprogrammed Control (cont’d)

Memory interface implementation details

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 40

Microprogrammed Control (cont’d)

add %G9,%G5,%G7 Implemented as

» Transfer G5 contents to A register – Assert G5out and Ain » Place G7 contents on the A bus – Assert G7out » Instruct ALU to perform addition – Appropriate ALU function control signals » Latch the result in the C register – Assert Cin » Transfer contents of the C register to G9 – Assert Cout and G9in

SLIDE 21

21

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 41

Microprogrammed Control (cont’d)

Example instruction groups

∗ Load/store

» Moves data between registers and memory

∗ Register

» Arithmetic and logic instructions

∗ Branch

» Jump direct/indirect

∗ Call

» Procedures invocation mechanisms

∗ More…

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 42

Microprogrammed Control (cont’d)

High-level FSM for instruction execution

SLIDE 22

22

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 43

Microprogrammed Control (cont’d)

Implementation

∗ Hardware

» Typical approach in RISC processors

∗ Software

» Typical approach in CISC processors

Hardware implementation

∗ PLA based implementation shown

» Three control signals – Opcode via the IR register – Status and condition codes – Counter to keep track of the steps in instruction execution

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 44

Microprogrammed Control (cont’d)

Controller implementation

SLIDE 23

23

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 45

Microprogrammed Control (cont’d)

Software implementation

∗ Typically used in CISC

» Hardware implementation is complex and expensive

Example

add %G9,%G5,%G7

∗ Three steps

S1 G5out: Ain; S2 G7out: ALU=add: Cin; S3 Cout: G9in: end;

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 46

Microprogrammed Control (cont’d)

Uses a microprogram to generate the control

signals

∗ Encode the signals of each step as a codeword

» Called microinstruction

∗ A instruction is expressed by a sequence of codewords

» Called microroutine

Microprogram essentially implements the FSM

discussed before

A simple microprogram structure is on the next

slide

SLIDE 24

24

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 47

Microprogrammed Control (cont’d)

Simple microcode

rganization

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 48

Microprogrammed Control (cont’d)

A simple microcontroller can execute a

microprogram to generate the control signals

∗ Control store

» Stores microprogram

∗ Uses µ µ µ µPC

» Similar to PC

∗ Address generator

» Generates appropriate address depending on the – Opcode, and – Condition code inputs

SLIDE 25

25

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 49

Microprogrammed Control (cont’d)

Microcontroller

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 50

Microprogrammed Control (cont’d)

Problems with previous

design:

∗ Makes microprograms long by replicating the common parts of microcode

Efficient way:

∗ Keep only one copy of common code ∗ Use branching to jump to the appropriate microroutine

SLIDE 26

26

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 51

Microprogrammed Control (cont’d)

Microinstruction format

∗ Two basic ways

» Horizontal organization » Vertical organization

∗ Horizontal organization

– One bit for each signal – Very flexible – Long microinstructions – Example: 1-bus datapath Needs 90 bits for each microinstruction

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 52

Microprogrammed Control (cont’d)

Horizontal microinstruction format

SLIDE 27

27

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 53

Microprogrammed Control (cont’d)

Vertical organization

∗ Encodes to reduce microinstruction length

» Reduced flexibility

∗ Example:

» Horizontal organization – 64 control signals for the 32 general purpose registers » Vertical organization – 5 bits to identify the register and 1 for in/out

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 54

Microprogrammed Control (cont’d)

General register control circuit

SLIDE 28

28

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 55

Microprogrammed Control (cont’d)

Microcontroller for vertical microcode

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 56

Microprogrammed Control (cont’d)

Adding more buses reduces time needed to

execute instructions

∗ No need to multiplex the bus

Example

add %G9,%G5,%G7 ∗ Needed three steps in 1-bus datapath ∗ Need only two steps with a 2-bus dtatpath S1 G5out: Ain; S2 G7out: ALU=add: G9in;

SLIDE 29

29

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 57

Microprogrammed Control (cont’d)

2-bus datapath

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 58

Performance

Two popular metrics

∗ Response time

» User- oriented

∗ Throughput

» System-oriented

Performance of components

∗ Processors, networks, disks,… ∗ Some simple metrics

» MIPS – Simple instruction execution rate » MFLOPS

SLIDE 30

30

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 59

Performance (cont’d)

Calculating execution time

∗ Three factors

» Instruction count (IC) – CISC processors have simple to complex instructions » Clocks per instruction (CPI) – RISC vs. CISC differences » Clock period (T)

∗ Execution time = IC * CPI * T ∗ This is not response time

» Not considering queuing delays

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 60

Performance (cont’d)

Means of performance

∗ Arithmetic mean

» Equal weight

∗ Weighted arithmetic mean

» Different weights can be assigned

∗ Geometric mean

» Geometric mean of a1, a2, …, an is

(a1 * a2 * … * an)1/n

∗ Weighted geometric mean

a1w2 * a2w2 * … * anwn

SLIDE 31

31

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 61

Performance (cont’d)

Resp. time on machine

Normalized values REF A B Ratio A B Ratio Program 1 10 11 12 1.1 1.2 Program 2 40 49.5 60 1.24 1.5

Arith. mean

30.25 36 1.19 1.17 1.35 1.16

Geo. mean

23.33 26.83 1.15 1.167 1.342 1.15

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 62

Performance (cont’d)

Resp. time on machine

A B Program 1 20 200 Program 2 50 5

Arith. mean

35 102.5

Geo. mean

31.62 31.62

Problem with arithmetic mean

SLIDE 32

32

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 63

Performance (cont’d)

SPEC Benchmarks

∗ SPEC CPU2000

» Measures performance of processors, memory, and compiler » Consists of 26 applications – Spans four languages C, C++, FORTRAN 77, and FORTRAN 90 » Consists of – Integer CINT2000 – Floating-point CFP2000

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 64

Performance (cont’d)

100 200 300 400 500 600 700 600 800 1000 1200 1400 1600 1800 2000 Clock rate (MHz) SPECint2000

PIII P4

SLIDE 33

33

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 65

Performance (cont’d)

100 200 300 400 500 600 700 600 800 1000 1200 1400 1600 1800 2000 Clock rate (MHz)

SPECfp2000

P4 PIII

2003

To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.

 S. Dandamudi Chapter 6: Page 66

Performance (cont’d)

SPEC Benchmarks

1

Processor Organization and Performance

Chapter 6

Outline

∗ 3-address machines ∗ 2-address machines ∗ 1-address machines ∗ 0-address machines ∗ Load/store architecture

∗ Branching ∗ Procedure calls ∗ Delayed versions ∗ Parameter passing

∗ Operand types ∗ Addressing modes ∗ Instruction types ∗ Instruction formats

∗ Implementation issues

∗ Performance metrics ∗ Execution time calculation ∗ Means of performance ∗ The SPEC benchmarks

2

Introduction

» Instruction set design issues – Number of addresses – Addressing modes – Instruction types – Instruction formats » Microprogrammed control – Hardware implementation – Software implementation » Performance issues – Performance metrics – Standards

Number of Addresses

∗ 3-address machines

» 2 for the source operands and one for the result

∗ 2-address machines

» One address doubles as source and result

∗ 1-address machine

» Accumulator machines » Accumulator is used for one source and result

∗ 0-address machines

» Stack machines » Operands are taken from the stack » Result goes onto the stack

3

Number of Addresses (cont’d)

∗ Two for the source operands, one for the result ∗ RISC processors use three addresses ∗ Sample instructions add dest,src1,src2 ; M(dest)=[src1]+[src2] sub dest,src1,src2 ; M(dest)=[src1]-[src2] mult dest,src1,src2 ; M(dest)=[src1]*[src2]

Number of Addresses (cont’d)

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

mult T,C,D ;T = C*D add T,T,B ;T = B+C*D sub T,T,E ;T = B+C*D-E add T,T,F ;T = B+C*D-E+F add A,T,A ;A = B+C*D-E+F+A

4

Number of Addresses (cont’d)

∗ One address doubles (for source operand & result) ∗ Last example makes a case for it

» Address T is used twice

∗ Sample instructions load dest,src ; M(dest)=[src] add dest,src ; M(dest)=[dest]+[src] sub dest,src ; M(dest)=[dest]-[src] mult dest,src ; M(dest)=[dest]*[src]

Number of Addresses (cont’d)

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load T,C ;T = C mult T,D ;T = C*D add T,B ;T = B+C*D sub T,E ;T = B+C*D-E add T,F ;T = B+C*D-E+F add A,T ;A = B+C*D-E+F+A

5

Number of Addresses (cont’d)

∗ Uses special set of registers called accumulators

» Specify one source operand & receive the result

∗ Called accumulator machines ∗ Sample instructions load addr ; accum = [addr] store addr ; M[addr] = accum add addr ; accum = accum + [addr] sub addr ; accum = accum - [addr] mult addr ; accum = accum * [addr]

Number of Addresses (cont’d)

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load C ;load C into accum mult D ;accum = C*D add B ;accum = C*D+B sub E ;accum = B+C*D-E add F ;accum = B+C*D-E+F add A ;accum = B+C*D-E+F+A store A ;store accum contents in A

6

Number of Addresses (cont’d)

∗ Stack supplies operands and receives the result

» Special instructions to load and store use an address

∗ Called stack machines (Ex: HP3000, Burroughs B5500) ∗ Sample instructions push addr ; push([addr]) pop addr ; pop([addr]) add ; push(pop + pop) sub ; push(pop - pop) mult ; push(pop * pop)

Number of Addresses (cont’d)

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

push E sub push C push F push D add Mult push A push B add add pop A

7

Number of Addresses (cont’d)

Load/Store Architecture

∗ Special LOAD and STORE instructions move data between registers and memory ∗ RISC and vector processors use this architecture ∗ Reduces instruction length

8

Load/Store Architecture (cont’d)

load Rd,addr ;Rd = [addr] store addr,Rs ;(addr) = Rs add Rd,Rs1,Rs2 ;Rd = Rs1 + Rs2 sub Rd,Rs1,Rs2 ;Rd = Rs1 - Rs2 mult Rd,Rs1,Rs2 ;Rd = Rs1 * Rs2

Number of Addresses (cont’d)

∗ C statement

A = B + C * D – E + F + A

∗ Equivalent code:

load R1,B mult R2,R2,R3 load R2,C add R2,R2,R1 load R3,D sub R2,R2,R4 load R4,E add R2,R2,R5 load R5,F add R2,R2,R6 load R6,A store A,R2

9

Flow of Control

∗ Branches

» Unconditional » Conditional » Delayed branches

∗ Procedure calls

» Delayed procedure calls

Flow of Control (cont’d)

∗ Unconditional

» Absolute address » PC-relative – Target address is specified relative to PC contents

mult T,C,D ;T = CD add T,T,B ;T = B+CD sub T,T,E ;T = B+CD-E add T,T,F ;T = B+CD-E+F add A,T,A ;A = B+C*D-E+F+A

load T,C ;T = C mult T,D ;T = CD add T,B ;T = B+CD sub T,E ;T = B+CD-E add T,F ;T = B+CD-E+F add A,T ;A = B+C*D-E+F+A

load C ;load C into accum mult D ;accum = CD add B ;accum = CD+B sub E ;accum = B+CD-E add F ;accum = B+CD-E+F add A ;accum = B+C*D-E+F+A store A ;store accum contents in A