Chapter 12 CPU Structure and Function Contents Processor - - PowerPoint PPT Presentation

chapter 12 cpu structure and function contents processor
SMART_READER_LITE
LIVE PREVIEW

Chapter 12 CPU Structure and Function Contents Processor - - PowerPoint PPT Presentation

Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining Pentium processor PowerPC processor 12.1 Processor Organization Requirements


slide-1
SLIDE 1

Chapter 12 CPU Structure and Function

slide-2
SLIDE 2

Contents

  • Processor organization
  • Register organization
  • Instruction cycle
  • Instruction pipelining
  • Pentium processor
  • PowerPC processor
slide-3
SLIDE 3

12.1 Processor Organization

  • Requirements on CPU

—Fetch instructions —Interpret instructions —Fetch data —Process data —Write data

  • CPU consists of

—ALU —Control unit —Registers —Internal bus

slide-4
SLIDE 4

CPU With Systems Bus

slide-5
SLIDE 5

CPU Internal Structure

slide-6
SLIDE 6

12.2 Register Organization

  • Design issues

— Completely GPRs vs specialized registers

– Specialized registers for particular operands

+ only BX, SI, and DI used for storing offset address in 80x86 + saving bits to represent them

– Specialization limits programmer’s flexibility

— Number of registers

– For CISC, between 8 and 32 regarded as optimum

+ Fewer registers result in more memory references + More registers do not noticeably reduce memory references

– RISC uses hundreds of registers

— Register length

– Address register must be long enough to hold the target address – Data register must be long enough to hold values of most data types

+ Some machine allow consecutive registers to hold double-length values

slide-7
SLIDE 7

User Visible Registers

  • GPR
  • Data register
  • Address register

—Segment pointers —Index registers —Stack pointer

  • Condition codes(flags)

—Set according to the result of operations —Used for checking certain condition —Can be read (implicitly) by programs

– e.g. Jump if zero

—Can not (usually) be set by programs

slide-8
SLIDE 8

Control & Status Registers

  • Program Counter

— Updated after each instruction fetch — Updated when branch instruction is met

  • Instruction Register
  • Memory Address Register

— Connected directly to address bus

  • Memory Buffer Register

— Connected directly to data bus

  • Program Status Word

— Sign, zero, carry, equal, overflow, interrupt enable/disable, supervisor mode

  • Others

— Pointer to PCB (Process Control Block), Interrupt vector register — Stack-related registers, Page table pointer

slide-9
SLIDE 9

Supervisor Mode

  • Intel microprocessor has 4 modes

—Ring zero

– Kernel functions

—Ring one

– Operating system functions

—Ring three

– User programs

—Ring two

– May be used for DBMS

slide-10
SLIDE 10

Example Register Organizations

  • Motorola MC68000 (Not including purly internal regs)

—8 data registers

– Used primarily for data manipulation

+ 8-, 16-, and 32-bit operations are possible

– Also used as index registers

—9 address registers

– 32-bit wide – Includes two stack pointers

+ One for user and one for system

—PC and status register There are no special perpose registers in this CPU.

slide-11
SLIDE 11

Example Register Organizations

  • Intel 8086 (Every register is special purpose)

—4 16-bit data registers (can be used as general in some instructions)

– AX, BX, CX, DX

—4 pointer and index registers

– SP, BP, SI, DI

—4 segment registers

– CS, DS, SS, ES

—Instruction Pointer and flags Registers have general as well as special purposes.

  • There is no universally accepted philosophy

concerning the best way to organize CPU registers.

slide-12
SLIDE 12

Example Register Organizations

slide-13
SLIDE 13

12.3 Instruction Cycle

  • Subcycles of instruction cycle

—Fetch —Execute —Interrupt —Indirect(Newly added)

  • Indirect cycle

—Indirect addressing requires additional memory access —Can be thought of as additional instruction subcycle

slide-14
SLIDE 14

Instruction Cycle State Diagram

slide-15
SLIDE 15

Data Flow

  • Fetch cycle

—PC contains address of next instruction —Address moved to MAR —Address placed on address bus —Control unit requests memory read —Result placed on data bus, copied to MBR, then to IR —Meanwhile PC incremented by 1

slide-16
SLIDE 16

Data Flow, Fetch Cycle

slide-17
SLIDE 17

Data Flow

  • Indirect cycle

—IR is examined —If indirect addressing, indirect cycle is performed

– Right most N bits of MBR transferred to MAR – Control unit requests memory read – Result (address of operand) moved to MBR

slide-18
SLIDE 18

Data Flow, Indirect Cycle

slide-19
SLIDE 19

Data Flow

  • Execute cycle

—May take many forms depending on instructions —May include

– Memory read/write – Input/Output – Register transfers – ALU operations

slide-20
SLIDE 20

Data Flow

  • Interrupt cycle

—Current PC saved to allow resumption after interrupt

– Contents of PC copied to MBR – Special memory location (e.g. stack pointer) loaded to MAR – MBR written to memory

—PC loaded with address of interrupt handling routine —Next instruction (first of interrupt handler) can be fetched

slide-21
SLIDE 21

Data Flow, Interrupt Cycle

slide-22
SLIDE 22

12.4 Instruction Pipelining

  • Pipelining strategy

—Similar to an assembly line in automobile factory

– Instruction has a number of stages – Stages can be executed simultaneously

  • Simple two-stage pipelining

—Fetch and execute stages —If two stages were of equal duration, instruction cycle time would be halved —But things are not that easy

– Execution time is longer than fetch time

+ Fetch stage may have to wait

– Conditional branch makes the next instruction unknown

+ Fetch stage wait or guess the branch

slide-23
SLIDE 23

Two Stage Instruction Pipeline

slide-24
SLIDE 24

Instruction Pipelining

  • More stages mean further speedup

—Fetch Instruction(FI) —Decode Instruction(DI) —Calculate Operands (CO) —Fetch Operands(FO) —Execute Instructions(EI) —Write Operand(WO)

  • Characteristics(Equal duration assumed)

—Reduced execution time for 9 inst. from 54 to 14 —Some instructions may not go through all 6 stages

– LOAD does not need WO stage

—Some stages may not be performed in parallel

– FI, FO, and WO stages involve a memory access

slide-25
SLIDE 25

Timing Diagram for Instruction Pipeline Operation

slide-26
SLIDE 26

Instruction Pipelining

  • Factors that limit performance enhancement

—Stages may not be of equal duration —Conditional branch instruction

– Invalidate several instruction fetches

—Interrupt —Data dependency

– CO stage may depend on the contents of a register that could be altered by a previous instruction that is still in pipeline – System need to contain logic to solve this conflict

slide-27
SLIDE 27

Effect of a Conditional Branch

slide-28
SLIDE 28

No Yes Yes No

FI DI CO FO EI WO

Calculate Operands Fetch Instruction Decode Instruction Uncon- ditional Branch? Branch

  • r

Inter

  • rupt?

Figure 12.12 Six-Stage Instruction Pipeline

Write Operands Fetch Operands Execute Instruction Update PC Empty Pipe

slide-29
SLIDE 29

I16 I16 I16 I16 I16 I16 FI DI CO FO EI WO I1 1 I2 I1 2 I3 I2 I1 3 I4 I3 I2 I1 4 I5 I4 I3 I2 I1 I6 I5 I4 I3 I2 I1 I7 I6 I5 I4 I3 I2 I8 I7 I6 I5 I4 I3 I9 I8 I7 I6 I5 I4 I9 I8 I7 I6 I5 I9 I8 I7 I6 I9 I8 I7 I9 I8 I9 5 6 7 8 9 10 11 12 13 14

Figure 12.13 An Alternative Pipeline Depiction

(a) No branches FI DI CO FO EI WO I1 1 I2 I1 2 I3 I2 I1 3 I4 I3 I2 I1 4 I5 I4 I3 I2 I1 I6 I5 I4 I3 I2 I1 I7 I6 I5 I4 I3 I2 I15 I15 I15 I15 I15 I15 I3 5 6 7 8 9 10 11 12 13 14 (b) With conditional branch Time

slide-30
SLIDE 30

Pipeline Performance

  • Measures of performance

—Cycle time can be determined as τ = max[τi ] + d = τm + d 1 <= i <= k

where

τm = maximum stage delay

k = number of stages in the instruction pipeline d = time delay of a latch, needed to advance signals and data from one stage to the next

—We can ignore d since τm >> d —Total time Tk to execute n instructions is

Tk = [k + (n - 1)]τ

—Thus speedup factor is defined as

Sk = T1/Tk = nkτ /[k +(n - 1)]τ = nk/ [k +(n - 1)]

slide-31
SLIDE 31

Speedup Factors with Pipelining

slide-32
SLIDE 32

Speedup Factors with Pipelining

slide-33
SLIDE 33

Dealing with Branches

  • Approaches for dealing with branches

—Multiple streams —Prefetch branch target —Loop buffer —Branch prediction —Delayed branch

  • Multiple streams

—Have two pipelines

– Prefetch each branch into a separate pipeline – Use appropriate pipeline

—Problems

– There may be contention delays for accessing data – Additional branch instruction needs an additional stream

slide-34
SLIDE 34

Dealing with Branches

  • Prefetch branch target

—Target of branch is prefetched in addition to the instruction following branch —Keep target until branch is executed —Used in IBM 360/91

  • Loop buffer

—Contains n most recently fetched instructions, in sequence —Whenever a branch is to be taken, buffer is checked —Well suited to dealing with loops

– If loop buffer is large enough to contain all the instructions in a loop, we need to fetch them only once

—Used in CDC and CRAY-1

slide-35
SLIDE 35

Loop Buffer Diagram

slide-36
SLIDE 36

Dealing with Branches

  • Branch prediction

—Predict never taken —Predict always taken —Predict by opcode —Taken/not taken switch —Branch history table

  • Predict never taken

—Assume that jump will not happen

– Always fetch next instruction

—Used in MC68020 & VAX 11/780 —VAX will not prefetch the instruction after branch if a page fault would result

slide-37
SLIDE 37

Dealing with Branches

  • Predict always taken

—Assume that jump will happen —Always fetch target instruction —Studies show that conditional branches are taken more than 50% —But prefetching the branch target is more likely to cause a page fault

  • Predict by opcode

—Some instructions are more likely to result in a jump than others

– JNZ of 80x86

—Success rates of >75% are reported

slide-38
SLIDE 38

Dealing with Branches

  • Taken/Not taken switch

—Based on previous history

– History bits are associated with each conditional branch instruction

—Single bit switch

– Record whether the last execution resulted in a branch or not – Not good for the nested loop case

—Two bit switch

– Two consecutive wrong predictions change the prediction

—Drawback

– If the decision is to take the branch, target instruction cannot be fetched until the target address is decoded

slide-39
SLIDE 39

Figure 12.16 Branch Prediction Flow Chart

Yes Yes Predict taken Read next conditional branch instr Branch taken? Predict taken Read next conditional branch instr Branch taken? No Yes Yes Predict not taken Read next conditional branch instr Branch taken? Predict not taken Read next conditional branch instr Branch taken? No No No

slide-40
SLIDE 40

Branch Prediction State Diagram

slide-41
SLIDE 41

Dealing with Branches

  • Branch history table

—Each entry consists of the address of a branch instruction, history bits, and information about the target instruction —Used in AMD29000 microprocessor

slide-42
SLIDE 42

Dealing with Branches

slide-43
SLIDE 43

Dealing with Branches

slide-44
SLIDE 44

Dealing with Branches

  • Delayed Branch

—Do not take jump until you have to —Rearrange instructions

– To improve pipeline performance, rearrange instructions so that branch instructions occur later than actually desired

slide-45
SLIDE 45

Intel 80486 Pipelining

  • Fetch

— From cache or external memory — Put in one of two 16-byte prefetch buffers — Fill buffer with new data as soon as old data consumed — Average 5 instructions fetched per 16-byte load — Independent of other stages to keep buffers full

  • Decode stage 1

— Opcode & address-mode info is decoded — Required information is in at most first 3 bytes of instruction — Can direct D2 stage to get rest of instruction

  • Decode stage 2

— Expand opcode into control signals — Computation of complex address modes

  • Execute

— ALU operations, cache access, register update

  • Write back

— Update registers & flags — Results sent to cache & bus interface write buffers

slide-46
SLIDE 46

Pentium Instruction Formats

slide-47
SLIDE 47

12.5 Pentium Processor

  • Register organization

—General : EAX ~ EDX, ESP, EBP, ESI, and EDI —Segment : CS, SS, DS, ES, FS, and GS —Flags(EFLAGS) : condition codes and mode bits —Instruction pointer —Registers for floating-point unit

– Numeric

+ Register for holding 80-bit floating-point number

– Control – Status

+ 16-bit register reflecting the state of floating-point unit

– Tag word

+ 2 bits associated with each numeric register + Represents valid, zero, special(NaN, infinity..), and empty

slide-48
SLIDE 48

Pentium 4 Registers

slide-49
SLIDE 49

Pentium 4 Registers

slide-50
SLIDE 50

EFLAGS Register

  • Condition codes and control bits

—Carry, parity, auxiliary, zero, sign, and overflow —Trap flag(TF)

– Causes an interrupt after each instruction execution

—Interrupt enable flag(IF) —Direction flag(DF) —I/O privilege flag(IOPL)

– Causes CPU to generate an exception on all accesses to I/O devices

—Resume flag(RF)

– Used for debugging

—Alignment check(AC) —Identification flag(ID)

– Provides information about vendor, family, and model

slide-51
SLIDE 51

EFLAGS Register

slide-52
SLIDE 52

Control Registers

  • 4 32-bit control registers(CR0 ~ CR4)

—CR0

– Protection enable(PE) – Monitor coprocessor(MP) – Emulation(EM)

+ Set when CPU does not have floating-point unit

– Task switched(TS) – Extension type(ET) – Numeric error(NE) – Write protect(WP) – Alignment mask(AM) – Not write through(NW)

+ Selects mode of operation of data cache

– Cache disable(CD) – Paging(PG)

slide-53
SLIDE 53

Control Registers

  • 4 32-bit control registers(CR0 ~ CR4)

—CR2

– Holds 32-bit linear address of the last page accessed before a page fault

—CR3

– Leftmost 20 bits for the 20 most significant bits of the base address of the page directory

—CR4

– Additional control bits

slide-54
SLIDE 54

Control Registers

slide-55
SLIDE 55

MMX Register Mapping

  • MMX uses 64 bit data types
  • Each instruction use 3 bit register address fields

—8 MMX registers

  • No MMX specific registers

—Aliasing to lower 64 bits of existing 8 floating point registers

slide-56
SLIDE 56

Mapping of MMX Registers to FP Registers

slide-57
SLIDE 57

Pentium Interrupt Processing

  • Interrupts

—Maskable : Received on INTR pin —Nonmaskable : Received on NMI pin

  • Exceptions

—Processor detected —Programmed : Instructions that generate exception

  • Interrupt vector table

—Each interrupt type assigned a number

– Index into the interrupt vector table

—256 * 32 bit interrupt vectors

slide-58
SLIDE 58

Pentium Interrupts and Exceptions

slide-59
SLIDE 59

Pentium Interrupt Processing

  • 5 priority classes

—Class 1 (1)

– Traps on the previous instruction

—Class 2 (2, 32-255)

– External interrupts

—Class 3 (3, 14)

– Faults from fetching next instruction

—Class 4 (6, 7)

– Faults from decoding next instruction

—Class 5 (0, 4, 5, 8, 10-14, 16, 17)

– Faults on executing an instruction

slide-60
SLIDE 60

Pentium Interrupt Processing

  • Interrupt handling

—If the transfer involves a change of privilege level, SS and ESP are pushed onto the stack —EFLAGS is pushed —IF and TF flags are cleared —CS and EIP are pushed —If error code is accompanied, it is pushed —CS and EIP of interrupt service routine are fetched to be executed —To return from interrupt, IRET instruction is executed

slide-61
SLIDE 61

12.6 PowerPC Processor

  • Register organization

—Fixed-point unit

– General : 32 64-bit GPRs – Exception register(XER) : Used to report exceptions

—Floating-point unit

– General : 32 64-bit GPRs – Floating-point status and control register(FPSCR)

+ Used to control the operation and to record status

—Branch processing unit

– Condition register : 8 4-bit condition code fields – Link register

+ Used for indirect addressing of the target address

– Count register

+ Used to control an iteration loop

slide-62
SLIDE 62

PowerPC User Visible Registers

slide-63
SLIDE 63

PowerPC Interrupt Processing

  • Types of interrupts
  • Machine state registers
  • Interrupt handling

—Place the address of next instruction in SRR0 —MSR is copied into SRR1 —MSR is set according to interrupt type —Control is transferred to interrupt handler —To return from interrupt, rfi instruction is executed

slide-64
SLIDE 64

PowerPC Interrupt Table

slide-65
SLIDE 65

PowerPC Interrupt Table(Cont’d)

slide-66
SLIDE 66

PowerPC MSR