Superscalar Design: Instruction Flow Techniques Virendra Singh - - PowerPoint PPT Presentation

superscalar design
SMART_READER_LITE
LIVE PREVIEW

Superscalar Design: Instruction Flow Techniques Virendra Singh - - PowerPoint PPT Presentation

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay


slide-1
SLIDE 1

CADSL

Superscalar Design:

Instruction Flow Techniques

Virendra Singh

Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in

EE-739: Processor Design

Lecture 26 (19 March 2013)

slide-2
SLIDE 2

CADSL

Disruption of Sequential Control Flow

Instruction/Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Reorder/ Store Buffer Complete Retire Stations Issue Execute Finish Completion Buffer Branch

19 Mar 2013 EE-739@IITB 2

slide-3
SLIDE 3

CADSL

Branch Prediction

  • Target address generation  Target

Speculation

– Access register:

  • PC, General purpose register, Link register

– Perform calculation:

  • +/- offset, autoincrement, autodecrement
  • Condition resolution  Condition speculation

– Access register:

  • Condition code register, General purpose register

– Perform calculation:

  • Comparison of data register(s)

19 Mar 2013 EE-739@IITB 3

slide-4
SLIDE 4

CADSL

Target Address Generation

Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Store Buffer Complete Retire Stations Issue Execute Finish Completion Buffer Branch PC- rel. Reg. ind. Reg. ind. with

  • ffset

19 Mar 2013 EE-739@IITB 4

slide-5
SLIDE 5

CADSL

Condition Resolution

Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Store Buffer Complete Retire Stations Issue Execute Finish Completion Buffer Branch CC reg. GP reg. value comp.

19 Mar 2013 EE-739@IITB 5

slide-6
SLIDE 6

CADSL

Branch/Jump Target Prediction

  • Branch Target Buffer: small cache in fetch stage

– Previously executed branches, address, taken history, target(s)

  • Fetch stage compares current FA against BTB

– If match, use prediction – If predict taken, use BTB target

  • When branch executes, BTB is updated
  • Optimization:

– Size of BTB: increases hit rate – Prediction algorithm: increase accuracy of prediction

Branch inst. Information Branch target address for predict. address (most recent) 0x0348 0101 (NTNT) 0x0612

19 Mar 2013 EE-739@IITB 6

slide-7
SLIDE 7

CADSL

Branch Prediction Function

  • Prediction function F(X1, X2, … )

– X1 – opcode type – X2 – history

  • Prediction effectiveness based on opcode only, or

history

IBM1 IBM2 IBM3 IBM4 DEC CDC Opcode

  • nly

66 69 71 55 80 78 History 0 64 64 70 54 74 78 History 1 92 95 87 80 97 82 History 2 93 97 91 83 98 91 History 3 94 97 91 84 98 94 History 4 95 97 92 84 98 95 History 5 95 97 92 84 98 96

19 Mar 2013 EE-739@IITB 7

slide-8
SLIDE 8

CADSL

Branch Instruction Distribution

% of each branch type % bc with penalty cycles Benchmark b bl bc bcr bcc 3 cyc 2 cyc 1 cyc spice2g6 7.86 0.30 12.5 8 0.32 13.8 2 3.12 0.76 doduc 1.00 0.94 8.22 1.01 10.1 4 1.76 2.02 matrix300 0.00 0.00 14.5 0.00 0.68 0.22 0.20 tomcatv 0.00 0.00 6.10 0.00 0.24 0.02 0.01 gcc 2.30 1.32 15.5 1.81 22.4 6 9.48 4.85 espresso 3.61 0.58 19.8 5 0.68 37.3 7 1.77 0.31 li 2.41 1.92 14.3 6 1.91 31.5 5 3.44 1.37

19 Mar 2013 EE-739@IITB 8

slide-9
SLIDE 9

CADSL

Branch Instruction Speculation

Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Stations Issue Execute Finish Completion Buffer Branch to I-cache PC(seq.) = FA (fetch address) PC(seq.) Branch Predictor (using a BTB)

  • Spec. target

BTB update Prediction (target addr. and history)

  • Spec. cond.

FA-mux

19 Mar 2013 EE-739@IITB 9

slide-10
SLIDE 10

CADSL

BTAC and BHT Design (PPC 604)

Decode Buffer Dispatch Buffer

Decode

Reservation

Dispatch

Stations Issue

Execute

Finish Completion Branch

FA Branch Target Address Cache

FA-mux

Branch History Table (BHT) BTAC BHT

SFX SFX CFX FPU LS BRN Buffer

(BTAC) I-cache update update FA FA

FAR

+4 BTAC prediction BHT prediction

BTAC:

  • 64 entries
  • fully associative
  • hit => predict taken

BHT:

  • 512 entries
  • direct mapped
  • 2-bit saturating counter

history based prediction

  • overrides BTAC prediction

19 Mar 2013 EE-739@IITB 10

slide-11
SLIDE 11

CADSL

Branch Speculation

  • Leading Speculation

– Typically done during the Fetch stage – Based on potential branch instruction(s) in the current fetch group

  • Trailing Confirmation

– Typically done during the Branch Execute stage – Based on the next Branch instruction to finish execution NT T NT T NT T NT T NT T NT T NT T

(TAG 1) (TAG 2) (TAG 3)

19 Mar 2013 EE-739@IITB 11

slide-12
SLIDE 12

CADSL

Branch Speculation

  • Leading Speculation
  • 1. Tag speculative instructions
  • 2. Advance branch and following

instructions

  • 3. Buffer addresses of speculated branch

instructions

  • Trailing Confirmation
  • 1. When branch resolves,

remove/deallocate speculation tag

  • 2. Permit completion of branch and

following instructions

19 Mar 2013 EE-739@IITB 12

slide-13
SLIDE 13

CADSL

Branch Speculation

  • Start new correct path

– Must remember the alternate (non-predicted) path

  • Eliminate incorrect path

– Must ensure that the mis-speculated instructions produce no side effects

NT T NT T NT T NT T NT T NT T NT T

(TAG 2) (TAG 3) (TAG 1)

19 Mar 2013 EE-739@IITB 13

slide-14
SLIDE 14

CADSL

Tracking Instructions

  • Assign branch tags

– Allocated in circular order – Instruction carries this tag throughout processor

  • Track instruction groups

– Instructions managed in groups, max.

  • ne branch per group

– ROB structured as groups

  • Leads to some inefficiency
  • Simpler tracking of speculative instructions

19 Mar 2013 EE-739@IITB 14

slide-15
SLIDE 15

CADSL

Program Control Flow

Decode Buffer Fetch Dispatch Buffer Decode Reservation Dispatch Stations Issue Execute Finish Completion Branch to I-cache FA (fetch address) FA Branch Predictor

  • Spec. target

Prediction

FA-mux

SFX SFX CFX FPU LS BRN Buffer Branch Predictor Update

19 Mar 2013 EE-739@IITB 15

slide-16
SLIDE 16

CADSL

Smith Predictor Hardware

  • Jim E. Smith. A Study of Branch Prediction Strategies.

International Symposium on Computer Architecture, pages 135-148, May 1981

  • Widely employed: Intel Pentium, PowerPC 604, PowerPC

620, etc.

Branch Address

Branch Prediction

m

2m k-bit counters

most significant bit Saturating Counter Increment/Decrement

Branch Outcome

Updated Counter Value

19 Mar 2013 EE-739@IITB 16

slide-17
SLIDE 17

CADSL

Two-level Branch Prediction

  • BHR adds global branch history

– Provides more context – Can differentiate multiple instances of the same static branch – Can correlate behavior across multiple static branches

BHR 0110 PC = 01011010010101 010110 000000 000001 000010 000011 010100 010101 010110 010111 111110 111111 PHT 1 0 1 Branch Prediction

19 Mar 2013 EE-739@IITB 17