Superscalar Design: An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

▶

Nov 11, 2023 356 likes •498 views

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

SLIDE 1

CADSL

Superscalar Design:

An Introduction

Virendra Singh

Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in

EE-739: Processor Design

Lecture 24 (12 March 2013)

SLIDE 2

CADSL

Superscalar Pipeline Stages Superscalar Pipeline Stages

Instruction Buffer Fetch Dispatch Buffer Decode Issuing Buffer Dispatch Completion Buffer Execute Store Buffer Complete Retire

In Program Order In Program Order Out

Order 14 Mar 2013 EE-739@IITB 2

SLIDE 3

CADSL

14 Mar 2013 EE-739@IITB 3

Superscalar Architecture

Wide pipelines to exploit ILP
ILP is not necessarily exploited by widening

the pipelines and adding more resources

Processor policies towards fetching

decoding, and executing instruction have significant effect on its ability to discover instructions which can be executed concurrently

Instruction issue policy limits or enhances

performance because it determines the processor’s look ahead capability

SLIDE 4

CADSL

Issues in Decoding Issues in Decoding

Primary Tasks
Identify individual instructions (!)
Determine instruction types
Determine dependences between

instructions

Two important factors

 Instruction set architecture  Pipeline width

14 Mar 2013 EE-739@IITB 4

SLIDE 5

CADSL

Pentium Pro Fetch/Decode Pentium Pro Fetch/Decode

14 Mar 2013 EE-739@IITB 5

SLIDE 6

CADSL

Predecoding in the AMD K5 Predecoding in the AMD K5

14 Mar 2013 EE-739@IITB 6

SLIDE 7

CADSL

14 Mar 2013 EE-739@IITB 7

Instruction Dispatching

Diversified pipeline
Different type instructions executed by

different FU in different pipelines

Distributed control
Operands are fetched from RF
Operands may not be available
Reservation station

SLIDE 8

CADSL

Instruction Dispatch and Issue Instruction Dispatch and Issue

Parallel pipeline
Centralized instruction fetch
Centralized instruction decode
Diversified pipeline
Distributed instruction execution

14 Mar 2013 EE-739@IITB 8

SLIDE 9

CADSL

Necessity of Instruction Dispatch Necessity of Instruction Dispatch

14 Mar 2013 EE-739@IITB 9

SLIDE 10

CADSL

Centralized Reservation Station Centralized Reservation Station

14 Mar 2013 EE-739@IITB 10

SLIDE 11

CADSL

Distributed Reservation Station Distributed Reservation Station

14 Mar 2013 EE-739@IITB 11

SLIDE 12

CADSL

Issues in Instruction Execution Issues in Instruction Execution

Current trends
More parallelism  bypassing very challenging
Deeper pipelines
More diversity
Functional unit types
Integer
Floating point
Load/store  most difficult to make parallel
Branch
Specialized units (media)
Very wide datapaths (256 bits/register or more)

14 Mar 2013 EE-739@IITB 12

SLIDE 13

CADSL

Bypass Networks Bypass Networks

O(n2) interconnect from/to FU inputs and
utputs
Associative tag-match to find operands
Solutions (hurt IPC, help cycle time)

– Use RF only (IBM Power4) with no bypass network – Decompose into clusters (Alpha 21264)

PC I-Cache BR Scan BR Predict Fetch Q Decode Reorder Buffer BR/CR Issue Q CR Unit BR Unit FX/LD 1 Issue Q FX1 Unit LD1 Unit FX/LD 2 Issue Q LD2 Unit FX2 Unit FP Issue Q FP1 Unit FP2 Unit StQ D-Cache

14 Mar 2013 EE-739@IITB