Superscalar Design: An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

superscalar design
SMART_READER_LITE
LIVE PREVIEW

Superscalar Design: An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:


slide-1
SLIDE 1

CADSL

Superscalar Design:

An Introduction

Virendra Singh

Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in

EE-739: Processor Design

Lecture 24 (12 March 2013)

slide-2
SLIDE 2

CADSL

Superscalar Pipeline Stages Superscalar Pipeline Stages

Instruction Buffer Fetch Dispatch Buffer Decode Issuing Buffer Dispatch Completion Buffer Execute Store Buffer Complete Retire

In Program Order In Program Order Out

  • f

Order 14 Mar 2013 EE-739@IITB 2

slide-3
SLIDE 3

CADSL

14 Mar 2013 EE-739@IITB 3

Superscalar Architecture

  • Wide pipelines to exploit ILP
  • ILP is not necessarily exploited by widening

the pipelines and adding more resources

  • Processor policies towards fetching

decoding, and executing instruction have significant effect on its ability to discover instructions which can be executed concurrently

  • Instruction issue policy limits or enhances

performance because it determines the processor’s look ahead capability

slide-4
SLIDE 4

CADSL

Issues in Decoding Issues in Decoding

  • Primary Tasks
  • Identify individual instructions (!)
  • Determine instruction types
  • Determine dependences between

instructions

  • Two important factors

 Instruction set architecture  Pipeline width

14 Mar 2013 EE-739@IITB 4

slide-5
SLIDE 5

CADSL

Pentium Pro Fetch/Decode Pentium Pro Fetch/Decode

14 Mar 2013 EE-739@IITB 5

slide-6
SLIDE 6

CADSL

Predecoding in the AMD K5 Predecoding in the AMD K5

14 Mar 2013 EE-739@IITB 6

slide-7
SLIDE 7

CADSL

14 Mar 2013 EE-739@IITB 7

Instruction Dispatching

  • Diversified pipeline
  • Different type instructions executed by

different FU in different pipelines

  • Distributed control
  • Operands are fetched from RF
  • Operands may not be available
  • Reservation station
slide-8
SLIDE 8

CADSL

Instruction Dispatch and Issue Instruction Dispatch and Issue

  • Parallel pipeline
  • Centralized instruction fetch
  • Centralized instruction decode
  • Diversified pipeline
  • Distributed instruction execution

14 Mar 2013 EE-739@IITB 8

slide-9
SLIDE 9

CADSL

Necessity of Instruction Dispatch Necessity of Instruction Dispatch

14 Mar 2013 EE-739@IITB 9

slide-10
SLIDE 10

CADSL

Centralized Reservation Station Centralized Reservation Station

14 Mar 2013 EE-739@IITB 10

slide-11
SLIDE 11

CADSL

Distributed Reservation Station Distributed Reservation Station

14 Mar 2013 EE-739@IITB 11

slide-12
SLIDE 12

CADSL

Issues in Instruction Execution Issues in Instruction Execution

  • Current trends
  • More parallelism  bypassing very challenging
  • Deeper pipelines
  • More diversity
  • Functional unit types
  • Integer
  • Floating point
  • Load/store  most difficult to make parallel
  • Branch
  • Specialized units (media)
  • Very wide datapaths (256 bits/register or more)

14 Mar 2013 EE-739@IITB 12

slide-13
SLIDE 13

CADSL

Bypass Networks Bypass Networks

  • O(n2) interconnect from/to FU inputs and
  • utputs
  • Associative tag-match to find operands
  • Solutions (hurt IPC, help cycle time)

– Use RF only (IBM Power4) with no bypass network – Decompose into clusters (Alpha 21264)

PC I-Cache BR Scan BR Predict Fetch Q Decode Reorder Buffer BR/CR Issue Q CR Unit BR Unit FX/LD 1 Issue Q FX1 Unit LD1 Unit FX/LD 2 Issue Q LD2 Unit FX2 Unit FP Issue Q FP1 Unit FP2 Unit StQ D-Cache

14 Mar 2013 EE-739@IITB

13