CSE 291E / EE260C Spring 2002 Overview Quick review of basic - - PowerPoint PPT Presentation

cse 291e ee260c spring 2002
SMART_READER_LITE
LIVE PREVIEW

CSE 291E / EE260C Spring 2002 Overview Quick review of basic - - PowerPoint PPT Presentation

Program In Chip Out CSE 291E / EE260C Spring 2002 Overview Quick review of basic architectures What is Single Issue, Super Scalar, VLIW, Overview of Systolic Arrays Overview of PICO Project DataWidth Reduction


slide-1
SLIDE 1

Program In – Chip Out

CSE 291E / EE260C Spring 2002

slide-2
SLIDE 2

Tim Sherwood 2

Overview

  • Quick review of basic architectures

– What is Single Issue, Super Scalar, VLIW,

  • Overview of Systolic Arrays
  • Overview of PICO Project
  • DataWidth Reduction Algorithm
slide-3
SLIDE 3

Tim Sherwood 3

Architecture Review

  • Code Segment

For(n=0; n<100; n++) {

A[n+1] = A[n]*x[n]; B[n+1] = B[n]*y[n] + A[n]; C[n+1] = C[n]*z[n] + B[n];

}

  • How does this map on different architectures?

– In-order Single Issue – Superscalar – VLIW

slide-4
SLIDE 4

Tim Sherwood 4

In-Order Single Issue

1) A[n+1] = A[n]*x[n] 2) r1 = B[n]*y[n] 3) B[n+1] = r1 + A[n] 4) r2 = C[n]*z[n] 5) C[n+1] = r2 + B[n]

1 2 3 4 5 1 2 Time

slide-5
SLIDE 5

Tim Sherwood 5

Superscalar

1) A[n+1] = A[n]*x[n] 2) r1 = B[n]*y[n] 3) B[n+1] = r1 + A[n] 4) r2 = C[n]*z[n] 5) C[n+1] = r2 + B[n]

1 2 3 4 5 1 2 Time 3 4 5 1 2 4 3

slide-6
SLIDE 6

Tim Sherwood 6

VLIW

1:2) A[n+1] = A[n]*x[n] : r1 = B[n]*y[n] 3:4) B[n+1] = r1 + A[n] : r2 = C[n]*z[n] 5) C[n+1] = r2 + B[n] : NOP

1 : 2 Time 3 : 4 5 : NOP 1 : 2 3 : 4 5 : NOP

slide-7
SLIDE 7

Tim Sherwood 7

Systolic Arrays

  • Where does name “Systolic Array” come from?

– Array: to set or place in order – Systolic: a rhythmically recurrent contraction; especially the contraction of the heart by which the blood is forced onward and the circulation kept up

  • What is a Systolic Array?

– A network of PEs that rhythmically compute and pass data through the system

slide-8
SLIDE 8

Tim Sherwood 8

Systolic Arrays

  • All PEs are uniform and fully pipelined (usually)
  • Only local interconnection (nearest neighbor)
  • Some relaxations are introduction to increase

the utility of systolic arrays

– Neighbor interconnection (near, but not nearest) – Data broadcast operations – Different PEs, especially at the boundaries

slide-9
SLIDE 9

Tim Sherwood 9

Data Graphs for Systolic Arrays

  • Example: dynamic programming
slide-10
SLIDE 10

Tim Sherwood 10

Walking the Data Graph

slide-11
SLIDE 11

Tim Sherwood 11

Building the Array

PE PE PE

slide-12
SLIDE 12

Tim Sherwood 12

PICO

  • Program In Chip Out (PICO)

– Architecture synthesis system from HP – Work done by Bob Rau’s group – Input: Application written in subset of C

  • No complex pointer
  • No wacky array indexing

– Metric: Chip area and performance – Output: H/W as VHDL & S/W as binary – Generates Pareto-optimal architecture

slide-13
SLIDE 13

Tim Sherwood 13

Paretto Optimality

  • For a set of design points, a given design is

pareto optimal if:

– No other design is better with respect to every evaluation metric – This means there can be multiple pareto optimal points

delay area

slide-14
SLIDE 14

Tim Sherwood 14

PICO Architecture

slide-15
SLIDE 15

Tim Sherwood 15

PICO Design Framework

slide-16
SLIDE 16

Tim Sherwood 16

PICO Design Flow

slide-17
SLIDE 17

Tim Sherwood 17

PICO NPA Design

slide-18
SLIDE 18

Tim Sherwood 18

PICO Analysis

L1: x = a + 1 L2: y = x * b Loop:

L3: y = y + 1 If () goto loop

L4: z = y + c

slide-19
SLIDE 19

Tim Sherwood 19

PICO Datawidth Analysis