Architectural Synthesis and Exploration using Term Rewriting - - PowerPoint PPT Presentation

architectural synthesis and exploration using term
SMART_READER_LITE
LIVE PREVIEW

Architectural Synthesis and Exploration using Term Rewriting - - PowerPoint PPT Presentation

Architectural Synthesis and Exploration using Term Rewriting Systems Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/ /www.csg.lcs.mit.edu Outline u Introduction u Term Rewriting Systems (TRS)


slide-1
SLIDE 1

Architectural Synthesis and Exploration using Term Rewriting Systems

Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/

/www.csg.lcs.mit.edu
slide-2
SLIDE 2

NTT, January 12, 2000, Slide 2 Arvind, MIT Lab for Computer Science

Outline

u Introduction u Term Rewriting Systems (TRS) as a Hardware

Description Language

u Hardware Synthesis from Term Rewriting Systems u Results

slide-3
SLIDE 3

NTT, January 12, 2000, Slide 3 Arvind, MIT Lab for Computer Science

Internet/Communication Space

u Rapidly changing functionality and performance

requirements necessitate rapid hardware development _ ATM, frame-relay, Gigabit Ethernet, packet-over- SONET protocols _ voice-over-IP, video, streaming data, QoS issues dominant _ merger of LAN and WAN infrastructures

u Currently addressed by

_ General-purpose or Embedded processors + ASICs _ Network processors (emerging) ASIC development time and cost is the limiting factor in product release

slide-4
SLIDE 4

NTT, January 12, 2000, Slide 4 Arvind, MIT Lab for Computer Science

Current ASIC Design Flow

RTL Implementation High-level C Simulators

ASICs

Synthesis/Optimization

Manual Steps Verification nightmare Labor Intensive Time Consuming Error Prone

Informal Architectural Spec Fab

Time pressure means: little architecture exploration & high technology risk

slide-5
SLIDE 5

NTT, January 12, 2000, Slide 5 Arvind, MIT Lab for Computer Science

Our New Design Technology

u Reduces time to market

_ Faster design capture _ Same specification for simulation, verification and synthesis _ Rapid feedback ⇒ architectural exploration

u Enables rapid development of a large variety of chips

with related designs ⇒ complex systems-on-a-chip

u Reduces manpower requirement

Makes designing hardware as commonplace as writing software

slide-6
SLIDE 6

NTT, January 12, 2000, Slide 6 Arvind, MIT Lab for Computer Science

a ce b ce

  • =0

< πMod πFlip δMod,a δFlip,a δFlip,b πFlip πFlip πFlip+ πMod πMod δFlip,b δMod,a δFlip,a

State-Centric Descriptions

what does it describe?

always @ (posedge Clk) begin if (a >= b) begin a <= a - b; b <= b; end else begin a <= b; b <= a; end end

Schematics Hardware description languages

slide-7
SLIDE 7

NTT, January 12, 2000, Slide 7 Arvind, MIT Lab for Computer Science

Euclid’s Algorithm Gcd(a, b) if b≠0 ⇒ Gcd(b, Rem(a, b)) Gcd(a, 0) ⇒ a Rem(a, b) if a<b ⇒ a Rem(a, b) if a≥b ⇒ Rem(a-b, b)

Operation-Centric Descriptions

Execution: Gc11d(2,4) ⇒ Gcd(4,Rem(2,4))

R1

⇒ Gcd(2,Rem(4,2))

R1

⇒ Gcd(2,Rem(0,2))

R4

⇒ 2

R2

⇒ Gcd(4,2)

R3

⇒ Gcd(2,Rem(2,2))

R4

⇒ Gcd(2,0)

R3

(Rule1) (Rule2) (Rule3) (Rule4) Hardware description?

slide-8
SLIDE 8

NTT, January 12, 2000, Slide 8 Arvind, MIT Lab for Computer Science

Operation-Centric Description:MIPS

MIPS Microprocessor Manual ADD rd, rs, rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4

slide-9
SLIDE 9

TRS as a Hardware Description Language

slide-10
SLIDE 10

NTT, January 12, 2000, Slide 10 Arvind, MIT Lab for Computer Science

Term Rewriting System

System ≡ Structure + Behavior An operation centric view of the world TRS ≡ < A, R> a set of terms a set of rewriting rules hierarchically

  • rganized

state elements state transitions

slide-11
SLIDE 11

NTT, January 12, 2000, Slide 11 Arvind, MIT Lab for Computer Science

TRS Execution Semantics

Given a set of rules and an initial term s While ( some rules are applicable to s ) { ♦ choose an applicable rule (non-deterministic) ♦ apply the rule atomically to s }

slide-12
SLIDE 12

NTT, January 12, 2000, Slide 12 Arvind, MIT Lab for Computer Science

Architectural Description

PC PROG RF Oport Iport +1 ALU BF

slide-13
SLIDE 13

NTT, January 12, 2000, Slide 13 Arvind, MIT Lab for Computer Science

Type SYS = Sys( PROC, IPORT, OPORT ) Type PROC = Proc( PC, RF, PROG, BF ) Type PC = Bit[16] Type RF = Array[RNAME] VAL Type RNAME= Reg0 || Reg1 || Reg2 || . . . Type VAL = Bit[16] Type PROG = Array[PC] INST Type BF = Fifo INST_D Type IPORT = Iport VAL Type OPORT= Oport VAL

AX Architectural Description

PC PROG RF Oport Iport +1 ALU BF

Abstract Datatypes

slide-14
SLIDE 14

NTT, January 12, 2000, Slide 14 Arvind, MIT Lab for Computer Science

Type INST = Loadi (RD, VAL) || Loadpc (RD) || Add (RD, R1, R2) || Sub (RD, R1, R2) || . . . || Bz (RA,RC) || MovToO (R1) || MovFromI (RD) Decoded instructions Type INST_D = Addd (RD, V1, V2) || ... RD, RA, etc. are RNAME’s. V1, V2, etc. are values

AX Instruction Set

slide-15
SLIDE 15

NTT, January 12, 2000, Slide 15 Arvind, MIT Lab for Computer Science

AX Processor Model: Fetch Rules

Fetch Add Rule Proc( pc, rf, prog, bf ) if r1∉target(bf) ∧ r2∉target(bf) where Add(r, r1, r2)=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Addd(r,rf[r1],rf[r2])) )

PC PROG RF Oport Iport +1 ALU BF

slide-16
SLIDE 16

NTT, January 12, 2000, Slide 16 Arvind, MIT Lab for Computer Science

AX Processor Model: Execute Rules

Proc( pc, rf, prog, bf ) where Addd(r, v1, v2)=first(bf) ⇒ Proc( pc, rf[r:=v1+v2], prog, deq(bf) ) “Execute Add”

PC PROG RF Oport Iport +1 ALU

Proc( pc, rf, prog, bf ) if r1∉target(bf) ∧ r2∉target(bf) where Add(r, r1, r2)=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Addd(r,rf[r1],rf[r2])) )

BF

slide-17
SLIDE 17

NTT, January 12, 2000, Slide 17 Arvind, MIT Lab for Computer Science

TRS as an HDL

u Clean, expressive, precise and concise

  • speculative & superscalar microarchitectures

[IEEE Micro, June ’99]

  • memory models & cache coherence protocols

[ISCA99, ICS99]

u Supports parallel and non-deterministic specifications u The correctness of a TRS can be verified against a

reference TRS specification

u Some pipelining can be done automatically as a source-to-

source transformation on TRS’s

u Superscalar versions of TRS’s can be derived

mechanically from pipelined TRS’s.

slide-18
SLIDE 18

Synthesis from TRS’s

slide-19
SLIDE 19

NTT, January 12, 2000, Slide 19 Arvind, MIT Lab for Computer Science

From TRS to Synchronous FSM

u Extract state elements (registers) from the

type declaration

u Extract state transition logic from the rules

States Transition Logic I O S“Next” S

slide-20
SLIDE 20

NTT, January 12, 2000, Slide 20 Arvind, MIT Lab for Computer Science

Rule: As a State Transformer

PC RF PR OG BF

current state

PC’ RF’ PR OG’ BF’

next state values δ

π

enable

Proc( pc, rf, prog, bf ) where Bzd(va, 0 ) = first(bf) ⇒ Proc( va, rf, prog, clear(bf) )

slide-21
SLIDE 21

NTT, January 12, 2000, Slide 21 Arvind, MIT Lab for Computer Science

u Synchronous state elements u Single transition per clock cycle

Reference Implementation

R

D LE Q WA WD WE RA1 RA2 RA3 RD1 RD2 RD3

A

ED EE first DE CE

F

_full _empty

slide-22
SLIDE 22

NTT, January 12, 2000, Slide 22 Arvind, MIT Lab for Computer Science

Scheduler

Scheduler π1 π2 πn φ1 φ2 φn

  • 1. φi ⇒ πi
  • 2. π1 ∨ π2 ∨ .... ∨ πn ⇒ φ1 ∨ φ2 ∨ .... ∨ φn
  • 3. One-rule-a-time ⇒ at most one φi is true
slide-23
SLIDE 23

NTT, January 12, 2000, Slide 23 Arvind, MIT Lab for Computer Science

Combining Logic from Multiple Rules

next state values from different rules next state value OR latch enable latch enables from different rules PC’ δ0,PC δ1,PC δn,PC φ0 φ1 φn sel

slide-24
SLIDE 24

NTT, January 12, 2000, Slide 24 Arvind, MIT Lab for Computer Science

Performance Considerations

u Concurrent Execution

_ Statically determine which transitions can be safely executed concurrently _ Generate a scheduler and update logic that allows as many concurrent transitions as possible Caution: Concurrent firing of two rules can violate one- transition-at-a-time semantics if, for example, firing of

  • ne rule disables the other

Conflict-free rules

slide-25
SLIDE 25

Quality of Synthesis

slide-26
SLIDE 26

NTT, January 12, 2000, Slide 26 Arvind, MIT Lab for Computer Science

Std Cell Gate Array FPGA Transform Compile Synopsys RTL Sim C Sim

TRAC Synthesis Flow

RTL C

Design SPEC

slide-27
SLIDE 27

NTT, January 12, 2000, Slide 27 Arvind, MIT Lab for Computer Science

CBA tc6a LSI 10K Area (cells) Clock Area (gates) Clock TRS 9521 10ns 100MHz 30756 19.48ns 51MHz Verilog RTL 8960 11.4ns 88MHz 29483 23.79ns 42MHz

Performance: TRS vs. Verilog

32-bit MIPS Integer Core Dan Rosenband & James Hoe

TRS 1 day Verilog 1 month

slide-28
SLIDE 28

NTT, January 12, 2000, Slide 28 Arvind, MIT Lab for Computer Science

Architectural Derivatives

PC PROG RF +1 ALU BF 1 BF MOUT MIN

Other Dimensions: Superscalar, Custom Instructions, Number of Registers, Word Size ... Non-pipelined 2-stage 3-stage

slide-29
SLIDE 29

NTT, January 12, 2000, Slide 29 Arvind, MIT Lab for Computer Science

u Derivatives of a 32-bit 4-GPR embedded RISC processor u Synopsys RTL Analyzer reports GTECH area and gate

delays (no wiring or load model)

simple 2-stage 3-stage 3-stage,2-way Delay 30+X max(18+X,25) max(6+X,25) max(8+X,31) Delay(X=20) 50 38 26 31 Area 4334 5753 6378 9492 unit area=1 NAND unit delay=1 NAND

Derivatives and Feedback

slide-30
SLIDE 30

NTT, January 12, 2000, Slide 30 Arvind, MIT Lab for Computer Science

Application: ASPN Chips

ASIC GP

Performance Flexibility

NP ASPN Application-Specific Programmable Network (ASPN) Chips are based on a core architecture and a set of domain-specific building blocks TRAC allows rapid customization of ASPN designs with ASIC like performance for evolving needs and for different vertical markets within the communication space