Architectural Synthesis and Exploration using Term Rewriting - PowerPoint PPT Presentation

Architectural Synthesis and Exploration using Term Rewriting Systems Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/ /www.csg.lcs.mit.edu

Outline u Introduction u Term Rewriting Systems (TRS) as a Hardware Description Language u Hardware Synthesis from Term Rewriting Systems u Results Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 2

Internet/Communication Space u Rapidly changing functionality and performance requirements necessitate rapid hardware development _ ATM, frame-relay, Gigabit Ethernet, packet-over- SONET protocols _ voice-over-IP, video, streaming data, QoS issues dominant _ merger of LAN and WAN infrastructures u Currently addressed by _ General-purpose or Embedded processors + ASICs _ Network processors (emerging) ASIC development time and cost is the limiting factor in product release Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 3

Current ASIC Design Flow Informal Architectural Spec Manual Steps High-level C Simulators Verification nightmare Labor Intensive Time Consuming Error Prone ASICs Fab Synthesis/Optimization RTL Implementation Time pressure means: little architecture exploration & high technology risk Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 4

Our New Design Technology u Reduces time to market _ Faster design capture _ Same specification for simulation, verification and synthesis _ Rapid feedback ⇒ architectural exploration u Enables rapid development of a large variety of chips with related designs ⇒ complex systems-on-a-chip u Reduces manpower requirement Makes designing hardware as commonplace as writing software Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 5

State-Centric Descriptions Hardware description Schematics languages always @ (posedge Clk) begin π Flip + π Mod π Mod if (a >= b) begin a <= a - b; ce δ Mod,a δ Flip,b π Flip a b <= b; < δ Flip,a end else begin δ Mod,a - π Flip a <= b; π Mod b <= a; b =0 δ Flip,b δ Flip,a end ce end π Flip what does it describe? Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 6

Operation-Centric Descriptions Euclid’s Algorithm Gcd(a, b) if b ≠ 0 ⇒ Gcd(b, Rem(a, b)) (Rule 1 ) Gcd(a, 0) ⇒ a (Rule 2 ) Rem(a, b) if a < b ⇒ a (Rule 3 ) Rem(a, b) if a ≥ b ⇒ Rem(a-b, b) (Rule 4 ) Execution: R 1 ⇒ Gcd(4,Rem(2,4)) Gc11d(2,4) R 3 R 1 ⇒ Gcd(4,2) ⇒ Gcd(2,Rem(4,2)) R 4 R 4 ⇒ Gcd(2,Rem(2,2)) ⇒ Gcd(2,Rem(0,2)) R 3 R 2 ⇒ Gcd(2,0) ⇒ 2 Hardware description? Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 7

Operation-Centric Description:MIPS MIPS Microprocessor Manual ADD rd, rs, rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4 Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 8

TRS as a Hardware Description Language

Term Rewriting System a set of terms a set of rewriting rules TRS ≡ < A, R> hierarchically state organized transitions state elements System ≡ Structure + Behavior An operation centric view of the world Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 10

TRS Execution Semantics Given a set of rules and an initial term s While ( some rules are applicable to s ) { ♦ choose an applicable rule (non-deterministic) ♦ apply the rule atomically to s } Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 11

Architectural Description +1 PC PROG RF ALU BF Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 12

AX Architectural Description Type SYS = Sys( PROC, IPORT, OPORT ) Type PROC = Proc( PC, RF, PROG, BF ) Abstract Type PC = Bit[16] Datatypes Type RF = Array[RNAME] VAL Type RNAME = Reg0 || Reg1 || Reg2 || . . . Type VAL = Bit[16] +1 Type PROG = Array[PC] INST Type BF = Fifo INST_D PC PROG RF ALU BF Type IPORT = Iport VAL Type OPORT = Oport VAL Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 13

AX Instruction Set Type INST = Loadi (RD, VAL) || Loadpc (RD) || Add (RD, R1, R2) || Sub (RD, R1, R2) || . . . || Bz (RA,RC) || MovToO (R1) || MovFromI (RD) Decoded instructions Type INST_D = Add d (RD, V1, V2) || ... RD, RA, etc. are RNAME’s. V1, V2, etc. are values Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 14

AX Processor Model: Fetch Rules Fetch Add Rule Proc( pc, rf, prog, bf ) if r 1 ∉ target(bf) ∧ r 2 ∉ target(bf) where Add(r, r 1 , r 2 )=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Add d (r,rf[r 1 ],rf[r 2 ])) ) +1 PC PROG RF ALU BF Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 15

AX Processor Model: Execute Rules Proc( pc, rf, prog, bf ) if r 1 ∉ target(bf) ∧ r 2 ∉ target(bf) where Add(r, r 1 , r 2 )=prog[pc] ⇒ Proc( pc+1, rf, prog, enq(bf,Add d (r,rf[r 1 ],rf[r 2 ])) ) Proc( pc, rf, prog, bf ) where Add d (r, v 1 , v 2 )=first(bf) ⇒ Proc( pc, rf[r:=v 1 +v 2 ], prog, deq(bf) ) +1 “Execute Add” BF PC PROG RF ALU Iport Oport Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 16

TRS as an HDL u Clean, expressive, precise and concise - speculative & superscalar microarchitectures [IEEE Micro, June ’99] - memory models & cache coherence protocols [ISCA99, ICS99] u Supports parallel and non-deterministic specifications u The correctness of a TRS can be verified against a reference TRS specification u Some pipelining can be done automatically as a source-to- source transformation on TRS’s u Superscalar versions of TRS’s can be derived mechanically from pipelined TRS’s. Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 17

Synthesis from TRS’s

From TRS to Synchronous FSM I S “Next” S O Transition States Logic u Extract state elements (registers) from the type declaration u Extract state transition logic from the rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 19

Rule: As a State Transformer Proc( pc, rf, prog, bf ) where Bz d (v a , 0 ) = first(bf) ⇒ Proc( v a , rf, prog, clear(bf) ) enable PC PC’ π RF RF’ PR PR δ OG OG’ BF BF’ current next state state values Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 20

Reference Implementation u Synchronous state elements WA ED WD first EE WE D F _full A R DE Q _empty RA 1 RD 1 LE CE RA 2 RD 2 RA 3 RD 3 u Single transition per clock cycle Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 21

Scheduler π 1 φ 1 π 2 φ 2 Scheduler π n φ n 1 . φ i ⇒ π i 2 . π 1 ∨ π 2 ∨ .... ∨ π n ⇒ φ 1 ∨ φ 2 ∨ .... ∨ φ n 3. One-rule-a-time ⇒ at most one φ i is true Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 22

Combining Logic from Multiple Rules latch φ 0 enables φ 1 latch from OR enable different φ n rules sel δ 0, PC δ 1,PC next state next PC’ values state from value different δ n , PC rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 23

Performance Considerations u Concurrent Execution _ Statically determine which transitions can be safely executed concurrently _ Generate a scheduler and update logic that allows as many concurrent transitions as possible Caution: Concurrent firing of two rules can violate one- transition-at-a-time semantics if, for example, firing of one rule disables the other Conflict-free rules Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 24

Quality of Synthesis

TRAC Synthesis Flow Design SPEC Transform Compile RTL Sim C RTL Synopsys Std C Sim Gate Array FPGA Cell Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 26

Performance: TRS vs. Verilog 32-bit MIPS Integer Core CBA tc6a LSI 10K Area Clock Area Clock (cells) (gates) TRS 9521 10ns 30756 19.48ns 100MHz 51MHz Verilog 8960 11.4ns 29483 23.79ns RTL 88MHz 42MHz TRS 1 day Dan Rosenband & James Hoe Verilog 1 month Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 27

Architectural Derivatives +1 PC PROG RF ALU BF BF 0 1 MIN MOUT Non-pipelined Other Dimensions: 2-stage Superscalar, Custom Instructions, Number of Registers, Word Size ... 3-stage Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 28

Derivatives and Feedback u Derivatives of a 32-bit 4-GPR embedded RISC processor u Synopsys RTL Analyzer reports GTECH area and gate delays (no wiring or load model) simple 2-stage 3-stage 3-stage,2-way Delay 30+X max(18+X,25) max(6+X,25) max(8+X,31) Delay(X=20) 50 38 26 31 Area 4334 5753 6378 9492 unit area=1 NAND unit delay=1 NAND Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 29

Application: ASPN Chips ASIC ASPN Performance NP GP Flexibility Application-Specific Programmable Network (ASPN) Chips are based on a core architecture and a set of domain-specific building blocks TRAC allows rapid customization of ASPN designs with ASIC like performance for evolving needs and for different vertical markets within the communication space Arvind, MIT Lab for Computer Science NTT, January 12, 2000, Slide 30

Architectural Synthesis and Exploration using Term Rewriting - PowerPoint PPT Presentation

Architectural Synthesis and Exploration using Term Rewriting Systems Arvind James C. Hoe Laboratory for Computer Science Massachusetts Institute of Technology http:/ /www.csg.lcs.mit.edu Outline u Introduction u Term Rewriting Systems (TRS)

OBAMA PRESIDENTIAL CENTER INTRODUCTION 2 INTRODUCTION 3 ARCHITECTURAL DESIGN 4 ARCHITECTURAL

Religious Architectural Religious Architectural Religious Architectural Religious Architectural

Architectural Resources Cambridge Architectural Resources Cambridge Architectural Resources

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

NES Architectural Ltd http://www.nes-solutions.co.uk/architectural Who Are we? NES Architectural

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

Basics Architectural Presentation Basics Architectural Presentation Filesize: 6.51 MB Reviews

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Banking software architecture 2 Architectural Styles 1 WebLogic Network Gatekeeper's software

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Architectural Reconfiguration Architectural Reconfiguration using Coordinated Atomic Actions

Rewriting Part 5. Confluence of Term Rewriting Systems Temur Kutsia RISC, JKU Linz Confluence

Unit8: Perspective Mike Chantler, 3/10/2008 Unit contents Perspective Motion in three

33. Review I Example 33.1. We have two lines in R 3 , one given parametrically by r 1 ( t ) =

Projection of Trees across Parallel Texts Daniel Zeman, Rudolf Rosa April 17, 2020 NPFL120

Modeling Plant Development with M Systems Petr Sosk 1 , Vladimr Smolka 1 , 1 INSTITUTE OF

Parallel Nested Loops Parallel Partition-Based Create n partitions of S by hashing each

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan

Parallel parking a car