UNUM UNified Universal Microprocessor Framework Nirav Dave, - - PowerPoint PPT Presentation

unum
SMART_READER_LITE
LIVE PREVIEW

UNUM UNified Universal Microprocessor Framework Nirav Dave, - - PowerPoint PPT Presentation

UNUM UNified Universal Microprocessor Framework Nirav Dave, Michael Pellauer, & Arvind Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab E Pluribus Unum Ex Uno Plures Wouldnt it be great if


slide-1
SLIDE 1

UNified Universal Microprocessor Framework

E Pluribus Unum Ex Uno Plures

Nirav Dave, Michael Pellauer, & Arvind Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab

UNUM

slide-2
SLIDE 2

Wouldn’t it be great if…

Architects (especially budding architects) could:

Easily Explore several different micro-architectures Get a feel for the performance/area/power tradeoffs Try out cool design alternatives New branch predictor, Load-Store buffer scheme, or New cache-coherence protocol Have a parameterized design By superscalar width, instruction latencies, etc. Easily convince others of their results

slide-3
SLIDE 3

One Solution: Simulators

Simulators written in C/C++/SystemC

Fast and extensible but Provide too much temptation to cheat (i.e., lower fidelity) to increase the speed of simulation Make it too easy to inadvertently leave out important implementation details ⇒ have difficulty convincing others of the results (also cannot mapped to FPGAs...)

RTL simulators written in HDLs (e.g., Verilog)

Tied to reality - can be turned into real hardware or simulated

  • n FPGAs but

Software simulation of large designs painfully slow Error prone and very hard to change Requires greater specification of low-level design

slide-4
SLIDE 4

Bluespec: Simplifying RTL Generation

RTL can be synthesized from high-level architectural descriptions Synthesized RTL can be simulated or further mapped onto FPGAs or into ASICs

Gives confidence in completeness of design

Provides an opportunity to study timing, area and power characteristics

See for example,

  • Itanium study by Wunderlich & Hoe (ICCD 04)
  • IP Lookup by Arvind, Dave, Nikhil, Rosenband (ICCAD 04)
slide-5
SLIDE 5

Bluespec Tool Flow

Bluespec SystemVerilog source Verilog 95 RTL Verilog sim VCD output Debussy Visualization Bluespec Compiler

files Bluespec tools 3rd party tools Legend

RTL synthesis gates C Bluespec C sim

Cycle Accurate

slide-6
SLIDE 6

UNUM: A general framework for processor design in Bluespec

Abstraction of a general processor which allows designers to:

Reuse library modules Insert their own custom hardware blocks Change datatypes & representations Change common module parameters (scalar width) Have a common testbench/verification framework

Bluespec semantics (guarded atomic actions) preserve correctness

slide-7
SLIDE 7

Bluespec Semantics

Bluespec modules contain:

Local state elements Local rules (guarded atomic actions) Methods to access internal state TRS semantic model

g1 g2

Rule State update Rule

Bluespec Interfaces are collections of methods instantiated (offered) by modules, e.g. if (!full); if (!empty); if (!empty); interface FIFO#(t); method Action enq(t) method Action deq() method t first() endinterface

slide-8
SLIDE 8

UNUM Microprocessor Model

Fetch ICache Decode Computation Control Unit (CCU) FUs Load/Store DCache

Each unit has an interface Standard component library available

slide-9
SLIDE 9

UNUM and ROB-based designs

Fetch ICache Decode CCU FUs Load/Store DCache ROB

High correspondence

slide-10
SLIDE 10

What about a 5-Stage Pipeline?

PowerPC 405

FET DCD EXU MEM WB

FET DCD EXU MMU WB Regs

slide-11
SLIDE 11

Abstract Microprocessor Model

Fetch ICache Decode CCU FUs Load/Store DCache FET DCD EXU MMU WB Regs Comb Issue Branch

>85% code reuse from library

slide-12
SLIDE 12

Current Progress

Complete: single-issue ROB PowerPC processor

Implements ~190 instructions, ~230 mnemonics Used for Decoder/ALU verification

Underway: PowerPC 405, PowerPC 440

8 instructions from running eCos

Beginning placing designs onto FPGA

slide-13
SLIDE 13

Future Work

Multiprocessor Framework

Processor/Memory Interface Cache-coherence implementation Multiprocessor testbench & monitoring

Hope to use this framework for large simulation

4 processors, 2 billion instructions each Testbench measuring: CPI, Cache Misses, Branch mis-predictions Goal: Summer 2005

slide-14
SLIDE 14

The End

Thank You ndave@csail.mit.edu pellauer@csail.mit.edu arvind@csail.mit.edu