PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau - PowerPoint PPT Presentation

PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau Vinod Kathail Scott Mahlke Darren Cronquist Mukund Sivaraman HP Labs, Palo Alto R. Schreiber – MPsoc Workshop, July 2002

Outline • What Can PICO Do for an SOC Designer? • The PICO System Design Hierarchy • From Sequential to Parallel Loop Nest • Parallel Loop Nest to Processor Design R. Schreiber – MPsoc Workshop, July 2002

PICO overview P rogram I n PICO Architecture Synthesis VHDL for Compiler Processors CAD Tools Logic Synthesis, Physical Design C O hip ode ut Program In --> IP Out R. Schreiber – MPsoc Workshop, July 2002

Using PICO • User provides application, test data, and design space limits • User indicates hot loop nests • PICO creates Pareto set of ASIP designs. • Each design has a customized VLIW with zero or more loop nests realized in HW • User selects appropriate design for SOC based on area, power, performance tradeoff R. Schreiber – MPsoc Workshop, July 2002

PICO’s ASIP Architecture Cache G.P. Processor control Global Memory Local Memory Systolic Array R. Schreiber – MPsoc Workshop, July 2002

Hierarchical Design Frameworks R. Schreiber – MPsoc Workshop, July 2002

An Automated Design Template Function Parameter Specification Ranges SpaceWalker Constructor Evaluator Pareto Filter R. Schreiber – MPsoc Workshop, July 2002

Good Systems from Good Subsystems VLIW Cache Pareto NPA Pareto Pareto System Constructor System Evaluator System Pareto Filter R. Schreiber – MPsoc Workshop, July 2002

design space exploration Design Space Exploration 77 Pareto Compile systems Estimate Cycle Count Runs per second 3,145 systems considered Area Synthesize Estimate Area 2.5 million systems specified R. Schreiber – MPsoc Workshop, July 2002

PICO GUI R. Schreiber – MPsoc Workshop, July 2002

Limiting the Design Space R. Schreiber – MPsoc Workshop, July 2002

Exploration R. Schreiber – MPsoc Workshop, July 2002

Pareto Optimal Machines: VLIW- only R. Schreiber – MPsoc Workshop, July 2002

Pareto Optimal Machines : All systems Hybrid Machines VLIW Machines R. Schreiber – MPsoc Workshop, July 2002

Systolic Design: Exploration 2 Processors, II=1 1 Processor, II=1 1 Processor, II=2 1 Processor, II=8 R. Schreiber – MPsoc Workshop, July 2002

Synthesis of a Non-Programmable, Application-Specific Accelerator: From Sequential Loop Nest to Parallel Loop Nest R. Schreiber – MPsoc Workshop, July 2002

Input Language • A perfect loop nest � A systolic array • • A sequence of nests � A pipeline of arrays • Constant loop bounds • Dependence analysis must be feasible: • No aliasing through pointers • Language extensions • #pragma bitsize x 12 • #internal coeff R. Schreiber – MPsoc Workshop, July 2002

From C to VHDL Sequential C loop nest Sequential loop nest, tiled and register promoted Iteration scheduled, parallel loop nest Function units and software pipelined loop nest Registers, interconnect, FUs, memory Verilog/VHDL Design R. Schreiber – MPsoc Workshop, July 2002

From C to VHDL C program Tiles, schedules, maps, transforms Compiler front end loops, eliminates loads/stores (SUIF+Omega) Optimizes, analyzes bitwidth, allocates Compiler back end function units, software pipelining (Elcor) Allocates registers and interconnect. HDL Synthesis Builds VHDL description of processor. Verilog/VHDL R. Schreiber – MPsoc Workshop, July 2002

What does it take to make this efficient? R. Schreiber – MPsoc Workshop, July 2002

The Memory Wall CPU Memory R. Schreiber – MPsoc Workshop, July 2002

Cache and Local Memory CPU Cache Memory Local DSP/NPA Memory R. Schreiber – MPsoc Workshop, July 2002

Goal of Code Transformation for each TILE { for (t = 0; t < Tfinal; t++) { forall processors p { X[t][p] = . . . Y[t-1][p+1] . . . } } } R. Schreiber – MPsoc Workshop, July 2002

Tiling the Iteration Space data computation Volume/Surface = O(radius) Computation/Footprint = Ω (radius) Computation/Footprint = CPU/Memory R. Schreiber – MPsoc Workshop, July 2002

Load/Store Elimination • For affine array references, intermediate results in registers • For affine, read-only array references, data routed through registers; no value loaded more than once. R. Schreiber – MPsoc Workshop, July 2002

Tile Shapes Big tiles � More local memory Small tiles � less reuse of data, more global memory bandwidth Optimal tile � smallest tile that does not oversubscribe memory bandwidth R. Schreiber – MPsoc Workshop, July 2002

Estimating the Footprint Affine array reference X[i+j][2*j-3*k] How many integer points in an affine image of a rectangular iteration space? R. Schreiber – MPsoc Workshop, July 2002

Example: the Affine Image of an Iteration Space R. Schreiber – MPsoc Workshop, July 2002

Corrected Estimates •Published bounds on the size of the image of a Z- polytope are wrong •Our corrections: - footprint = iteration space for 1-1 mappings - 1-1 if no integer null vector in the iteration space - corrected bounds from finding number of iterations that differ by a null vector - within 20 percent in practice R. Schreiber – MPsoc Workshop, July 2002

Reindexing to Reduce Local Memory x x x x x x x x x x x x x x x x x x x x xxxx x x x x xxxx x x x x xxxx x x x x xxxx R. Schreiber – MPsoc Workshop, July 2002

Finding the Parallel Iteration Schedule Annotated Dataflow Graph Iteration Linear Timing number of procs Scheduler Function initiation interval • Processors a mesh of processors is given • Initiation Interval (II) every processor starts an iteration periodically with period equal to II ( hardware pipelining) • Mapping clusters of iterations are mapped to each processor • Schedule one iteration per processor every II cycles • Honor data dependence constraints • Find the schedule via efficient direct search method R. Schreiber – MPsoc Workshop, July 2002

Hardware/Software Pipelining for (i=0; i < 100; i++) a[i] += b[i]*c[i] mpy add str ld b ld c i=0 ld b ld c mpy add str i=1 II ld b ld c mpy i=2 time Lower Bounds on II (RecMII, ResMII) R. Schreiber – MPsoc Workshop, July 2002

The Mapping of Iterations to Processors for (i = 0; i < 8; i++) for (i = 0; i < 8; i++) for (j = 0; j < 4; j++) for (j = 0; j < 4; j++) { { y[i] += w[j] * x[i- y[i] += w[j] * x[i -j]; j]; } } j j p=0 Iteration Space: (8,4) Mapping: proc(i,j) = j / 2 Cluster shape = (2) p=1 i i R. Schreiber – MPsoc Workshop, July 2002

A Tight Schedule: (i,j) --> 2i+3j for (i = 0; i < 8; i++) for (i = 0; i < 8; i++) for (j = 0; j < 4; j++) for (j = 0; j < 4; j++) { { y[i] += w[j] * x[i- y[i] += w[j] * x[i -j]; j]; } } j j 9 11 13 15 17 19 21 23 p=0 6 8 10 12 14 16 18 20 3 5 7 9 11 13 15 17 p=1 0 2 4 6 8 10 12 14 i i R. Schreiber – MPsoc Workshop, July 2002

Tight Schedules – Prior Work Darte/Delosme, Chen/Megson. • GIVEN : Iteration space, projection direction, linear schedule • DETERMINE : The allowed cluster shapes • Tail Wags Dog! R. Schreiber – MPsoc Workshop, July 2002

Constructing the Schedule array array Dependence Dependence spec. spec. Analysis Analysis loop loop nest nest Bounding Bounding Generate Generate Region Region (lots of) Tight (lots of) Tight Schedules Schedules Test for Test for Correctness Correctness Estimate Estimate Hardware Cost Hardware Cost Select Select Schedule Schedule R. Schreiber – MPsoc Workshop, July 2002

Processor Synthesis loop Processor Processor Synthesis II • Optimize the loop body • Analyze bitwidth of all values • Allocate the function units • Map operations to function units • Schedule operations • Allocate registers and memory • Interconnect communicating elements Parallel, custom, designed to spec: EFFICIENT! R. Schreiber – MPsoc Workshop, July 2002

Bitwidth analysis - basic idea Input information limits the amount information that can be produced c a b Opcode semantics relate input and output information Information required by consumers limits the amount that must be produced R. Schreiber – MPsoc Workshop, July 2002

Optimal FU allocation FU Operation count cost type type count 2 1 10 3 + + 0 10 1 1 - - 1 1 13 +/- MILP: minimize cost subject to sufficient capacity R. Schreiber – MPsoc Workshop, July 2002

Allocation and Op Scheduling Find : Cheapest processor Given : Inner loop and II that achieves II on the loop LOOP Count operations f.u. library Modulo Operation Preallocate Schedule Achieved II Reallocate achieved N Y f <= required? Required II R. Schreiber – MPsoc Workshop, July 2002

Conclusions • Accurate static analysis of memory bandwidth – optimal tiling • Linear iteration scheduling: solved problem • Efficient datapath synthesis – a hard problem, good heuristics • Automatic NPA synthesis is practical • Automatic synthesis of full embedded systems is feasible, too R. Schreiber – MPsoc Workshop, July 2002

PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau - PowerPoint PPT Presentation

PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau Vinod Kathail Scott Mahlke Darren Cronquist Mukund Sivaraman HP Labs, Palo Alto R. Schreiber MPsoc Workshop, July 2002 Outline What Can PICO Do for an SOC

PICO 60 Results and PICO 40L Status Carsten B. Krauss for the PICO Collaboration TAUP 2019,

UO Smart Beam Laser 100-Lumen PicoPix PPX4010 KP-101-01 HD Laser Pico Projector HD Laser Pico

UO Smart Beam Laser Beam Pro C200 OP800 Omni 800-Lumen WXGA DLP LED Pico 200-Lumen WXGA Pico

Bubble Chambers for Dark Matter Searches and Recent PICO 60 Results Carsten B Krauss WIN 2107

PicoPix PPX2055/F7 KP-101-01 P4-X 55-Lumen Pocket Projector 55-Lumen Pocket Projector LED Pico

Pla an The Toy Lan nguage Pico Pico has two types: natu ural number and string Booleans

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Dark Matter Search Project PICO-LON Ken-Ichi Fushimi, for PICO-LON Collaboration The Univ. of

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Traditional Netlist SignOff Model ASIC Vendors ASIC Customers [Front End] [Back End] Functional

what percentage were Property Crimes? 92% Of the total crimes reported in Pico Rivera, what

+30 26510 98808, Fax: +30 26510 98890, URL: http://www.cs.uoi.gr/~faturu/

Introduction to Human-Computer Interaction Guest Lectures in the Course Software Engineering

Using Machines to Exploit Machines Harnessing AI to Accelerate Exploitation Guy Barnhart-Magen

A Decidable Fragment in Separation Logic with Inductive Predicates and Arithmetic Quang Loc Le

Roles for Government Purchasers in Payment Reform Doug McKeever, Chief CalPERS Health Policy

Case Study: Therac-25 August 22nd, 2018 Therac machines are linear accelerators that target

Impact of physician community structure on healthcare outcomes Dr Shahadat Uddin Complex Systems

Disclosing COVID 19 Information to the Public 2 NC Public Records Law: G.S. Ch. 132 Some Some

PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau - PowerPoint PPT Presentation

PICO: ASIC Synthesis from C Rob Schreiber Shail Aditya Bob Rau Vinod Kathail Scott Mahlke Darren Cronquist Mukund Sivaraman HP Labs, Palo Alto R. Schreiber MPsoc Workshop, July 2002 Outline What Can PICO Do for an SOC

PICO 60 Results and PICO 40L Status Carsten B. Krauss for the PICO Collaboration TAUP 2019,

UO Smart Beam Laser 100-Lumen PicoPix PPX4010 KP-101-01 HD Laser Pico Projector HD Laser Pico

UO Smart Beam Laser Beam Pro C200 OP800 Omni 800-Lumen WXGA DLP LED Pico 200-Lumen WXGA Pico

Bubble Chambers for Dark Matter Searches and Recent PICO 60 Results Carsten B Krauss WIN 2107

PicoPix PPX2055/F7 KP-101-01 P4-X 55-Lumen Pocket Projector 55-Lumen Pocket Projector LED Pico

Pla an The Toy Lan nguage Pico Pico has two types: natu ural number and string Booleans

ASIC Computer-Aided Design Flow ELEC 5250/6250 ASIC Design Flow ASIC Design Flow Behavioral

Coercive Powers &amp; ASIC Coercive Powers &amp; ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Dark Matter Search Project PICO-LON Ken-Ichi Fushimi, for PICO-LON Collaboration The Univ. of

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Measurements on P2 and P3 FE ASIC and Experience of P2 FE ASIC in ProtoDUNE-SP Shanshan Gao on

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&amp;D

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Traditional Netlist SignOff Model ASIC Vendors ASIC Customers [Front End] [Back End] Functional

what percentage were Property Crimes? 92% Of the total crimes reported in Pico Rivera, what

+30 26510 98808, Fax: +30 26510 98890, URL: http://www.cs.uoi.gr/~faturu/

Introduction to Human-Computer Interaction Guest Lectures in the Course Software Engineering

Using Machines to Exploit Machines Harnessing AI to Accelerate Exploitation Guy Barnhart-Magen

A Decidable Fragment in Separation Logic with Inductive Predicates and Arithmetic Quang Loc Le

Roles for Government Purchasers in Payment Reform Doug McKeever, Chief CalPERS Health Policy

Case Study: Therac-25 August 22nd, 2018 Therac machines are linear accelerators that target

Impact of physician community structure on healthcare outcomes Dr Shahadat Uddin Complex Systems

Disclosing COVID 19 Information to the Public 2 NC Public Records Law: G.S. Ch. 132 Some Some

Coercive Powers & ASIC Coercive Powers & ASIC ASIC Summer School 2011 Richard Gilbert

ASIC Research and Development at Fermilab R. Yarema April 20, 2005 Main areas ASIC R&D