Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd - - PowerPoint PPT Presentation

is the 2nd wave of hls the one industry will surf on is
SMART_READER_LITE
LIVE PREVIEW

Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd - - PowerPoint PPT Presentation

DATE 2009 PANEL SESSION DATE 2009 PANEL SESSION Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will Surf on? Jason Cong Jason Cong Chancellor s Professor s Professor Chancellor UCLA


slide-1
SLIDE 1

DATE 2009 PANEL SESSION DATE 2009 PANEL SESSION

Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will Surf on?

Jason Cong Jason Cong Chancellor Chancellor’ ’s Professor s Professor UCLA Computer Science Department UCLA Computer Science Department

cong@cs.ucla.edu cong@cs.ucla.edu

Chief Technology Advisor Chief Technology Advisor

AutoESL Design Technologies, Inc. AutoESL Design Technologies, Inc. www.autoesl.com www.autoesl.com

slide-2
SLIDE 2

The Demand for High The Demand for High-

  • Level Synthesis is Real

Level Synthesis is Real

  • Embedded processors are in almost every SoC

Embedded processors are in almost every SoC

  • Need SW/HW co

Need SW/HW co-

  • design and exploration

design and exploration

  • C/C++/SystemC is a more natural starting point

C/C++/SystemC is a more natural starting point

  • Huge silicon capacity requires high

Huge silicon capacity requires high-

  • level of abstraction

level of abstraction

  • 700,000 lines of RTL for a 10M gate design is too much!

700,000 lines of RTL for a 10M gate design is too much!

  • Verification drives the acceptance of SystemC

Verification drives the acceptance of SystemC

  • Need executable model to verify against RTL design

Need executable model to verify against RTL design

  • More and more SystemC models are available

More and more SystemC models are available

  • Need and opportunity for aggressive power optimization

Need and opportunity for aggressive power optimization

  • Simultaneous functional, structural, and temporal optimization f

Simultaneous functional, structural, and temporal optimization for power.

  • r power.
  • Accelerated computing or reconfigurable computing also need C/C+

Accelerated computing or reconfigurable computing also need C/C++ based + based compilation/synthesis to FPGAs compilation/synthesis to FPGAs

slide-3
SLIDE 3

Opportunity for High Opportunity for High-

  • Level Synthesis

Level Synthesis

  • Life of an RTL designer is getting more and more miserable

Life of an RTL designer is getting more and more miserable

  • Complexity (80+M gates)

Complexity (80+M gates)

  • Correctness

Correctness -

  • First

First-

  • time working silicon ($2M mask cost)

time working silicon ($2M mask cost)

  • Performance (interconnects dominate)

Performance (interconnects dominate)

  • Routability

Routability (what/how to measure at RTL level??) (what/how to measure at RTL level??)

  • Power (yet another dimension)

Power (yet another dimension)

  • Real

Real opportunity

  • pportunity for automation/exploration by high

for automation/exploration by high-

  • level synthesis

level synthesis with BETTER quality with BETTER quality

slide-4
SLIDE 4

Significant Progress on HLS Significant Progress on HLS

  • Wide acceptance of C/C++/SystemC for design modeling and

Wide acceptance of C/C++/SystemC for design modeling and simulation simulation

  • Pave the way for C/C++/SystemC based HLS

Pave the way for C/C++/SystemC based HLS

  • Better compilation infrastructure

Better compilation infrastructure

  • Leveraging the progress in the compiler community

Leveraging the progress in the compiler community

  • Advancements of core HLS algorithms

Advancements of core HLS algorithms --

  • - e.g. research from UCLA:

e.g. research from UCLA:

  • SDC

SDC-

  • based scheduling

based scheduling

  • Distributed register file based architecture

Distributed register file based architecture

  • Simultaneous computation and communication synthesis

Simultaneous computation and communication synthesis

  • Pattern

Pattern-

  • based synthesis

based synthesis

  • HLS for power

HLS for power … …

slide-5
SLIDE 5

A New Generation of HLS Tool A New Generation of HLS Tool – – E.g. AutoESL E.g. AutoESL

Best language coverage

Pure ANSI C/C++ synthesis SystemC/TLM synthesis

Aggressive power optimization

Clock gating Operation gating Frequency scaling Power/performance trade-off …

Best QoR

Leveraging 8+ years of research from UCLA on ESL synthesis

Ideal for reuse and arch-exploration

Platform-based synthesis Separate source & constraint Link to implementation flows

  • Best language coverage

Best language coverage

Pure ANSI C/C++ synthesis SystemC/TLM synthesis

  • Aggressive power optimization

Aggressive power optimization

Clock gating Operation gating Frequency scaling Power/performance trade-off …

  • Best

Best QoR QoR

Leveraging 8+ years of research from UCLA on ESL synthesis

  • Ideal for reuse and arch

Ideal for reuse and arch-

  • exploration

exploration

Platform-based synthesis Separate source & constraint Link to implementation flows Unique ESL synthesis technology Behavior & Interface Synthesis Performance/Power/Area Optimizations Behavior & Interface Synthesis Behavior & Interface Synthesis Performance/Power/Area Performance/Power/Area Optimizations Optimizations Microarchitecture Generation Microarchitecture Microarchitecture Generation Generation C/C++/SystemC C/C++/SystemC Advanced Code Transformation Advanced Code Advanced Code Transformation Transformation Compilation & Elaboration Compilation & Compilation & Elaboration Elaboration User Constraints User Constraints & Directives & Directives RTL RTL HDLs HDLs RTL SystemC RTL SystemC

Platform Libraries RTL Constraints RTL Constraints (Timing/Layout) (Timing/Layout)

ASIC/FPGA RTL Synthesis Place-and-Route ASIC/FPGA ASIC/FPGA RTL Synthesis RTL Synthesis Place Place-

  • and

and-

  • Route

Route

= =

Simulator/ Simulator/ Verifier Verifier

slide-6
SLIDE 6

MPEG4 4CIF by AutoPilot MPEG4 4CIF by AutoPilot vs vs Manual Design Manual Design

  • Frame rate

Frame rate

  • 60 fps based on estimation for 4CIF video

60 fps based on estimation for 4CIF video

  • 200 fps on v2p board for 1CIF video

200 fps on v2p board for 1CIF video

BRAM# MULT# SLICE# BRAM# MULT# SLICE# Parser/VLD 1 1700 2 2156 Copy Control 1 340 2 264 Motion Comp 1 340 1 262 Texture/IDCT 6 25 1710 6 19 1560 Texture Update 2 150 2 133 6 30 4240 6 26 4375 0.0%

  • 13.3%

3.2% Block Manual Design AutoPilot

slide-7
SLIDE 7

1M+ Gate Wireless Communication Module by AutoPilot 1M+ Gate Wireless Communication Module by AutoPilot

  • Quickly generate multiple solutions with the same sample rate, b

Quickly generate multiple solutions with the same sample rate, but ut different area/power profiles different area/power profiles

  • Manual design took 4 months while C

Manual design took 4 months while C-

  • based synthesis using

based synthesis using AutoPilot in two weeks AutoPilot in two weeks

TSMC65nmLP Library TSMC65nmLP Library

Architecture Latency Area (mm^2) Clock (MHz) manual 96 1.17 150 config1 116 1.10 150 config2 86 1.12 150 config3 81 1.30 100 config4 64 1.55 75

slide-8
SLIDE 8

Next Challenges Next Challenges

  • Even better

Even better QoR QoR, out , out-

  • of
  • f-
  • box success

box success

  • Further algorithmic innovation for HLS

Further algorithmic innovation for HLS

  • Aggressive power optimization

Aggressive power optimization

  • Physical synthesis above RTL

Physical synthesis above RTL

  • Integrated synthesis and verification

Integrated synthesis and verification

  • Synthesis support for variability and reliability

Synthesis support for variability and reliability