Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd - - PowerPoint PPT Presentation
Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd - - PowerPoint PPT Presentation
DATE 2009 PANEL SESSION DATE 2009 PANEL SESSION Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will Surf on? Jason Cong Jason Cong Chancellor s Professor s Professor Chancellor UCLA
The Demand for High The Demand for High-
- Level Synthesis is Real
Level Synthesis is Real
- Embedded processors are in almost every SoC
Embedded processors are in almost every SoC
- Need SW/HW co
Need SW/HW co-
- design and exploration
design and exploration
- C/C++/SystemC is a more natural starting point
C/C++/SystemC is a more natural starting point
- Huge silicon capacity requires high
Huge silicon capacity requires high-
- level of abstraction
level of abstraction
- 700,000 lines of RTL for a 10M gate design is too much!
700,000 lines of RTL for a 10M gate design is too much!
- Verification drives the acceptance of SystemC
Verification drives the acceptance of SystemC
- Need executable model to verify against RTL design
Need executable model to verify against RTL design
- More and more SystemC models are available
More and more SystemC models are available
- Need and opportunity for aggressive power optimization
Need and opportunity for aggressive power optimization
- Simultaneous functional, structural, and temporal optimization f
Simultaneous functional, structural, and temporal optimization for power.
- r power.
- Accelerated computing or reconfigurable computing also need C/C+
Accelerated computing or reconfigurable computing also need C/C++ based + based compilation/synthesis to FPGAs compilation/synthesis to FPGAs
Opportunity for High Opportunity for High-
- Level Synthesis
Level Synthesis
- Life of an RTL designer is getting more and more miserable
Life of an RTL designer is getting more and more miserable
- Complexity (80+M gates)
Complexity (80+M gates)
- Correctness
Correctness -
- First
First-
- time working silicon ($2M mask cost)
time working silicon ($2M mask cost)
- Performance (interconnects dominate)
Performance (interconnects dominate)
- Routability
Routability (what/how to measure at RTL level??) (what/how to measure at RTL level??)
- Power (yet another dimension)
Power (yet another dimension)
- …
…
- Real
Real opportunity
- pportunity for automation/exploration by high
for automation/exploration by high-
- level synthesis
level synthesis with BETTER quality with BETTER quality
Significant Progress on HLS Significant Progress on HLS
- Wide acceptance of C/C++/SystemC for design modeling and
Wide acceptance of C/C++/SystemC for design modeling and simulation simulation
- Pave the way for C/C++/SystemC based HLS
Pave the way for C/C++/SystemC based HLS
- Better compilation infrastructure
Better compilation infrastructure
- Leveraging the progress in the compiler community
Leveraging the progress in the compiler community
- Advancements of core HLS algorithms
Advancements of core HLS algorithms --
- - e.g. research from UCLA:
e.g. research from UCLA:
- SDC
SDC-
- based scheduling
based scheduling
- Distributed register file based architecture
Distributed register file based architecture
- Simultaneous computation and communication synthesis
Simultaneous computation and communication synthesis
- Pattern
Pattern-
- based synthesis
based synthesis
- HLS for power
HLS for power … …
A New Generation of HLS Tool A New Generation of HLS Tool – – E.g. AutoESL E.g. AutoESL
Best language coverage
Pure ANSI C/C++ synthesis SystemC/TLM synthesis
Aggressive power optimization
Clock gating Operation gating Frequency scaling Power/performance trade-off …
Best QoR
Leveraging 8+ years of research from UCLA on ESL synthesis
Ideal for reuse and arch-exploration
Platform-based synthesis Separate source & constraint Link to implementation flows
- Best language coverage
Best language coverage
Pure ANSI C/C++ synthesis SystemC/TLM synthesis
- Aggressive power optimization
Aggressive power optimization
Clock gating Operation gating Frequency scaling Power/performance trade-off …
- Best
Best QoR QoR
Leveraging 8+ years of research from UCLA on ESL synthesis
- Ideal for reuse and arch
Ideal for reuse and arch-
- exploration
exploration
Platform-based synthesis Separate source & constraint Link to implementation flows Unique ESL synthesis technology Behavior & Interface Synthesis Performance/Power/Area Optimizations Behavior & Interface Synthesis Behavior & Interface Synthesis Performance/Power/Area Performance/Power/Area Optimizations Optimizations Microarchitecture Generation Microarchitecture Microarchitecture Generation Generation C/C++/SystemC C/C++/SystemC Advanced Code Transformation Advanced Code Advanced Code Transformation Transformation Compilation & Elaboration Compilation & Compilation & Elaboration Elaboration User Constraints User Constraints & Directives & Directives RTL RTL HDLs HDLs RTL SystemC RTL SystemC
Platform Libraries RTL Constraints RTL Constraints (Timing/Layout) (Timing/Layout)
ASIC/FPGA RTL Synthesis Place-and-Route ASIC/FPGA ASIC/FPGA RTL Synthesis RTL Synthesis Place Place-
- and
and-
- Route
Route
= =
Simulator/ Simulator/ Verifier Verifier
MPEG4 4CIF by AutoPilot MPEG4 4CIF by AutoPilot vs vs Manual Design Manual Design
- Frame rate
Frame rate
- 60 fps based on estimation for 4CIF video
60 fps based on estimation for 4CIF video
- 200 fps on v2p board for 1CIF video
200 fps on v2p board for 1CIF video
BRAM# MULT# SLICE# BRAM# MULT# SLICE# Parser/VLD 1 1700 2 2156 Copy Control 1 340 2 264 Motion Comp 1 340 1 262 Texture/IDCT 6 25 1710 6 19 1560 Texture Update 2 150 2 133 6 30 4240 6 26 4375 0.0%
- 13.3%
3.2% Block Manual Design AutoPilot
1M+ Gate Wireless Communication Module by AutoPilot 1M+ Gate Wireless Communication Module by AutoPilot
- Quickly generate multiple solutions with the same sample rate, b
Quickly generate multiple solutions with the same sample rate, but ut different area/power profiles different area/power profiles
- Manual design took 4 months while C
Manual design took 4 months while C-
- based synthesis using
based synthesis using AutoPilot in two weeks AutoPilot in two weeks
TSMC65nmLP Library TSMC65nmLP Library
Architecture Latency Area (mm^2) Clock (MHz) manual 96 1.17 150 config1 116 1.10 150 config2 86 1.12 150 config3 81 1.30 100 config4 64 1.55 75
Next Challenges Next Challenges
- Even better
Even better QoR QoR, out , out-
- of
- f-
- box success
box success
- Further algorithmic innovation for HLS
Further algorithmic innovation for HLS
- Aggressive power optimization
Aggressive power optimization
- Physical synthesis above RTL
Physical synthesis above RTL
- Integrated synthesis and verification
Integrated synthesis and verification
- Synthesis support for variability and reliability