Synthesis Challenges for Next- Synthesis Challenges for Next- - - PowerPoint PPT Presentation

synthesis challenges for next synthesis challenges for
SMART_READER_LITE
LIVE PREVIEW

Synthesis Challenges for Next- Synthesis Challenges for Next- - - PowerPoint PPT Presentation

Synthesis Challenges for Next- Synthesis Challenges for Next- Generation High-Performance and Generation High-Performance and High-Density PLDs High-Density PLDs Jason Cong Songjie Xu Jason Cong Songjie Xu Department of Computer Science


slide-1
SLIDE 1

Synthesis Challenges for Next- Generation High-Performance and High-Density PLDs Synthesis Challenges for Next- Generation High-Performance and High-Density PLDs

Jason Cong

Department of Computer Science University of California, Los Angeles, USA

Jason Cong

Department of Computer Science University of California, Los Angeles, USA

Songjie Xu

Aplus Design Technologies, Inc. Los Angeles, USA

Songjie Xu

Aplus Design Technologies, Inc. Los Angeles, USA

slide-2
SLIDE 2

Slide 2

Outline Outline

N Introduction N Synthesis Challenges for New

Architectures

N Synthesis Challenges for High Density

and High Performance

N Concluding Remarks N Introduction N Introduction

slide-3
SLIDE 3

Slide 3

PLD Industry Growth PLD Industry Growth

N Enjoyed the exponential growth as the rest of the

semiconductor industry

N With an even faster rate

I nt roduct ion

27.78% 36.07% 24.50%

15.71%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Annual Growth Rate (1994-1998)

Company/Industry

Semiconductor Industry Altera Intel LSI Logic

slide-4
SLIDE 4

Slide 4

Definitions Definitions

N PLD (Programmable Logic Device)

T CPLD (Complex PLD)

T Extensions of early PAL T Consist of PLA-like blocks T Macrocell

T FPGA (Field Programmable Gate Array)

T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB)

I nt roduct ion

slide-5
SLIDE 5

Slide 5

CPLD CPLD

N Example: Altera MAX 7000

I nt roduct ion

slide-6
SLIDE 6

Slide 6

Macrocell Macrocell

N Example: Altera MAX 7000

T Each macrocell has a logic array, a product-term

select matrix, and a programmable register

I nt roduct ion

slide-7
SLIDE 7

Slide 7

Definitions Definitions

N PLD (Programmable Logic Device)

T CPLD (Complex PLD)

T Extensions of early PAL T Consist of PLA-like blocks T Macrocell

T FPGA (Field Programmable Gate Array)

T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB)

I nt roduct ion

slide-8
SLIDE 8

Slide 8

FPGA FPGA

N Example: Xilinx XC 4000

I nt roduct ion

slide-9
SLIDE 9

Slide 9

PLB PLB

N Xilinx XC 4000

T Each PLB has two 4-LUTs, one 3-LUT and 2 FFs

I nt roduct ion

slide-10
SLIDE 10

Slide 10

Advance of PLD Architectures Advance of PLD Architectures

I nt roduct ion

1980’s 1998/1999 Altera

MAX 5000:

32-192 P-terms 600-3,750 usable gates

APEX 20K:

51,840 Logic elements (LUTs) 442,368 RAM bits 3,456 P-term macrocells 60,000-1.5M usable gates

Xilinx

XC 2000:

64-100 LUTs 1,200-1,800 logic gates

Virtex:

58K-4M system gates 1Mb distributed RAM 832Kb embedded memory

1980’s 1998/1999 Altera

MAX 5000:

32-192 P-terms 600-3,750 usable gates

APEX 20K:

51,840 Logic elements (LUTs) 442,368 RAM bits 3,456 P-term macrocells 60,000-1.5M usable gates

Xilinx

XC 2000:

64-100 LUTs 1,200-1,800 logic gates

Virtex:

58K-4M system gates 1Mb distributed RAM 832Kb embedded memory

slide-11
SLIDE 11

Slide 11

PLD Synthesis Tends to Fall Behind ... PLD Synthesis Tends to Fall Behind ...

N Additional features and capabilities in the new

architecture often place new requirements for synthesis tools

N Higher density and higher performance demand

better scalability and more efficient optimization

N Devil is always in the software …

T Tool effort is often being underestimated T Quick customization from ASIC or existing PLD

synthesis tool leads to considerably inferior results

T Software is often the bottleneck of new PLD product

release ... I nt roduct ion

slide-12
SLIDE 12

Slide 12

Challenges to PLD Synthesis Challenges to PLD Synthesis

N Support for new PLD architectures

T Hierarchical architectures T Heterogeneous architectures

N Support for high-performance and high-

density PLD designs

T Layout-driven synthesis T Incremental synthesis T IP-based synthesis

I nt roduct ion

slide-13
SLIDE 13

Slide 13

Outline Outline

N Introduction N Synthesis Challenges for New

Architectures

N Synthesis Challenges for High Density

and High Performance

N Concluding Remarks N Synthesis Challenges for New

Architectures

N Synthesis Challenges for New

Architectures

slide-14
SLIDE 14

Slide 14

PLD Architecture Development PLD Architecture Development

N Two important trends

T Hierarchical architectures T Heterogeneous architectures

N Synthesis needs

Synt hesis Challenges f or New Archit ect ures

slide-15
SLIDE 15

Slide 15

PLD Architecture Development Trend ……

Hierarchical Architectures

PLD Architecture Development Trend ……

Hierarchical Architectures

N Basic Idea

T Group of basic logic blocks into clusters T Fast local programmable interconnects inside

clusters

T May have multiple levels of hierarchy

N Benefits

T Exploit the inherent locality of interconnections

in most applications

T Lead to the improvement in both performance

and density

Synt hesis Challenges f or New Archit ect ures

slide-16
SLIDE 16

Slide 16

Example Hierarchical Architectures Example Hierarchical Architectures

N Altera FLEX 10K

T Each LAB has 8 LEs T Each LE has a 4-LUT and a programmable register

Synt hesis Challenges f or New Archit ect ures

slide-17
SLIDE 17

Slide 17

Two Types of Clusters Two Types of Clusters

N Hard-wired connection based cluster (HCC)

T Intra-cluster connection is formed by hard wires T e.g. CLB in XC4000

N Programmable interconnection based cluster (PIC)

T Intra-cluster connection is formed by a local

programmable interconnection array

T e.g. LAB in FLEX 10K and APEX 20K

Synt hesis Challenges f or New Archit ect ures

slide-18
SLIDE 18

Slide 18

Existing Synthesis Results for HCC Existing Synthesis Results for HCC

N Traditional approach T Map into LUTs and then combine the LUTs to

form HCCs in a heuristic post-processing step

N Recent advance [Cong & Hwang, FPGA’97]

T Use Boolean matching techniques to completely

characterize the set of functions that can be implemented in a HCC

T Map a netlist directly into HCCs

Synt hesis Challenges f or New Archit ect ures

slide-19
SLIDE 19

Slide 19

Hard-Wired Connection Based Clusters (HCCs) Hard-Wired Connection Based Clusters (HCCs)

N Example: Xilinx XC 4000 CLB

T Each CLB has two 4-LUTs connected to a 3-LUT

Synt hesis Challenges f or New Archit ect ures

slide-20
SLIDE 20

Slide 20

N Characterization based on functional

decomposition

T f (X) = H ( F (X1) , G (X2) ), T f(X) = H ( F (X1) , G (X2) , x ), T f(X) = H (F(X1,x), G(X2), x ), T f(X) = H (F(X1,x), G(X2,x), x ).

N Conditions

T F and G input sizes ≤ 4

N Result: matched all “difficult

examples” (over 1,700) from Xilinx

T Best known tool produced only about 70%

match

XC4K CLB

G F H

x

f(X)

Example: Boolean Matching for XC4K CLB Example: Boolean Matching for XC4K CLB

Synt hesis Challenges f or New Archit ect ures

slide-21
SLIDE 21

Slide 21

Example: Mapping to XC4K CLB Example: Mapping to XC4K CLB

J Given a function f(0,1,2,3,4,5) where

a = 1’ + 3, b = 1 + 3 f = 0’245b’ + 0’245’b + 0’145b + 012’5’a + 0’2’4’5a + 025b + 0’2’5’a’ + 045a’ + 05’b’

J How many XC4K CLBs are needed to

implement f(0,1,2,3,4,5) ?

Synt hesis Challenges f or New Archit ect ures

slide-22
SLIDE 22

Slide 22

Mapping Packing #CLBs #Levels Chortle-crf simple 9 4 FlowMap simple 8 3 FlowMap functional 6 3 Boolean 1 1

G F H

3 1 2 5 4

The Boolean matching result

Example: Mapping to XC4K CLB (Cont’d) Example: Mapping to XC4K CLB (Cont’d)

Synt hesis Challenges f or New Archit ect ures

slide-23
SLIDE 23

Slide 23

Programmable Interconnection Based Cluster (PIC) Programmable Interconnection Based Cluster (PIC)

N Example: Altera APEX 20K T Each LAB has 10 LEs (LUT + FF) connected

through a fully programmable matrix

Synt hesis Challenges f or New Archit ect ures

slide-24
SLIDE 24

Slide 24

Existing Synthesis Results for PIC Existing Synthesis Results for PIC

N Common approaches

T Map into basic logic blocks and then group the

them into clusters under size and pin constraints

T Recent progress on circuit clustering

T Performance driven clustering for combinational

circuits [Lawler’69] [Yang & Wong, T-CAD’97]

T Simultaneous clustering with retiming for sequential

circuits [Pan, et al, T-CAD’98][Cong, et al, DAC’99] Synt hesis Challenges f or New Archit ect ures

slide-25
SLIDE 25

Slide 25

Benefits of Considering Retiming during Clustering Benefits of Considering Retiming during Clustering

N

Proper clustering allows retiming to hide inter-cluster delays (E.g., assume gate_delay = 1, inter_cluster_delay = 2)

Φ=8

retiming cannot help

Φ=6

retiming reduces delay same cutsize

Φ=8

Clustering A

Φ=8

Clustering B

slide-26
SLIDE 26

Slide 26

Major Challenge in Synthesis for Hierarchical Architectures Major Challenge in Synthesis for Hierarchical Architectures

N Can we synthesize a design directly into a

multi-level hierarchical architecture?

T Most existing PLD synthesis algorithms

transform a given design into a flat netlist of basic PLBs and then go through a separate clustering/partitioning step.

T Very few consider synthesizing directly for

hierarchical architectures

Synt hesis Challenges f or New Archit ect ures

slide-27
SLIDE 27

Slide 27

PLD Architecture Development Trend ……

Heterogeneous Architectures

PLD Architecture Development Trend ……

Heterogeneous Architectures

N Three types of heterogeneous architectures

T Type 1: Multiple sizes and/or configurations of

the same type of logic blocks

T e.g. ORCA 2C, VF1, XC4000

T Type 2: Multiple types of logic blocks

T LUTs, macrocells, and MUXes T e.g. APEX 20K

T Type 3: Different kinds of resources on the same

chip

T Programmable logic blocks T Embedded memory blocks (EMBs) T Embedded processors

Synt hesis Challenges f or New Archit ect ures

slide-28
SLIDE 28

Slide 28

Type 1 Heterogeneous Architectures Type 1 Heterogeneous Architectures

N Example: Xilinx XC 4000

T Each CLB can implement two 4-LUTs or one 5-LUT

Synt hesis Challenges f or New Archit ect ures

slide-29
SLIDE 29

Slide 29

Synthesis Results for Type 1 Heterogeneous Architectures Synthesis Results for Type 1 Heterogeneous Architectures

N Area minimization

T [He & Rose, FPGA’94] T [Korupolu, et al, DAC’98] T [Cong, Ding & Wu, FPGA’99]

N Delay minimization

T HeteroMap [Cong & Xu, DAC’98]

T Delay optimal polynomial-time algorithm

N Evaluation results show

T Heterogeneous architectures are superior to

homogeneous ones for both area and delay

T “One size fits all” doesn’t produce best results.

Synt hesis Challenges f or New Archit ect ures

slide-30
SLIDE 30

0.5 1 1.5 2 2.5 Mapping-Delay MemoryCell-Area 3-LUT-FPGA 4-LUT-FPGA 5-LUT-FPGA 6-LUT-FPGA 3-4-5-6-LUT- HeteroFPGA

Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2 Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : 2 : 4 : 8

Architecture Evaluation—

Homogeneous vs. Heterogeneous FPGAs

Architecture Evaluation—

Homogeneous vs. Heterogeneous FPGAs

Synt hesis Challenges f or New Archit ect ures

slide-31
SLIDE 31

Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2 Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : r : r2 : r3

50000 100000 150000 200000 250000 300000 350000 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 r Area x Delay x Delay 3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT 50000 100000 150000 200000 250000 300000 350000 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 r Area x Delay x Delay 3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT

“AT2-Metric” for Homogeneous and Heterogeneous FPGAs “AT2-Metric” for Homogeneous and Heterogeneous FPGAs

Synt hesis Challenges f or New Archit ect ures

slide-32
SLIDE 32

Slide 32

Type 2 Heterogeneous Architectures Type 2 Heterogeneous Architectures

N An example: Altera APEX 20K

T Embedded system blocks (ESB) can implement dual-

port RAM, ROM, FIFO, CAM blocks, and P-term logic

T In P-term mode, each ESB has 16 macrocells

T Each macrocell has two P-terms

Synt hesis Challenges f or New Archit ect ures

slide-33
SLIDE 33

Slide 33

Synthesis for Type 2 Heterogeneous Architectures Synthesis for Type 2 Heterogeneous Architectures

N Very little work N Preliminary study for a hybrid architecture

  • f LUTs and Pterm blocks [Kaviani, Ph.D.

thesis’99]

T Use a greedy approach for hybrid mapping

T Use LUTs for density optimization T Use Pterm blocks for performance optimization

Synt hesis Challenges f or New Archit ect ures

slide-34
SLIDE 34

Slide 34

Type 3 Heterogeneous Architectures Type 3 Heterogeneous Architectures

N An example: FLEX 10K (logic array + embedded

memory blocks (EMBs))

T 576 to 12,160 LEs T 3 to 20 embedded array blocks (EABs)

T Each EAB has 2K bits (11x1, 10x2, 9x4, 8x8)

Synt hesis Challenges f or New Archit ect ures

slide-35
SLIDE 35

Slide 35

Field-Programmable System-on-a-Chip (FPSOC) Field-Programmable System-on-a-Chip (FPSOC)

processor memory

Programmable Logic

General-Purpose FPSOC

processor memory

Programmable Logic

ASIC

Application-specific FPSOC

Synt hesis Challenges f or New Archit ect ures

slide-36
SLIDE 36

Slide 36

Synthesis for Type 3 Heterogeneous Architectures Synthesis for Type 3 Heterogeneous Architectures

N Explore logic implementation using EMBs

T Area minimization

T EMB_Pack [Cong & Xu, FPGA’98]

T With Delay constraint

T SMAP [Wilton, FPGA’98]

T Delay minimization

T [Cong & Xu, ICCAD’98]

N The general synthesis problem for FPSOC

is largely untouched

Synt hesis Challenges f or New Archit ect ures

slide-37
SLIDE 37

Slide 37

Synthesis Needs for FP-SOC Synthesis Needs for FP-SOC

N Partition the design/application to

heterogeneous resources. E.g.

T Software/hardware partitioning T Memory/logic partitioning

N Efficient use of each type of resources. E.g.

T Code generation for embedded CPUs T Automatic synthesis for FPGA

N Scheduling & synchronization of various

  • components. E.g.

T Real-time O/S

N Trade-off between heterogeneous resources N Support for IP integration

Synt hesis Challenges f or New Archit ect ures

slide-38
SLIDE 38

Slide 38

Outline Outline

N Introduction N Synthesis Challenges for New

Architectures

N Synthesis Challenges for High Density

and High Performance

N Concluding Remarks N Synthesis Challenges for High Density

and High Performance

N Synthesis Challenges for High Density

and High Performance

slide-39
SLIDE 39

Slide 39

Important Synthesis Problems Important Synthesis Problems

N Layout-driven synthesis N Incremental synthesis N IP-based design

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-40
SLIDE 40

Slide 40

Layout-Driven Synthesis Layout-Driven Synthesis

N Scaling of IC feature size [NTRS’97]

T Interconnect delay becomes more and more dominant in

the overall circuit delay

N FPGA design

T Interconnect delay has always been very significant (due

to programmable switches)

N Layout design has a significant impact on

performance

N Synthesis needs to consider impact on layout

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-41
SLIDE 41

Slide 41

Logic v.s. Local Interconnect v.s. Global Interconnect Delay Logic v.s. Local Interconnect v.s. Global Interconnect Delay

Delay Resource Delay Value (ns) Logic Element (LE) 2.4 Local Inerconnect 0.5 Row Interconnect 4.7 Column Interconnect 7.2

Altera FLEX8K part

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-42
SLIDE 42

Slide 42

Delay Distribution Delay Distribution

Logic 30% Local Interconnect 9% Global Interconnect 61%

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-43
SLIDE 43

Slide 43

Challenges and Opportunities for Layout- Driven Synthesis Challenges and Opportunities for Layout- Driven Synthesis

N Challenges:

T Interconnect design is not finalized until after placement

and routing

T Both synthesis and layout are highly complex. How to

properly combine them without complexity explosion?

N Opportunities: substantial performance gain

T Example: Mapping with consideration of fast

interconnections (cascade chains) Synt hesis Challenges f or High Densit y and High P erf ormance

slide-44
SLIDE 44

Slide 44

Comparison between FlowMap and Fast Interconnection Mapping Comparison between FlowMap and Fast Interconnection Mapping

0.2 0.4 0.6 0.8 1 1.2 1.4 Mapping-Delay #4-LUT

FlowMap (K=4) Fast Interconnect Mapping

  • 34%

+24%

  • Delay Assumption: 4-LUT fast pin delay = 0.7ns

4-LUT slow pin delay = 2.7 ns fast interconnect delay = 0.2 ns general interconnect delay = 4.1 ns

  • LUT fast interconnect is connected to the fast pin
slide-45
SLIDE 45

Slide 45

Comparison between FlowMap and Fast Interconnection Mapping (Cont’d) Comparison between FlowMap and Fast Interconnection Mapping (Cont’d)

0.2 0.4 0.6 0.8 1 Mapping-Delay #4-LUT

FlowMap (K=4) Fast Interconnect Postprocessing

  • 21%

+0%

  • Delay Assumption:

4-LUT fast pin delay = 0.7ns 4-LUT slow pin delay = 2.7 ns fast interconnect delay = 0.2ns general interconnect delay = 4.1 ns

  • LUT fast interconnect is connected to the fast pin
slide-46
SLIDE 46

Slide 46

Incremental Synthesis Incremental Synthesis

N Motivation

T The PLD designs are getting more complex T All design process is iterative/incremental T Resynthesizing the entire large design is not

acceptable with consideration of multiple design iterations

T The highly incremental design process

requires fast incremental synthesis capabilities

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-47
SLIDE 47

Slide 47

Requirements on Incremental Synthesis Requirements on Incremental Synthesis

N Preservability

T Preserve as much information as possible from

the existing synthesis solution

N Efficiency

T A faster synthesis system will enable more

design iterations and shorten the overall design time

N Quality of the synthesis solution

T Delay, area, etc. should be as close as possible

to that by complete re-synthesis

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-48
SLIDE 48

Slide 48

Status on Incremental Synthesis Status on Incremental Synthesis

N Very few works

T ECO [Kukimoto & Fujita, ICCAD’92]

T No structural change is allowed T Only functional change is allowed

T Incremental mapping [Cong & Hui, DAC’2000]

T Preserve optimal mapping depth T Achieve over 300X speed-up for circuits of about

100,000 gates compared to re-mapping by FlowMap

N Much more work is needed in this area

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-49
SLIDE 49

Slide 49

IP-Based Design IP-Based Design

N Motivation

T Design reuse to improve productivity T Better performance and density

N Example:

T Altera IP MegaStore

Synt hesis Challenges f or High Densit y and High P erf ormance

slide-50
SLIDE 50

Slide 50

slide-51
SLIDE 51

Slide 51

Requirements on IP-Based Design Requirements on IP-Based Design

N IP representation -- should allow migration

between

T Different FPGA vendors T Different FPGA generations

N Characterization

T functionality T performance

N Interface with synthesis tools

T automatic inference/instantiation T optimization and constraint propagation T simulation and verification

N IP protection

T How to prevent un-authorized use? T E.g. Embed watermarks in FPGA mapping solutions

[Kirovski, et al, ICCAD’98] Synt hesis Challenges f or High Densit y and High P erf ormance

slide-52
SLIDE 52

Slide 52

Outline Outline

N Introduction N Synthesis Challenges for New

Architectures

N Synthesis Challenges for High Density

and High Performance

N Concluding Remarks N Concluding Remarks N Concluding Remarks

slide-53
SLIDE 53

Slide 53

Concluding Remarks Concluding Remarks

N PLD market is going through a rapid expansion N PLD synthesis is facing many new challenges

T Support for new PLD architectures

T Hierarchical architectures T Heterogeneous architectures

T Support for high-performance and high-density PLD

designs

T Layout-driven synthesis T Incremental synthesis T IP-based synthesis

N Many research and business opportunities

T UCLA VLSI CAD Laboratory T Aplus Design Technologies, Inc.

Concluding Remarks

slide-54
SLIDE 54

Slide 54

PLD Synthesis Research at UCLA PLD Synthesis Research at UCLA

N Advanced synthesis algorithms

T Synthesis for heterogeneous architectures T Synthesis for sequential circuits with simultaneous mapping,

retiming, and pipelining

T Layout-driven synthesis T IP-based synthesis T Synthesis/compilation techniques for FPSOC … T Software prototype: RASP system

N Architecture evaluation

T Evaluation of PLB architecture T Evaluation of heterogeneous architectures T Evaluation of hierarchical architectures … T Software prototype: fpgaEva tool

N URL: http://cadlab.cs.ucla.edu/~xfpga

Concluding Remarks

slide-55
SLIDE 55

Slide 55

UCLA RASP Synthesis System for LUT-Based FPGAs UCLA RASP Synthesis System for LUT-Based FPGAs

EDIF

netlist

HDL design

Internal netlist

LUT Mapping Engine

LUT netlist

PLB Mapping Engine

Vendor Specific netlist Xilinx, Altera, ORCA

Placement Routing

Chip Programming Information

Concluding Remarks

slide-56
SLIDE 56

Slide 56

FPGA Architecture Evaluation FPGA Architecture Evaluation

Concluding Remarks

slide-57
SLIDE 57

Slide 57

Aplus Design Technologies, Inc. Aplus Design Technologies, Inc.

N

A new start-up in PLD synthesis

T Based in Los Angeles (near UCLA)

N

Objective: provide Advanced Programmable Logic Unified Solution (APLUS)

T Unify architecture and synthesis T Unify synthesis and layout

N

Products & Services

T Next generation synthesis tool for high-density, high-

performance PLDs

T Architecture evaluation tool kits and services

N

Has already established strategic partnership with several major PLD vendors

N

URL: http://www.aplus-dt.com Concluding Remarks

slide-58
SLIDE 58

THANK YOU!

J . Cong and S. Xu

slide-59
SLIDE 59

Slide 59

The Typical Design Flow Using LPMs The Typical Design Flow Using LPMs

Synt hesis Challenges f or High Densit y and High P erf ormance