Configurable Logic Cores: Why? 1. Accommodate complexity of current - - PDF document

configurable logic cores why
SMART_READER_LITE
LIVE PREVIEW

Configurable Logic Cores: Why? 1. Accommodate complexity of current - - PDF document

Architectures and Algorithms for Synthesizable Programmable Logic Cores Noha Kafafi,Kimberly Bozman,Steve Wilton University of British Columbia Vancouver, B.C., Canada This work is funded by Altera, Micronet, and NSERC Configurable Logic


slide-1
SLIDE 1

1

Architectures and Algorithms for Synthesizable Programmable Logic Cores

Noha Kafafi,Kimberly Bozman,Steve Wilton University of British Columbia Vancouver, B.C., Canada

This work is funded by Altera, Micronet, and NSERC

  • 1. Accommodate complexity of current designs
  • 2. Postpone some decisions until late in design cycle
  • 3. Fast upgrade path for products
  • 4. Can "patch up" design errors

Configurable Logic Cores: Why?

slide-2
SLIDE 2

2

Soft Programmable Logic Cores:

Use standard cells to implement PLC: One way: Hard PLC Alternative way: Soft PLC

PLC

This work talks about this way

Outline

  • Process
  • New architectures
  • CAD tool algorithms
  • Results
  • Future work
  • Summary
slide-3
SLIDE 3

3 How do we make a soft PLC?

Separate logic Use Standard synthesis tools to generate I.C. Fabricate chip Program PLC Generate a HDL of PLC Combine PLC HDL with fixed-logic HDL

Soft Programmable Logic Cores:

Advantages of using soft cores: + Easy to integrate. Place and route with the rest of the ASIC + Very flexible, can generate exactly the core you need + Easy to migrate to smaller technologies Disadvantages:

  • Really inefficient compared to

hard core (estimate 6-7x bigger) Our thought: It makes sense if you only want a small PLC (a few hundred gates, perhaps) e.g. next state logic in state machine

slide-4
SLIDE 4

4 Interesting Tid-bit: When we synthesized our programmable logic core, we had all sorts of problems with combinational loops, but an un-programmed FPGA is full of them! Our solution: We use uni-directional architectures. Feedback would have to be done outside the PLC

Directional Architecture:

3- LUT 3- LUT 3- LUT 3- LUT 3- LUT 3- LUT 3- LUT 3- LUT 3- LUT

Close-up of Switch Block

slide-5
SLIDE 5

5

A Few Interesting Observations:

  • 1. Since we are only implementing small blocks,

we can remove some flexibility

  • 2. Since these blocks are hardwired to the rest of the

chip, we still need lots of flexibility at the inputs and outputs

  • 3. Each “tile” need not be identical

Gradual Architecture:

OUTPUTS

3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT

x3 x3 x3 x3 x3 x3

INPUTS

x3 x3

3-LUT

x3

V

All inputs are fed into multiplexer x4 x4 x4

slide-6
SLIDE 6

6

Segmented Architecture:

INPUTS OUTPUTS

3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT

x4 x4 x4

Computer –Aided Design Tools

Place and Route tools are needed to implement a user circuit on our core

  • Need new algorithms

for our architectures, to take into account:

  • directional aspect
  • new routing structure
  • More details to come…..

Separate logic Use Standard synthesis tools to generate I.C. Fabricate chip Program PLC Generate a HDL of PLC Combine PLC HDL with fixed-logic HDL

slide-7
SLIDE 7

7

More on Placement and Routing

Placement:

  • Used simulated annealing algorithm
  • Normal FPGA algorithms based on wire length or

critical path delay

  • For our architecture, routing resources are very

limited => minimize overuse of routing multiplexors

  • Goal is to achieve “Placement for Routability”

Routing:

  • Turns out this is an easy problem
  • Normal FPGA routers work well

Placement Costs

cost = ΣΣ[MAX(0, Occ(x,y) - Cap(x,y) +γ)]

Occupancy of mux at Capacity of mux at location (x,y), is an location (x,y) is equal estimate of how many to 1 nets would potentially use that routing mux

  • For Segmented Architecture:

cost = ΣΣΣ[MAX(0, Occ(x,y,z) - Cap(x,y,z) +γ)]

slide-8
SLIDE 8

8

Gradual Architecture: Some Good Placements

Source Sinks Sinks Source Routing Multiplexor

Gradual Architecture: Estimating Mux Usage During Placement

Source Sink Probability of using each mux is 0.5 Sinks Source Probability of using this mux is about 1

slide-9
SLIDE 9

9

Gradual Architecture: Estimating Mux Usage During Placement

Source Sinks Probability of using each of these muxes is assumed to be 1

Segmented Architecture: Some Good Placements

source routing multiplexors sinks

slide-10
SLIDE 10

10

Segmented Architecture: Estimating Mux Usage During Placement

Sink Source Probability of using this mux is 0.34 Probability of using this mux is 0.66 Sinks Source Probability of using both of these muxes is 1

Experimental methodology

  • Used 20 MCNC benchmark

circuits

  • Found the minimum sized

core on which a circuit placed and routed successfully

  • Synthesized a core of that

size and obtained area from Design Analyzer

Place circuit using VPR Route circuit using VPR Routing successful? Generate VHDL description for core of that size Synthesize VHDL description using Design Analyzer Report core area (in square microns) from Design Analyzer Increase core size

yes no

slide-11
SLIDE 11

11

Measurements and Estimates:

Experimental Area Results:

  • Results:
  • Gradual Architecture is 19% more dense than

Directional Architecture

  • Gradual Architecture is 25% more dense than

Segmented Architecture

Area Estimate:

  • Compared to the same size hard-FPGA, our soft

FPGA is about 6.4x less dense

Combinational Circuits:

From Hutton et al: squar5.blif sqrt8ml.blif Combinational logic naturally has a “triangular” shape

slide-12
SLIDE 12

12

Non-Rectangular Cores:

A B C E D F

Original Circuit

Non-Rectangular Cores:

A B C E D F A D B E C F

Original Circuit Normal FPGA

slide-13
SLIDE 13

13

Non-Rectangular Cores:

A B C E D F A D B E C F A B D C E F

Original Circuit Normal FPGA Our Core

Non-Rectangular Cores:

A B C E D F A D B E C F A B D C E F

Original Circuit Normal FPGA Our Core

Remember: Since we are synthesizing these cores with standard cells, the actual layout will not be triangular

slide-14
SLIDE 14

14

Non-Rectangular Cores:

  • As c increases,

cores are more triangular => less area, but eventually core size increases and area increases again

  • Using a c value of

0.6 results in 11% area savings, on average

0.5 0.6 0.7 0.8 0.9 1.0

0.8 Normalized Area 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.0

c

Geometric Average

Soft-PLC Area Distribution

LUT (functional blocks) Config Area

  • verhead

Top Overhead LUT (mux_luts) ROUTE area Input Mux Area Output Mux Area Config Area for LUT Config Area for LUT MUX Config Area for Routing Config Area for I/O mux

slide-15
SLIDE 15

15

Further Gradual Architecture Optimizations:

OUTPUTS

3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT

INPUTS

x3 x3 3-LUT x3 All inputs are f ed into multiplexer x4 x4 x4

  • Result:
  • 8.9% reduction in average area

Further Gradual Architecture Optimizations:

  • Result:
  • 12% increase in average area

OUTPUTS

3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT 3-LUT

INPUTS

3-LUT All inputs are fed into multiplexer x4 x4 x4

slide-16
SLIDE 16

16

Chip Layout

Chip with embedded 8x8 Gradual Core 884106 884106µ µm m2

2

What comes next?

Long-Term Goal: Programmable Logic Core Generator:

Architecture Generator

  • - File: decoder1.vhd
  • - Date: January 20th, 2002
  • - Authors: Noha Kafafi
  • - Kimberly Bozman
  • - Description: Decoder block for lut
  • inputs: DEN (decode enable)
  • - shift_in (config bits in one bit)
  • - output: shift_out (config bits out variable

length)

  • - Notes: Nothing to change for size upgrade
  • library IEEE;

use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.all; entity decoder1 is generic (config_width : INTEGER := 25); port ( signal DEN : in STD_LOGIC; signal shift_in: in STD_LOGIC; signal shift_out: out STD_LOGIC_VECTOR(config_width-1 DOWNTO 0) ); end decoder1; architecture rtl of decoder1 is signal i_bus : std_logic_vector(config_width-1 downto 0); begin shift_out <= i_bus;

  • - When decode enable is low then shift in configuration bits

process (DEN, i_bus) begin if rising_edge(DEN) then for i in 0 to config_width-2 loop i_bus(i+1) <= i_bus(i); end loop; i_bus(0) <= shift_in; end if; end process; end rtl ;

slide-17
SLIDE 17

17

Future Work

  • Investigate speed of our core
  • Investigate power implications of our core
  • Add new cells to the standard cell library

Future Work

Back Annotation of Delays: After physical layout, give actual wire delay information to VPR for accurate delay driven placements VPR assumes equal wire lengths A→B = A→C After physical layout actual wire lengths vary A→B << A→C

B A C A B C

slide-18
SLIDE 18

18

Summary

Soft Cores are viable! Compared to a hard-core, 6.4 x less dense Our Gradual Soft Core Architecture is 19% more dense than Directional Architecture Our Gradual Soft Core Architecture is 25% more dense than Segmented Architecture We’ve built a real chip (it has been fabricated and is now being tested)

For more details…

  • Paper:
  • N. Kafafi, K. Bozman, S.J.E. Wilton, ``Architectures and

Algorithms for Synthesizable Embedded Programmable Logic Cores'', in the ACM International Symposium on Field-Programmable Gate Arrays, Feb 2003.

  • Patent:

S.J.E Wilton, K. Bozman, N. Kafafi, J. Wu, “Method For Constructing An Integrated Circuit Device Having Fixed And Programmable Logic Portions And Programmable Logic Architecture For Use Therewith”, U.S. Patent, submitted August 2003. Licensed by Altera Corporation.