Configurable Logic Cores: Why? 1. Accommodate complexity of current - PDF document

Architectures and Algorithms for Synthesizable Programmable Logic Cores Noha Kafafi,Kimberly Bozman,Steve Wilton University of British Columbia Vancouver, B.C., Canada This work is funded by Altera, Micronet, and NSERC Configurable Logic Cores: Why? 1. Accommodate complexity of current designs 2. Postpone some decisions until late in design cycle 3. Fast upgrade path for products 4. Can "patch up" design errors 1

Soft Programmable Logic Cores: This work talks Use standard cells to implement PLC: about this way PLC One way: Hard PLC Alternative way: Soft PLC Outline • Process • New architectures • CAD tool algorithms • Results • Future work • Summary 2

Separate logic Generate a HDL of PLC Combine PLC HDL with How do we make a fixed-logic HDL soft PLC? Use Standard synthesis tools to generate I.C. Fabricate chip Program PLC Soft Programmable Logic Cores: Advantages of using soft cores: + Easy to integrate. Place and route with the rest of the ASIC + Very flexible, can generate exactly the core you need + Easy to migrate to smaller technologies Disadvantages: - Really inefficient compared to hard core (estimate 6-7x bigger) Our thought: It makes sense if you only want a small PLC (a few hundred gates, perhaps) e.g. next state logic in state machine 3

Interesting Tid-bit: When we synthesized our programmable logic core, we had all sorts of problems with combinational loops, but an un-programmed FPGA is full of them! Our solution: We use uni-directional architectures. Feedback would have to be done outside the PLC Directional Architecture: 3- 3- 3- LUT LUT LUT 3- 3- 3- LUT LUT LUT 3- 3- 3- LUT LUT LUT Close-up of Switch Block 4

A Few Interesting Observations: 1. Since we are only implementing small blocks, we can remove some flexibility 2. Since these blocks are hardwired to the rest of the chip, we still need lots of flexibility at the inputs and outputs 3. Each “tile” need not be identical Gradual Architecture: INPUTS x3 x3 x3 3-LUT 3-LUT 3-LUT x4 OUTPUTS V x3 x3 x3 3-LUT 3-LUT 3-LUT x4 x3 x3 x3 3-LUT 3-LUT 3-LUT x4 All inputs are fed into multiplexer 5

Segmented Architecture: INPUTS 3-LUT 3-LUT 3-LUT x4 OUTPUTS 3-LUT 3-LUT 3-LUT x4 3-LUT 3-LUT 3-LUT x4 Computer –Aided Design Tools Place and Route tools are needed to implement a user circuit on our core Separate logic • Need new algorithms Generate a HDL of PLC for our architectures, to take into account: Combine PLC HDL with fixed-logic HDL - directional aspect - new routing structure Use Standard synthesis tools to generate I.C. • More details to come….. Fabricate chip Program PLC 6

More on Placement and Routing Placement: - Used simulated annealing algorithm - Normal FPGA algorithms based on wire length or critical path delay - For our architecture, routing resources are very limited => minimize overuse of routing multiplexors - Goal is to achieve “Placement for Routability” Routing: - Turns out this is an easy problem - Normal FPGA routers work well Placement Costs cost = ΣΣ [ MAX (0, Occ(x,y) - Cap(x,y) + γ )] Occupancy of mux at Capacity of mux at location (x,y), is an location (x,y) is equal estimate of how many to 1 nets would potentially use that routing mux - For Segmented Architecture: cost = ΣΣΣ [MAX(0, Occ(x,y,z) - Cap(x,y,z) + γ )] 7

Gradual Architecture: Some Good Placements Routing Multiplexor Sinks Source Sinks Source Gradual Architecture: Estimating Mux Usage During Placement Probability of Probability of using each using this mux mux is 0.5 is about 1 Source Sink Source Sinks 8

Gradual Architecture: Estimating Mux Usage During Placement Probability of using each of these muxes is assumed to be 1 Sinks Source Segmented Architecture: Some Good Placements sinks routing source multiplexors 9

Segmented Architecture: Estimating Mux Usage During Placement Sink Sinks Source Probability of using this mux is 0.34 Probability of using both of Probability of these muxes using this is 1 mux is 0.66 Source Experimental methodology Place circuit using VPR • Used 20 MCNC benchmark circuits Route circuit Increase core using VPR size • Found the minimum sized core on which a circuit placed no and routed successfully Routing successful? • Synthesized a core of that yes size and obtained area Generate VHDL description for core of that size from Design Analyzer Synthesize VHDL description using Design Analyzer Report core area (in square microns) from Design Analyzer 10

Measurements and Estimates: Experimental Area Results: • Results: - Gradual Architecture is 19% more dense than Directional Architecture - Gradual Architecture is 25% more dense than Segmented Architecture Area Estimate: • Compared to the same size hard-FPGA, our soft FPGA is about 6.4x less dense Combinational Circuits: From Hutton et al: squar5.blif sqrt8ml.blif Combinational logic naturally has a “triangular” shape 11

Non-Rectangular Cores: A D B E F C Original Circuit Non-Rectangular Cores: A D A D B E B E F C F C Original Circuit Normal FPGA 12

Non-Rectangular Cores: A D A A D B E B D B E F C F C E F C Original Circuit Normal FPGA Our Core Non-Rectangular Cores: A D A A D B E B D B E F C F C E F C Original Circuit Normal FPGA Our Core Remember: Since we are synthesizing these cores with standard cells, the actual layout will not be triangular 13

Non-Rectangular Cores: 1.0 • As c increases, cores are more 0.9 Normalized Area triangular => less 0.8 area, but eventually core size increases 0.7 Geometric and area increases Average again 0.6 0.5 • Using a c value of 0.1 0.3 0.4 0.5 0.7 0.8 0.0 0.2 0.6 0.9 1.0 0.6 results in 11% c area savings, on average Soft-PLC Area Distribution Top Overhead Output Mux Config Area for Area LUT Input Mux Area Config Area for LUT MUX ROUTE area Config Area for Routing Config Area for I/O mux LUT (mux_luts) Config Area overhead LUT (functional blocks) 14

Further Gradual Architecture Optimizations: INPUTS x3 x4 3-LUT 3-LUT 3-LUT OUTPUTS x3 3-LUT 3-LUT 3-LUT x4 x3 3-LUT 3-LUT 3-LUT x4 All inputs are f ed into multiplexer • Result: - 8.9% reduction in average area Further Gradual Architecture Optimizations: INPUTS 3-LUT 3-LUT 3-LUT x4 OUTPUTS 3-LUT 3-LUT 3-LUT x4 3-LUT 3-LUT 3-LUT x4 All inputs are fed into multiplexer • Result: - 12% increase in average area 15

Chip Layout Chip with embedded 8x8 Gradual Core µ m 884106 µ 884106 m 2 2 What comes next? Long-Term Goal: Programmable Logic Core Generator: Architecture Generator -- File: decoder1.vhd -- Date: January 20th, 2002 -- Authors: Noha Kafafi -- Kimberly Bozman -- -- Description: Decoder block for lut -- inputs: DEN (decode enable) -- shift_in (config bits in one bit) -- output: shift_out (config bits out variable length) -- -- Notes: Nothing to change for size upgrade -- library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_ARITH.all; entity decoder1 is generic (config_width : INTEGER := 25); port ( signal DEN : in STD_LOGIC; signal shift_in: in STD_LOGIC; signal shift_out: out STD_LOGIC_VECTOR(config_width-1 DOWNTO 0) ); end decoder1; architecture rtl of decoder1 is signal i_bus : std_logic_vector(config_width-1 downto 0); begin shift_out <= i_bus; -- When decode enable is low then shift in configuration bits process (DEN, i_bus) begin if rising_edge(DEN) then for i in 0 to config_width-2 loop i_bus(i+1) <= i_bus(i); end loop; i_bus(0) <= shift_in; end if; end process; end rtl ; 16

Future Work • Investigate speed of our core • Investigate power implications of our core • Add new cells to the standard cell library Future Work Back Annotation of Delays: After physical layout, give actual wire delay information to VPR for accurate delay driven placements B B A A C C After physical layout actual wire lengths vary VPR assumes equal A → B << A → C wire lengths A → B = A → C 17

Summary Soft Cores are viable! Compared to a hard-core, 6.4 x less dense Our Gradual Soft Core Architecture is 19% more dense than Directional Architecture Our Gradual Soft Core Architecture is 25% more dense than Segmented Architecture We’ve built a real chip (it has been fabricated and is now being tested) For more details… • Paper: N. Kafafi, K. Bozman, S.J.E. Wilton, ``Architectures and Algorithms for Synthesizable Embedded Programmable Logic Cores'', in the ACM International Symposium on Field-Programmable Gate Arrays, Feb 2003. • Patent: S.J.E Wilton, K. Bozman, N. Kafafi, J. Wu, “Method For Constructing An Integrated Circuit Device Having Fixed And Programmable Logic Portions And Programmable Logic Architecture For Use Therewith”, U.S. Patent, submitted August 2003. Licensed by Altera Corporation. 18

Configurable Logic Cores: Why? 1. Accommodate complexity of current - PDF document

Architectures and Algorithms for Synthesizable Programmable Logic Cores Noha Kafafi,Kimberly Bozman,Steve Wilton University of British Columbia Vancouver, B.C., Canada This work is funded by Altera, Micronet, and NSERC Configurable Logic

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of configurable architectures configurable architectures Prof. Kurt

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

Configurable software- -based based Configurable software edge router architecture edge router

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Performance Prediction of Configurable Software Systems by Fourier Learning Yi Zhang, Jianmei

Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc.

Vdiff: A Program Differencing Algorithm for Verilog HDL Adam Duley Christopher Spandikow Miryung

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

Automated Synthesis from HDL models Design Compiler (Synopsys) Leonardo (Mentor Graphics)

patient with Diabetes and ACS Professor Kausik Ray (UK) BSc(hons), MBChB, MD, MPhil, FRCP (lon),

Open Source HDL Synthesis and Verification with Yosys Clifford Wolf Abstract Yosys (Yosys Open

Hardware Description Language (HDL) and Logic Synthesis Alireza Tarighat (ee216a@gmail.com)

gelFORTH | A Forth for interactive hardware design . Andreas Wagner

and Datapaths Using LLVM to Generate FPGA Accelerators Alan Baker Altera Corporation FPGAs

Configurable Logic Cores: Why? 1. Accommodate complexity of current - PDF document

Architectures and Algorithms for Synthesizable Programmable Logic Cores Noha Kafafi,Kimberly Bozman,Steve Wilton University of British Columbia Vancouver, B.C., Canada This work is funded by Altera, Micronet, and NSERC Configurable Logic

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of *configurable* architectures *configurable* architectures Prof. Kurt

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

Configurable software- -based based Configurable software edge router architecture edge router

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Performance Prediction of Configurable Software Systems by Fourier Learning Yi Zhang, Jianmei

Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc.

Vdiff: A Program Differencing Algorithm for Verilog HDL Adam Duley Christopher Spandikow Miryung

ECE 5745 Complex Digital ASIC Design Section 1: ASIC Flow Front-End Christopher Batten School of

Automated Synthesis from HDL models Design Compiler (Synopsys) Leonardo (Mentor Graphics)

patient with Diabetes and ACS Professor Kausik Ray (UK) BSc(hons), MBChB, MD, MPhil, FRCP (lon),

Open Source HDL Synthesis and Verification with Yosys Clifford Wolf Abstract Yosys (Yosys Open

Hardware Description Language (HDL) and Logic Synthesis Alireza Tarighat (ee216a@gmail.com)

gelFORTH | A Forth for interactive hardware design . Andreas Wagner

and Datapaths Using LLVM to Generate FPGA Accelerators Alan Baker Altera Corporation FPGAs

Overview of Overview of configurable architectures configurable architectures Prof. Kurt