A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary - - PowerPoint PPT Presentation
A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary - - PowerPoint PPT Presentation
A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary Francis Simon Moore Robert Mullins Motivation FPGAs are now home to complex Systems on-Chip ...but still optimised for single-core designs FPGA global wiring is
Motivation
- FPGAs are now home to complex Systems
- n-Chip
- ...but still optimised for single-core designs
- FPGA global wiring is simple in comparison
with ASIC Networks-on-Chip
- Networks for FPGAs use lots of logic
- Hard blocks are limited by the soft IP blocks
Goals
- Use TDM components for effective soft
NoC implementation
- Funnel data to high-speed hard blocks
– Hard NoC – Multipliers – Block RAM
- Determine optimum TDM architecture
– What are the costs? – Is it possible to design for global and local routing?
Hierarchy of interconnect
Clusters of logic elements with local interconnect Time-division multiplexed wires in a fine-grain network Coarse-grain packet-switched network
Architecture: Stratix vs TDM
Switch box TDM Global routing Local routing SRAM
LUT
Cluster of logic elements with latched inputs
LUT
Cluster of logic elements Switch box Stratix Global routing Local routing
Wire Sharing
- Many wires can be
shared without a problem
1 1 2 2 3 3 4 4 5
Wire Sharing
- Many wires can be
shared without a problem
- Other configurations
require a more intelligent approach
Conflict!! 2 2 1 1
Wire Sharing
- Many wires can be
shared without a problem
- Other configurations
require a more intelligent approach
- Signals can be
delayed to allow more efficient wire use without rerouting
1 2 2 3 3 4 4 5
Our Scheduler
- Our scheduler
– maps benchmarks from a Stratix FPGA to a TDM FPGA – resolved TDM conflicts after place and route
- Benchmarks
– IP cores taken from the Altera University Suite
- Aim
– To reduce the amount of wiring as far as possible using TDM wiring with realistic characteristics
Parameter selection (1 of 3)
- Assume infinite time slots to reduce wiring
– Determine minimum number of TDM wires
Infinite number of time slots
6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Number of TDM wires Total number of wires needed
Stratix with static wiring
Parameter selection (2 of 3)
- Assume infinite time slots to reduce wiring
– Determine minimum number of TDM wires
- Vary number of time slots
– Determine optimum number of time slots – Investigate the effect this has on latency
Determine number of time slots
1 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45 50 55 60 65 70
Number of time slots (= number of configurations bits per mux) Wires per switch box
Stratix with static wiring
Number of time slots vs latency
1 8 12 16 20 24 28 32 36 0.5 1 1.5 2 2.5 3 3.5 4
Number of time slots (=number of configuration bits per mux) Normalised latency of critical path
Parameter selection (3 of 3)
- Assume infinite time slots to reduce wiring
– Determine minimum number of TDM wires
- Vary number of time slots
– Determine optimum number of time slots – Investigate the effect this has on latency
- Using optimum number of time slots
– Re-evaluate optimum number of TDM wires
6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Number of TDM wires Total number of wires needed
Limited resources
Stratix with static wiring All but two benchmarks map to 13 wires
- A reduction of over 75%
Architectural drawbacks
- Extra configuration SRAM
- High-speed interconnect clock
- Benchmarks run over three times slower
- New CAD tools needed
– Re-routing in space as well as time – Optimise for TDM wiring at every stage
Conclusions
- Using TDM wiring we can reduce the
number of wires whilst increasing the data rate within channels
– 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate
- This will allow
– the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs
Future Work
- Current scheduling algorithm gives
– Large wire reduction, large latency penalty
- We are investigating a better compromise
– Small wiring reduction, small latency penalties? – Recent new results show this is possible
- Area and power