A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary - - PowerPoint PPT Presentation

▶

Nov 10, 2023 399 likes •625 views

A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary Francis Simon Moore Robert Mullins Motivation FPGAs are now home to complex Systems on-Chip ...but still optimised for single-core designs FPGA global wiring is

SLIDE 1

A Network of Time-Division Multiplexed Wiring for FPGAs

Rosemary Francis Simon Moore Robert Mullins

SLIDE 2

Motivation

FPGAs are now home to complex Systems
n-Chip
...but still optimised for single-core designs
FPGA global wiring is simple in comparison

with ASIC Networks-on-Chip

Networks for FPGAs use lots of logic
Hard blocks are limited by the soft IP blocks

SLIDE 3

Goals

Use TDM components for effective soft

NoC implementation

Funnel data to high-speed hard blocks

– Hard NoC – Multipliers – Block RAM

Determine optimum TDM architecture

– What are the costs? – Is it possible to design for global and local routing?

SLIDE 4

Hierarchy of interconnect

Clusters of logic elements with local interconnect Time-division multiplexed wires in a fine-grain network Coarse-grain packet-switched network

SLIDE 5

Architecture: Stratix vs TDM

Switch box TDM Global routing Local routing SRAM

LUT

Cluster of logic elements with latched inputs

LUT

Cluster of logic elements Switch box Stratix Global routing Local routing

SLIDE 6

Wire Sharing

Many wires can be

shared without a problem

1 1 2 2 3 3 4 4 5

SLIDE 7

Wire Sharing

Many wires can be

shared without a problem

Other configurations

require a more intelligent approach

Conflict!! 2 2 1 1

SLIDE 8

Wire Sharing

Many wires can be

shared without a problem

Other configurations

require a more intelligent approach

Signals can be

delayed to allow more efficient wire use without rerouting

1 2 2 3 3 4 4 5

SLIDE 9

Our Scheduler

Our scheduler

– maps benchmarks from a Stratix FPGA to a TDM FPGA – resolved TDM conflicts after place and route

Benchmarks

– IP cores taken from the Altera University Suite

– To reduce the amount of wiring as far as possible using TDM wiring with realistic characteristics

SLIDE 10

Parameter selection (1 of 3)

Assume infinite time slots to reduce wiring

– Determine minimum number of TDM wires

SLIDE 11

Infinite number of time slots

6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of TDM wires Total number of wires needed

Stratix with static wiring

SLIDE 12

Parameter selection (2 of 3)

Assume infinite time slots to reduce wiring

– Determine minimum number of TDM wires

Vary number of time slots

– Determine optimum number of time slots – Investigate the effect this has on latency

SLIDE 13

Determine number of time slots

1 8 12 16 20 24 28 32 36

5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of time slots (= number of configurations bits per mux) Wires per switch box

Stratix with static wiring

SLIDE 14

Number of time slots vs latency

1 8 12 16 20 24 28 32 36 0.5 1 1.5 2 2.5 3 3.5 4

Number of time slots (=number of configuration bits per mux) Normalised latency of critical path

SLIDE 15

Parameter selection (3 of 3)

Assume infinite time slots to reduce wiring

– Determine minimum number of TDM wires

Vary number of time slots

– Determine optimum number of time slots – Investigate the effect this has on latency

Using optimum number of time slots

– Re-evaluate optimum number of TDM wires

SLIDE 16

6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of TDM wires Total number of wires needed

Limited resources

Stratix with static wiring All but two benchmarks map to 13 wires

A reduction of over 75%

SLIDE 17

Architectural drawbacks

Extra configuration SRAM
High-speed interconnect clock
Benchmarks run over three times slower
New CAD tools needed

– Re-routing in space as well as time – Optimise for TDM wiring at every stage

SLIDE 18

Conclusions

Using TDM wiring we can reduce the

number of wires whilst increasing the data rate within channels

– 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate

This will allow

– the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs

SLIDE 19

Future Work

Current scheduling algorithm gives

– Large wire reduction, large latency penalty

We are investigating a better compromise

– Small wiring reduction, small latency penalties? – Recent new results show this is possible

Area and power

– Is the wiring reduction enough to justify the extra area and power costs?

SLIDE 20

A Network of Time-Division Multiplexed Wiring for FPGAs

Rosemary Francis Simon Moore Robert Mullins

Motivation

with ASIC Networks-on-Chip

Goals

NoC implementation

– Hard NoC – Multipliers – Block RAM

– What are the costs? – Is it possible to design for global and local routing?

Hierarchy of interconnect

Architecture: Stratix vs TDM

Wire Sharing

shared without a problem

Wire Sharing

shared without a problem

require a more intelligent approach

Wire Sharing

shared without a problem

require a more intelligent approach

delayed to allow more efficient wire use without rerouting

Our Scheduler

– maps benchmarks from a Stratix FPGA to a TDM FPGA – resolved TDM conflicts after place and route

– IP cores taken from the Altera University Suite

– To reduce the amount of wiring as far as possible using TDM wiring with realistic characteristics

Parameter selection (1 of 3)

– Determine minimum number of TDM wires

Infinite number of time slots

Parameter selection (2 of 3)

– Determine minimum number of TDM wires

– Determine optimum number of time slots – Investigate the effect this has on latency

Determine number of time slots

Number of time slots vs latency

Parameter selection (3 of 3)

– Determine minimum number of TDM wires

– Determine optimum number of time slots – Investigate the effect this has on latency

– Re-evaluate optimum number of TDM wires

Limited resources

Architectural drawbacks

– Re-routing in space as well as time – Optimise for TDM wiring at every stage

Conclusions

number of wires whilst increasing the data rate within channels

– 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate

– the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs

Future Work

– Large wire reduction, large latency penalty

– Small wiring reduction, small latency penalties? – Recent new results show this is possible

– Is the wiring reduction enough to justify the extra area and power costs?

Thanks for listening...

Rosemary.Francis@cl.cam.ac.uk