A Network of Time Division Multiplexing for FPGAs Rosemary Francis - - PowerPoint PPT Presentation

▶

Mar 23, 2023 2.39k likes •2.67k views

A Network of Time Division Multiplexing for FPGAs Rosemary Francis Motivation FPGAs are now home to complex Systems on-Chip Designs require the use of Network-on- Chip FPGA global wiring is simple in comparison with ASIC

SLIDE 1

A Network of Time Division Multiplexing for FPGAs

Rosemary Francis

SLIDE 2

Motivation

FPGAs are now home to complex Systems
n-Chip
Designs require the use of Network-on-

Chip

FPGA global wiring is simple in comparison

with ASIC Networks-on-Chip

Networks for FPGAs use lots of wires or

lots of logic

Hard blocks are limited by the soft IP blocks

SLIDE 3

Goals

Improve wiring density through TDM
Use TDM components for effective soft

NoC implementation

Funnel data to high-speed hard blocks

– Hard NoC – Multipliers – Block RAM

SLIDE 4

Hierarchy of interconnect

Clusters of logic elements with local interconnect Time-division multiplexed wires in a fine-grain network Coarse-grain packet-switched network

SLIDE 5

Architecture: Stratix vs TDM

Switch box TDM Global routing Local routing SRAM

LUT

Cluster of logic elements with latched inputs

LUT

Cluster of logic elements Switch box Stratix Global routing Local routing

SLIDE 6

Wire Sharing

Many wires can be

shared without a problem

1 1 2 2 3 3 4 4 5

SLIDE 7

Wire Sharing

Many wires can be

shared without a problem

Other configurations

require a more intelligent approach

Conflict!! 2 2 1 1

SLIDE 8

Wire Sharing

Many wires can be

shared without a problem

Other configurations

require a more intelligent approach

Signals can be

delayed to allow more efficient wire use without rerouting

1 2 2 3 3 4 4 5

SLIDE 9

Parameter selection

Assume infinite time slots to reduce wiring

– Determine optimum number of TDM wires

SLIDE 10

Infinite resources

6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of TDM wires Total number of wires needed

SLIDE 11

Parameter selection

Assume infinite time slots to reduce wiring

– Determine optimum number of TDM wires

Vary number of time slots

– Determine optimum number of time slots – Investigate the effect this has on latency

SLIDE 12

Determine number of time slots

1 8 12 16 20 24 28 32 36

5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of time slots (= number of configurations bits per mux) Wires per switch box

SLIDE 13

Number of time slots vs latency

1 8 12 16 20 24 28 32 36 0.5 1 1.5 2 2.5 3 3.5 4

Number of time slots (=number of configuration bits per mux) Normalised latency of critical path

SLIDE 14

Parameter selection

Assume infinite time slots to reduce wiring

– Determine optimum number of TDM wires

Vary number of time slots

– Determine optimum number of time slots – Investigate the effect this has on latency

Using optimum number of time slots

– Re-evaluate optimum number of TDM wires

SLIDE 15

6 7 8 10 12 14 16 18 5 10 15 20 25 30 35 40 45 50 55 60 65 70

Number of TDM wires Total number of wires needed

Limited resources

SLIDE 16

Architectural drawbacks

Extra configuration SRAM
High-speed interconnect clock
Benchmarks run over three times slower
New CAD tools needed

– Re-routing in space as well as time – Optimise for TDM wiring at every stage

SLIDE 17

Conclusions

Using TDM wiring we can reduce the

number of wires whilst increasing the data rate within channels

– 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate

This will allow

– the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs

SLIDE 18

Future Work

Current scheduling algorithm gives
Large wire reduction
Large latency penalty
Is there a better compromise?
Halve the wiring, small latency penalties
How can we reduce latency in other ways?
Better scheduling algorithms
Circuit redesign

SLIDE 19

A Network of Time Division Multiplexing for FPGAs

Rosemary Francis

Motivation

Chip

with ASIC Networks-on-Chip

lots of logic

Goals

NoC implementation

– Hard NoC – Multipliers – Block RAM

Hierarchy of interconnect

Architecture: Stratix vs TDM

Wire Sharing

shared without a problem

Wire Sharing

shared without a problem

require a more intelligent approach

Wire Sharing

shared without a problem

require a more intelligent approach

delayed to allow more efficient wire use without rerouting

Parameter selection

– Determine optimum number of TDM wires

Infinite resources

Parameter selection

– Determine optimum number of TDM wires

– Determine optimum number of time slots – Investigate the effect this has on latency

Determine number of time slots

Number of time slots (= number of configurations bits per mux) Wires per switch box

Number of time slots vs latency

Number of time slots (=number of configuration bits per mux) Normalised latency of critical path

Parameter selection

– Determine optimum number of TDM wires

– Determine optimum number of time slots – Investigate the effect this has on latency

– Re-evaluate optimum number of TDM wires

Limited resources

Architectural drawbacks

– Re-routing in space as well as time – Optimise for TDM wiring at every stage

Conclusions

number of wires whilst increasing the data rate within channels

– 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate

– the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs

Future Work

Thanks for listening...

Rosemary.Francis@cl.cam.ac.uk