a network of time division multiplexed wiring for fpgas
play

A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary - PowerPoint PPT Presentation

A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary Francis Simon Moore Robert Mullins Motivation FPGAs are now home to complex Systems on-Chip ...but still optimised for single-core designs FPGA global wiring is


  1. A Network of Time-Division Multiplexed Wiring for FPGAs Rosemary Francis Simon Moore Robert Mullins

  2. Motivation • FPGAs are now home to complex Systems on-Chip • ...but still optimised for single-core designs • FPGA global wiring is simple in comparison with ASIC Networks-on-Chip • Networks for FPGAs use lots of logic • Hard blocks are limited by the soft IP blocks

  3. Goals • Use TDM components for effective soft NoC implementation • Funnel data to high-speed hard blocks – Hard NoC – Multipliers – Block RAM • Determine optimum TDM architecture – What are the costs? – Is it possible to design for global and local routing?

  4. Hierarchy of interconnect Coarse-grain packet-switched network Time-division multiplexed wires in a fine-grain network Clusters of logic elements with local interconnect

  5. Architecture: Stratix vs TDM SRAM TDM Global routing Stratix Global routing Switch box Switch box Local routing Local routing LUT LUT Cluster of logic elements Cluster of logic elements with latched inputs

  6. Wire Sharing • Many wires can be 1 shared without a problem 2 1 2 3 4 3 4 5

  7. Wire Sharing • Many wires can be shared without a problem 1 • Other configurations 1 require a more intelligent approach 2 2 Conflict!!

  8. Wire Sharing • Many wires can be shared without a problem 2 • Other configurations 1 require a more intelligent approach 2 3 • Signals can be 4 delayed to allow more efficient wire 3 use without 4 5 rerouting

  9. Our Scheduler • Our scheduler – maps benchmarks from a Stratix FPGA to a TDM FPGA – resolved TDM conflicts after place and route • Benchmarks – IP cores taken from the Altera University Suite • Aim – To reduce the amount of wiring as far as possible using TDM wiring with realistic characteristics

  10. Parameter selection (1 of 3) • Assume infinite time slots to reduce wiring – Determine minimum number of TDM wires

  11. Infinite number of time slots 70 Stratix with static wiring Total number of wires needed 65 60 55 50 45 40 35 30 25 20 15 10 5 0 0 6 7 8 10 12 14 16 18 Number of TDM wires

  12. Parameter selection (2 of 3) • Assume infinite time slots to reduce wiring – Determine minimum number of TDM wires • Vary number of time slots – Determine optimum number of time slots – Investigate the effect this has on latency

  13. Determine number of time slots Stratix with static wiring 70 65 60 Wires per switch box 55 50 45 40 35 30 25 20 15 10 5 0 1 8 12 16 20 24 28 32 36 Number of time slots (= number of configurations bits per mux)

  14. Number of time slots vs latency 4 Normalised latency of critical path 3.5 3 2.5 2 1.5 1 0.5 0 1 8 12 16 20 24 28 32 36 Number of time slots (=number of configuration bits per mux)

  15. Parameter selection (3 of 3) • Assume infinite time slots to reduce wiring – Determine minimum number of TDM wires • Vary number of time slots – Determine optimum number of time slots – Investigate the effect this has on latency • Using optimum number of time slots – Re-evaluate optimum number of TDM wires

  16. Limited resources 70 65 Stratix with static wiring 60 Total number of wires needed All but two benchmarks map to 13 wires 55 - A reduction of over 75% 50 45 40 35 30 25 20 15 10 5 0 0 6 7 8 10 12 14 16 18 Number of TDM wires

  17. Architectural drawbacks • Extra configuration SRAM • High-speed interconnect clock • Benchmarks run over three times slower • New CAD tools needed – Re-routing in space as well as time – Optimise for TDM wiring at every stage

  18. Conclusions • Using TDM wiring we can reduce the number of wires whilst increasing the data rate within channels – 75% less wiring * 24 time slots * 3 times slower means 2 times channel data rate • This will allow – the design of effective global interconnect – more efficient sharing of on-chip resources – simplification of multi-chip designs

  19. Future Work • Current scheduling algorithm gives – Large wire reduction, large latency penalty • We are investigating a better compromise – Small wiring reduction, small latency penalties? – Recent new results show this is possible • Area and power – Is the wiring reduction enough to justify the extra area and power costs?

  20. Thanks for listening... Rosemary.Francis@cl.cam.ac.uk

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend