SLIDE 4 Chip Multiprocessors (ACS MPhil) 13
Filtered Segmented Bus
“Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks”, Udipi et al, HPCA 2010
segments with Bloom filter
- Energy savings possible vs.
mesh and flattened butterfly networks (for 16, 32 and 64- cores) because routers can be removed
- For large numbers of cores
multiple (address- interleaved) buses are required to avoid a significant performance penalty due to contention
Chip Multiprocessors (ACS MPhil) 14
Bus-based interconnects for multicore?
- Exploiting multiple buses (or rings):
– Multiple address-interleaved buses
- e.g. Sun Wildfire/Starfire
– Use different buses for different message types – Subspace snooping [Huh/Burger06]
- Associate (dynamic) address ranges with each bus.
Each subspace are regions of data that are shared by a stable subset of the processors.
- This technique tackles snoop bandwidth limitations as
all processors are not required to snoop all buses
– Exploit buses at the lowest level of a hierarchical network (e.g. mesh interconnecting tiles, where each tile is a group of cores connected by a bus)
Chip Multiprocessors (ACS MPhil) 15
Sun Starfire (UE10000)
Uses 4 interleaved address buses to scale snooping protocol
16x16 Data Crossbar
Memory Module
Board Interconnect µP $ µP $ µP $ µP $
Memory Module
Board Interconnect µP $ µP $ µP $ µP $ 4 processors + memory module per system board
- Up to 64-way SMP using bus-based snooping protocol
Separate data transfer over high bandwidth crossbar
Slide from Krste Asanovic (Berkeley)
Chip Multiprocessors (ACS MPhil) 16
Ring Networks
- Exploit short point-to-point interconnects
- Can support many concurrent data transfers
- Can keep coherence protocol simple and avoid need
for directory-based schemes – We may still broadcast transactions
k-node ring (or k-ary 1-cube)