SLIDE 8 Chip Multiprocessors (ACS MPhil)‐ 29
Organising directory information
Centralized Distributed Hierarchical Flat Memory-based Cache-based Directory Schemes How to find source of directory information How to locate copies
Figure 8.7 (reproduced from Culler Parallel book)‐ information is distributed amongst sharers, e.g. sharers form a linked list (IEEE SCI, Sequent NUMA-Q)‐ Typically operations to: add to head, remove a node (by contacting neighbours)‐ and invalidate all nodes (from head only)‐ – we won't discuss Information about all sharers is stored at the directory using a full bit-vector organization, limited-pointer scheme etc. Requests traverse up a tree to find a node with information on the block
Chip Multiprocessors (ACS MPhil)‐ 30
Organising directory information
- How do we store the list of sharers in a flat, memory-
based directory scheme?
– Full bit-vector
- P presence bits, which indicate for each of the p processors –
whether the processor has a copy of the block
– Limited-pointer schemes
- Maintain a fixed (and limited)‐ number of pointers
- Typically the number of sharers is small (4 pointers may often
suffice)‐
- Need a backup or overflow strategy
– Overflow to memory or resort to broadcast – Or a coarse vector scheme (e.g. SGI Origin)‐ (where each bit represents groups of processors)‐
– Extract from duplicated L1 tags (reverse-mapped)
- Query local copy of tags to find sharers
[Culler p.568]
Chip Multiprocessors (ACS MPhil)‐ 31
Organising directory information
- Four examples of how we might store our directory
information in a CMP: 1)‐ Append state to L2 tags 2)‐ Duplicate L1 tags at the directory 3)‐ Store directory state in main memory and include a directory cache at each node 4)‐ A hierarchical directory
I assume the L2 is the first shared cache. In a real system this could as easily be the L3 or interface to main memory. The directory is placed at the first shared memory regardless of the number of levels of cache.
Chip Multiprocessors (ACS MPhil)‐ 32
Organising directory information
– Perhaps conceptually the simplest scheme – Assume a shared banked inclusive L2 cache
directory depends only on the block address
– Directory state can simply be appended to the L2 cache tags
Reproduced from “Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors”, Zhang/Asanovic, ISCA'05
L2 tags