proximity aware directory based coherence for multi core
play

Proximity-Aware Directory-based Coherence for Multi-core Processor - PowerPoint PPT Presentation

Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures Jeff Brown Rakesh Kumar Dean Tullsen UC San Diego University of Illinois at Urbana-Champaign SPAA19 June 9, 2007 Introduction The chip multiprocessor


  1. Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures Jeff Brown Rakesh Kumar Dean Tullsen UC San Diego ● University of Illinois at Urbana-Champaign SPAA19 ● June 9, 2007

  2. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs

  3. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs P P M M

  4. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs P P P M M M P M

  5. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server

  6. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions P

  7. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir

  8. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir M

  9. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir M

  10. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access ● Updates, conflicts P Dir P M

  11. Background: Historical MP Cache Coherence ● Distributed directory, memory P P P P M M M M

  12. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss P P P P M M M M

  13. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M

  14. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M

  15. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M Data Request

  16. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M Data Request Reply

  17. Motivation: Multi-core Cache Coherence P P P P M M M M

  18. Motivation: Multi-core Cache Coherence Cache Miss P P P P M M M M

  19. Motivation: Multi-core Cache Coherence Cache Miss P P P P M M M M

  20. Motivation: Multi-core Cache Coherence Cache Miss P P P P "Home M M Node" M M

  21. Motivation: Multi-core Cache Coherence Cache Miss P P Data Request P P "Home M M Node" M M

  22. Motivation: Multi-core Cache Coherence Cache Miss P P Data Request P P "Home M M Node" M M

  23. Motivation: Multi-core Cache Coherence Cache Miss P P P P Reply "Home M M Node" M M

  24. Motivation: Multi-core Cache Coherence Additional Sharer P P P P M M M M

  25. Motivation: Multi-core Cache Coherence Additional Sharer P P P P M M M M ● Multi-core designs present radically different relative latency & bandwidth

  26. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  27. Directory-based Cache Coherence ● Directory structures

  28. Directory-based Cache Coherence ● Directory structures Main Memory

  29. Directory-based Cache Coherence ● Directory structures Main Memory

  30. Directory-based Cache Coherence ● Directory structures – Directory Memory Directory Main Memory Memory

  31. Directory-based Cache Coherence ● Directory structures – Directory Memory – Directory Entries Directory Main Memory Memory

  32. Directory-based Cache Coherence ● Directory structures – Directory Memory – Directory Entries – Directory Controller Controller Directory Main Memory Memory

  33. A Traditional Multiprocessor Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  34. A Traditional Multiprocessor (Chassis, board, etc.) Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  35. A Traditional Multiprocessor (Chassis, board, etc.) Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  36. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  37. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  38. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  39. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  40. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  41. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  42. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  43. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy

  44. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible

  45. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies

  46. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies P P P P M M M M

  47. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Cache Miss P P Data Request P P "Home M M Node" M M

  48. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Additional Sharer P P P P "Home M M Node" M M

  49. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Additional Sharer Forward Request P P P P "Home M M Node" M M

  50. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Reply P P P P M M M M

  51. Proximity-Aware Coherence ● To service read misses for shared data, traditional protocols use main memory ● Other nodes may hold copies ● On the CMP landscape, inter-node latency is much less than memory latency

  52. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask

  53. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask Miss Home

  54. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss Home

  55. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 Home

  56. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 – via1 Home

  57. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 – via1 Home ● Retries didn't prove beneficial

  58. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  59. Methodology ● Detailed, execution-driven processor and network simulation ● "RSIM" simulator, adapted to our CMP model ● Parallel workloads from several suites ● Hardware, benchmark details in paper

  60. Proximity-Aware: Potential Coverage Fraction of read misses to shared lines 1 0.9 0.8 0.7 0.6 6 5 0.5 4 0.4 3 2 0.3 1 0.2 0.1 0 appbt fft lu mp3d ocean quicksort unstruct

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend