complex address patterns
play

Complex Address Patterns Manjunath Shevgoor , Sahil Koladiya, Rajeev - PowerPoint PPT Presentation

Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor , Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan Chishti, Seth Pugsley *Intel Labs Variable Length Delta Prefetcher 1 Prefetchers


  1. Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor , Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan Chishti, Seth Pugsley *Intel Labs Variable Length Delta Prefetcher 1

  2. Prefetchers Confirmation Based Prefetchers Immediate Prefetchers • Issue predictions after a few deltas • Aggressive • High Accuracy • Low Accuracy • Short Streams Lose out • Waste DRAM bandwidth and cache capacity Accurate Fast Variable Length Delta Prefetcher 2

  3. Spatial Correlation • Learn Access (Delta) Patterns • Apply patterns when similar conditions re-occur. • Eg: PC, physical address, delta patterns Delta Patterns • Regular Delta Patterns. Eg : ( +1, +1, +1)…, (+2, +2, +2, +2)… • Irregular Delta Patterns. Eg : ( +1, +2, +3 )… Variable Length Delta Prefetcher 3

  4. Long Repeatable Streams of Irregular Deltas Delta patterns for milc Page Num: 479218 Deltas: 1, 9, -8, 1, 8, 1, - 8, 1, 1, 7…….. Variable Length Delta Prefetcher 4

  5. Long Repeatable Streams of Irregular Deltas Deltas : 1, 9, -8, 1, 8, 1, -8, 1, 1, 7, -1, - 5,….. Cache Line: A+1, A+10 , A+2, A+3, A+11 , A+12 , A+4, A+5, A+6, A+13 , A+12, A+7 …… Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7 Stride Prefetcher Coverage: 5/11 Stream2 : A+10, A+11, A+12, A+13 Confirmation Prefetches SandBox Prefetcher Coverage: 9/11 Neither are perfectly timely! Variable Length Delta Prefetcher 5

  6. Variable Length Delta Prefetcher Variable Length Delta Prefetcher 6

  7. Delta Prediction Per Page Tables $ Access Predicted Delta History Core 1 Delta/Offset Offset Prediction Tables Tables Last Level $$ Delta Prediction Per Page Tables $ Access Predicted Core 8 Delta History Delta/Offset Offset Prediction Tables Tables Structure of VLDP Variable Length Delta Prefetcher 7

  8. Delta History Table  Tracks delta within a page for (i=0;i<BIGNUM; i++) { Delta = Last Address- Current Address a[i]=b[i]+c[i]; }  a, b, c can each belong to different pages  So Deltas between pages is meaningless Variable Length Delta Prefetcher 8

  9. Delta History Table Last 4 Num. Times Last Four Prefetched Page Last Last Deltas Used Offsets Num. Add. Predictor Variable Length Delta Prefetcher 9

  10. Delta Prediction Tables Highest Priority (t=3) Lowest Priority (t=1) Deltas (3) Pred. Accuracy Delta(1) Pred. Accuracy 8b 8b 8b 8b 2b 8 b 8 b 2 b 64 Rows per Table … Match? Match? MUX Predicted Delta Variable Length Delta Prefetcher 10

  11. Offset Prediction Table First Page Pred. Accuracy Offset Offset 7 b 7 b 2 b OPT is used only to predict the second access to a page Variable Length Delta Prefetcher 11

  12. Need for Multiple Tables Repeating Delta Pattern- (1, 2, 3, 5, 2, 4)… Table 2 Table 1 Delta Pred. Delta Pred. 1,2 3 1 2 50% 2,3 5 2 3 Accuracy 3,5 2 3 5 5,2 4 5 2 Search for Delta pattern match starts from right most table Variable Length Delta Prefetcher 12

  13. Looking farther than one Delta ahead Repeating Delta Pattern- (1, 2, 3), (1, 2, 3)……. Current Delta Delta Pred. Delta Pred. 1,2 3 1 2 2,3 1 2 3 3,1 2 3 1 Degree 1 Prediction -,- - - - Variable Length Delta Prefetcher 13

  14. Looking farther than one Delta ahead Repeating Delta Pattern- 1, 2, 3, 1, 2, 3……. Current Delta Deg 1 Prediction Delta Pred. Delta Pred. 1,2 3 1 2 Degree 2 Prediction 2,3 1 2 3 3,1 2 3 1 Degree 1 Prediction -,- - - - Use Recursive lookup to look farther than one Delta Variable Length Delta Prefetcher 14

  15. Case Study: Streaming Workloads Repeating Delta Pattern- 1, 1, 1, 1, 1… Table 2 Table 1 Delta Pred. Delta Pred. -,- - 1 1 -,- - - - -,- - - - -,- - - - Patterns learned from one page is applied to another Variable Length Delta Prefetcher 15

  16. Updating the Delta History Tables Evict Not Recently Used Page Last Last 4 Last Num. Last 4 If Page not Num. Add. Deltas Predictor Used Prefetches present, replace LLC Access If Page present, add Last Last 4 Last Num. Last 4 Page Delta Add. Deltas Predictor Used Prefetches Num. Variable Length Delta Prefetcher 16

  17. Updating the Prediction Tables Last Last 3 Last Page Num. Add. Deltas Predictor Can the current state predict Latest Delta? B, C, D E If Prediction is Correct Latest Delta Increment Accuracy Delta Pred. Delta Pred. Delta Pred. If Prediction of Wrong D F? C,D E? B,C,D E? Decrement Accuracy If Accuracy==0 - - - - - - Table 1 Table 2 Table 3 Update + Promote Prediction - - - - - - If Prediction is Missing - - - - - - Seed T1 with prediction Variable Length Delta Prefetcher 17

  18. Populating the Prediction Tables Delta Pred. Delta Pred. Delta Pred. 1 A 1,1,1 C 1,1 B - - - - -,- - Table 1 Table 2 Table 3 - - - - -,- - Table 1 Table 2 Pattern Wrong Wrong Missing - - - - -,- - NRU NRU NRU If mis-predict, a longer Delta history might be needed Variable Length Delta Prefetcher 18

  19. Evaluation Methodology • Simics + USIMM • 8 RISC cores, UltraSPARC III ISA • 3.2 GHz, 4-wide OoO, 128-entry RoB • 32 KB I&D L1 caches, 4 cycles • 8 MB shared (1MB per core) L2 cache, 10 cycles • DRAM Specifications • 2Channels, 2 Ranks per Channel, 8 Banks per Rank • 800MHz DDR3 DRAM • SPEC 2006, NPB, and Cloudsuite • Mix1- milc, astar, lbm, libq ; Mix2- xalancbmk, lbm, zeusmp, milc ; Variable Length Delta Prefetcher 19

  20. VLDP Configuration • Per-Core VLDP • 1 Offset Prediction Table, 64 entry • 3 Delta Prediction Tables, 64 entries each • 16 entry Delta History Table • Only Delta Prediction Tables 2,3 contribute to multi degree prefetch Offset Prediction Table 128 B Delta History Table 222 B Delta Prediction Table 648 B Total 998 B/Core Variable Length Delta Prefetcher 20

  21. Performance Improvement (Vs No PC) FDP SBP AMPM VLDP 2.0 1.8 Speedup 1.6 1.4 1.2 1.0 0.8 VLDP is 6% better than AMPM 9% better than SBP 17% better than FDP Variable Length Delta Prefetcher 21

  22. Performance Improvement (Vs PC) SMS GHB_PC_DC VLDP 2.0 1.8 Speedup 1.6 1.4 1.2 1.0 0.8 VLDP is 7.1% better than GHB 7.6% better than SMS Variable Length Delta Prefetcher 22

  23. Coverage 120% FDP SMS SBP GHB_PC_DC AMPM VLDP 100% Coverage 80% 60% 40% 20% 0% NPB CloudSuite Spec2006 Spec2006-Mix GM FDP 16% GHB 33% SMS 55% AMPM 49% SBP 40% VLDP 61% Variable Length Delta Prefetcher 23

  24. Sensitivity to table size 1.03 1.02 Speedup 1.01 1.00 0.99 0.98 2% increase in performance when DPT size is increased Variable Length Delta Prefetcher 24

  25. Sensitivity number of Delta Prediction Tables 1.5 Speedup DRAM Accesses 1.4 1.3 1.2 1.1 1 1DPT_NoOPT 1DPT+OPT 2DPT+OPT 3DPT+OPT 4DPT+OPT 3DPT improves efficiency despite a modest 1% 1% performance improvement by reducing DRAM requests by 3% 3% Variable Length Delta Prefetcher 25

  26. Conclusions • OPT Issues predictions without confirmation • DPT recognizes Irregular Delta Patterns • Long delta patterns provide high accuracy • Less than 1KB per core overhead • 6% better performance Variable Length Delta Prefetcher 26

  27. Thank You Variable Length Delta Prefetcher 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend