static branch frequency and program profile analysis
play

Static Branch Frequency and Program Profile Analysis James R. Larus - PowerPoint PPT Presentation

Static Branch Frequency and Program Profile Analysis James R. Larus Youfeng Wu larus@cs.wisc.edu wu@sequent.com University of Wisconsin Intel Labs Divino Csar Soares Lucas divcesar@gmail.com Laboratrio de Sistemas de Computao


  1. Static Branch Frequency and Program Profile Analysis James R. Larus Youfeng Wu larus@cs.wisc.edu wu@sequent.com University of Wisconsin Intel Labs Divino César Soares Lucas divcesar@gmail.com Laboratório de Sistemas de Computação Instituto de Computação UNICAMP

  2. Schedule 1. Introduction 2. Related Work 3. Key Idea 4. Branch Prediction 5. Branch Probabilities 6. Combining Predictions 7. Local Block and Edge Frequency 8. From Local to Global Frequencies 9. Results 10. Conclusion 11. References

  3. Introduction • What is a program profile? • Dynamic profile • Static profile • Why we need profile? • Instruction scheduling • Identifying program bottlenecks • Enhance memory locality

  4. Related Work • Dynamic profile • Work centered on reducing profiling overhead [3, 6] • Static profile • Simple estimation heuristics [4] • Estimation based on markov models [5]

  5. Key Idea [1] • Predict Branches • Use heuristics • Compute Probabilities • Use heuristic hit rates • Compute Frequency • Use probabilities

  6. Branch Prediction • A branch prediction predicts if a branch will be taken or not taken. It’s a binary decision! • Some static heuristics [2]: • LBH - Loop Branch Heuristic • PH - Pointer Heuristic • OH - Opcode Heuristic • GH - Guard Heuristic • LEH - Loop Exit Heuristic • LHH - loop Header Heuristic • CH - Call Heuristic • SH - Store Heuristic • RH - Return Heuristic

  7. Branch Probabilities • A branch probability is a estimate whether the branch will be taken or not. It’s a continuous value among [0, 1]. Heuristic H.R. • We will use these H it R ates as Loop Branch Header 88% branch probabilities. Pointer Heuristic 60% Opcode Heuristic 84% Guard Heuristic 62% Loop Exit Heuristic 80% Loop Header Heuristic 75% Call Heuristic 78% Store Heuristic 55% Return Heuristic 72%

  8. Combining Predictions • What happen if two or more heuristics are applicable? • OH predicts the then part ! if (k < 0) then (With 84% of hit rate). k = y; else • RH predicts the else part ! return ; end-if (With 72% of hit rate). • In these situations we use Dempster- Shafer algorithm…

  9. Combining Predictions • Each branch has a set of possible targets. In our case two, taken or not taken: 𝐶 = *𝑢 1 , 𝑢 2 + • Each heuristic gives a evidence that an event can happen: 𝑖 1 𝑢 1 = 𝑏 𝑖 2 𝑢 1 = 𝑐 𝑖 1 𝑢 2 = 1 − 𝑏 𝑖 2 𝑢 2 = 1 − 𝑐 • Dempster-Shafer algorithm combine these evidences: 𝑖 1 (𝑢 1 )𝑖 2 (𝑢 1 ) 𝑖 1 ⊕ 𝑖 2 𝑢 1 = 𝑖 1 𝑢 1 𝑖 2 𝑢 1 + 𝑖 1 (𝑢 2 )𝑖 2 (𝑢 2 ) 𝑖 1 (𝑢 2 )𝑖 2 (𝑢 2 ) 𝑖 1 ⊕ 𝑖 2 𝑢 2 = 𝑖 1 𝑢 1 𝑖 2 𝑢 1 + 𝑖 1 (𝑢 2 )𝑖 2 (𝑢 2 )

  10. Combining Predictions Example: 𝑖 1 𝑢 1 = 0.5 𝑖 2 𝑢 1 = 0.7 𝑖 3 𝑢 1 = 0.6 𝑖 1 𝑢 2 = 0.5 𝑖 2 𝑢 2 = 0.3 𝑖 3 𝑢 2 = 0.4 0.5𝑦0.7 𝑖 1 ⊕ 𝑖 2 𝑢 1 = 0.5𝑦0.7+0.5𝑦0.3 = 0.7 0.5𝑦0.3 𝑖 1 ⊕ 𝑖 2 𝑢 2 = 0.5𝑦0.7+0.5𝑦0.3 = 0.3 0.7𝑦0.6 𝑖 2 ⊕ 𝑖 3 𝑢 1 = 0.7𝑦0.6+0.3𝑦0.4 = 0.778 0.3𝑦0.4 𝑖 2 ⊕ 𝑖 3 𝑢 2 = 0.7𝑦0.6+0.3𝑦0.4 = 0.222

  11. Local Block and Edge Frequency • The Branch/Edge frequency is a estimate of how often a block or edge is executed or taken. • We calculate local branch/block frequency by propagating branch probabilities, that is: bfreq(b i ) = 1 b i is entry bfreq(b i ) = 𝑔𝑠𝑓𝑟(𝑐𝑞 → 𝑐𝑗) 𝑐𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑐 𝑗 otherwise freq(b i → b j ) = bfreq(b i ) prob(b i → b j ) • But these formulas doesn’t work when we have a cycle!

  12. Local Block and Edge Frequency 𝑙 𝑐𝑔𝑠𝑓𝑟 𝑐 0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) + 𝑔𝑠𝑓𝑟(𝑐 𝑗 → 𝑐 0 ) 𝑗=1 𝑙 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) + (𝑐𝑔𝑠𝑓𝑟(𝑐 𝑗 )𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐 0 )) 𝑗=1 𝑙 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) + (𝑐𝑔𝑠𝑓𝑟(𝑐 0 )𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐 0 )) 𝑗=1 𝑙 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) + 𝑐𝑔𝑠𝑓𝑟(𝑐 0 ) 𝑠 𝑗 𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐 0 ) 𝑗=1 Let 𝑙 𝑑𝑞 𝑐 0 = 𝑠 𝑗 𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐 0 ) 𝑗=1 𝑐𝑔𝑠𝑓𝑟 𝑐 0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) + 𝑐𝑔𝑠𝑓𝑟 𝑐 0 𝑑𝑞(𝑐 0 ) 𝑐𝑔𝑠𝑓𝑟 𝑐 0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐 0 ) 1 − 𝑑𝑞(𝑐 0 )

  13. Local Block and Edge Frequency Example: 1 𝑐𝑔𝑠𝑓𝑟 𝑐 0 = 1−0.88−0.88𝑦0.12 −0.88𝑦0.12𝑦0.12 = 578.70

  14. From Local to Global Frequencies • The frequency a function f calls another function g can be expressed by – considering one invocation of f: 𝑚𝑔𝑠𝑓𝑟 𝑔, 𝑕 = bfreq(b i ) calls(b i , g) • The global frequency of f calling g is: 𝑕𝑔𝑠𝑓𝑟 𝑔, 𝑕 = cfreq(f) lfreq(f, g) • Where: 𝑑𝑔𝑠𝑓𝑟 𝑔 = 1, 𝑔 𝑗𝑡 𝑛𝑏𝑗𝑜 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜 𝑑𝑔𝑠𝑓𝑟 𝑔 = 𝑔𝑠𝑓𝑟(𝑞, 𝑔) 𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑔 , 𝑝𝑢𝑖𝑓𝑠𝑥𝑗𝑡𝑓 • Global block/edge frequency can be calculated multiplying function execution frequency by local block/edge frequency.

  15. Results • Scores of SPEC92 local block frequency:

  16. Results • Scores of SPEC92 local edge frequency:

  17. Results • Scores of SPEC92 local edge frequency:

  18. Results • Results came from SPECint92 C benchmarks and some Unix applications. • The system used was a Sequent S2000/750 with i486 processors and the Sequent DYNIX/ptx C compiler 2.1. • Use of Wall [5] weighted and unweighted match score.

  19. Results • Scores of SPEC92 global function call frequency:

  20. Results • Scores of SPEC92 global block frequency:

  21. Results • Scores of SPEC92 global edge frequency:

  22. Results • Scores for Unix commands:

  23. Conclusion • A new technique for static profile was presented. • The technique introduced a new way to combine multiple evidences for a branch outcome. • Although the heuristics hit rate are from another environment they resulted in considerable results.

  24. References [1] Y. Wu and J. R. Larus. Static Branch Frequency and Program Profile Analysis . In Proceedings of the 27th Annual International Symposium on Microarchitecture. pages 1-11, 1994. [2] T. Ball and J. R. Larus. Branch prediction for free . In SIGPLAN Conference on Programming Language Design and Implementation. pages 300-313, 1993. [3] T. Ball and J. R. Larus. Optimally profilling and tracing programs . ACM Transactions on Programming Languages and Systems. 16(4):1319-1360, July 1994. [4] T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static estimators for program optimization . In Proceedings of the ACM SIGPLAN’ 94 conference on Programming Language Design and Implementation. pages 85-96. ACM Press, 1994.

  25. References [5] D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles . Proceedings of ACM SIGPLAN’ 91 Conference on Programming Language Design and Implementation. pages 59-70, 1991. [6] V. Sarkar. Determining average program execution times and their variance . In SIGPLAN Conference on Programming Language Design and Implementation. pages 298.312, 1989.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend