SLIDE 1
Static Branch Frequency and Program Profile Analysis
Divino César Soares Lucas divcesar@gmail.com Laboratório de Sistemas de Computação Instituto de Computação UNICAMP Youfeng Wu wu@sequent.com Intel Labs James R. Larus larus@cs.wisc.edu University of Wisconsin
SLIDE 2 Schedule
- 1. Introduction
- 2. Related Work
- 3. Key Idea
- 4. Branch Prediction
- 5. Branch Probabilities
- 6. Combining Predictions
- 7. Local Block and Edge Frequency
- 8. From Local to Global Frequencies
- 9. Results
- 10. Conclusion
- 11. References
SLIDE 3 Introduction
- What is a program profile?
- Dynamic profile
- Static profile
- Why we need profile?
- Instruction scheduling
- Identifying program bottlenecks
- Enhance memory locality
SLIDE 4 Related Work
- Dynamic profile
- Work centered on reducing profiling overhead [3, 6]
- Static profile
- Simple estimation heuristics [4]
- Estimation based on markov models [5]
SLIDE 5 Key Idea [1]
- Predict Branches
- Use heuristics
- Compute Probabilities
- Use heuristic hit rates
- Compute Frequency
- Use probabilities
SLIDE 6 Branch Prediction
- A branch prediction predicts if a branch will be taken or not
- taken. It’s a binary decision!
- Some static heuristics [2]:
- LBH - Loop Branch Heuristic
- PH - Pointer Heuristic
- OH - Opcode Heuristic
- GH - Guard Heuristic
- LEH - Loop Exit Heuristic
- LHH - loop Header Heuristic
- CH - Call Heuristic
- SH - Store Heuristic
- RH - Return Heuristic
SLIDE 7 Branch Probabilities
- A branch probability is a estimate whether the branch will
be taken or not. It’s a continuous value among [0, 1].
Heuristic H.R. Loop Branch Header 88% Pointer Heuristic 60% Opcode Heuristic 84% Guard Heuristic 62% Loop Exit Heuristic 80% Loop Header Heuristic 75% Call Heuristic 78% Store Heuristic 55% Return Heuristic 72%
- We will use these Hit Rates as
branch probabilities.
SLIDE 8 Combining Predictions
- What happen if two or more heuristics are applicable?
if (k < 0) then k = y; else return ; end-if
- OH predicts the then part!
(With 84% of hit rate).
- RH predicts the else part!
(With 72% of hit rate).
- In these situations we use Dempster-
Shafer algorithm…
SLIDE 9 Combining Predictions
- Each branch has a set of possible targets. In our case two,
taken or not taken: 𝐶 = *𝑢1, 𝑢2+
- Each heuristic gives a evidence that an event can happen:
𝑖1 𝑢1 = 𝑏 𝑖1 𝑢2 = 1 − 𝑏 𝑖2 𝑢1 = 𝑐 𝑖2 𝑢2 = 1 − 𝑐
- Dempster-Shafer algorithm combine these evidences:
𝑖1 ⊕ 𝑖2 𝑢1 = 𝑖1(𝑢1)𝑖2(𝑢1) 𝑖1 𝑢1 𝑖2 𝑢1 + 𝑖1(𝑢2)𝑖2(𝑢2) 𝑖1 ⊕ 𝑖2 𝑢2 = 𝑖1(𝑢2)𝑖2(𝑢2) 𝑖1 𝑢1 𝑖2 𝑢1 + 𝑖1(𝑢2)𝑖2(𝑢2)
SLIDE 10
Combining Predictions
Example: 𝑖1 𝑢1 = 0.5 𝑖1 𝑢2 = 0.5 𝑖2 𝑢1 = 0.7 𝑖2 𝑢2 = 0.3 𝑖1 ⊕ 𝑖2 𝑢1 =
0.5𝑦0.7 0.5𝑦0.7+0.5𝑦0.3 = 0.7
𝑖3 𝑢1 = 0.6 𝑖3 𝑢2 = 0.4 𝑖1 ⊕ 𝑖2 𝑢2 =
0.5𝑦0.3 0.5𝑦0.7+0.5𝑦0.3 = 0.3
𝑖2 ⊕ 𝑖3 𝑢1 =
0.7𝑦0.6 0.7𝑦0.6+0.3𝑦0.4 = 0.778
𝑖2 ⊕ 𝑖3 𝑢2 =
0.3𝑦0.4 0.7𝑦0.6+0.3𝑦0.4 = 0.222
SLIDE 11 Local Block and Edge Frequency
- The Branch/Edge frequency is a estimate of how often a
block or edge is executed or taken.
- We calculate local branch/block frequency by propagating
branch probabilities, that is: bfreq(bi) = 1
bi is entry
bfreq(bi) = 𝑔𝑠𝑓𝑟(𝑐𝑞 → 𝑐𝑗) 𝑐𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑐𝑗
freq(bi → bj) = bfreq(bi) prob(bi → bj)
- But these formulas doesn’t work when we have a cycle!
SLIDE 12 Local Block and Edge Frequency
𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑔𝑠𝑓𝑟(𝑐𝑗
𝑙 𝑗=1
→ 𝑐0) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + (𝑐𝑔𝑠𝑓𝑟(𝑐𝑗
𝑙 𝑗=1
)𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + (𝑐𝑔𝑠𝑓𝑟(𝑐0
𝑙 𝑗=1
)𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑐𝑔𝑠𝑓𝑟(𝑐0) 𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)
𝑙 𝑗=1
Let
𝑑𝑞 𝑐0 = 𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)
𝑙 𝑗=1
𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑐𝑔𝑠𝑓𝑟 𝑐0 𝑑𝑞(𝑐0) 𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) 1 − 𝑑𝑞(𝑐0)
SLIDE 13 Local Block and Edge Frequency
Example:
𝑐𝑔𝑠𝑓𝑟 𝑐0 =
1 1−0.88−0.88𝑦0.12 −0.88𝑦0.12𝑦0.12 = 578.70
SLIDE 14 From Local to Global Frequencies
- The frequency a function f calls another function g can be
expressed by – considering one invocation of f:
𝑚𝑔𝑠𝑓𝑟 𝑔, = bfreq(bi) calls(bi, g)
- The global frequency of f calling g is:
𝑔𝑠𝑓𝑟 𝑔, = cfreq(f) lfreq(f, g)
𝑑𝑔𝑠𝑓𝑟 𝑔 = 1, 𝑔 𝑗𝑡 𝑛𝑏𝑗𝑜 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜 𝑑𝑔𝑠𝑓𝑟 𝑔 = 𝑔𝑠𝑓𝑟(𝑞, 𝑔) 𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑔 , 𝑝𝑢𝑖𝑓𝑠𝑥𝑗𝑡𝑓
- Global block/edge frequency can be calculated multiplying
function execution frequency by local block/edge frequency.
SLIDE 15 Results
- Scores of SPEC92 local block frequency:
SLIDE 16 Results
- Scores of SPEC92 local edge frequency:
SLIDE 17 Results
- Scores of SPEC92 local edge frequency:
SLIDE 18 Results
- Results came from SPECint92 C benchmarks and some
Unix applications.
- The system used was a Sequent S2000/750 with i486
processors and the Sequent DYNIX/ptx C compiler 2.1.
- Use of Wall [5] weighted and unweighted match score.
SLIDE 19 Results
- Scores of SPEC92 global function call frequency:
SLIDE 20 Results
- Scores of SPEC92 global block frequency:
SLIDE 21 Results
- Scores of SPEC92 global edge frequency:
SLIDE 22 Results
- Scores for Unix commands:
SLIDE 23 Conclusion
- A new technique for static profile was presented.
- The technique introduced a new way to combine multiple
evidences for a branch outcome.
- Although the heuristics hit rate are from another
environment they resulted in considerable results.
SLIDE 24
References
[1] Y. Wu and J. R. Larus. Static Branch Frequency and Program Profile Analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture. pages 1-11, 1994. [2] T. Ball and J. R. Larus. Branch prediction for free. In SIGPLAN Conference on Programming Language Design and Implementation. pages 300-313, 1993. [3] T. Ball and J. R. Larus. Optimally profilling and tracing programs. ACM Transactions on Programming Languages and Systems. 16(4):1319-1360, July 1994. [4] T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static estimators for program optimization. In Proceedings of the ACM SIGPLAN’94 conference on Programming Language Design and Implementation. pages 85-96. ACM Press, 1994.
SLIDE 25
References
[5] D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles. Proceedings of ACM SIGPLAN’91 Conference on Programming Language Design and Implementation. pages 59-70, 1991. [6] V. Sarkar. Determining average program execution times and their variance. In SIGPLAN Conference on Programming Language Design and Implementation. pages 298.312, 1989.