Static Branch Frequency and Program Profile Analysis James R. Larus - - PowerPoint PPT Presentation

static branch frequency and program profile analysis
SMART_READER_LITE
LIVE PREVIEW

Static Branch Frequency and Program Profile Analysis James R. Larus - - PowerPoint PPT Presentation

Static Branch Frequency and Program Profile Analysis James R. Larus Youfeng Wu larus@cs.wisc.edu wu@sequent.com University of Wisconsin Intel Labs Divino Csar Soares Lucas divcesar@gmail.com Laboratrio de Sistemas de Computao


slide-1
SLIDE 1

Static Branch Frequency and Program Profile Analysis

Divino César Soares Lucas divcesar@gmail.com Laboratório de Sistemas de Computação Instituto de Computação UNICAMP Youfeng Wu wu@sequent.com Intel Labs James R. Larus larus@cs.wisc.edu University of Wisconsin

slide-2
SLIDE 2

Schedule

  • 1. Introduction
  • 2. Related Work
  • 3. Key Idea
  • 4. Branch Prediction
  • 5. Branch Probabilities
  • 6. Combining Predictions
  • 7. Local Block and Edge Frequency
  • 8. From Local to Global Frequencies
  • 9. Results
  • 10. Conclusion
  • 11. References
slide-3
SLIDE 3

Introduction

  • What is a program profile?
  • Dynamic profile
  • Static profile
  • Why we need profile?
  • Instruction scheduling
  • Identifying program bottlenecks
  • Enhance memory locality
slide-4
SLIDE 4

Related Work

  • Dynamic profile
  • Work centered on reducing profiling overhead [3, 6]
  • Static profile
  • Simple estimation heuristics [4]
  • Estimation based on markov models [5]
slide-5
SLIDE 5

Key Idea [1]

  • Predict Branches
  • Use heuristics
  • Compute Probabilities
  • Use heuristic hit rates
  • Compute Frequency
  • Use probabilities
slide-6
SLIDE 6

Branch Prediction

  • A branch prediction predicts if a branch will be taken or not
  • taken. It’s a binary decision!
  • Some static heuristics [2]:
  • LBH - Loop Branch Heuristic
  • PH - Pointer Heuristic
  • OH - Opcode Heuristic
  • GH - Guard Heuristic
  • LEH - Loop Exit Heuristic
  • LHH - loop Header Heuristic
  • CH - Call Heuristic
  • SH - Store Heuristic
  • RH - Return Heuristic
slide-7
SLIDE 7

Branch Probabilities

  • A branch probability is a estimate whether the branch will

be taken or not. It’s a continuous value among [0, 1].

Heuristic H.R. Loop Branch Header 88% Pointer Heuristic 60% Opcode Heuristic 84% Guard Heuristic 62% Loop Exit Heuristic 80% Loop Header Heuristic 75% Call Heuristic 78% Store Heuristic 55% Return Heuristic 72%

  • We will use these Hit Rates as

branch probabilities.

slide-8
SLIDE 8

Combining Predictions

  • What happen if two or more heuristics are applicable?

if (k < 0) then k = y; else return ; end-if

  • OH predicts the then part!

(With 84% of hit rate).

  • RH predicts the else part!

(With 72% of hit rate).

  • In these situations we use Dempster-

Shafer algorithm…

slide-9
SLIDE 9

Combining Predictions

  • Each branch has a set of possible targets. In our case two,

taken or not taken: 𝐶 = *𝑢1, 𝑢2+

  • Each heuristic gives a evidence that an event can happen:

𝑖1 𝑢1 = 𝑏 𝑖1 𝑢2 = 1 − 𝑏 𝑖2 𝑢1 = 𝑐 𝑖2 𝑢2 = 1 − 𝑐

  • Dempster-Shafer algorithm combine these evidences:

𝑖1 ⊕ 𝑖2 𝑢1 = 𝑖1(𝑢1)𝑖2(𝑢1) 𝑖1 𝑢1 𝑖2 𝑢1 + 𝑖1(𝑢2)𝑖2(𝑢2) 𝑖1 ⊕ 𝑖2 𝑢2 = 𝑖1(𝑢2)𝑖2(𝑢2) 𝑖1 𝑢1 𝑖2 𝑢1 + 𝑖1(𝑢2)𝑖2(𝑢2)

slide-10
SLIDE 10

Combining Predictions

Example: 𝑖1 𝑢1 = 0.5 𝑖1 𝑢2 = 0.5 𝑖2 𝑢1 = 0.7 𝑖2 𝑢2 = 0.3 𝑖1 ⊕ 𝑖2 𝑢1 =

0.5𝑦0.7 0.5𝑦0.7+0.5𝑦0.3 = 0.7

𝑖3 𝑢1 = 0.6 𝑖3 𝑢2 = 0.4 𝑖1 ⊕ 𝑖2 𝑢2 =

0.5𝑦0.3 0.5𝑦0.7+0.5𝑦0.3 = 0.3

𝑖2 ⊕ 𝑖3 𝑢1 =

0.7𝑦0.6 0.7𝑦0.6+0.3𝑦0.4 = 0.778

𝑖2 ⊕ 𝑖3 𝑢2 =

0.3𝑦0.4 0.7𝑦0.6+0.3𝑦0.4 = 0.222

slide-11
SLIDE 11

Local Block and Edge Frequency

  • The Branch/Edge frequency is a estimate of how often a

block or edge is executed or taken.

  • We calculate local branch/block frequency by propagating

branch probabilities, that is: bfreq(bi) = 1

bi is entry

bfreq(bi) = 𝑔𝑠𝑓𝑟(𝑐𝑞 → 𝑐𝑗) 𝑐𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑐𝑗

  • therwise

freq(bi → bj) = bfreq(bi) prob(bi → bj)

  • But these formulas doesn’t work when we have a cycle!
slide-12
SLIDE 12

Local Block and Edge Frequency

𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑔𝑠𝑓𝑟(𝑐𝑗

𝑙 𝑗=1

→ 𝑐0) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + (𝑐𝑔𝑠𝑓𝑟(𝑐𝑗

𝑙 𝑗=1

)𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + (𝑐𝑔𝑠𝑓𝑟(𝑐0

𝑙 𝑗=1

)𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)) = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑐𝑔𝑠𝑓𝑟(𝑐0) 𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)

𝑙 𝑗=1

Let

𝑑𝑞 𝑐0 = 𝑠𝑗𝑞𝑠𝑝𝑐(𝑐𝑗 → 𝑐0)

𝑙 𝑗=1

𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) + 𝑐𝑔𝑠𝑓𝑟 𝑐0 𝑑𝑞(𝑐0) 𝑐𝑔𝑠𝑓𝑟 𝑐0 = 𝑗𝑜_𝑔𝑠𝑓𝑟(𝑐0) 1 − 𝑑𝑞(𝑐0)

slide-13
SLIDE 13

Local Block and Edge Frequency

Example:

𝑐𝑔𝑠𝑓𝑟 𝑐0 =

1 1−0.88−0.88𝑦0.12 −0.88𝑦0.12𝑦0.12 = 578.70

slide-14
SLIDE 14

From Local to Global Frequencies

  • The frequency a function f calls another function g can be

expressed by – considering one invocation of f:

𝑚𝑔𝑠𝑓𝑟 𝑔, 𝑕 = bfreq(bi) calls(bi, g)

  • The global frequency of f calling g is:

𝑕𝑔𝑠𝑓𝑟 𝑔, 𝑕 = cfreq(f) lfreq(f, g)

  • Where:

𝑑𝑔𝑠𝑓𝑟 𝑔 = 1, 𝑔 𝑗𝑡 𝑛𝑏𝑗𝑜 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜 𝑑𝑔𝑠𝑓𝑟 𝑔 = 𝑔𝑠𝑓𝑟(𝑞, 𝑔) 𝑞 ∊ 𝑞𝑠𝑓𝑒 𝑔 , 𝑝𝑢𝑖𝑓𝑠𝑥𝑗𝑡𝑓

  • Global block/edge frequency can be calculated multiplying

function execution frequency by local block/edge frequency.

slide-15
SLIDE 15

Results

  • Scores of SPEC92 local block frequency:
slide-16
SLIDE 16

Results

  • Scores of SPEC92 local edge frequency:
slide-17
SLIDE 17

Results

  • Scores of SPEC92 local edge frequency:
slide-18
SLIDE 18

Results

  • Results came from SPECint92 C benchmarks and some

Unix applications.

  • The system used was a Sequent S2000/750 with i486

processors and the Sequent DYNIX/ptx C compiler 2.1.

  • Use of Wall [5] weighted and unweighted match score.
slide-19
SLIDE 19

Results

  • Scores of SPEC92 global function call frequency:
slide-20
SLIDE 20

Results

  • Scores of SPEC92 global block frequency:
slide-21
SLIDE 21

Results

  • Scores of SPEC92 global edge frequency:
slide-22
SLIDE 22

Results

  • Scores for Unix commands:
slide-23
SLIDE 23

Conclusion

  • A new technique for static profile was presented.
  • The technique introduced a new way to combine multiple

evidences for a branch outcome.

  • Although the heuristics hit rate are from another

environment they resulted in considerable results.

slide-24
SLIDE 24

References

[1] Y. Wu and J. R. Larus. Static Branch Frequency and Program Profile Analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture. pages 1-11, 1994. [2] T. Ball and J. R. Larus. Branch prediction for free. In SIGPLAN Conference on Programming Language Design and Implementation. pages 300-313, 1993. [3] T. Ball and J. R. Larus. Optimally profilling and tracing programs. ACM Transactions on Programming Languages and Systems. 16(4):1319-1360, July 1994. [4] T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static estimators for program optimization. In Proceedings of the ACM SIGPLAN’94 conference on Programming Language Design and Implementation. pages 85-96. ACM Press, 1994.

slide-25
SLIDE 25

References

[5] D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles. Proceedings of ACM SIGPLAN’91 Conference on Programming Language Design and Implementation. pages 59-70, 1991. [6] V. Sarkar. Determining average program execution times and their variance. In SIGPLAN Conference on Programming Language Design and Implementation. pages 298.312, 1989.