branch prediction branch prediction vs vs execution time
play

Branch Prediction Branch Prediction vs vs Execution Time - PDF document

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu


  1. Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu virtutech virtutech tech jakob.engblom@it.uu.se jakob.engblom@it.uu.se jakob@virtutech.com jakob@virtutech.com

  2. The Question The Question Branch prediction � Branch prediction � � Performance enhancing technique Performance enhancing technique � Necessary with deep pipelines � Necessary with deep pipelines � � Works well, on average Works well, on average � Execution time prediction � Execution time prediction � � Determining the extremes Determining the extremes � Especially the worst case � Especially the worst case � � Does ”BP” make ”ETP” harder? Does ”BP” make ”ETP” harder? � RTAS 2003 Branch Prediction & WCET 2

  3. Execution Time Estimates Execution Time Estimates actual actual WCET BCET possible execution times safe BCET safe WCET estimates estimates tighter tighter 0 � WCET � WCET WCET = Worst case WCET = Worst case = Worst case (main interest here) = Worst case � � (main interest here) � BCET � BCET BCET = Best case BCET = Best case = Best case = Best case � � ACET = Average case = Average case � ACET � RTAS 2003 Branch Prediction & WCET 3

  4. Branch Branch Prediction Prediction RTAS 2003 Branch Prediction & WCET 4

  5. The Performance Problem The Performance Problem You need You need You need Conditional Conditional Conditional IF IF to redirect to redirect to redirect branch: branch: branch: instruction instruction instruction execution will execution will execution will fetch here fetch here fetch here cmp r7,5 cmp r7,5 continue at continue at continue at cmp r7,5 ID A A or or B B ID A or B bne B bne B bne B A: A: A: Result of Result of Result of add r4,r5 add r4,r5 EX branch is branch is branch is add r4,r5 EX not known not known not known ... ... ... until here until here until here B: B: B: MEM MEM bset r5,1 bset r5,1 bset r5,1 = wait to see = wait to see = wait to see where branch where branch where branch WB goes = stall WB goes = stall goes = stall RTAS 2003 Branch Prediction & WCET 5

  6. Static Techniques Static Techniques Keep fetching ahead � Keep fetching ahead NEC V850, NEC V850, NEC V850, � ARM7 ARM7 ARM7 � Always assume not taken Always assume not taken � � Or introduce ”branch delay slot” Or introduce ”branch delay slot” original original original � SPARC & SPARC & SPARC & � BTFN BTFN MIPS MIPS MIPS � Backwards- -taken taken � Backwards � ARM10, ARM10, ARM10, � Forwards Forwards- -not taken not taken � + base case + base case + base case in more in more Recognize branches in IF or ID in more � Recognize branches in IF or ID � advanced advanced advanced Make speculative decision early � Make speculative decision early predictors predictors predictors � � About 70% correct About 70% correct � RTAS 2003 Branch Prediction & WCET 6

  7. Dynamic Techniques Dynamic Techniques ”History will repeat itself” History will repeat itself” � ” � Predict: not taken � Use history of taken/not taken Use history of taken/not taken � Pentium 1, Pentium 1, Pentium 1, One- -level: level: � One Alpha 21064, Alpha 21064, Alpha 21064, � NT NT UltraSparc II UltraSparc II UltraSparc II NT NT � One counter per branch One counter per branch � 00 � Actually a state machine Actually a state machine 01 � T T NT NT � Usually with hysteresis Usually with hysteresis � T T � Implementation: Implementation: NT NT � 10 � Cache of counters Cache of counters T T � 11 � Indexed by branch address Indexed by branch address T : T t � c i d e r P n e k a t RTAS 2003 Branch Prediction & WCET 7

  8. Two- -Level Dynamic Level Dynamic Two ”History has a pattern” History has a pattern” � ” � 00 01 � Use Use pattern pattern of taken/not taken of taken/not taken 10 � 11 � ”taken every other time”, for example ”taken every other time”, for example � � History register tracks outcomes History register tracks outcomes � . � History per branch or global History per branch or global � . � Table of counters Table of counters . � � Combination of history and address Combination of history and address � � 2D table, XOR, ... lots of possibilities 2D table, XOR, ... lots of possibilities � UltraSparc III, UltraSparc III, UltraSparc III, history: 01001... history: 01001... Athlon, Pentium 3, Athlon, Pentium 3, Athlon, Pentium 3, 00 01 + Pentium 4, Pentium 4, Pentium 4, 10 address: 11100... address: 11100... 11 PowerPC G3, G4 PowerPC G3, G4 PowerPC G3, G4 RTAS 2003 Branch Prediction & WCET 8

  9. The The Experimental al Experiment setup setup RTAS 2003 Branch Prediction & WCET 9

  10. Experimental Setup Experimental Setup for{k=1; k<32; k++) for{k=1; k<32; k++) { outer: { outer: ri=rk starttimer(); ri=rk starttimer(); for(n=0; n < 10000000; n++) for(n=0; n < 10000000; n++) { time this part time this part { inner: inner: for(i=0; i < k; i++) for(i=0; i < k; i++) nop nop { dec ri { dec ri cmp ri,0 __nop(); cmp ri,0 __nop(); bnz inner } bnz inner } } } stoptimer(); dec rn stoptimer(); dec rn recordtime(); cmp rn,0 recordtime(); cmp rn,0 bnz outer } bnz outer } RTAS 2003 Branch Prediction & WCET 10

  11. Baseline Result Baseline Result Static prediction: total time � Static prediction: total time � V850E Time 20,00 18,00 16,00 Smooth Smooth Smooth 14,00 straight straight straight 12,00 line line line Monotone Monotone Monotone 10,00 increase increase increase 8,00 6,00 Perfectly Perfectly 4,00 Perfectly easy to 2,00 easy to easy to 0,00 predict predict predict 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 RTAS 2003 Branch Prediction & WCET 11

  12. Baseline Result Baseline Result Static prediction: / inner count � Static prediction: / inner count � V850E Time/Count 1,70 Smooth Smooth Smooth monotone monotone monotone 1,60 decrease decrease decrease 1,50 Cost of outer Cost of outer Cost of outer 1,40 loop is spread loop is spread loop is spread across more & across more & across more & 1,30 more iterations more iterations more iterations of inner loop of inner loop 1,20 of inner loop 1,10 1,00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 RTAS 2003 Branch Prediction & WCET 12

  13. The The Results Results RTAS 2003 Branch Prediction & WCET 13

  14. One- -Level Dynamic Level Dynamic One UltraSparc II Time 21 19 Monotone 17 Monotone Monotone Takes some Takes some Takes some increase, increase, 15 increase, time for time for time for but not but not but not 13 predictor to predictor to predictor to exactly exactly exactly tune in 11 tune in tune in smooth smooth smooth 9 7 5 Analyzing or Analyzing or Analyzing or 3 measuring max # measuring max # measuring max # 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 iterations is safe iterations is safe iterations is safe RTAS 2003 Branch Prediction & WCET 14

  15. Two- -Level Dynamic, Local Level Dynamic, Local Two Pentium III Time 17 Inversion: Inversion : Inversion: 15 doing more more doing more doing 13 iterations iterations iterations 11 takes less takes less takes less time time time 9 7 5 Increases the Increases the Increases the 3 search space for search space for search space for 1 the worst case the worst case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 the worst case considerably considerably considerably RTAS 2003 Branch Prediction & WCET 15

  16. Inversions Explained Inversions Explained nop nop Cost of the Cos t of the Cost of the dec ri dec ri cmp ri,0 mispredict is mispredict is mispredict is nop cmp ri,0 nop bnz inner dec ri greater than the greater than the bnz inner greater than the dec ri cmp ri,0 cost of cost of cycles >T cycles cmp ri,0 cost of iterations, n+1 iterations, n iterations, iterations, bnz inner nop cycles bnz inner T cycles nop executing an executing an executing an dec ri dec ri extra inner loop extra inner loop extra inner loop cmp ri,0 nop cmp ri,0 nop bnz inner dec ri bnz inner takes >T dec ri cmp ri,0 takes T cmp ri,0 takes takes bnz inner nop bnz inner nop dec ri n+1 n dec ri cmp ri,0 cmp ri,0 dec rn dec rn bnz inner bnz inner cmp rn,0 cmp rn,0 bnz outer bnz outer dec rn dec rn cmp rn,0 cmp rn,0 bnz outer bnz outer RTAS 2003 Branch Prediction & WCET 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend