Filtered Runahead Execution with a Runahead Buffer
Milad Hashemi Yale N. Patt December 8, 2015
with a Runahead Buffer Milad Hashemi Yale N. Patt December 8, 2015 - - PowerPoint PPT Presentation
Filtered Runahead Execution with a Runahead Buffer Milad Hashemi Yale N. Patt December 8, 2015 Runahead Execution Overview Runahead dynamically expands the instruction window when the pipeline is stalled [Mutlu et al., 2003] The core
Milad Hashemi Yale N. Patt December 8, 2015
10 20 30 40 50 60 70 80 90 100 calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264 bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf MI-Average % Total Core Cycles
3.0 1.8 2.2 2.4 2.7 2.01.8 1.62.31.6 1.4 1.7 1.42.1 0.91.4 1.41.3 1.5 0.9 1.2 0.71.2 0.9 0.80.9 0.40.7 0.3 0.9 10 20 30 40 50 60 70 80 90 100 calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264 bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf MI-Average % Total Core Cycles
ADD R9, R1 -> R6 ADD R4, R5 -> R9 LD [R3] -> R5
Cache Miss
LD [R6] -> R8
ADD R9, R1 -> R6 ADD R4, R5 -> R9 LD [R3] -> R5
Cache Miss
LD [R6] -> R8
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Total Operations Executed During Runahead Other Operation Dependence Chain
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Total Operations Executed During Runahead Other Operation Dependence Chain
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Total Cache Miss Dependence Chains Unique Chain Repeated Chain
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Total Cache Miss Dependence Chains Unique Chain Repeated Chain
10 20 30 40 50 60 70 80
calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Average
Dependence Chain Length
10 20 30 40 50 60 70 80
calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf Average
Dependence Chain Length
Arch Checkpoint RA-Cache Pseudo- Wakeup
Poison Bits RA-Buffer
LD [P3] -> P5 LD [P15] -> P2 ADD P4, P5 -> P9 ADD P9, P1 -> P6 MOV P6 -> P7 LD [P7] -> P8 0xD 0xE 0x7 0x8 0xA 0xA
LD [R0] -> R2 LD [R3] -> R5 ADD R4, R5 -> R7 ADD R7, R1 -> R6 MOV R6 -> R0 LD [R0] -> R2
ROB is not found in the ROB
System
Optimizations
5 10 15 20 25 30 35 40
calculix povray namd gamess perlbench tonto gromac gobmk dealII sjeng gcc hmmer h264ref bzip2 astar xalancbmk zeusmp cactus wrf GemsFDTD leslie
milc soplex sphinx bwaves libquantum lbm mcf GMean
% IPC Difference over No- Prefetching Baseline Runahead Runahead Buffer Runahead Buffer + Chain Cache Hybrid Policy
5 10 15 20 25 30 35 40 % IPC Difference over No-Prefetching Baseline Runahead Runahead Buffer Runahead Buffer + Chain Cache Hybrid Policy
2 4 6 8 10 12 14 16 18 Cache Misses per Runahead Interval Runahead Runahead Buffer
0.5 1 1.5 2 2.5 % Energy Difference over No-PF Baseline Runahead Runahead Enhancements Runahead Buffer Runahead Buffer + Chain Cache Hybrid
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Total Cycles
20 40 60 80 100 120 140 160 % IPC Difference over No- Prefetching Baseline Stream Runahead + Stream Runahead Buffer + Stream Runahead Buffer + Chain Cache + Stream Hybrid + Stream
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 Normalized Bandwidth Stream Runahead Runahead Buffer Runahead Buffer + Chain Cache Hybrid
0.5 1 1.5 2 2.5 % Energy Difference over No-PF Baseline Baseline + Stream Runahead + Stream Runahead Enhancements + Stream Runhead Buffer + Stream Runahead Buffer + Chain Cache + Stream Hybrid + Stream
Milad Hashemi Yale N. Patt December 8, 2015