Lazy Retirement: A Power Aware Register Management Mechanism
WCED – Workshop on Complexity Efficient Design May 2002 – Anchorage Alaska Guillermo (Eli) Savransky Ronny Ronen Antonio Gonzalez MRL - Intel Corp.
Lazy Retirement: A Power Aware Register Management Mechanism - - PowerPoint PPT Presentation
Lazy Retirement: A Power Aware Register Management Mechanism Guillermo (Eli) Savransky WCED Workshop on Complexity Efficient Design Ronny Ronen May 2002 Anchorage Antonio Gonzalez Alaska MRL - Intel Corp. Agenda Standard
WCED – Workshop on Complexity Efficient Design May 2002 – Anchorage Alaska Guillermo (Eli) Savransky Ronny Ronen Antonio Gonzalez MRL - Intel Corp.
Savransky, Ronen, Gonzalez Page 2
Savransky, Ronen, Gonzalez Page 3
Reorder buffer (ROB) and physical
Values produced by the retiring
ROB entries deallocated on
Motivation: Reduce the number of copy operations without breaking the cyclic ROB structure. Motivation: Reduce the number of copy operations without breaking the cyclic ROB structure.
EAX
1 2
EAX
3
EBX
4 5 … 64
EAX EBX … EDI Data Data
Head Tail
ROB RRF
Retirement Retirement Allocation Allocation
Savransky, Ronen, Gonzalez Page 4
If it is, copy it to the RRF. If it isn’t, ignore.
Standard Retirement: Register Deallocation Copy to RRF Standard Retirement: Register Deallocation Copy to RRF Lazy Retirement: Register Reallocation Copy to RRF Lazy Retirement: Register Reallocation Copy to RRF
Savransky, Ronen, Gonzalez Page 5
EAX EBX EBX EAX 37 38 39 40 Tail Head EAX EBX EAX 37 38 39 40 ECX 3 retire 4 allocated Tail Head
Copy to RRF is needed Copy to RRF is needed
load eax [esp] add ebx eax and ebx 0xf mov eax ebx mov ecx 0x1 load eax [esp] add ebx eax and ebx 0xf mov eax ebx mov ecx 0x1
EBX EAX 37 38 39 40 2 retire Tail Head ECX EBX EAX 37 38 39 40 60 allocated Tail Head ECX
Savransky, Ronen, Gonzalez Page 6
EAX
1 2
EAX
3
EBX
4 5 … 64
EAX EBX … EDI
Data Data
EAX
No
EBX
No
… EDI
Yes
Is in RRF?
2 3
Index
Register Map Table Head Tail
ROB RRF
EAX EBX
Yes
… EDI
Yes
Is in RRF?
Index
Lazy Map Table No
Savransky, Ronen, Gonzalez Page 7
It will be set at retirement. It will be reset when:
Another operation with the same architectural retires or The register is copied to the RRF.
ROB entry or RRF. It will be actualized at retirement and if the allocator forces
Savransky, Ronen, Gonzalez Page 8
Cache misses. Long latency
Uniformely distributed register allocation
0% 20% 40% 60% 80% 100% 120% 1 9 1 7 2 5 3 3 4 1 4 9 5 7 6 5 7 3 8 1 8 9 9 7 1 5 1 1 3 1 2 1 Unallocated window size Probability of avoiding the copy
8 16 32 64 128
ROB usage for SPECInt
0% 2% 4% 6% 8% 10% 12%
4
8
1 1 2
5 1 6
9 2
3 2 4
7 2 8
1 3 2
5 3 6
9 4
3 4 4
7 4 8
1 5 2
5 5 6
9 6
4
Entries used Percent used
0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00%
Cumulative Average Cumulative
Savransky, Ronen, Gonzalez Page 9
IA32 architecture. P6-like microarchitecture. Separated ROB and RRF. 64 ROB entries.
SpecInt2000 Winstone99 SYSmark98 Other multimedia traces.
Savransky, Ronen, Gonzalez Page 10
23.9% 12.2% 2.5% 0.3% 2.2% 8.7% 0% 5% 10% 15% 20% 25% 30% 1 2 3 Standard retirement Lazy retirement
Clocks with zero copies not shown Clocks with zero copies not shown Improves clock gating when no port required: P6:61%, Lazy: 88%
Savransky, Ronen, Gonzalez Page 11
0% 10% 20% 30% 40% 50% 60% 70%
KatCh_Dec MM99_VP07 SModem SPECint2000_bzip204 SPECint2000_crafty07 SPECint2000_gap06 SPECint2000_gcc01 SPECint2000_gcc02 SPECint2000_gzip06 SPECint2000_gzip15 SPECint2000_gzip20 SPECint2000_link12 SPECint2000_mcf01 SPECint2000_twolf10 SPECint2000_vpr 14 Smark98NT_Corel 01 Smark98NT_Excel05 Smark98NT_Natur 01 Smark98NT_OmniPage 01 Smark98NT_Paradox 01 Smark98NT_PowerP10 Smark98NT_Word03 Winst99_Cor97_7 Winst99_Lot_17 Winst99_Lot_6 Winst99_Off 97_3 Average
Standard Retirement Lazy Retirement
Copies out of Retired operations
Savransky, Ronen, Gonzalez Page 12
Power consumed by the different tables as a function of the original consumption
0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 45.0% 50.0% KatCh_Dec MM99_VP07 SModem SPECint2000_bzip20 SPECint2000_crafty0 SPECint2000_gap06 SPECint2000_gcc01 SPECint2000_gcc02 SPECint2000_gzip06 SPECint2000_gzip15 SPECint2000_gzip20 SPECint2000_link12 SPECint2000_mcf01 SPECint2000_twolf1 SPECint2000_vpr14 Smark98NT_Corel01 Smark98NT_Excel05 Smark98NT_Natur01 Smark98NT_OmniPa Smark98NT_Paradox Smark98NT_PowerP Smark98NT_Word03 Winst99_Cor97_7 Winst99_Lot_17 Winst99_Lot_6 Winst99_Off97_3 Average
Trace File Percent of the
Lazy Table RRF lazy ROB lazy
Lazy table use 13% of the original retirement power.
Savransky, Ronen, Gonzalez Page 13
High power reduction. Have performance penalty (trade off is architecture
Savransky, Ronen, Gonzalez Page 14
Eliminates about 75% of the copies. Can be implemented without performance penalty.
Too dumb lots of work High power. Too smart lots of control logic High power. In general: