Rank Idle Time Prediction Driven Last-Level Cache Writeback
Zhe Wang, Samira M. Khan, Daniel A. Jiménez Computer Science Department University of Texas at San Antonio
Rank Idle Time Prediction Driven Last-Level Cache Writeback Zhe - - PowerPoint PPT Presentation
Rank Idle Time Prediction Driven Last-Level Cache Writeback Zhe Wang, Samira M. Khan, Daniel A. Jimnez Computer Science Department University of Texas at San Antonio Memory Latency is Performance Bottleneck Memory wall -
Zhe Wang, Samira M. Khan, Daniel A. Jiménez Computer Science Department University of Texas at San Antonio
Memory Latency is Performance Bottleneck
CPUs Caches
DRAM Microprocessor
Read Write
Fast Slow
1
Read Buffer Write Buffer
Data
Servicing Write Cycles
Write-Induced Interference Cycles
Bus Turnaround
Wait
Command line Data line
Write Read 2
108 processor cycles
Service of write requests delay the service of following read requests, thus causing performance degradation
Quantifying Write-Induced Interference
Without write-induced interference, system performance improves 23% on average
3
1 1.1 1.2 1.3 1.4 1.5
400.perlbench 401.bzip2 403.gcc 410.bwaves 429.mcf 433.milc 434.zeusmp 435.gromacs 436.cactusADM 450.soplex 456.hmmer 459.GemsFDTD 462.libquantum 464.h264ref 470.lbm 473.astar 482.sphinx3 483.xalancbmk Gmean
Speedup
Write Buffer DRAM Last-Level Cache Small Size
4
Evicted Writes
the delay caused to the following read requests
time in memory ranks
bank-level parallelism writes during the long rank idle period
5
Contributions of This Paper
Writeback Technique.
6
interference with read requests
Read Requests Write Requests
Read Access Pattern Perfect Writeback Traditional Writeback
7
Related Work: LLC Writeback Technique
LLC DRAM
Write Buffer
Scheduled Writes
Virtual Write Queue
8
MRU LRU
9
0% 20% 40% 60% 80% 100% mix1 mix2 mix3 mix4 mix5 mix6 mix7 mix8 Gmean
Rank Idle Percent
38%
Ranks are Idle 38% of the time on average
10
Rank Idle Time Prediction Driven LLC Writeback
Insight: Allow writes to be serviced during long rank idle periods
idle
service during the predicted long rank idle period
LLC Cache Cleaner Rank Idle Time Predictor
PC of LLC miss Rank is Idle Long rank idle time
Write Buffer
DRAM Bank-Level Parallelism
MRU LRU
dirty bit
11
period, then there is a high probability that the next time this instruction is reached it will also lead to a long rank idle period
12
2-Bit Counter 2-Bit Counter
PC of memory read accesses Prediction result Prediction result
First Level Predictor Second Level Predictor
Rank Idle Cycle Counter
2-Bit Counter 2-Bit Counter
Cache Cleaner
PC of Last LLC miss PC of Last LLC miss
T1
RRank is idle
Long rank idle time(300 CPU cycles) Long rank idle time Long rank idle time
First Level Predictor Second Level Predictor
T3
m
T2
m Rank Idle Time Predictor
Rank Idle Cycle Counter
13
14
1 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 1.2
401.bzip2 410.bwaves 429.mcf 433.milc 434.zeusmp 435.gromacs 436.cactusADM 450.soplex 456.hmmer 459.GemsFDTD 462.libquantum 464.h264ref 470.lbm 473.astar 482.sphinx3 483.xalancbmk GMean
Speedup
Eager-WB VWQ RITPD-WB
1.30
It improves performance of eight benchmarks by at least 10% and delivers an average speedup of 10.5% with two-rank configuration and 10.1% with four-rank configuration.
15
Baseline : 32-entry/c per-channel WB
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20%
mix1 mix2 mix3 mix4 mix5 mix6 AMean
False Positive Rates
first predictor second predictor
False positive rates for the first-level and second-level predictors are 8.5% and 14.7% on average
16
17 0.6 0.7 0.8 0.9 1 1.1
mix1 mix2 mix3 mix4 mix5 mix6 GMean Normalized Read Latency Eager-WB VWQ RITPD-WB
The technique reduces the read latency on average by 12.7% with two-rank configuration and 14.8% with four-rank configuration
Overhead
Two-level rank idle time predictor
4KB=2bits * 8096entries*2
Cache Cleaner
2K bytes
Total
18KB for 2-rank / 34 KB for 4- rank
Percentage of LLC Capacity
~0.3%
18
degradation.
idle time.
intelligently writes back bank-level parallelism writes during the long rank idle period
much as possible
19