Density Tradeoffs of Non-Volatile Memory as a Replacement for SRAM based Last Level Cache
Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young and Hong Wang
based Last Level Cache Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, - - PowerPoint PPT Presentation
Density Tradeoffs of Non-Volatile Memory as a Replacement for SRAM based Last Level Cache Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young and Hong Wang
Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young and Hong Wang
efficient LLC Potential ~3x capacity gain over state-of-art SRAM with logic compatible
process, non-volatility
latency in practice
can help make NVM as viable replacement of SRAM for LLC
2
3
RAM, Spin Hall Effect (SHE) MRAM, etc..
benefits over SRAM LLC
4
5
1.00 1.15 1.23 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25
to 4MB SRAM LLC
SRAM 4MB SRAM 8MB SRAM 16MB
6
1.00 1.15 1.13 1.05 0.85 0.60 0.70 0.80 0.90 1.00 1.10 1.20 Performance normalized to 4MB SRAM LLC SRAM 4MB STTRAM 8MB WR +0ns STTRAM 8MB WR +5ns STTRAM 8MB WR +10ns STTRAM 8MB WR +20ns
bypassing
size, trade-off latency with retention/higher WER, new devices, etc
7
Reduce Write Interference Eliminate Redundant Writes
1 2
8
9
1
50 100 150 200 250 300 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 373 385 397 409 421 433 445 457 469 481 493
Number of requests arrived
Intervals of 10k cycles
num_writes num_reads
gcc.200
10
Don’t bypass
NO
Request queue is full && pending writes > write_th
If any read ready Send read Send write
NO
Don’t bypass
NO
min_score <= byp_score_th Bypass write with min_score Get average write occupancy calculated in intervals (int_write_occ) Refer Lookup Table to find bypass score threshold (byp_score_th) for int_write_occ Find pending write with lowest live score (min_score)
Interval write occupancy (int_write_occ) Bypass score threshold (byp_score_th) 1/4th of request queue 20% Half of request queue 50% 3/4th of request queue 70% Equal to request queue 100%
Lookup Table (Tuned) write_th 75% of request queue
1
11
2
0% 20% 40% 60% 80% 100%
Percentage of frequent clean and dirty fills in LLC
frequent clean fills frequent dirty fills
12
2
13
14
15
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Performance normalized to STTRAM 8MB baseline WCAB WCAB+VHC
Our proposals provide 26% performance gain over the baseline
16
1.10 1.07 0.87 0.71 1.12 1.18 1.12 1.03
0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 5 ns, 7 MB 10 ns, 12 MB 20 ns, 16 MB 30 ns, 20 MB
Performance normalized to SRAM 4MB
STT - baseline STT - Proposed Architecture
Our proposals provide up to 18% performance gain over the SRAM of same area
17
1.10 1.09 1.09 1.05 1.03 1.04 1.07 1.13 1.11 1.18 1.30 1.26 0.9 1.0 1.1 1.2 1.3 1.4 Homogeneous Heterogeneous Geomean Performance normalized to 8MB STTRAM baseline Hybrid LLC - 2MB SRAM, 4MB STTRAM Hybrid LLC - 1MB SRAM, 6MB STTRAM STTRAM LLC - LAP STTRAM LLC - Proposed Architecture
Our proposals perform significantly better than the prior art
18
capacity benefits
make NVM as viable replacement of SRAM for LLC
THANK YOU!!