The Bunker Cache for Spatio-Value Approximation Joshua San Miguel - - PowerPoint PPT Presentation
The Bunker Cache for Spatio-Value Approximation Joshua San Miguel - - PowerPoint PPT Presentation
The Bunker Cache for Spatio-Value Approximation Joshua San Miguel Jorge Albericio Natalie Enright Jerger Aamer Jaleel Data Movement and Storage off-chip memory shared last-level cache private caches processor core 2 Data Movement and
Data Movement and Storage
2
- ff-chip memory
processor core private caches shared last-level cache
Data Movement and Storage
3
- ff-chip memory
processor core private caches shared last-level cache
Accessing memory is 10x – 100x greater latency and energy than accessing private cache!
high cost of moving data
Data Movement and Storage
4
- ff-chip memory
processor core private caches shared last-level cache high cost of storing data
Last-level cache consumes substantial energy and takes up 30%-50% of chip area!
high cost of moving data
Data Movement and Storage
5
- ff-chip memory
processor core private caches shared last-level cache
- ptimize via data addresses
e.g., cache prefetching
Data Movement and Storage
6
- ff-chip memory
processor core private caches shared last-level cache
- ptimize via data values
e.g., cache compression
- ptimize via data addresses
e.g., cache prefetching
Data Movement and Storage
7
- ff-chip memory
processor core private caches shared last-level cache
- ptimize via data values
e.g., cache compression
- ptimize via data addresses
e.g., cache prefetching
complexity of tracking address correlations
Data Movement and Storage
8
- ff-chip memory
processor core private caches shared last-level cache
- ptimize via data values
e.g., cache compression
- ptimize via data addresses
e.g., cache prefetching
complexity of tracking address correlations complexity of manipulating data values
Data Movement and Storage
9
- ff-chip memory
processor core private caches shared last-level cache
- ptimize via data values
e.g., cache compression
- ptimize via data addresses
e.g., cache prefetching
Can we improve data movement and storage simultaneously without the added complexities?
(where data is located? what value is encoded in data?)
complexity of tracking address correlations complexity of manipulating data values
Our Work
10
We explore Spatio-Value Similarity:
- there is regularity to where approximately similar values are located in memory
We propose the Bunker Cache:
- many-to-one similarity mapping based on memory address
- savings in runtime (1.58x), dynamic energy (1.72x), leakage power (1.65x) at
acceptable quality levels
Spatio-Value Similarity
11
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
Spatio-Value Similarity
12
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
data values can often be approximate and continuous (i.e., smooth)
Spatio-Value Similarity
13
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
data values can often be approximate and continuous (i.e., smooth) data addresses represent a
- ne-dimensional memory space
Spatio-Value Similarity
14
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
Spatio-Value Similarity
15
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
16
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
17
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
18
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z approximately similar and contiguous
Spatio-Value Similarity
19
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
20
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
21
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
22
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z
Spatio-Value Similarity
23
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z similar but not contiguous
Spatio-Value Similarity
24
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
memory space: x y z STRIDE similar but not contiguous
Spatio-Value Similarity
25
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
e.g., image processing
Spatio-Value Similarity
26
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
e.g., image processing STRIDE = image row size similar but not contiguous
Spatio-Value Similarity
27
where data is located? what value is encoded in data?
The goal of a processor is to process real-world information, not bits.
- Spatio-Value Similarity: there is regularity to where approximately similar
values are located in memory.
e.g., image processing e.g., signal processing STRIDE = image row size STRIDE = signal period
Spatio-Value Similarity
28
Given any data block, how similar is it to the block that is distance X away from it?
Spatio-Value Similarity
29 debayer
similarity distance away
Given any data block, how similar is it to the block that is distance X away from it?
Spatio-Value Similarity
30 debayer
similarity distance away
contiguous data are similar in value
Given any data block, how similar is it to the block that is distance X away from it?
Spatio-Value Similarity
31 debayer dwt53
similarity
similar data are stored at regular intervals
distance away
contiguous data are similar in value
Given any data block, how similar is it to the block that is distance X away from it?
Spatio-Value Similarity
32 2dconv debayer dwt53 histeq jpeg kmeans lucas-kanade
similarity
similar data are stored at regular intervals
distance away
contiguous data are similar in value
Given any data block, how similar is it to the block that is distance X away from it?
change-detection
The Bunker Cache
33
- ff-chip memory
processor core private caches shared last-level cache
The Bunker Cache
34
- ff-chip memory
processor core private caches Bunker Cache
The Bunker Cache
35
- ff-chip memory
processor core private caches Bunker Cache
Treat address X and address X+STRIDE as if they are one and the same
Conventional Cache
36
tag tag data data
- ● ●
- ● ●
way 0 way N way 0 way N physical address
Conventional Cache
37
tag tag data data
- ● ●
- ● ●
way 0 way N way 0 way N index function set physical address
Conventional Cache
38
tag tag data data
- ● ●
- ● ●
= =
way 0 way N way 0 way N hit? index function hit? tag set physical address
Conventional Cache
39
tag tag data data
- ● ●
- ● ●
= =
data way 0 way N way 0 way N hit? index function hit? tag set physical address
Conventional Cache – Lookup
40
tag tag data data
- ● ●
- ● ●
way 0 way N way 0 way N address X physical address
Conventional Cache – Lookup
41
tag tag data data
- ● ●
- ● ●
X
=
way 0 way N way 0 way N HIT address X index function tag set physical address
Conventional Cache – Lookup
42
tag tag data data
- ● ●
- ● ●
X [X]
=
way 0 way N way 0 way N HIT address X index function data tag set physical address
Conventional Cache – Lookup
43
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N physical address
Conventional Cache – Lookup
44
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE physical address
Conventional Cache – Lookup
45
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
[X+STRIDE] approximately similar to [X]
physical address
Conventional Cache – Lookup
46
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
= MISS
tag index function set
[X+STRIDE] approximately similar to [X]
physical address
Conventional Cache – Lookup
47
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
= MISS
tag index function
Incurs data movement cost!
set
[X+STRIDE] approximately similar to [X]
physical address
Conventional Cache – Lookup
48
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
=
data MISS tag index function set
[X+STRIDE] approximately similar to [X]
physical address
Conventional Cache – Lookup
49
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
=
data MISS tag index function
Incurs data storage cost!
set
[X+STRIDE] approximately similar to [X]
physical address
The Bunker Cache
50
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N physical address
The Bunker Cache
51
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N similarity mapping Bunker address physical address
The Bunker Cache – Approximate Lookup
52
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping Bunker address physical address
The Bunker Cache – Approximate Lookup
53
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping bunker_addr(X+STRIDE) == bunker_addr(X) Bunker address physical address
The Bunker Cache – Approximate Lookup
54
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping
= HIT
index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) physical address
The Bunker Cache – Approximate Lookup
55
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping
= HIT
index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves movement cost physical address
The Bunker Cache – Approximate Lookup
56
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping
=
data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) physical address
The Bunker Cache – Approximate Lookup
57
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping
=
data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves storage cost physical address
The Bunker Cache – Approximate Lookup
58
tag tag data data
- ● ●
- ● ●
X [X] way 0 way N way 0 way N address X+STRIDE
=
data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves storage cost
Rest of cache hardware unchanged!
similarity mapping physical address
The Bunker Cache – Similarity Mapping
59
physical address space Bunker address space
The Bunker Cache – Similarity Mapping
60
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
physical address space Bunker address space
The Bunker Cache – Similarity Mapping
61
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
0 1 2 3 4 5 6 7 8 9 A B
physical address space
The Bunker Cache – Similarity Mapping
62
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
0 1 2 3 4 5 6 7 8 9 A B
STRIDE = 4 physical address space
The Bunker Cache – Similarity Mapping
63
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
0 1 2 3 5 6 7 9 A B 4 8
STRIDE = 4 RADIX = 3 physical address space
The Bunker Cache – Similarity Mapping
64
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
0 1 2 3 5 6 7 9 A B 4 8 1 2 3 4 5 6 7
STRIDE = 4 RADIX = 3 physical address space Bunker address space
The Bunker Cache – Similarity Mapping
65
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
0 1 2 3 4 5 6 7 8 9 A B 1 2 3 4 5 6 7
STRIDE = 4 RADIX = 3 physical address space Bunker address space
The Bunker Cache – Similarity Mapping
66
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
physical address space Bunker address space
// skip if precise
The Bunker Cache – Similarity Mapping
67
STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation
physical address space Bunker address space
// skip if precise bunker_addr = (phys_addr / STRIDE*RADIX) * STRIDE; bunker_addr += (phys_addr % STRIDE*RADIX) % STRIDE;
The Bunker Cache – Additional Details
68
Coherence and dirty state:
- requires a separate directory structure that bypasses similarity mapping
The Bunker Cache – Additional Details
69
Coherence and dirty state:
- requires a separate directory structure that bypasses similarity mapping
Drowsy blocks:
- many-to-one mapping offers more opportunity for low-leakage storage
The Bunker Cache – Additional Details
70
Coherence and dirty state:
- requires a separate directory structure that bypasses similarity mapping
Drowsy blocks:
- many-to-one mapping offers more opportunity for low-leakage storage
Dynamic quality control:
- can tune RADIX and STRIDE on-the-fly via periodic quality checks
More details in paper
Evaluation
71
- Applications: PERFECT and AxBench
- Performance: Full-system cycle-level simulation
- Energy and Power: CACTI
- Quality: Pin simulation, signal-to-noise-ratio (SNR)
- Configuration:
- 4-core CMP, 16KB private L1, 128KB private L2
- 2MB shared LLC, 2K-entry directory
- STRIDE selected based on application’s data set dimensions
- RADIX varied in results
Evaluation – Application Output Quality
72 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix
Evaluation – Application Output Quality
73 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix histeq
Evaluation – Application Output Quality
74 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix histeq precise radix 4
Evaluation – Application Output Quality
75 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix
Evaluation – Application Output Quality
76 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix change-detection
Evaluation – Application Output Quality
77 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix change-detection precise radix 4
Evaluation – Application Output Quality
78 better
5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix
Evaluation – Application Speedup
79 better
1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 speedup similarity radix
Evaluation – Dynamic Energy Savings
80 better
1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 dynamic energy savings similarity radix
Evaluation – Leakage (Drowsy) Power Savings
81 better
1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 leakage power savings similarity radix
Conclusion
82
- ff-chip memory
processor core private caches shared last-level cache high cost of storing data high cost of moving data
Conclusion
83
- ff-chip memory
processor core private caches Bunker Cache
Conclusion
84
- ff-chip memory
processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE
Conclusion
85
- ff-chip memory
processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE cache compression without values e.g., storage of X+STRIDE is saved without needing [X+STRIDE]
Conclusion
86
- ff-chip memory
processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE cache compression without values e.g., storage of X+STRIDE is saved without needing [X+STRIDE]
savings in runtime (1.58x), dynamic energy (1.72x), leakage power (1.65x) at radix 4
Thank you
The Bunker Cache for Spatio-Value Approximation
Joshua San Miguel Jorge Albericio Natalie Enright Jerger Aamer Jaleel