Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - - PowerPoint PPT Presentation
Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - - PowerPoint PPT Presentation
Doppelgnger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger Cache Hierarchy main memory shared last-level cache private caches processor core 2 Cache Hierarchy main memory
Cache Hierarchy
2
main memory processor core private caches shared last-level cache
Cache Hierarchy
3
processor core private caches main memory shared last-level cache
Cache Hierarchy
4
processor core private caches main memory shared last-level cache
Accessing memory is 10x – 100x greater latency and energy than accessing private cache!
Cache Hierarchy
5
processor core private caches main memory shared last-level cache
Accessing memory is 10x – 100x greater latency and energy than accessing private cache! Need hierarchy of large caches…
Cache Hierarchy
6
main memory processor core private caches shared last-level cache
Cache Hierarchy
7
main memory processor core private caches shared last-level cache
Cache Hierarchy
8
main memory processor core private caches shared last-level cache
But last-level cache consumes substantial energy and takes up 30%-50% of chip area!
Cache Hierarchy
9
main memory processor core private caches shared last-level cache
But last-level cache consumes substantial energy and takes up 30%-50% of chip area! Higher efficiency via Approximate Computing…
Summary
10
Doppelgänger Cache:
- Identifies approximate similarity in data block values.
- 77% cache storage savings of approximable data.
Summary
11
Doppelgänger Cache:
- Identifies approximate similarity in data block values.
- 77% cache storage savings of approximable data.
- Effectively compresses storage of approximately similar blocks.
- 3x better compression ratio than state-of-the-art techniques.
Summary
12
Doppelgänger Cache:
- Identifies approximate similarity in data block values.
- 77% cache storage savings of approximable data.
- Effectively compresses storage of approximately similar blocks.
- 3x better compression ratio than state-of-the-art techniques.
- Significantly reduces area and energy consumption.
- Reduces total on-chip cache area by 1.36x.
Outline
- Approximate Computing
- Approximate Similarity
- Doppelgänger Cache
- Cache Architecture
- Similarity Mapping
- Evaluation
13
Approximate Computing
Not all data/computations need to be precise.
14
http://www.zentut.com/ http://www.businessweek.com/ http://www.cc.gatech.edu/~cnieto6/ http://www.analyticbridge.com/ http://themusicparlour.blogspot.ca/ http://www.scientific-computing.com/
Data mining Computer vision Audio and video processing Gaming Machine learning Dynamical simulation
Approximate Similarity
15
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
16
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
17
1 2 3
92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
18
1 2 3
92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37
approximately similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
19
1 2 3
92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37
approximately similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
20
1 2 3
92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37
approximately similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
21
1 2 3
92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37
approximately similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Allows for 77% cache storage savings of approximable data!
Outline
- Approximate Computing
- Approximate Similarity
- Doppelgänger Cache
- Cache Architecture
- Similarity Mapping
- Evaluation
22
Doppelgänger Cache
23
main memory processor core private caches shared last-level cache
Doppelgänger Cache
24
main memory processor core private caches shared last-level cache
How can we exploit approximate similarity to save area and energy in the last-level cache?
Doppelgänger Cache
25
main memory processor core private caches shared last-level cache
Doppelgänger Cache
26
main memory processor core private caches shared LLC precise LLC Doppelgänger LLC
Conventional Cache
27 tag array data array address from L2 data from memory
Conventional Cache
28 tag array data array address from L2 data from memory
One-to-one mapping of data values to memory locations.
Conventional Cache
29 tag array data array address from L2 data from memory
One-to-one mapping of data values to memory locations. But the fundamental goal of a processor is to process data values, not memory locations…
Conventional Cache
30 tag array data array address from L2 data from memory
Conventional Cache
31 tag array data array address from L2 data from memory
Conventional Cache
32 tag array data array address from L2 data from memory
Conventional Cache
33 tag array data array address from L2 data from memory
Conventional Cache
34 tag array data array address from L2 data from memory
Multiple copies of approximately similar blocks.
Conventional Cache
35 tag array data array address from L2 data from memory
Doppelgänger Cache
36 tag array approximate data array address from L2 data from memory
Doppelgänger Cache
37 tag array approximate data array address from L2 data from memory
Smaller data array allows for substantial area and energy savings.
Doppelgänger Cache
38 tag array approximate data array address from L2 data from memory
Doppelgänger Cache
39
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address from L2 data from memory
Doppelgänger Cache - Lookups
40
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array
Doppelgänger Cache - Lookups
41
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 0 from L2
Doppelgänger Cache - Lookups
42
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 0 from L2 map X from tag array
Doppelgänger Cache - Lookups
43
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 0 from L2 data A to L2 map X from tag array
Doppelgänger Cache - Insertions
44
tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array
Doppelgänger Cache - Insertions
45
tag 0 map X tag 5 tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 5 from L2
Doppelgänger Cache - Insertions
46
tag 0 map X tag 5 tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 5 from L2 data B from memory data B to L2
Doppelgänger Cache - Insertions
47
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A
tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2
Doppelgänger Cache - Insertions
48
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y
tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2
Doppelgänger Cache - Insertions
49
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y
tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2
Miss!
Doppelgänger Cache - Insertions (Miss)
50
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y
tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2
Doppelgänger Cache - Insertions (Miss)
51
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2
Doppelgänger Cache - Insertions
52
tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array
Doppelgänger Cache - Insertions
53
tag 0 map X tag 6 tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2
Doppelgänger Cache - Insertions
54
tag 0 map X tag 6 tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory data C to L2
Doppelgänger Cache - Insertions
55
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Doppelgänger Cache - Insertions
56
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Doppelgänger Cache - Insertions
57
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Hit!
Doppelgänger Cache - Insertions (Hit)
58
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Doppelgänger Cache - Insertions (Hit)
59
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Doppelgänger Cache - Insertions (Hit)
60
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Data block A serves as an acceptable approximation
- f data block C.
Doppelgänger Cache - Insertions (Hit)
61
tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B
tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2
Doppelgänger Cache - Similarity Mapping
62
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
63
data block A A[0] A[1] A[n] hash function mapping hash map
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
64
data block A A[0] A[1] A[n] mapping hash map hash function
Aggregates values in block:
hash = AVG(A*0+, …, A*n+)
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
65
data block A A[0] A[1] A[n] hash function hash map
Discretizes hash value:
mapping
map (M-bit) All possible hash values
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
66
data block A A[0] A[1] A[n] hash function hash map
Discretizes hash value:
mapping
map (M-bit) All possible hash values approximately similar
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache
67
main memory processor core private caches shared LLC precise LLC Doppelgänger LLC
uniDoppelgänger Cache
68
main memory processor core private caches shared LLC uniDoppelgänger LLC
uniDoppelgänger Cache
69
main memory processor core private caches shared LLC uniDoppelgänger LLC
Precise blocks simply use physical address as the map value.
70
More details in paper:
- Cache writes, replacements and coherence.
- Details on hash functions and mapping.
- Sensitivity to size of map space and data array.
- Evaluation of uniDoppelgänger.
Doppelgänger Cache
Outline
- Approximate Computing
- Approximate Similarity
- Doppelgänger Cache
- Cache Architecture
- Similarity Mapping
- Evaluation
71
Evaluation
72
- Applications: PARSEC and AxBench
- Performance: Full-system cycle-level simulation
- Error: Pin simulation
- Area and Energy: CACTI
- Configuration:
- 4 cores, private L1 and L2
- 2MB shared LLC (1MB precise, 1MB Doppelgänger)
- Doppelgänger: 14-bit similarity map, 1/4 data array
73
Evaluation - Compression Ratio
0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better
74
Evaluation - Compression Ratio
0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better
75
Evaluation - Compression Ratio
0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better
76
Evaluation
0.8x 0.9x 1.0x 1.1x 1.2x 1.3x 1.4x application
- utput accuracy
application performance total cache dynamic energy reduction total cache leakage energy reduction total cache area reduction better
77
Evaluation
0.8x 0.9x 1.0x 1.1x 1.2x 1.3x 1.4x application
- utput accuracy
application performance total cache dynamic energy reduction total cache leakage energy reduction total cache area reduction better
Conclusion
78
Doppelgänger Cache:
- Identifies approximate similarity in data block values.
- 77% cache storage savings of approximable data.
- Effectively compresses storage of approximately similar blocks.
- 3x better compression ratio than state-of-the-art techniques.
- Significantly reduces area and energy consumption.
- Reduces total on-chip cache area by 1.36x.
Thank you
Doppelgänger: A Cache for Approximate Computing
Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger