DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS
Alice M. Tokarnia, Marina Tachibana
- A. M. Tokarnia, M. Tachibana
Schoool of Electrical and Computer Engineering, UNICAMP
Euromicro DSD 2010
SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 - - PowerPoint PPT Presentation
A. M. Tokarnia, M. Tachibana Schoool of Electrical and Computer Engineering, UNICAMP DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina Tachibana Introduction Split Array Caches
Schoool of Electrical and Computer Engineering, UNICAMP
Euromicro DSD 2010
Introduction
Schoool of Electrical and Computer Engineering, UNICAMP
On-chip caches are one of the ideal targets of design optimization
Role in performance and power consumption. Caches can be customized. Core-based processors, ASIPs, configurable processors
Arrays
Vectors, arrays, data structures. Elements stored at sequential addresses.
Array caches Split array caches
Defined by partition organization and array-partition mapping. Arrays with distinct locality properties can be mapped to partitions with
different organizations
Parallel accesses to the partitions may further improve performance.
Euromicro DSD 2010
Introduction
Schoool of Electrical and Computer Engineering, UNICAMP
Tuning a cache to an application to improve system
Givargis et al. [ICCAD 99] Vahid et al. [DATE 04] Ghosh and Givargis [DATE 03] Gordon-Ross et al. [Ultra Low-Power Electronics and Design 04] Mapping application variables to embedded cache ‘parts’
Panda et al [DATE 98] Sanchez et al.[IEEE TCCA Newsletters 97] and Gonzalez et al. [IEEE
Micro 00]
Lee et al. [ETRI Journal 03] Naz et al. [ACM SIGARCH 06]
Euromicro DSD 2010
Introduction
Schoool of Electrical and Computer Engineering, UNICAMP
Trace-based split array caches
Organization
Two-partition array cache The line size of one partition is 2x that of the other The ways of the partitions have the same size
Array-partition mapping and partition degree of set-
Determined as to minimize the average memory access EDP
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP Inputs
Application program, typical inputs, cache size constraint
Outputs
TSAC-EDPs and best unified caches
Main concern
Navigate through a large design space
Array-partition mapping Degree of set-associativity Strategy
Unified caches whose ways are split between two partitions
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
1.
2.
3.
4.
5.
6.
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Instrument and execute code to generate a trace of
For an access to an array, the trace has an entry with
Unified caches C0 (b, n, m) satisfying the size
line size b, degree of set-associativity n, number of
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Concentration of line Li of array A lX(A, Li, b)
Fraction of accesses that falls to same half of Li as the previous access
1. L1
3. L1 4. L1
Exemple 1 lX(A, L1, 8) = 4/4 = 1
2. L2
4. L2
Exemplo 2 lX(A, L1, 8) = 1/4 = 0.25 Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
X(A, b): concentration of A median concentration of the lines accessed. N(A, b): number of distinct lines of A accessed. Nt(b): number of distinct lines accessed for all
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Array-partition mapping
Arrays in which accesses exhibit lower concentration
Other arrays → C2(b/2, n1, 2m)
At most, (#arrays -1) array-partition mappings. 2#arrays array-partition mappings for exhaustive search
Euromicro DSD 2010
Design Method
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Degree of set-associativity of the partitions
Number of lines is approximantely proportional to
5 . ) ( / ) , (
2
2 C to mapped A t b
N b A N n n
2 1
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
C1 = {A, B}
b/2
b b C2 = {X, Y, Z} m 2m
… …
C1= {A, B, Y, Z} b C2 = {X} m 2m b/2 b/2 1 1
…
n - 1 1 n - 1 b b
…
1 n - 1 1 1 b/2 2m m C1 = {B} C2 = {X, Y, Z, A} b b b
…
b m 1 2 n - 1 n
(C2|C1)1 (C2|C1)2 (C2|C1)n
C0 = {X, Y, Z, A, B}
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Obtain metrics used for cache evaluation
miss rates
Other metrics can be obtained from cache models
Hit access time and energy consumption Miss (time) penalty and miss (energy) penalty Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Fraction F of accesses consists of two parallel
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
i i i i
2 2 1 1 2 1
Euromicro DSD 2010
Design Method
Schoool of Electrical and Computer Engineering, UNICAMP
Unified array caches with minimum EDP Split caches with minimum EDP: TSAC-EDPs
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Applications:
Convolution (3 arrays) Fast Fourier Transform (4 arrays) Group 3 Fax decoder G3fax (4 arrays) JPEG encoder (DCT, Quantization) (10 arrays) MPEG-2 video decoder (49 arrays)
Cache sizes: 8K-byte, 12K-byte Cache access time and energy
CACTI, 90 nm
Memory access time
Samsung DDR266
Memory energy consumption
50x cache energy consumption
Euromicro DSD 2010
Experimental Results
Schoool of Electrical and Computer Engineering, UNICAMP Application Unified-EDP C0 TSAC-EDP C1|C2 (b, n, m) (b1, n1, m1)|(b2, n2, m2)
Conv-8Kb (32, 2, 128) (64,1,64) | (32,1,128) FFT-8Kb (32, 2, 128) (16,3,128) | (8,1,256) G3fax-8Kb (16, 2, 256) (16,1,256) | (8,1,512) JPEG-8Kb (16, 2, 256) (32,1,128) | (16,1,256) MPEG-8Kb (16, 2, 256) (16,1,256) | (8,1,512) Conv-12Kb (32, 3, 128) (64,1,64) | (32,2,128) FFT-12Kb (16, 6, 128) (16,2,256) | (8,1,512) G3fax-12Kb (16, 3, 256) (16,1,128) | (8,5,256) JPEG-12Kb (16, 3, 256) (16,2,256) | (8,1,512) MPEG-12Kb (16, 3, 256) (32,1,128) | (16,2,256)
Euromicro DSD 2010
Experimental Results
Schoool of Electrical and Computer Engineering, UNICAMP
0% 10% 20% 30% 40% 50% 60% 70%
Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb
DifEDP
0% 25% 50%
Euromicro DSD 2010
Experimental Results
0% 5% 10% 15% 20% 25% 30% 35% 40%
Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb
DifEnergy Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
0% 10% 20% 30% 40% 50% 60%
Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb
DifAccessTime
0% 25% 50%
Schoool of Electrical and Computer Engineering, UNICAMP
TSAC-EDPs have better average memory access
Parallel accesses to cache partitions, when possible,
Concept of array concentration can be applied to
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Introduction of new criteria for split cache selection Extension of the design method presented to two-level
Development of dynamic cache reconfiguration
Design of TSAC-EDPs for other applications and cache
Euromicro DSD 2010
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Fraction of paired accesses with an access to each partition Average memory access EDP reduction 50% 0.25 29% 0.50 65%
Schoool of Electrical and Computer Engineering, UNICAMP Naz et al. [HPSC, 04]
proposed a split cache consisting of a scalar and an array
Our work can be combined with this by using array
Zhang et al. [DATE, 04]
propose the addition of a configurable victim buffer to a
Since many of the TSAC-EDPs include a direct-mapped
Euromicro DSD 2010
Schoool of Electrical and Computer Engineering, UNICAMP
Assigning arrays to different cache partitions
additional step in configuration methods that tune other
Zang et al. [RSP
Gordon-Ross et al. [book chapter, 04].
Euromicro DSD 2010