SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 - - PowerPoint PPT Presentation

split array caches for
SMART_READER_LITE
LIVE PREVIEW

SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 - - PowerPoint PPT Presentation

A. M. Tokarnia, M. Tachibana Schoool of Electrical and Computer Engineering, UNICAMP DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina Tachibana Introduction Split Array Caches


slide-1
SLIDE 1

DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS

Alice M. Tokarnia, Marina Tachibana

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

Euromicro DSD 2010

slide-2
SLIDE 2

Introduction

Split Array Caches

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 On-chip caches are one of the ideal targets of design optimization

 Role in performance and power consumption.  Caches can be customized.  Core-based processors, ASIPs, configurable processors

 Arrays

 Vectors, arrays, data structures.  Elements stored at sequential addresses.

 Array caches  Split array caches

 Defined by partition organization and array-partition mapping.  Arrays with distinct locality properties can be mapped to partitions with

different organizations

 Parallel accesses to the partitions may further improve performance.

Euromicro DSD 2010

slide-3
SLIDE 3

Introduction

Related Works

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Tuning a cache to an application to improve system

performance or power consumption

 Givargis et al. [ICCAD 99]  Vahid et al. [DATE 04]  Ghosh and Givargis [DATE 03]  Gordon-Ross et al. [Ultra Low-Power Electronics and Design 04]  Mapping application variables to embedded cache ‘parts’

according to their locality

 Panda et al [DATE 98]  Sanchez et al.[IEEE TCCA Newsletters 97] and Gonzalez et al. [IEEE

Micro 00]

 Lee et al. [ETRI Journal 03]  Naz et al. [ACM SIGARCH 06]

Euromicro DSD 2010

slide-4
SLIDE 4

Introduction

TSAC-EDPs

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Trace-based split array caches

 Organization

 Two-partition array cache  The line size of one partition is 2x that of the other  The ways of the partitions have the same size

 Array-partition mapping and partition degree of set-

associativity

 Determined as to minimize the average memory access EDP

Euromicro DSD 2010

slide-5
SLIDE 5

Design Method

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP  Inputs

 Application program, typical inputs, cache size constraint

 Outputs

 TSAC-EDPs and best unified caches

 Main concern

 Navigate through a large design space

 Array-partition mapping  Degree of set-associativity  Strategy

 Unified caches whose ways are split between two partitions

Euromicro DSD 2010

slide-6
SLIDE 6

Design Method

Design Steps

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

1.

Trace generation

2.

Definition of unified caches

3.

Trace analysis

4.

Candidate split caches

5.

Cache simulation

6.

Cache evaluation and selection

Euromicro DSD 2010

slide-7
SLIDE 7

Design Method

Trace Generation & Definition of Unified Caches

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Instrument and execute code to generate a trace of

array accesses

 For an access to an array, the trace has an entry with

array name, address, and number of words.

 Unified caches C0 (b, n, m) satisfying the size

constraint

 line size b, degree of set-associativity n, number of

sets m

Euromicro DSD 2010

slide-8
SLIDE 8

Design Method

Trace Analysis

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Concentration of line Li of array A lX(A, Li, b)

 Fraction of accesses that falls to same half of Li as the previous access

1. L1

  • 2. L1

3. L1 4. L1

Exemple 1 lX(A, L1, 8) = 4/4 = 1

  • 1. L2

2. L2

  • 3. L2

4. L2

Exemplo 2 lX(A, L1, 8) = 1/4 = 0.25 Euromicro DSD 2010

slide-9
SLIDE 9

Design Method

Trace Analysis

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 X(A, b): concentration of A  median concentration of the lines accessed.  N(A, b): number of distinct lines of A accessed.  Nt(b): number of distinct lines accessed for all

arrays.

Concentration defines a partial

  • rdering of arrays

Euromicro DSD 2010

slide-10
SLIDE 10

Design Method

Candidate Split Caches

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Array-partition mapping

 Arrays in which accesses exhibit lower concentration

→C1(b, n1, m)

 Other arrays → C2(b/2, n1, 2m)

 At most, (#arrays -1) array-partition mappings.  2#arrays array-partition mappings for exhaustive search

Euromicro DSD 2010

slide-11
SLIDE 11

Design Method

Candidate Split Caches

Euromicro DSD 2010

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Degree of set-associativity of the partitions

 Number of lines is approximantely proportional to

number of distinct lines accessed

            

5 . ) ( / ) , (

2

2 C to mapped A t b

N b A N n n

2 1

n n n  

slide-12
SLIDE 12

Design Method

Examples of Cache Spliting

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

C1 = {A, B}

b/2

b b C2 = {X, Y, Z} m 2m

… …

C1= {A, B, Y, Z} b C2 = {X} m 2m b/2 b/2 1 1

n - 1 1 n - 1 b b

1 n - 1 1 1 b/2 2m m C1 = {B} C2 = {X, Y, Z, A} b b b

b m 1 2 n - 1 n

(C2|C1)1 (C2|C1)2 (C2|C1)n

C0 = {X, Y, Z, A, B}

Euromicro DSD 2010

slide-13
SLIDE 13

Design Method

Simulation of Unified and Split Caches

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Obtain metrics used for cache evaluation

 miss rates

 Other metrics can be obtained from cache models

and memory datasheets

 Hit access time and energy consumption  Miss (time) penalty and miss (energy) penalty Euromicro DSD 2010

slide-14
SLIDE 14

Design Method

Cache Evaluation C0

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Fraction F of accesses consists of two parallel

accesses, one to C1 and the other to C2

         

_ _ _ 1 _ _ _ C penalty Miss C rate Miss C time Hit C time access memory Average      Φ

Euromicro DSD 2010

slide-15
SLIDE 15

Design Method

Cache Evaluation C1|C2

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

       

i i i i

C penalty Miss C rate Miss C time Hit C time access memory Average _ _ _ _ _ _   

     

2 2 1 1 2 1

_ _ _ _ _ _ | _ _ _ C time access memory Average f C time access memory Average f C C time access memory Average    

Euromicro DSD 2010

slide-16
SLIDE 16

Design Method

Cache Selection

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Unified array caches with minimum EDP  Split caches with minimum EDP: TSAC-EDPs

Euromicro DSD 2010

slide-17
SLIDE 17

Experimental Results

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Applications:

 Convolution (3 arrays)  Fast Fourier Transform (4 arrays)  Group 3 Fax decoder G3fax (4 arrays)  JPEG encoder (DCT, Quantization) (10 arrays)  MPEG-2 video decoder (49 arrays)

 Cache sizes: 8K-byte, 12K-byte  Cache access time and energy

 CACTI, 90 nm

 Memory access time

 Samsung DDR266

 Memory energy consumption

 50x cache energy consumption

Euromicro DSD 2010

slide-18
SLIDE 18

Experimental Results

TSAC-EDPs

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP Application Unified-EDP C0 TSAC-EDP C1|C2 (b, n, m) (b1, n1, m1)|(b2, n2, m2)

Conv-8Kb (32, 2, 128) (64,1,64) | (32,1,128) FFT-8Kb (32, 2, 128) (16,3,128) | (8,1,256) G3fax-8Kb (16, 2, 256) (16,1,256) | (8,1,512) JPEG-8Kb (16, 2, 256) (32,1,128) | (16,1,256) MPEG-8Kb (16, 2, 256) (16,1,256) | (8,1,512) Conv-12Kb (32, 3, 128) (64,1,64) | (32,2,128) FFT-12Kb (16, 6, 128) (16,2,256) | (8,1,512) G3fax-12Kb (16, 3, 256) (16,1,128) | (8,5,256) JPEG-12Kb (16, 3, 256) (16,2,256) | (8,1,512) MPEG-12Kb (16, 3, 256) (32,1,128) | (16,2,256)

Euromicro DSD 2010

slide-19
SLIDE 19

Experimental Results

DifEDP

(EDPC0-EDPC1|C2)/EDPC0

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

  • 20%
  • 10%

0% 10% 20% 30% 40% 50% 60% 70%

Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb

DifEDP

0% 25% 50%

Euromicro DSD 2010

slide-20
SLIDE 20

Experimental Results

DifEnergy and DifAccessTime

  • 5%

0% 5% 10% 15% 20% 25% 30% 35% 40%

Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb

DifEnergy Euromicro DSD 2010

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

(EC0-EC1|C2)/EC0

(DC0-DC1|C2)/DC0

  • 40%
  • 30%
  • 20%
  • 10%

0% 10% 20% 30% 40% 50% 60%

Conv 8 Kb Conv 12 Kb FFT 8 Kb FFT 12 Kb G3fax 8 Kb G3fax 12 Kb JPEG 8 Kb JPEG 12 Kb MPEG 8 Kb MPEG 12 Kb

DifAccessTime

0% 25% 50%

slide-21
SLIDE 21

Conclusion

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 TSAC-EDPs have better average memory access

time, energy and energy-delay product than unified caches of the same size for some applications.

 Parallel accesses to cache partitions, when possible,

can further improve average memory access EDP , time, and energy.

 Concept of array concentration can be applied to

  • ther design methods.

Euromicro DSD 2010

slide-22
SLIDE 22

End

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

Thank you!

tokarnia@dca.fee.unicamp.br marinatachibana@gmail.com

Questions?

Euromicro DSD 2010

slide-23
SLIDE 23

Future Work-TSAC

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Introduction of new criteria for split cache selection  Extension of the design method presented to two-level

caches

 Development of dynamic cache reconfiguration

algorithms

 Design of TSAC-EDPs for other applications and cache

size constraints

Euromicro DSD 2010

slide-24
SLIDE 24

TSAC-EDP x Unified Array Cache

Euromicro DSD 2010

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

MPEG-2 Decoder 8K-byte caches

Fraction of paired accesses with an access to each partition Average memory access EDP reduction 50% 0.25 29% 0.50 65%

slide-25
SLIDE 25

Combining our cache splitting method with other cache design methods

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP  Naz et al. [HPSC, 04]

 proposed a split cache consisting of a scalar and an array

cache.

 Our work can be combined with this by using array

concentrations and further splitting the array cache.

 Zhang et al. [DATE, 04]

 propose the addition of a configurable victim buffer to a

direct-mapped cache

 Since many of the TSAC-EDPs include a direct-mapped

partition, adding a victim buffer to this partition may further reduce energy consumption.

Euromicro DSD 2010

slide-26
SLIDE 26

Combining our cache splitting method with other cache design methods

  • A. M. Tokarnia, M. Tachibana

Schoool of Electrical and Computer Engineering, UNICAMP

 Assigning arrays to different cache partitions

according to their concentrations

 additional step in configuration methods that tune other

cache parameters

 Zang et al. [RSP

, 03]

 Gordon-Ross et al. [book chapter, 04].

Euromicro DSD 2010