The Bunker Cache for Spatio-Value Approximation Joshua San Miguel - - PowerPoint PPT Presentation

the bunker cache for spatio value approximation
SMART_READER_LITE
LIVE PREVIEW

The Bunker Cache for Spatio-Value Approximation Joshua San Miguel - - PowerPoint PPT Presentation

The Bunker Cache for Spatio-Value Approximation Joshua San Miguel Jorge Albericio Natalie Enright Jerger Aamer Jaleel Data Movement and Storage off-chip memory shared last-level cache private caches processor core 2 Data Movement and


slide-1
SLIDE 1

The Bunker Cache for Spatio-Value Approximation

Joshua San Miguel Jorge Albericio Natalie Enright Jerger Aamer Jaleel

slide-2
SLIDE 2

Data Movement and Storage

2

  • ff-chip memory

processor core private caches shared last-level cache

slide-3
SLIDE 3

Data Movement and Storage

3

  • ff-chip memory

processor core private caches shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

high cost of moving data

slide-4
SLIDE 4

Data Movement and Storage

4

  • ff-chip memory

processor core private caches shared last-level cache high cost of storing data

Last-level cache consumes substantial energy and takes up 30%-50% of chip area!

high cost of moving data

slide-5
SLIDE 5

Data Movement and Storage

5

  • ff-chip memory

processor core private caches shared last-level cache

  • ptimize via data addresses

e.g., cache prefetching

slide-6
SLIDE 6

Data Movement and Storage

6

  • ff-chip memory

processor core private caches shared last-level cache

  • ptimize via data values

e.g., cache compression

  • ptimize via data addresses

e.g., cache prefetching

slide-7
SLIDE 7

Data Movement and Storage

7

  • ff-chip memory

processor core private caches shared last-level cache

  • ptimize via data values

e.g., cache compression

  • ptimize via data addresses

e.g., cache prefetching

complexity of tracking address correlations

slide-8
SLIDE 8

Data Movement and Storage

8

  • ff-chip memory

processor core private caches shared last-level cache

  • ptimize via data values

e.g., cache compression

  • ptimize via data addresses

e.g., cache prefetching

complexity of tracking address correlations complexity of manipulating data values

slide-9
SLIDE 9

Data Movement and Storage

9

  • ff-chip memory

processor core private caches shared last-level cache

  • ptimize via data values

e.g., cache compression

  • ptimize via data addresses

e.g., cache prefetching

Can we improve data movement and storage simultaneously without the added complexities?

(where data is located?  what value is encoded in data?)

complexity of tracking address correlations complexity of manipulating data values

slide-10
SLIDE 10

Our Work

10

We explore Spatio-Value Similarity:

  • there is regularity to where approximately similar values are located in memory

We propose the Bunker Cache:

  • many-to-one similarity mapping based on memory address
  • savings in runtime (1.58x), dynamic energy (1.72x), leakage power (1.65x) at

acceptable quality levels

slide-11
SLIDE 11

Spatio-Value Similarity

11

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

slide-12
SLIDE 12

Spatio-Value Similarity

12

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

data values can often be approximate and continuous (i.e., smooth)

slide-13
SLIDE 13

Spatio-Value Similarity

13

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

data values can often be approximate and continuous (i.e., smooth) data addresses represent a

  • ne-dimensional memory space
slide-14
SLIDE 14

Spatio-Value Similarity

14

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

slide-15
SLIDE 15

Spatio-Value Similarity

15

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-16
SLIDE 16

Spatio-Value Similarity

16

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-17
SLIDE 17

Spatio-Value Similarity

17

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-18
SLIDE 18

Spatio-Value Similarity

18

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z approximately similar and contiguous

slide-19
SLIDE 19

Spatio-Value Similarity

19

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-20
SLIDE 20

Spatio-Value Similarity

20

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-21
SLIDE 21

Spatio-Value Similarity

21

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-22
SLIDE 22

Spatio-Value Similarity

22

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z

slide-23
SLIDE 23

Spatio-Value Similarity

23

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z similar but not contiguous

slide-24
SLIDE 24

Spatio-Value Similarity

24

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

memory space: x y z STRIDE similar but not contiguous

slide-25
SLIDE 25

Spatio-Value Similarity

25

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

e.g., image processing

slide-26
SLIDE 26

Spatio-Value Similarity

26

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

e.g., image processing STRIDE = image row size similar but not contiguous

slide-27
SLIDE 27

Spatio-Value Similarity

27

where data is located?  what value is encoded in data?

The goal of a processor is to process real-world information, not bits.

  • Spatio-Value Similarity: there is regularity to where approximately similar

values are located in memory.

e.g., image processing e.g., signal processing STRIDE = image row size STRIDE = signal period

slide-28
SLIDE 28

Spatio-Value Similarity

28

Given any data block, how similar is it to the block that is distance X away from it?

slide-29
SLIDE 29

Spatio-Value Similarity

29 debayer

similarity distance away

Given any data block, how similar is it to the block that is distance X away from it?

slide-30
SLIDE 30

Spatio-Value Similarity

30 debayer

similarity distance away

contiguous data are similar in value

Given any data block, how similar is it to the block that is distance X away from it?

slide-31
SLIDE 31

Spatio-Value Similarity

31 debayer dwt53

similarity

similar data are stored at regular intervals

distance away

contiguous data are similar in value

Given any data block, how similar is it to the block that is distance X away from it?

slide-32
SLIDE 32

Spatio-Value Similarity

32 2dconv debayer dwt53 histeq jpeg kmeans lucas-kanade

similarity

similar data are stored at regular intervals

distance away

contiguous data are similar in value

Given any data block, how similar is it to the block that is distance X away from it?

change-detection

slide-33
SLIDE 33

The Bunker Cache

33

  • ff-chip memory

processor core private caches shared last-level cache

slide-34
SLIDE 34

The Bunker Cache

34

  • ff-chip memory

processor core private caches Bunker Cache

slide-35
SLIDE 35

The Bunker Cache

35

  • ff-chip memory

processor core private caches Bunker Cache

Treat address X and address X+STRIDE as if they are one and the same

slide-36
SLIDE 36

Conventional Cache

36

tag tag data data

  • ● ●
  • ● ●

way 0 way N way 0 way N physical address

slide-37
SLIDE 37

Conventional Cache

37

tag tag data data

  • ● ●
  • ● ●

way 0 way N way 0 way N index function set physical address

slide-38
SLIDE 38

Conventional Cache

38

tag tag data data

  • ● ●
  • ● ●

= =

way 0 way N way 0 way N hit? index function hit? tag set physical address

slide-39
SLIDE 39

Conventional Cache

39

tag tag data data

  • ● ●
  • ● ●

= =

data way 0 way N way 0 way N hit? index function hit? tag set physical address

slide-40
SLIDE 40

Conventional Cache – Lookup

40

tag tag data data

  • ● ●
  • ● ●

way 0 way N way 0 way N address X physical address

slide-41
SLIDE 41

Conventional Cache – Lookup

41

tag tag data data

  • ● ●
  • ● ●

X

=

way 0 way N way 0 way N HIT address X index function tag set physical address

slide-42
SLIDE 42

Conventional Cache – Lookup

42

tag tag data data

  • ● ●
  • ● ●

X [X]

=

way 0 way N way 0 way N HIT address X index function data tag set physical address

slide-43
SLIDE 43

Conventional Cache – Lookup

43

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N physical address

slide-44
SLIDE 44

Conventional Cache – Lookup

44

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE physical address

slide-45
SLIDE 45

Conventional Cache – Lookup

45

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

[X+STRIDE] approximately similar to [X]

physical address

slide-46
SLIDE 46

Conventional Cache – Lookup

46

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

= MISS

tag index function set

[X+STRIDE] approximately similar to [X]

physical address

slide-47
SLIDE 47

Conventional Cache – Lookup

47

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

= MISS

tag index function

Incurs data movement cost!

set

[X+STRIDE] approximately similar to [X]

physical address

slide-48
SLIDE 48

Conventional Cache – Lookup

48

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

=

data MISS tag index function set

[X+STRIDE] approximately similar to [X]

physical address

slide-49
SLIDE 49

Conventional Cache – Lookup

49

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

=

data MISS tag index function

Incurs data storage cost!

set

[X+STRIDE] approximately similar to [X]

physical address

slide-50
SLIDE 50

The Bunker Cache

50

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N physical address

slide-51
SLIDE 51

The Bunker Cache

51

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N similarity mapping Bunker address physical address

slide-52
SLIDE 52

The Bunker Cache – Approximate Lookup

52

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping Bunker address physical address

slide-53
SLIDE 53

The Bunker Cache – Approximate Lookup

53

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping bunker_addr(X+STRIDE) == bunker_addr(X) Bunker address physical address

slide-54
SLIDE 54

The Bunker Cache – Approximate Lookup

54

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping

= HIT

index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) physical address

slide-55
SLIDE 55

The Bunker Cache – Approximate Lookup

55

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping

= HIT

index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves movement cost physical address

slide-56
SLIDE 56

The Bunker Cache – Approximate Lookup

56

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping

=

data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) physical address

slide-57
SLIDE 57

The Bunker Cache – Approximate Lookup

57

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE similarity mapping

=

data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves storage cost physical address

slide-58
SLIDE 58

The Bunker Cache – Approximate Lookup

58

tag tag data data

  • ● ●
  • ● ●

X [X] way 0 way N way 0 way N address X+STRIDE

=

data HIT index function tag set bunker_addr(X+STRIDE) == bunker_addr(X) saves storage cost

Rest of cache hardware unchanged!

similarity mapping physical address

slide-59
SLIDE 59

The Bunker Cache – Similarity Mapping

59

physical address space Bunker address space

slide-60
SLIDE 60

The Bunker Cache – Similarity Mapping

60

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

physical address space Bunker address space

slide-61
SLIDE 61

The Bunker Cache – Similarity Mapping

61

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

0 1 2 3 4 5 6 7 8 9 A B

physical address space

slide-62
SLIDE 62

The Bunker Cache – Similarity Mapping

62

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

0 1 2 3 4 5 6 7 8 9 A B

STRIDE = 4 physical address space

slide-63
SLIDE 63

The Bunker Cache – Similarity Mapping

63

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

0 1 2 3 5 6 7 9 A B 4 8

STRIDE = 4 RADIX = 3 physical address space

slide-64
SLIDE 64

The Bunker Cache – Similarity Mapping

64

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

0 1 2 3 5 6 7 9 A B 4 8 1 2 3 4 5 6 7

STRIDE = 4 RADIX = 3 physical address space Bunker address space

slide-65
SLIDE 65

The Bunker Cache – Similarity Mapping

65

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

0 1 2 3 4 5 6 7 8 9 A B 1 2 3 4 5 6 7

STRIDE = 4 RADIX = 3 physical address space Bunker address space

slide-66
SLIDE 66

The Bunker Cache – Similarity Mapping

66

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

physical address space Bunker address space

// skip if precise

slide-67
SLIDE 67

The Bunker Cache – Similarity Mapping

67

STRIDE: distance between approximately similar blocks RADIX: degree (i.e., aggressiveness) of approximation

physical address space Bunker address space

// skip if precise bunker_addr = (phys_addr / STRIDE*RADIX) * STRIDE; bunker_addr += (phys_addr % STRIDE*RADIX) % STRIDE;

slide-68
SLIDE 68

The Bunker Cache – Additional Details

68

Coherence and dirty state:

  • requires a separate directory structure that bypasses similarity mapping
slide-69
SLIDE 69

The Bunker Cache – Additional Details

69

Coherence and dirty state:

  • requires a separate directory structure that bypasses similarity mapping

Drowsy blocks:

  • many-to-one mapping offers more opportunity for low-leakage storage
slide-70
SLIDE 70

The Bunker Cache – Additional Details

70

Coherence and dirty state:

  • requires a separate directory structure that bypasses similarity mapping

Drowsy blocks:

  • many-to-one mapping offers more opportunity for low-leakage storage

Dynamic quality control:

  • can tune RADIX and STRIDE on-the-fly via periodic quality checks

More details in paper

slide-71
SLIDE 71

Evaluation

71

  • Applications: PERFECT and AxBench
  • Performance: Full-system cycle-level simulation
  • Energy and Power: CACTI
  • Quality: Pin simulation, signal-to-noise-ratio (SNR)
  • Configuration:
  • 4-core CMP, 16KB private L1, 128KB private L2
  • 2MB shared LLC, 2K-entry directory
  • STRIDE selected based on application’s data set dimensions
  • RADIX varied in results
slide-72
SLIDE 72

Evaluation – Application Output Quality

72 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix

slide-73
SLIDE 73

Evaluation – Application Output Quality

73 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix histeq

slide-74
SLIDE 74

Evaluation – Application Output Quality

74 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix histeq precise radix 4

slide-75
SLIDE 75

Evaluation – Application Output Quality

75 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix

slide-76
SLIDE 76

Evaluation – Application Output Quality

76 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix change-detection

slide-77
SLIDE 77

Evaluation – Application Output Quality

77 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix change-detection precise radix 4

slide-78
SLIDE 78

Evaluation – Application Output Quality

78 better

5 10 15 20 25 30 35 40 1 2 4 8 16 32 64 SNR (dB) similarity radix

slide-79
SLIDE 79

Evaluation – Application Speedup

79 better

1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 speedup similarity radix

slide-80
SLIDE 80

Evaluation – Dynamic Energy Savings

80 better

1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 dynamic energy savings similarity radix

slide-81
SLIDE 81

Evaluation – Leakage (Drowsy) Power Savings

81 better

1.0x 1.2x 1.4x 1.6x 1.8x 2.0x 2.2x 1 2 4 8 16 32 64 leakage power savings similarity radix

slide-82
SLIDE 82

Conclusion

82

  • ff-chip memory

processor core private caches shared last-level cache high cost of storing data high cost of moving data

slide-83
SLIDE 83

Conclusion

83

  • ff-chip memory

processor core private caches Bunker Cache

slide-84
SLIDE 84

Conclusion

84

  • ff-chip memory

processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE

slide-85
SLIDE 85

Conclusion

85

  • ff-chip memory

processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE cache compression without values e.g., storage of X+STRIDE is saved without needing [X+STRIDE]

slide-86
SLIDE 86

Conclusion

86

  • ff-chip memory

processor core private caches Bunker Cache cache prefetching without fetches e.g., a request for X implicitly prefetches X+STRIDE cache compression without values e.g., storage of X+STRIDE is saved without needing [X+STRIDE]

savings in runtime (1.58x), dynamic energy (1.72x), leakage power (1.65x) at radix 4

slide-87
SLIDE 87

Thank you

The Bunker Cache for Spatio-Value Approximation

Joshua San Miguel Jorge Albericio Natalie Enright Jerger Aamer Jaleel