Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - - PowerPoint PPT Presentation

doppelg nger a cache for approximate computing
SMART_READER_LITE
LIVE PREVIEW

Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - - PowerPoint PPT Presentation

Doppelgnger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger Cache Hierarchy main memory shared last-level cache private caches processor core 2 Cache Hierarchy main memory


slide-1
SLIDE 1

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger

slide-2
SLIDE 2

Cache Hierarchy

2

main memory processor core private caches shared last-level cache

slide-3
SLIDE 3

Cache Hierarchy

3

processor core private caches main memory shared last-level cache

slide-4
SLIDE 4

Cache Hierarchy

4

processor core private caches main memory shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

slide-5
SLIDE 5

Cache Hierarchy

5

processor core private caches main memory shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache! Need hierarchy of large caches…

slide-6
SLIDE 6

Cache Hierarchy

6

main memory processor core private caches shared last-level cache

slide-7
SLIDE 7

Cache Hierarchy

7

main memory processor core private caches shared last-level cache

slide-8
SLIDE 8

Cache Hierarchy

8

main memory processor core private caches shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area!

slide-9
SLIDE 9

Cache Hierarchy

9

main memory processor core private caches shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area! Higher efficiency via Approximate Computing…

slide-10
SLIDE 10

Summary

10

Doppelgänger Cache:

  • Identifies approximate similarity in data block values.
  • 77% cache storage savings of approximable data.
slide-11
SLIDE 11

Summary

11

Doppelgänger Cache:

  • Identifies approximate similarity in data block values.
  • 77% cache storage savings of approximable data.
  • Effectively compresses storage of approximately similar blocks.
  • 3x better compression ratio than state-of-the-art techniques.
slide-12
SLIDE 12

Summary

12

Doppelgänger Cache:

  • Identifies approximate similarity in data block values.
  • 77% cache storage savings of approximable data.
  • Effectively compresses storage of approximately similar blocks.
  • 3x better compression ratio than state-of-the-art techniques.
  • Significantly reduces area and energy consumption.
  • Reduces total on-chip cache area by 1.36x.
slide-13
SLIDE 13

Outline

  • Approximate Computing
  • Approximate Similarity
  • Doppelgänger Cache
  • Cache Architecture
  • Similarity Mapping
  • Evaluation

13

slide-14
SLIDE 14

Approximate Computing

Not all data/computations need to be precise.

14

http://www.zentut.com/ http://www.businessweek.com/ http://www.cc.gatech.edu/~cnieto6/ http://www.analyticbridge.com/ http://themusicparlour.blogspot.ca/ http://www.scientific-computing.com/

Data mining Computer vision Audio and video processing Gaming Machine learning Dynamical simulation

slide-15
SLIDE 15

Approximate Similarity

15

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-16
SLIDE 16

Approximate Similarity

16

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-17
SLIDE 17

Approximate Similarity

17

1 2 3

92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-18
SLIDE 18

Approximate Similarity

18

1 2 3

92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37

approximately similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-19
SLIDE 19

Approximate Similarity

19

1 2 3

92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37

approximately similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-20
SLIDE 20

Approximate Similarity

20

1 2 3

92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37

approximately similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

slide-21
SLIDE 21

Approximate Similarity

21

1 2 3

92 131 183 91 132 186 90 131 185 93 133 184 35 31 29 43 38 37

approximately similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Allows for 77% cache storage savings of approximable data!

slide-22
SLIDE 22

Outline

  • Approximate Computing
  • Approximate Similarity
  • Doppelgänger Cache
  • Cache Architecture
  • Similarity Mapping
  • Evaluation

22

slide-23
SLIDE 23

Doppelgänger Cache

23

main memory processor core private caches shared last-level cache

slide-24
SLIDE 24

Doppelgänger Cache

24

main memory processor core private caches shared last-level cache

How can we exploit approximate similarity to save area and energy in the last-level cache?

slide-25
SLIDE 25

Doppelgänger Cache

25

main memory processor core private caches shared last-level cache

slide-26
SLIDE 26

Doppelgänger Cache

26

main memory processor core private caches shared LLC precise LLC Doppelgänger LLC

slide-27
SLIDE 27

Conventional Cache

27 tag array data array address from L2 data from memory

slide-28
SLIDE 28

Conventional Cache

28 tag array data array address from L2 data from memory

One-to-one mapping of data values to memory locations.

slide-29
SLIDE 29

Conventional Cache

29 tag array data array address from L2 data from memory

One-to-one mapping of data values to memory locations. But the fundamental goal of a processor is to process data values, not memory locations…

slide-30
SLIDE 30

Conventional Cache

30 tag array data array address from L2 data from memory

slide-31
SLIDE 31

Conventional Cache

31 tag array data array address from L2 data from memory

slide-32
SLIDE 32

Conventional Cache

32 tag array data array address from L2 data from memory

slide-33
SLIDE 33

Conventional Cache

33 tag array data array address from L2 data from memory

slide-34
SLIDE 34

Conventional Cache

34 tag array data array address from L2 data from memory

Multiple copies of approximately similar blocks.

slide-35
SLIDE 35

Conventional Cache

35 tag array data array address from L2 data from memory

slide-36
SLIDE 36

Doppelgänger Cache

36 tag array approximate data array address from L2 data from memory

slide-37
SLIDE 37

Doppelgänger Cache

37 tag array approximate data array address from L2 data from memory

Smaller data array allows for substantial area and energy savings.

slide-38
SLIDE 38

Doppelgänger Cache

38 tag array approximate data array address from L2 data from memory

slide-39
SLIDE 39

Doppelgänger Cache

39

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address from L2 data from memory

slide-40
SLIDE 40

Doppelgänger Cache - Lookups

40

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array

slide-41
SLIDE 41

Doppelgänger Cache - Lookups

41

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 0 from L2

slide-42
SLIDE 42

Doppelgänger Cache - Lookups

42

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 0 from L2 map X from tag array

slide-43
SLIDE 43

Doppelgänger Cache - Lookups

43

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 0 from L2 data A to L2 map X from tag array

slide-44
SLIDE 44

Doppelgänger Cache - Insertions

44

tag 0 map X tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array

slide-45
SLIDE 45

Doppelgänger Cache - Insertions

45

tag 0 map X tag 5 tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 5 from L2

slide-46
SLIDE 46

Doppelgänger Cache - Insertions

46

tag 0 map X tag 5 tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 5 from L2 data B from memory data B to L2

slide-47
SLIDE 47

Doppelgänger Cache - Insertions

47

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A

tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2

slide-48
SLIDE 48

Doppelgänger Cache - Insertions

48

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y

tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2

slide-49
SLIDE 49

Doppelgänger Cache - Insertions

49

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y

tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2

Miss!

slide-50
SLIDE 50

Doppelgänger Cache - Insertions (Miss)

50

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y

tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2

slide-51
SLIDE 51

Doppelgänger Cache - Insertions (Miss)

51

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 5 from L2 data B from memory generate map Y from data B data B to L2

slide-52
SLIDE 52

Doppelgänger Cache - Insertions

52

tag 0 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array

slide-53
SLIDE 53

Doppelgänger Cache - Insertions

53

tag 0 map X tag 6 tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2

slide-54
SLIDE 54

Doppelgänger Cache - Insertions

54

tag 0 map X tag 6 tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory data C to L2

slide-55
SLIDE 55

Doppelgänger Cache - Insertions

55

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

slide-56
SLIDE 56

Doppelgänger Cache - Insertions

56

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

slide-57
SLIDE 57

Doppelgänger Cache - Insertions

57

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

Hit!

slide-58
SLIDE 58

Doppelgänger Cache - Insertions (Hit)

58

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

slide-59
SLIDE 59

Doppelgänger Cache - Insertions (Hit)

59

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

slide-60
SLIDE 60

Doppelgänger Cache - Insertions (Hit)

60

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

Data block A serves as an acceptable approximation

  • f data block C.
slide-61
SLIDE 61

Doppelgänger Cache - Insertions (Hit)

61

tag 0 map X tag 6 map X tag 5 map Y tag 1 map X tag 2 map X tag 3 map X map X data block A map Y data block B

tag array approximate data array address 6 from L2 data C from memory generate map X from data C data C to L2

slide-62
SLIDE 62

Doppelgänger Cache - Similarity Mapping

62

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

slide-63
SLIDE 63

Doppelgänger Cache - Similarity Mapping

63

data block A A[0] A[1] A[n] hash function mapping hash map

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

slide-64
SLIDE 64

Doppelgänger Cache - Similarity Mapping

64

data block A A[0] A[1] A[n] mapping hash map hash function

Aggregates values in block:

hash = AVG(A*0+, …, A*n+)

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

slide-65
SLIDE 65

Doppelgänger Cache - Similarity Mapping

65

data block A A[0] A[1] A[n] hash function hash map

Discretizes hash value:

mapping

map (M-bit) All possible hash values

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

slide-66
SLIDE 66

Doppelgänger Cache - Similarity Mapping

66

data block A A[0] A[1] A[n] hash function hash map

Discretizes hash value:

mapping

map (M-bit) All possible hash values approximately similar

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

slide-67
SLIDE 67

Doppelgänger Cache

67

main memory processor core private caches shared LLC precise LLC Doppelgänger LLC

slide-68
SLIDE 68

uniDoppelgänger Cache

68

main memory processor core private caches shared LLC uniDoppelgänger LLC

slide-69
SLIDE 69

uniDoppelgänger Cache

69

main memory processor core private caches shared LLC uniDoppelgänger LLC

Precise blocks simply use physical address as the map value.

slide-70
SLIDE 70

70

More details in paper:

  • Cache writes, replacements and coherence.
  • Details on hash functions and mapping.
  • Sensitivity to size of map space and data array.
  • Evaluation of uniDoppelgänger.

Doppelgänger Cache

slide-71
SLIDE 71

Outline

  • Approximate Computing
  • Approximate Similarity
  • Doppelgänger Cache
  • Cache Architecture
  • Similarity Mapping
  • Evaluation

71

slide-72
SLIDE 72

Evaluation

72

  • Applications: PARSEC and AxBench
  • Performance: Full-system cycle-level simulation
  • Error: Pin simulation
  • Area and Energy: CACTI
  • Configuration:
  • 4 cores, private L1 and L2
  • 2MB shared LLC (1MB precise, 1MB Doppelgänger)
  • Doppelgänger: 14-bit similarity map, 1/4 data array
slide-73
SLIDE 73

73

Evaluation - Compression Ratio

0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better

slide-74
SLIDE 74

74

Evaluation - Compression Ratio

0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better

slide-75
SLIDE 75

75

Evaluation - Compression Ratio

0x 1x 2x 3x 4x 5x 6x BΔI exact deduplication doppelganger doppelganger + BΔI compression ratio better

slide-76
SLIDE 76

76

Evaluation

0.8x 0.9x 1.0x 1.1x 1.2x 1.3x 1.4x application

  • utput accuracy

application performance total cache dynamic energy reduction total cache leakage energy reduction total cache area reduction better

slide-77
SLIDE 77

77

Evaluation

0.8x 0.9x 1.0x 1.1x 1.2x 1.3x 1.4x application

  • utput accuracy

application performance total cache dynamic energy reduction total cache leakage energy reduction total cache area reduction better

slide-78
SLIDE 78

Conclusion

78

Doppelgänger Cache:

  • Identifies approximate similarity in data block values.
  • 77% cache storage savings of approximable data.
  • Effectively compresses storage of approximately similar blocks.
  • 3x better compression ratio than state-of-the-art techniques.
  • Significantly reduces area and energy consumption.
  • Reduces total on-chip cache area by 1.36x.
slide-79
SLIDE 79

Thank you

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger