Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - PowerPoint PPT Presentation

Doppelgänger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger

Cache Hierarchy main memory shared last-level cache private caches processor core 2

Cache Hierarchy main memory Accessing memory is 10x – 100x greater latency and energy than accessing private cache! shared last-level cache private caches processor core 4

Cache Hierarchy main memory Accessing memory is 10x – 100x greater latency and energy than accessing private cache! shared last-level cache Need hierarchy of large caches… private caches processor core 5

Cache Hierarchy main memory But last-level cache consumes substantial energy and takes up 30%-50% of chip area! shared last-level cache private caches processor core 8

Cache Hierarchy main memory But last-level cache consumes substantial energy and takes up 30%-50% of chip area! shared last-level cache Higher efficiency via Approximate Computing … private caches processor core 9

Summary Doppelgänger Cache:  Identifies approximate similarity in data block values.  77% cache storage savings of approximable data. 10

Summary Doppelgänger Cache:  Identifies approximate similarity in data block values.  77% cache storage savings of approximable data.  Effectively compresses storage of approximately similar blocks.  3x better compression ratio than state-of-the-art techniques. 11

Summary Doppelgänger Cache:  Identifies approximate similarity in data block values.  77% cache storage savings of approximable data.  Effectively compresses storage of approximately similar blocks.  3x better compression ratio than state-of-the-art techniques.  Significantly reduces area and energy consumption.  Reduces total on-chip cache area by 1.36x . 12

Outline  Approximate Computing  Approximate Similarity  Doppelgänger Cache  Cache Architecture  Similarity Mapping  Evaluation 13

Approximate Computing Not all data/computations need to be precise. Data mining Computer vision Audio and video processing http://www.zentut.com/ http://www.cc.gatech.edu/~cnieto6/ http://themusicparlour.blogspot.ca/ Gaming Machine learning Dynamical simulation http://www.businessweek.com/ http://www.analyticbridge.com/ http://www.scientific-computing.com/ 14

Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 15

Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 16

Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 2 90 131 185 93 133 184 3 35 31 29 43 38 37 17

Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 18

Approximate Similarity Two data blocks are approximately similar (i.e., doppelgängers ) if replacing the values of one with the other still results in acceptable application output in the end. Allows for 77% cache storage savings of approximable data! 1 92 131 183 91 132 186 approximately similar 2 90 131 185 93 133 184 3 35 31 29 43 38 37 21

Outline  Approximate Computing  Approximate Similarity  Doppelgänger Cache  Cache Architecture  Similarity Mapping  Evaluation 22

Doppelgänger Cache main memory shared last-level cache private caches processor core 23

Doppelgänger Cache main memory How can we exploit approximate similarity to save area and energy in the last-level cache? shared last-level cache private caches processor core 24

Doppelgänger Cache main memory shared last-level cache private caches processor core 25

Doppelgänger Cache main memory precise LLC shared LLC Doppelgänger LLC private caches processor core 26

Conventional Cache address from L2 data array tag array data from memory 27

Conventional Cache address from L2 data array tag array One-to-one mapping of data values to memory locations. data from memory 28

Conventional Cache address from L2 data array tag array One-to-one mapping of data values to memory locations. But the fundamental goal of a processor is to process data values, not memory locations… data from memory 29

Conventional Cache address from L2 data array tag array Multiple copies of approximately similar blocks. data from memory 34

Doppelgänger Cache address from L2 tag array approximate data array data from memory 36

Doppelgänger Cache address from L2 tag array Smaller data array allows for substantial area and energy savings. approximate data array data from memory 37

Doppelgänger Cache address from L2 tag array approximate data array data from memory 38

Doppelgänger Cache address from L2 tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data from memory 39

Doppelgänger Cache - Lookups tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 40

Doppelgänger Cache - Lookups address 0 from L2 tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 41

Doppelgänger Cache - Lookups address 0 from L2 tag array map X from tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 42

Doppelgänger Cache - Lookups address 0 from L2 tag array map X from tag array tag 0 map X data A to L2 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 43

Doppelgänger Cache - Insertions tag array tag 0 map X tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 44

Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X tag 5 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X 45

Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data B from memory 46

Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X tag 3 map X data B from generate map Y from data B memory 47

Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 48

Doppelgänger Cache - Insertions address 5 from L2 tag array tag 0 map X data B to L2 Miss! tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 49

Doppelgänger Cache - Insertions (Miss) address 5 from L2 tag array tag 0 map X data B to L2 tag 5 map Y tag 1 map X approximate data array map X data block A tag 2 map X map Y tag 3 map X data B from generate map Y from data B memory 50

Doppelgnger: A Cache for Approximate Computing Joshua San Miguel - PowerPoint PPT Presentation

Doppelgnger: A Cache for Approximate Computing Joshua San Miguel Jorge Albericio Andreas Moshovos Natalie Enright Jerger Cache Hierarchy main memory shared last-level cache private caches processor core 2 Cache Hierarchy main memory

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

NGER-EERS workshops 2018-19 Reporting under the National Greenhouse and Energy Reporting Act 2007

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Approximate Computing Is Dead; Long Live Approximate Computing Adrian Sampson Cornell Hardware

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Introducing DSpace 3.x Hardy Po)nger , University of Missouri

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

Georgia Flood Risk Mapping Assessment and Planning (MAP) Program Heart of Georgia Altamaha

Simplified Implementation of the MAP Decoder Shouvik Ganguly ECE 259B Final Project Presentation

Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student

Time for Public Health Action on Infertility Accessible Version: https://youtu.be/gdVKVY5de-U :

CRYSTAL CITY BLOCK PLAN # CCBP- J-K 2019 1 BLOCK J-K Long Range Planning Committee Block

Influence of Positioning Error on X-Map Estimation Michaela Neuland, TUBS Outline Motivation

Imagemaps and R How the WWW WWWorks Hyperlinks and Imagemaps R Plots to Imagemaps

Rally-Owl Overview of Rally-Owl Game This game is based off of Rally-X The goal of the game is