Energy Consumption of PCM Main Memory Santiago Bock, Bruce - - PowerPoint PPT Presentation

energy consumption of pcm main memory
SMART_READER_LITE
LIVE PREVIEW

Energy Consumption of PCM Main Memory Santiago Bock, Bruce - - PowerPoint PPT Presentation

Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory Santiago Bock, Bruce Childers, Rami Melhem, Daniel Moss and Youtao Zhang University of Pittsburgh Introduction Datacenters are growing


slide-1
SLIDE 1

Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption

  • f PCM Main Memory

Santiago Bock, Bruce Childers, Rami Melhem, Daniel Mossé and Youtao Zhang University of Pittsburgh

slide-2
SLIDE 2

Santiago Bock

Introduction

  • Datacenters are growing in size and number
  • Energy consumption will cost $7.4 billion in 2011
  • Memory consumes 20% to 40% of energy in a typical server
  • Larger memories due to multi-core
  • Smaller transistor sizes leak more current
  • PCM for main memory

Low static power due to non-volatility Read performance comparable to DRAM Better scalability than DRAM High energy cost of writes Limited write endurance

slide-3
SLIDE 3

Santiago Bock

Motivation

  • A write-back is useless when its data is not used again
  • Avoiding useless write-backs requires future knowledge
  • Idea: use application information
  • Memory allocator
  • Control flow analysis
  • Stack pointer
  • Focus of this work
  • How many useless write-backs can be avoided?
  • What’s the impact on endurance and energy consumption?
slide-4
SLIDE 4

Santiago Bock

Outline

  • Introduction
  • Motivation
  • What is Phase Change Memory?
  • What are useless write-backs?
  • How do we count useless write-backs?
  • How much can we gain?
  • Conclusions
slide-5
SLIDE 5

Santiago Bock

Background on PCM Main Memory

  • PCM writes
  • Modify physical state
  • Slow
  • High energy cost
  • Limited to 106 to 108
  • Main memory architecture
  • L2 cache
  • Small DRAM cache (optional)
  • Large PCM main memory
slide-6
SLIDE 6

Santiago Bock

Useless Write-Backs

Write A

A

A becomes dirty Action Comment Cache Status Read A

A

A is used Read A

A

A is used again Read B

B

A is evicted and written back Write A

A

Original value of A is overwritten A is dead

The write-back of A is useless because A is dead

slide-7
SLIDE 7

Santiago Bock

Useless Write-Backs

  • Detecting useless write-backs
  • Difficult to identify last read before a write
  • Use program information to detect dead memory locations
  • Detecting dead memory locations depends on the type of

memory region

  • Heap: use calls to malloc() and free()
  • Global: use control flow analysis
  • Stack: use the stack pointer
slide-8
SLIDE 8

Santiago Bock

Analysis Framework

  • Trace: address and type of each memory reference
  • Analyzer: cache simulator and list of dead memory locations

Program Instrumentation Analyzer Model

Endurance gains Energy savings Configuration Configuration Useless write-backs Trace

slide-9
SLIDE 9

Santiago Bock

Analysis for Heap Data

Trace: Cache: List of allocated blocks: List of dead blocks:

malloc(1) returns 3 3,1 free(3) a 3,1 write to 3 a 3,1

3 becomes dead! write-back of a is useless!

read from 7 b 3,1 malloc returns 3 b 3,1

slide-10
SLIDE 10

Santiago Bock

Analysis for Global Data

Trace: Cache: Objects (id, last access, last write-back):

read 5 a 5,3,0 3 read 9 b 5,3,7 7 write 5 a 5,1,0 1 5,9,7 write 5 a 9

3 < 7: useless write-back!

slide-11
SLIDE 11

Santiago Bock

Analysis for Stack Data

read 3, stack 100 100

Trace: Cache: Min Stack Pointer:

write 90, stack 80 a 80

Stack:

100: 96: 92: 88: 84: 80: read 2, stack 100 b 80 read 5, stack 100 a 80

stack frame becomes dead write-back of a is useless

slide-12
SLIDE 12

Santiago Bock

Methodology

  • SPEC CPU2006 benchmark suite
  • 26 benchmarks
  • 52 combinations of benchmark/input
  • Pin collects traces
  • 100 billion instructions
  • L2 Cache
  • 1MB
  • 8-way, LRU
  • DRAM Cache
  • No cache, 8MB, 16MB, 32MB and 64MB
  • 16-way, LRU
  • Cache line size
  • 8B (limit study), 32B, 64B and 128B
slide-13
SLIDE 13

Santiago Bock

Experimental Results

  • Categorization of benchmarks based on memory region
  • Heap intensive: more than 1 million object allocations
  • Global intensive: more than 4MB global size

1E+00 1E+02 1E+04 1E+06 1E+08 1E+10

sjeng lbm mcf bzip2-1 bzip2-2 bzip2-3 bzip2-4 bzip2-5 bzip2-6 libquantum zeusmp bwaves namd gamess-1 gamess-2 gamess-3 milc soplex-2 gromacs h264ref-2 h264ref-1 gobmk-3 h264ref-3 gobmk-4 gobmk-5 cactusADM gobmk-1 gobmk-2 soplex-1 leslie3d hmmer-2 astar-2 GemsFDTD hmmer-1 gcc-1 gcc-8 gcc-3 povray gcc-6 gcc-9 gcc-7 astar-1 gcc-4 calculix gcc-5 gcc-2 sphinx3 perlbench-3 perlbench-1

  • mnetpp

perlbench-2 tonto

Number of Object Allocations in the Heap

1E+00 1E+02 1E+04 1E+06 1E+08 1E+10

lbm libquantum bwaves bzip2-1 bzip2-2 bzip2-3 bzip2-4 bzip2-5 bzip2-6 leslie3d cactusADM mcf GemsFDTD perlbench-1 perlbench-2 perlbench-3 sphinx3 milc gromacs hmmer-1 hmmer-2 namd soplex-1 soplex-2 astar-1 astar-2

  • mnetpp

tonto calculix povray h264ref-1 h264ref-2 h264ref-3 gcc-1 gcc-2 gcc-3 gcc-4 gcc-5 gcc-6 gcc-7 gcc-8 gcc-9 sjeng gobmk-1 gobmk-2 gobmk-3 gobmk-4 gobmk-5 gamess-1 gamess-2 gamess-3 zeusmp

Size of Global Region in Bytes

slide-14
SLIDE 14

Santiago Bock

Heap (8-byte cache line)

Fraction of useless write-backs Energy savings 0% 10% 20% 30% 40% 50% 60% 70% DRAM PCM 0% 10% 20% 30% 40% 50% 60% 70% DRAM PCM

slide-15
SLIDE 15

Santiago Bock

Heap (Average Endurance Gains)

0% 5% 10% 15% 20% 25% 30% No DRAM 8MB 16MB 32MB 64MB DRAM Cache Size 8 Byte Cache Line 32 Byte Cache Line 64 Byte Cache Line 128 Byte Cache Line

slide-16
SLIDE 16

Santiago Bock

Heap (Average Energy Savings)

0% 5% 10% 15% 20%

DRAM PCM 0MB PCM 8MB PCM 16MB PCM 32MB PCM 64MB Total 8MB Total 16MB Total 32MB Total 64MB

Type of Saving and DRAM Cache Size 8 Byte Cache Line 32 Byte Cache Line 64 Byte Cache Line 128 Byte Cache Line

slide-17
SLIDE 17

Santiago Bock

Global (8-byte cache line)

Fraction of useless write-backs Energy savings 0% 5% 10% 15% 20% 25% 30% 35% 40% DRAM PCM 0% 5% 10% 15% 20% 25% 30% 35% 40% DRAM PCM

slide-18
SLIDE 18

Santiago Bock

Global (Average Energy Savings)

0% 5% 10% 15% 20%

DRAM PCM 0MB PCM 8MB PCM 16MB PCM 32MB PCM 64MB

Type of savings and DRAM cache size 8 Byte Cache Line 32 Byte Cache Line 64 Byte Cache Line 128 Byte Cache Line

slide-19
SLIDE 19

Santiago Bock

Global (Average Energy Savings)

0% 5% 10% 15% 20%

DRAM PCM (0MB) PCM (8MB) PCM (16MB) PCM (32MB) PCM (64MB) Total (8MB) Total (16MB) Total (32MB) Total (64MB)

Type of savings and DRAM cache size 8 Byte Cache Line 32 Byte Cache Line 64 Byte Cache Line 128 Byte Cache Line

slide-20
SLIDE 20

Santiago Bock

Stack

  • Very few useless write-backs
  • Fraction of useless write-backs between 0% and 2.3%
  • Average endurance gains and energy savings between 0% and 0.1%
  • Programs use a small part of the stack
  • 10KB to 20KB
  • Kept mostly in the cache
  • Few opportunities to evict dead data from the cache
slide-21
SLIDE 21

Santiago Bock

Conclusions

  • We showed that a considerable amount of write-backs are

useless

  • We showed there is potential
  • Up to 20% energy savings
  • Up to 26% endurance gains
  • Next step: develop techniques to avoid useless write-backs
  • Low energy cost
  • Low performance impact
slide-22
SLIDE 22

Santiago Bock

Thank you!

Questions?

sab104@cs.pitt.edu http://www.cs.pitt.edu/~sab104