Dynamic Zero Compression for Cache Energy Reduction
- Conventional Cache Structure
Energy Dissipation
Bitlines (~75%) Decoders I/O Drivers Wordlines
wl bit bit_b DGGU %86 $ G G U H V V
- '
H F R G H U ,2
Dynamic Zero Compression for Cache Energy Reduction - - PDF document
Dynamic Zero Compression for Cache Energy Reduction
Energy Dissipation
Bitlines (~75%) Decoders I/O Drivers Wordlines
wl bit bit_b DGGU %86 $ G G U H V V
H F R G H U ,2
Sub-banking Hierarchical Bitlines Low-swing Bitlines
performed full swing.
Wordline Gating
,2 %86 DGGU $ G G U H V V
H F R G H U JZO OZO 2IIVHW 'HF RIIVHW 65$0 &HOOV 6HQVH $PSV OZO 2IIVHW 'HF RIIVHW 65$0 &HOOV 6HQVH $PSV
>70% of the bits in D-cache accesses are “0”s
Measured from SPECint95 and MediaBench Examples: small values, data types
Differential bitlines preferred in large SRAM
Better Noise Immunity Faster Sensing
Related work with single-ended bitlines
[Tseng and Asanovic ’00] --- Used in register file
[Chang et. al. ’99] --- Used in ROM and small
One bit per grouping of bits
Set if bits are zeros Controls wordline gating
,2 DGGU $ G G U H V V
H F R G H U OZO 65$0 &HOOV 6QV$PS RII GHF
OZO 65$0 &HOOV 6QV $PS
= , %
10 20 30 40 50 comp li ijpeg go vortex m88k gcc perl adpcm_en adpcm_de epic unepic g721_en g721_de mpeg_en mpeg_de pegwit_en pegwit_de Avg word half-word byte half-byte
Zero Indicator Bit Wordline Gating Circuitry Sense Amplifier CPU Store Driver Cache Output Driver
%LW %:/ %LWBE =,%BE :B(1
,2 %86 DGGU $ G G U H V V
H F R G H U E Z O 65$0 &HOOV 6HQVH$PSOLILHUV
= , % &ZO &ZO
%86
%LWBE
]HUR
'DWD %LW
=,%BE VHQVH =,%
,2 DGGU $ G G U H V V
H F R G H U E Z O 65$0 &HOOV 6HQVH$PSOLILHUV
= , %
Not driven onto bus Not in critical path ZIB read w/o delay
=,% :B(1 Z U L W H
D W D
Area Overhead: 9% Zero-Indicator-Bits
Sense Amplifiers WLG Circuitry I/O Circuitry
No delay overhead for writes
Zero check performed in parallel with tag check
2 F04 gate-delays for reads
A pessimistic 7% worst case delay
'DWD%LWV / R Z
Z L Q J
X V =,% /:/
10 15 20 25 30 35 40 45
comp li ijpeg go vortex m88k gcc perl adpcm_en adpcm_de epic unepic g721_en g721_de mpeg_en mpeg_de pegwit_en pegwit_de Avg
Zeros are not as prevalent in I-Cache. Use a recoding scheme to increase the zero-byte in I-cache. [Panich ’99] --- IWLG technique that compacts the
Use two-address form when src reg = dest reg
OZO
VP
m/l
Small modification on instruction format
l Use 8-8-8-8 instead of 16-7-9
Upper two byte are zero-detected Lower two bytes are usage-detected Able to eliminate bitline swings of zero-valued
l Example: Opcode 000000
Slower than IWLG due to wordline gating in the
VP
PO
" "
8 8 8 8 OZO
5 10 15 20 25 30 35 comp li ijpeg go vortex m88k gcc perl adpcm_en adpcm_de epic unepic g721_en g721_de mpeg_en mpeg_de pegwit_en pegwit_de Avg byte w/o recoding byte w/ recoding IWLG
10 15 20 25 comp li ijpeg go vortex m88k gcc perl adpcm_en adpcm_de epic unepic g721_en g721_de mpeg_en mpeg_de pegwit_en pegwit_de Avg
byte w/o recoding byte w/ recoding IWLG
A novel hardware technique to reduce cache
Small area and delay overhead
l Area: 9%, Delay: 2 F04 gate-delays
Average energy saving: D-Cache: 26%, I-
l Processor wide: ~10% for typical embedded processors
Completely orthogonal to existing energy
Dynamic Zero Compression is applicable to
Second level caches DRAM Datapath [Canal et. al. Micro-33]