Linearly Compressed Pages: A Main Memory Compression Framework with - PowerPoint PPT Presentation

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

Executive Summary  Main memory is a limited shared resource  Observation : Significant data redundancy  Idea : Compress data in main memory  Problem : How to avoid latency increase?  Solution : Linearly Compressed Pages (LCP): fixed-size cache line granularity compression 1. Increases capacity ( 69% on average) 2. Decreases bandwidth consumption ( 46% ) 3. Improves overall performance ( 9.5% ) 2

Challenges in Main Memory Compression 1. Address Computation 2. Mapping and Fragmentation 3. Physically Tagged Caches 3

Address Computation Cache Line (64B) Uncompressed L 0 L 1 L 2 . . . L N-1 Page Address Offset 128 0 (N-1)*64 64 Compressed L 0 L 1 L 2 . . . L N-1 Page Address Offset 0 ? ? ? 4

Mapping and Fragmentation Virtual Page (4kB) Virtual Address Physical Address Physical Page Fragmentation ( ? kB) 5

Physically Tagged Caches Virtual Core Address Critical Path Address Translation TLB Physical Address tag data L2 Cache tag data Lines tag data 6

Shortcomings of Prior Work Compression Access Decompression Complexity Compression Mechanisms Latency Latency Ratio     IBM MXT [IBM J.R.D. ’01] 7

Shortcomings of Prior Work Compression Access Decompression Complexity Compression Mechanisms Latency Latency Ratio     IBM MXT [IBM J.R.D. ’01] Robust Main Memory     Compression [ISCA’05] 8

Shortcomings of Prior Work Compression Access Decompression Complexity Compression Mechanisms Latency Latency Ratio     IBM MXT [IBM J.R.D. ’01] Robust Main Memory     Compression [ISCA’05] LCP:     Our Proposal 9

Linearly Compressed Pages (LCP): Key Idea Uncompressed Page (4kB: 64* 64B ) . . . 64B 64B 64B 64B 64B 4:1 Compression Exception M E . . . Storage Metadata (64B): Compressed Data ? (compressible) (1kB) 10

LCP Overview • Page Table entry extension – compression type and size – extended physical base address • Operating System management support – 4 memory pools (512B, 1kB, 2kB, 4kB) • Changes to cache tagging logic – physical page base address + cache line index (within a page) • Handling page overflows • Compression algorithms: BDI [PACT’12] , FPC [ISCA’04] 11

LCP Optimizations • Metadata cache – Avoids additional requests to metadata • Memory bandwidth reduction: 1 transfer 64B 64B 64B 64B instead of 4 • Zero pages and zero cache lines – Handled separately in TLB (1-bit) and in metadata (1-bit per cache line) • Integration with cache compression – BDI and FPC 12

Methodology • Simulator – x86 event-driven simulators • Simics-based [Magnusson+, Computer’02] for CPU • Multi2Sim [Ubal+, PACT’12] for GPU • Workloads – SPEC2006 benchmarks, TPC, Apache web server, GPGPU applications • System Parameters – L1/L2/L3 cache latencies from CACTI [Thoziyoor+, ISCA’08] – 512kB - 16MB L2, simple memory model 13

Compression Ratio Comparison SPEC2006, databases, web workloads, 2MB L2 cache Zero Page FPC Compression Ratio 3.5 LCP (BDI) LCP (BDI+FPC-fixed) MXT LZ 3 2.60 2.5 2.31 2 1.69 1.62 1.59 1.5 1.30 1 GeoMean LCP -based frameworks achieve competitive average compression ratios with prior work 14

Bandwidth Consumption Decrease SPEC2006, databases, web workloads, 2MB L2 cache FPC-cache BDI-cache FPC-memory (None, LCP-BDI) (FPC, FPC) (BDI, LCP-BDI) Normalized BPKI (BDI, LCP-BDI+FPC-fixed) 1.2 1 0.89 0.92 Better 0.8 0.63 0.57 0.55 0.54 0.54 0.6 0.4 0.2 0 GeoMean LCP frameworks significantly reduce bandwidth ( 46 %) 15

Performance Improvement Cores LCP-BDI (BDI, LCP-BDI) (BDI, LCP-BDI+FPC-fixed) 1 6.1% 9.5% 9.3% 2 13.9% 23.7% 23.6% 4 10.7% 22.6% 22.5% LCP frameworks significantly improve performance 16

Conclusion • A new main memory compression framework called LCP(Linearly Compressed Pages) – Key idea: fixed size for compressed cache lines within a page and fixed compression algorithm per page • LCP evaluation: – Increases capacity ( 69% on average) – Decreases bandwidth consumption ( 46% ) – Improves overall performance ( 9.5% ) – Decreases energy of the off-chip bus ( 37 %) 17

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

Linearly Compressed Pages: A Main Memory Compression Framework with - PowerPoint PPT Presentation

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu Executive Summary Main memory is a limited shared resource Observation :

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy Po-An Tsai and

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Why re-compression of a compressed graph? large graphs long time to find a good

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Linear Algebra in File Compression: SVD and DCT By: Andrew Fraser How Are Images Stored?

Headquartered in Tulsa, Oklahoma, TESCORP distributes, fabricates, and services its line of

With Our Partner: A Program By: Letters from Soldiers Thank you so much for the birthday cake

POI360 Panoramic Mobile Video Telephony over LTE Cellular Networks Xiufeng Xie Xinyu Zhang

and Fire Goal: Improve quality and efficiency of methods used to visualize smoke and fire Glenn

EXAR A NEW DIRECTION Mixed Signal and Data Management Solutions for a Connected World Forwar

Local compression and Word Equations Artur Je MPI, Germany 28 February 2013 Compression and

Baltic Marine Environment Protection Commission Task force on migratory fish species FISH-M

Linearly Compressed Pages: A Main Memory Compression Framework with - PowerPoint PPT Presentation

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu Executive Summary Main memory is a limited shared resource Observation :

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % &amp; non-linearly. As

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy Po-An Tsai and

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Why re-compression of a compressed graph? large graphs long time to find a good

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Lecture 10: Larger-than-Memory Databases 1 / 53 Larger-than-Memory Databases Recap

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Linear Algebra in File Compression: SVD and DCT By: Andrew Fraser How Are Images Stored?

Headquartered in Tulsa, Oklahoma, TESCORP distributes, fabricates, and services its line of

With Our Partner: A Program By: Letters from Soldiers Thank you so much for the birthday cake

POI360 Panoramic Mobile Video Telephony over LTE Cellular Networks Xiufeng Xie Xinyu Zhang

and Fire Goal: Improve quality and efficiency of methods used to visualize smoke and fire Glenn

EXAR A NEW DIRECTION Mixed Signal and Data Management Solutions for a Connected World Forwar

Local compression and Word Equations Artur Je MPI, Germany 28 February 2013 Compression and

Baltic Marine Environment Protection Commission Task force on migratory fish species FISH-M

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As