Accelerating Relative-error Bounded Lossy Compression for HPC - PowerPoint PPT Presentation

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, Sheng Di, Dingwen Tao and Franck Cappello Harbin Institute of Technology, Shenzhen & Peng Cheng Laboratory & Marvell Technology Group & Argonne National Laboratory & University of Alabama & University of Illinois at Urbana-Champaign 2019/5/23

Outline • Background of research • Our design • Evaluation • Conclusion 2 / 21

Background • Scientific simulations • Climate scientists need to run large ensembles of high-fidelity 1kmX1km simulations. Estimating even one ensemble member per simulated day may generate 260 TB of data every 16s across the ensemble. • Cosmologicaly simulation may produce 40PB of data when simulating 1 trillion of particles in hundreds of snapshots. • Data reduction is required • Lossless compression • Simulation data often exhibit high entropy • Reduction ratio usually around 2:1 • Lossy compression • More aggressive data reduction scheme • High reduction ratio 3 / 21

Background - Lossy compressors • ZFP • follow the classic texture compression for image data • Data transformation + embedded coding • Low compression ratio , High compression speed • SZ • Prediction + quantization + Huffman encodng + Zstd • High compression ratio, Low compression speed • A dilemma: which compressor should I use? • Question: Can we significantly improve compression speed for SZ, leading to an easy solution for users? 4 / 21

Background - Lossy compression error bound • Absolute error bound • For a value f, we get f’ ∈ ( f - ε, f + ε ) is acceptable • Pointwise relative error bound • For a value f, we get f’ ∈ ( f * (1 - ε), f * (1 + ε) ) is acceptable • CLUSTER18: Convert a pointwise relative error bound to an absolute error bound with a logarithmic transformation • log(f*(1 - ε))=log(f)+log(1 - ε), log(f*(1 + ε))=log(f)+log(1 + ε) • log(f’) ∈ ( log(f) + log(1 - ε ), log(f) + log(1 + ε)) 5 / 21

Background – design of SZ compressor for relative error control • Preprocess - Logarithmic transformation • Point-by-point processing – prediction & quantization • Huffman encode • Compression with lossless compressor Logarithmic transformation (logX) is too expensive! 6 / 21

Performance breakdown of SZ Compression/Decompression Time costs on log-trans and exp-trans stages consist about 1/3 of the total 7 / 21

Our design - workflow • No longer to calculate the quantization factor, but look up tables. • Using Table T1 to get quantization factor from f • Using Table T2 to get a approximate value of f from quantization factor 8 / 21

Our design - Model A 9 / 21

A general description to model A PI interval 10 / 21

Our design - Model B 11 / 21

A general description about model B 12 / 21

Our design - Advantage of Model B • Any grid (i.e., a data point) is always included in a PI’ • Grid size is smaller than any intersection size, therefore any grid is completely included in one PI’(M) • Effect: Strictly respecting the use-specified error bound 13 / 21

Accelerating Huffman decoding Idea: building precomputed table to accelerate Huffman decoding 14 / 21

Performance Evaluation • Environment • Datasets • 2.4GHz Intel Xeon E5-2640 v4 • NYX (3D, 3.1GB) processors • CESM (2D, 2.0GB) • 256GB memory • Hurrican (3D, 1.9GB) • HACC (1D, 6.3GB) 15 / 21

Compression/Decompression Rate Our Approach is about 1.2x ~ 1.5x than original SZ on compression rate and 1.3x ~ 3.0x on decompression rate 。 16 / 21

Compression/Decompression breakdown No time cost on log-trans and exp-trans. Time cost on build-table stage is very small. 17 / 21

Compression Ratio We can observe that our solution (SZ_T) has very similar compression ratios with SZ_T. 18 / 21

Data quality Comparable compression ratios with related works (SZ_T and ZFP_T) 19 / 21

Data quality (Cont’d) Visualization of decompressed dark matter density dataset (slice 200) at the compression ratio of 2.75. SZ series has a better visual quality than ZFP does. SZ_P (both mode A and B) lead to satisfied visual quality! 20 / 21

Conclusion • We accelerate the SZ compressor for point-wise relative error bound control by designing a table- lookup method. • We control the error bound strictly by an in-depth analysis of mapping relation between predicted value and quantization factor. • Experiments show that 1.2x ~ 1.5x on compression speed and 1.3x ~ 3.0x on decompression speed, compared with SZ 2.1. 21 / 21

Thank you Contact: Sheng Di (sdi1@anl.gov) 2019/5/23

Accelerating Relative-error Bounded Lossy Compression for HPC - PowerPoint PPT Presentation

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, Sheng Di, Dingwen Tao and Franck Cappello Harbin Institute of Technology, Shenzhen

Lossless compression in lossy compression systems Almost every lossy compression system

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

DeepSZ : A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

PaSTR TRI: E : Err rror-Bou Bounded Los Lossy Comp Compression on for or Two-El Electron

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

Algorithms in the Real World Data Compression 4 Page 1 Compression Outline Introduction : Lossy

Data Compression Lossless And Lossy Compression compressedData = compress(originalData)

Data Compression Lossless And Lossy Compression compressedData = compress(originalData)

The Parametric Complexity of Lossy Counter Machines Sylvain Schmitz ICALP , July 12, 2019,

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

6.02 Fall 2012 Lecture #12 Bounded-input, bounded-output stability Frequency response 6.02

32-bit Multipliers Accomplished Milan e ka, Ji Maty, Vojtch Mrzek, Luk

Alternative Strategies for Mapping ACS Estimates and Error of Estimation Joe Francis, Jan Vink,

Effective computation of biased quantiles over data streams Graham Cormode Flip Korn

Supervised Learning Given: a set of inputs features X 1 , . . . , X n a set of target features Y 1

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP:

Visual inspection of forecasts Visual inspection allows you to develop susbtantial insight on

Explore Absolute Value Janet Oien Fort Collins High School, NCTM CRDT-H Deidra Baker

Math 1448C Mathematics Tuesday 13 October 2020 Instructor: dr Adam Abrams Course overview