Fix Fixed ed-PS PSNR Lossy Compres essio ion for Sci - - PowerPoint PPT Presentation

fix fixed ed ps psnr lossy compres essio ion for sci
SMART_READER_LITE
LIVE PREVIEW

Fix Fixed ed-PS PSNR Lossy Compres essio ion for Sci - - PowerPoint PPT Presentation

Fix Fixed ed-PS PSNR Lossy Compres essio ion for Sci Scientific D c Data Dingwen Tao (The University of Alabama, USA) Sheng Di (Argonne National Laboratory, USA) Xin Liang (University of California, Riverside, USA) Zizhong Chen


slide-1
SLIDE 1

Fix Fixed ed-PS PSNR Lossy Compres essio ion for Sci Scientific D c Data

Dingwen Tao (The University of Alabama, USA) Sheng Di (Argonne National Laboratory, USA) Xin Liang (University of California, Riverside, USA) Zizhong Chen (University of California, Riverside, USA) Franck Cappello (Argonne National Laboratory, USA) September 2018

slide-2
SLIDE 2

Outline

  • Introduction
  • Background
  • State-of-the-Art Lossy Compressor
  • Peak Signal to Noise Ratio (PSNR)
  • L2-Norm-Preserving Lossy Compression
  • Design of Fixed-PSNR Lossy Compression
  • Experimental Evaluation
  • Conclusion
slide-3
SLIDE 3
  • Scientific data are growing extremely
  • Ability to generate data is exceeding our ability to store and analyze
  • Simulation systems and observation instruments grow in capability with Moore’s Law
  • Petabyte (PB) data sets will be common soon!
  • Climate Simulation (CESM)
  • High resolution simulation à 1TB of data generated per compute day
  • IPCC Coupled Model Intercomparison Projects (CMIPs)
  • Phase 5 (2013): 2.5 PB of output à Phase 6 (2018): 10 PB expected!
  • The relative cost of storage is increasing
  • Previous NCAR platform (2013): ~ 20% of hardware budget
  • Current NCAR platform (2017): ~50% of hardware budget
  • Data reduction of about a factor at least 10 is needed! (A. Baker et al., HPDC’16)

3

In Intr troduc ductio tion

slide-4
SLIDE 4
  • ISABELA (NCSU)
  • Sorting preconditioner
  • B-Spline interpolation
  • ZFP (LLNL)
  • Customized orthogonal block transform
  • Exponent alignment
  • Block-wise bit-stream truncation
  • SZ (ANL)
  • Multi-dimensional prediction
  • Error-controlled quantization
  • Variable-length encoding
  • Unpredictable data analysis
  • VAPOR (NCAR)
  • Wavelets transformation
  • Vector quantization

4

St State-of

  • f-th

the-Ar Art Lossy Co Compression

slide-5
SLIDE 5
  • ISABELA (NCSU)
  • Sorting preconditioner
  • B-Spline interpolation
  • ZFP (LLNL)
  • Customized orthogonal block transform
  • Exponent alignment
  • Block-wise bit-stream truncation
  • SZ (ANL)
  • Multi-dimensional prediction
  • Error-controlled quantization
  • Variable-length encoding
  • Unpredictable data analysis
  • VAPOR (NCAR)
  • Wavelets transformation
  • Vector quantization

5

St State-of

  • f-th

the-Ar Art Lossy Co Compression

Pointwise relative error bound Absolute error bound Pointwise relative error bound Value-range-based relative error bound No error bound control scheme Absolute error bound

Error Control Mode

slide-6
SLIDE 6
  • ISABELA (NCSU)
  • Sorting preconditioner
  • B-Spline interpolation
  • ZFP (LLNL)
  • Customized orthogonal block transform
  • Exponent alignment
  • Block-wise bit-stream truncation
  • SZ (ANL)
  • Multi-dimensional prediction
  • Error-controlled quantization
  • Variable-length encoding
  • Unpredictable data analysis
  • VAPOR (NCAR)
  • Wavelets transformation
  • Vector quantization

6

St State-of

  • f-th

the-Ar Art Lossy Co Compression

None can control l2-norm-based data distortion (e.g., RMSE, PSNR)

Pointwise relative error bound Absolute error bound Pointwise relative error bound Value-range-based relative error bound No error bound control scheme Absolute error bound

Error Control Mode

slide-7
SLIDE 7
  • PSNR: one of most critical indicators

used to assess the distortion of reconstructed data v.s. original data

  • PSNR is defined as

where NRMSE (normalized root mean squared error) is

Pe Peak Signal to Noise Ratio

(a) original raw data (b) SZ-2.0 (PSNR=56 dB) (c) SZ-1.4 (PSNR=39 dB) (d) ZFP (PSNR=31 dB)

  • Fig. Distortion of Slice 50 in Hurrican-ISABEL

(TCf48) Data with Compression Ratio of 117:1

!"#$ = −20 ) *+,-. #$/"0 #$/"0 = ∑23-

4 (62 − 62 7)9

: 6;<= − 6;24

slide-8
SLIDE 8
  • PSNR: one of most critical indicators

used to assess the distortion of reconstructed data v.s. original data

  • PSNR is defined as

where NRMSE (normalized root mean squared error) is

Pe Peak Signal to Noise Ratio

(a) original raw data (b) SZ-2.0 (PSNR=56 dB) (c) SZ-1.4 (PSNR=39 dB) (d) ZFP (PSNR=31 dB)

  • Fig. Distortion of Slice 50 in Hurrican-ISABEL

(TCf48) Data with Compression Ratio of 117:1

!"#$ = −20 ) *+,-. #$/"0 #$/"0 = ∑23-

4 (62 − 62 7)9

: 6;<= − 6;24

L2-norm-based data distortion

slide-9
SLIDE 9
  • Prediction-based Lossy Compression
  • Compression

1. Predict data values (i.e., Xpred) and calculate prediction errors (i.e., Xpe) 2. Quantize or encode Xpe 3. Entropy encoding (optional)

  • Decompression

1. Reconstruct prediction values (i.e., !"#$%

%$&'(")

2. De-quantize or decode !"$

%$&'("

3. Reconstruct data values !%$&'(" = !"#$%

%$&'("+ !"$ %$&'("

L2-No Norm rm-Pr Preserving Lossy Compression

slide-10
SLIDE 10
  • Prediction-based Lossy Compression
  • Compression

1. Predict data values (i.e., Xpred) and calculate prediction errors (i.e., Xpe) 2. Quantize or encode Xpe 3. Entropy encoding (optional)

  • Decompression

1. Reconstruct prediction values (i.e., !"#$%

%$&'(")

2. De-quantize or decode !"$

%$&'("

3. Reconstruct data values !%$&'(" = !"#$%

%$&'("+ !"$ %$&'("

L2-No Norm rm-Pr Preserving Lossy Compression

! − !%$&'(" = !"$ − !"$

%$&'("

Assure Xpred = !"#$%

%$&'(" in compression

Otherwise data loss will propagate!

Theorem 1: For prediction-based lossy compression, overall L2-norm-based data distortion is as same as the distortion (introduced in Step 2) of the prediction error.

slide-11
SLIDE 11
  • Prediction-based Lossy Compression
  • Compression

1. Predict data values (i.e., Xpred) and calculate prediction errors (i.e., Xpe) 2. Quantize or encode Xpe 3. Entropy encoding (optional)

  • Decompression

1. Reconstruct prediction values (i.e., !"#$%

%$&'(")

2. De-quantize or decode !"$

%$&'("

3. Reconstruct data values !%$&'(" = !"#$%

%$&'("+ !"$ %$&'("

L2-No Norm rm-Pr Preserving Lossy Compression

! − !%$&'(" = !"$ − !"$

%$&'("

Assure Xpred = !"#$%

%$&'(" in compression

Otherwise data loss will propagate!

Theorem 1: For prediction-based lossy compression, overall L2-norm-based data distortion is as same as the distortion (introduced in Step 2) of the prediction error.

Also CORRECT for orthogonal-transform-based lossy compression (such as ZFP)

slide-12
SLIDE 12

L2-No Norm rm-Pr Preserving Lossy Compression

Theorem 1: For prediction-based lossy compression, overall L2-norm-based data distortion is as same as the distortion (introduced in Step 2) of the prediction error. Theorem 2: For orthogonal-transform-based lossy compression, overall L2-norm-based data distortion is as same as the distortion (introduced in Step 2) of the transformed data.

Overall L2-norm-based distortion Data distortion in Step 2 ~

estimate

slide-13
SLIDE 13
  • Quantization: prediction errors (or transformed data), i.e.,

!" à a set of integers

  • Dequantization: integers à decompressed prediction errors
  • !"~$(&) – probability density function
  • ()* !", ,

!" = ∫

/0 0 & − 2

& 3 4 $ & 5&

Des esign gn of Fi Fixed ed-PS PSNR Lossy Compres essio ion

slide-14
SLIDE 14
  • Quantization: prediction errors (or transformed data), i.e.,

!" à a set of integers

  • Dequantization: integers à decompressed prediction errors
  • !"~$(&) – probability density function
  • ()* !", ,

!" = ∫

/0 0 & − 2

& 3 4 $ & 5&

Des esign gn of Fi Fixed ed-PS PSNR Lossy Compres essio ion

≈ 1 6 9

:;< =

>:

?$(@:)

Quantization bin size Quantization bin’s midpoint

slide-15
SLIDE 15
  • Quantization: prediction errors (or transformed data), i.e.,

!" à a set of integers

  • Dequantization: integers à decompressed prediction errors
  • !"~$(&) – probability density function
  • ()* !", ,

!" = ∫

/0 0 & − 2

& 3 4 $ & 5&

  • Assume uniform quantization:
  • 67 = 63 ⋯ = 639 = 6
  • Adopted by SZ lossy compressor
  • ()* =

7 : 6;, $)<= = 20 4 @AB7C DE F

+ 10 4 @AB7C12

Des esign gn of Fi Fixed ed-PS PSNR Lossy Compres essio ion

≈ 1 6 K

LM7 9

6L

;$(NL)

slide-16
SLIDE 16
  • Quantization: prediction errors (or transformed data), i.e.,

!" à a set of integers

  • Dequantization: integers à decompressed prediction errors
  • !"~$(&) – probability density function
  • ()* !", ,

!" = ∫

/0 0 & − 2

& 3 4 $ & 5&

  • Assume uniform quantization:
  • 67 = 63 ⋯ = 639 = 6
  • Adopted by SZ lossy compressor
  • ()* =

7 : 6;, $)<= = 20 4 @AB7C DE F

+ 10 4 @AB7C12

  • Absolute error bound = 2 4 6 è IJKLM =

3 4 10/OPQR

ST 4 UV

Des esign gn of Fi Fixed ed-PS PSNR Lossy Compres essio ion

≈ 1 6 Y

Z[7 9

6Z

;$(\Z)

slide-17
SLIDE 17
  • Experimental setup
  • Bebop cluster at Argonne (Intel Xeon E5-2695 v4 processors and 128 GB)
  • Data: 2D CESM-ATM, 3D hurricane-Isabel, 3D NYX
  • Implement fixed-PSNR mode based on SZ framework

Expe Experimental Evalua uation

slide-18
SLIDE 18
  • Our fixed-PSNR lossy compressor

can precisely control PSNRs

  • Meet PSNR’s demands for 90%+ ATM

data fields on average

  • PSNRs limited within 0.1~5.0 dB on

average

  • Higher PSNR of demand, better our

fixed-PSNR method performs

Expe Experimental Resul sults

slide-19
SLIDE 19
  • Explore an in-depth analysis to precisely predict overall data distortion (such

as MSE and PSNR) based on state-of-the-art lossy compressors

  • Propose a novel fixed-PSNR method that can allow users to fix PSNR during

lossy compression based on our accurate prediction of PSNR

  • Implement our proposed fixed-PSNR method based on SZ lossy compression

framework and release code as an open-source tool

  • Evaluate our fixed-PSNR method using three real-world HPC data sets
  • Future work
  • Implement fixed-PSNR mode to ZFP and other lossy compressors
  • Select best-fit compressor based on precise error and ratio estimations

Con Conclusion

  • ns
slide-20
SLIDE 20

Ackn knowledge

20

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. The material was also supported by and supported by the National Science Foundation under Grant No. 1305624,

  • No. 1513201, and No. 1619253.
slide-21
SLIDE 21

Thank you!

Any questions are welcome!