Error-Controlled Lossy Compression Optimized for High Compression - - PowerPoint PPT Presentation

error controlled lossy compression optimized for high
SMART_READER_LITE
LIVE PREVIEW

Error-Controlled Lossy Compression Optimized for High Compression - - PowerPoint PPT Presentation

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets Xin Liang(University of California, Riverside) Sheng Di (Argonne National Laboratory) Dingwen Tao (University of Alabama) Sihuan Li (University of


slide-1
SLIDE 1

Xin Liang(University of California, Riverside) Sheng Di (Argonne National Laboratory) Dingwen Tao (University of Alabama) Sihuan Li (University of California, Riverside) Shaomeng Li (National Center for Atmospheric Research) Hanqi Guo (Argonne National Laboratory) Zizhong Chen (University of California, Riverside) Franck Cappello (Argonne National Laboratory & UIUC)

1

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

slide-2
SLIDE 2

2 Exascale Computing Project

Outline

2

u Introduction

Ø

Large amount of scientific data

Ø

Limitations of lossless compression

u Related Works and Limitations u Proposed Predictors

Ø

Mean-integrated Lorenzo predictor

Ø

Regression predictor

u Adaptive Predictor Selection u Evaluation u Conclusion

slide-3
SLIDE 3

3 Exascale Computing Project

Introduction – Large Amount of Data

3

u Extremely large amount of data are produced by scientific

simulations and instruments

Ø

CESM (Climate Simulation)

² 2.5 PB raw data produced ² 170 TB post-processed data Ø

HACC (Cosmology Simulation)

² 20 PB data: a single 1-trillion-particle simulation ² Mira at ANL: 26 PB file system storage ² 20 PB / 26 PB ~ 80%

Two partial visualizations of HACC simulation data: coarse grain on full volume or full resolution on small sub-volumes

slide-4
SLIDE 4

4 Exascale Computing Project

Introduction – Large Amount of Data

4

u APS-U: next-generation APS (Advanced

Photon Source) project at ANL

Ø 15 PB data for storage Ø 35 TB post-processed floating-point data Ø 100 GB/s bandwidth between APS and Mira Ø 15 PB / 100 GB/s ~ 105 seconds (42 hours)

slide-5
SLIDE 5

5 Exascale Computing Project

Introduction – I/O Bottleneck

5

u I/O improvement is less than the other parts u From 1960 ~ 2014

Ø Supercomputer speed increased by 11 orders of

magnitude

Ø I/O capacity increased 6 orders of magnitude Ø Internal drive access rate increased by 3~4 orders

  • f magnitude

Ø We are producing more data than we can store!

Parallel I/O introductory tutorial, online

slide-6
SLIDE 6

6 Exascale Computing Project

Introduction – Limitations of Existing Lossless Compressors

6

u Existing lossless compressors work not efficiently on large-scale scientific data

(compression ratio up to 2)

Table 1: Compression ratios for lossless compressors on large-scale simulations Compression ratio = Original data size / Compressed data size

slide-7
SLIDE 7

7 Exascale Computing Project 7

u Lossy compression is then proposed to trade the accuracy for compression ratio u Error-bounded lossy compression, in addition, provides user with means to control the error

Introduction – Lossy Compressors

slide-8
SLIDE 8

8 Exascale Computing Project 8

u Common compression modes for error-bounded lossy compressors Ø Point-wise absolute error bound ² ² SZ, ZFP, TTHRESH etc. Ø Point-wise relative error bound ² ² ISABELA, FPZIP, SZ etc.

Introduction – Error-bounded Lossy Compressors

slide-9
SLIDE 9

9 Exascale Computing Project

Introduction – Common Assessments

9

u Common metrics for accessing error-bounded lossy compressors

Ø Compression Ratio (cratio) ² cratio = ()*+,-./00/1 023/ *+,-./00/1 023/ Ø Compression/Decompression Rate/Speed (crate/drate) ² 45678 = ()*+,-./00/1 023/ *+,-./002+) 92,/

:5678 =

()*+,-./00/1 023/ 1/*+,-./002+) 92,/ Ø RMSE (root of mean squared error) & PSNR (peak signal-to-noise ratio) ² ;<=> = ? @ ∑2B? @

8CD0 E FGH5 = 20 KLM?N

OPQRSOPTU VWXY Ø Rate-distortion

Tao etal, Z-checker: A framework for assessing lossy compression of scientific data, IJHPCA

bitrate = G\]8L^ :6769_-/ ∗ 8 4567\L

slide-10
SLIDE 10

10 Exascale Computing Project

Introduction – Existing Error-Bounded Lossy Compressors

10

u Existing state-of-the-art error-bounded lossy compressors Ø ISABELA (NCSU) ² Sorting preconditioner ² B-Spline interpolation Ø FPZIP (LLNL) ² Lorenzo prediction with truncation on fixed-point residuals ² Entropy encoding Ø ZFP (LLNL) ² Block-wise exponent align and customized orthogonal block transform ² Block-wise embedded encoding Ø SZ-1.4 (ANL) ² Multi-layer prediction with linear-scaling quantization ² Huffman encoding followed by lossless compression

slide-11
SLIDE 11

11 Exascale Computing Project

Introduction – SZ Decorrelation (Prediction)

11

u SZ has 4 key stages, among which prediction and quantization are the most important two.

3D Lorenzo predictor

input prediction encoding (lossless)

  • utput

quantization

x y z

Current data point

f111

(L)

'

f000

'

f100

'

f110

'

f011

'

f001

'

f010

'

f101

SZ multi-layer predictor, default 1 (Lorenzo predictor)

Tao etal., Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error- Controlled Quantization, IPDPS17.

slide-12
SLIDE 12

12 Exascale Computing Project

Introduction – SZ Linear-Scaling Quantization

12

input prediction encoding (lossless)

  • utput

quantization

Uniform quantization with linear scaling in SZ-1.4

Tao etal., Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization, IPDPS17.

First-phase Predicted Value Real Value Error Bound 2*Error Bound 2*Error Bound

… …

Uniform Quan>za>on Code +1

  • 1
  • 2

+2 Second-phase Predicted Value Second-phase Predicted Value Second-phase Predicted Value Second-phase Predicted Value 2*Error Bound 2*Error Bound

Predicted Value Real Value Error Bound

SZ-1.1 à SZ-1.4

(i) Expand the quantization interval from the predicted value (made by previous prediction model) by linear scaling of the error bound (ii) Encode the real value using the quantization interval number (quantization code)

Quantization with one interval in SZ-1.1

Tao etal., Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization, IPDPS17.

slide-13
SLIDE 13

13 Exascale Computing Project

Introduction – Limitation of Lorenzo Predictor

13

u Has to use decompressed data for prediction Ø Low prediction accuracy for large error bound ² Predict using inaccurate decompressed data ² The higher dimensional data, the larger the error ² Lead to unexpected artifacts

SZ 1.4 decompressed (cr 111:1) Origin data

slide-14
SLIDE 14

14 Exascale Computing Project

Introduction – Limitation of Lorenzo Predictor

14

u Has to use decompressed data for prediction Ø Low prediction accuracy for large error bound ² Predict using inaccurate decompressed data ² The higher dimensional data, the larger the error ² Lead to unexpected artifacts Ø Hard to parallelize ² Frequent data dependency between points Ø Prone to error ² One error may propagate to all data

SZ multi-layer predictor, default 1 (Lorenzo predictor)

Tao etal., Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization, IPDPS17.

slide-15
SLIDE 15

15 Exascale Computing Project

New Design – Mean-Integrated Lorenzo Predictor

15

u Use mean to approximate data in the densest interval Ø Advantages ² Predict without decompressed data ² Reduced artifacts ² Work well when data has obvious background Ø Limitations ² Only cover data in the interval ² Degraded performance for more uniform data

slide-16
SLIDE 16

16 Exascale Computing Project

New Design – Mean-Integrated Lorenzo Predictor

16

u Adaptive selection – select Lorenzo or M-Lorenzo according to maximum data density in the

densest interval

² Sample dataset: Hurricane, 13 data fields

20 30 40 50 60 70 80 90 100 110 1 2 3 4 5 6 PSNR Bit Rate M-Lorenzo Lorenzo 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 Percent of Mean-Integrated Lorenzo Bit Rate

Overall rate distortion on Hurricane Percent of M-Lorenzo

slide-17
SLIDE 17

17 Exascale Computing Project

New Design – Regression Predictor

17

u Divide data into individual blocks and predict data point in a block by the coefficients from the

linear regression model

Ø Compute Regression Coefficients Ø Predict the data by regression coefficients ² 3D case: !"#"$ = &' ∗ ) + &+ ∗ , + &- ∗ . + &/

2D Regression

Di etal., SZ turtorial, SC18

slide-18
SLIDE 18

18 Exascale Computing Project

New Design – Regression Predictor: Advantages

18

u Does not need to use decompressed data for prediction Ø Prediction accuracy will hold for different error bounds ² Predict using the stored coefficients ² Less visible artifacts even when error bound is large

SZ 1.4 decompressed (cr 111:1) OurSol decompressed (cr 117:1) Origin data

slide-19
SLIDE 19

19 Exascale Computing Project

New Design – Regression Predictor: Advantages

19

u Does not need to use decompressed data for prediction Ø Prediction accuracy will hold for different error bounds ² Predict using the stored coefficients ² Less visible artifact even when error bound is large Ø Highly parallelizable both inter-block and intra-block ² No data dependency between data points Ø Controlled error propagation ² Error would propagate to at most one block

slide-20
SLIDE 20

20 Exascale Computing Project

New Design – Regression Predictor: Limitations

20

u Regression works better under large error bound Ø Cannot model sharp change because of the reconstructed hyperplane is always linear ² Lorenzo: constant for 1D data, linear for 2D data, quadratic for 3D data etc.. Ø High storage cost for regression coefficients ² 1/54 overhead for 6x6x6 block, although it could be further compressed Ø Higher computational cost ² More multiplications than the Lorenzo predictor

slide-21
SLIDE 21

21 Exascale Computing Project

New Design – Adaptive Selection

21

u Sampling + prediction error estimation Ø Regression error can be estimated accurately ² error = !" ∗ $ + !& ∗ ' + !( ∗ ) + !* − data Ø Add some penalty term for Lorenzo ² error = fL − data + 1.22 ∗ 4

Diagonal sampling Penalty estimation

slide-22
SLIDE 22

22 Exascale Computing Project

New Design – Adaptive Selection

22

u Sampling + prediction error estimation Ø Regression error can be estimated accurately ² error = !" ∗ $ + !& ∗ ' + !( ∗ ) + !* − data Ø Add some penalty term for Lorenzo ² error = fL − data + 1.22 ∗ 4

20 30 40 50 60 70 80 90 100 110 1 2 3 4 5 6 PSNR (dB) Bit-rate Our sol. w/ penalty coeff. Our sol. w/o penalty coeff. All regression

40 50 60 70 80 90 0.5 1 1.5 2 2.5 PSNR (dB)

Bit-rate

Our sol. w/ penalty coeff. Our sol. w/o penalty coeff. All regression

20% 40% 60% 80% 100% 0.5 1 1.5 2 2.5 Percentage of regression blocks

Bit-rate

Our sol. w/ penalty coeff. Our sol. w/o penalty coeff. All regression

Rate-distortion on ISABEL (overall) Rate-distortion on TCf48 (single field)

slide-23
SLIDE 23

23 Exascale Computing Project

Evaluation – Experiments Setup

23

u Evaluated Datasets Ø CSEM-ATM:

2D, climate simulation

Ø Hurricane-ISABEL:

3D, meteorological simulation

Ø NYX:

3D, cosmological simulation

Ø SCALE-LETKF:

3D, weather simulation

u Experimental platforms Ø Bebop cluster at ANL – up to 128 nodes, each with two Intel Xeon E5-2695 v4 processors

and 128 GB of memory, and each processor with 16 cores

slide-24
SLIDE 24

24 Exascale Computing Project

Evaluation – Rate Distortion

24 10 20 30 40 50 60 70 80 90 5 10 15 20 25 30 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA 30 40 50 60 70 80 90 0.5 1 1.5 2 2.5 3

CESM rate distortion

10 20 30 40 50 60 70 80 90 5 10 15 20 25 30 35 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA TTHRESH 40 50 60 70 80 1 2 3 4 5

Hurricane rate distortion

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA TTHRESH 30

40 50 60 70 80 0.5 1 1.5 2 2.5 3

NYX rate distortion

10 20 30 40 50 60 70 80 90 5 10 15 20 25 30 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA TTHRESH 30 40 50 60 70 80 90 0.5 1 1.5 2 2.5 3

S-L rate distortion

slide-25
SLIDE 25

25 Exascale Computing Project

Evaluation – Hurricane ISABEL with visualization

  • f CLOUDf48 (Compression Ratio 64)

25

  • rigin

down-sampling + tricubic interpolation ZFP decompressed SZ 1.4 decompressed OurSol decompressed

10 20 30 40 50 60 70 80 90 5 10 15 20 25 30 35 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA TTHRESH 40 50 60 70 80 1 2 3 4 5

Overall rate distortion

slide-26
SLIDE 26

26 Exascale Computing Project

Evaluation – NYX with visualization of dark_matter_density (Compression Ratio 64)

26

  • rigin

down-sampling + tricubic interpolation ZFP decompressed SZ 1.4 decompressed OurSol decompressed Overall rate distortion

10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20 PSNR Bit Rate OurSol SZ ZFP VAPOR FPZIP ISABELA TTHRESH 30

40 50 60 70 80 0.5 1 1.5 2 2.5 3

slide-27
SLIDE 27

27 Exascale Computing Project

Evaluation – Dumping and Loading Performance for NYX

27

u Data dumping time: time to dump data to parallel file system

Ø

Data compression time + writing compressed file time

u Data loading time: time to load data from parallel file system

Ø

Reading compressed file time + data decompression time

100 200 300 400 500 600 700 800 900 1000 S Z Z F P O u r S

  • l

S Z Z F P O u r S

  • l

S Z Z F P O u r S

  • l

Elapsed Time(s) Number of Cores compression time data writing time 8192 4096 2048 200 400 600 800 1000 1200 Elapsed Time(s) S Z Z F P O u r S

  • l

S Z Z F P O u r S

  • l

S Z Z F P O u r S

  • l

Number of Cores 8192 4096 2048 decompression time data reading time

Data dumping time Data loading time

slide-28
SLIDE 28

28 Exascale Computing Project

Conclusion

28

u We propose an adaptive lossy compression framework by dividing a dataset into blocks and

select the best-fit predictor for each block

u We propose two new predictors to mitigate the impact of decompressed data in traditional

Lorenzo predictor under large error bounds

u We develop an algorithm to adaptively select the best-fit predictor u We evaluate the proposed method with six other state-of-the-art lossy compressors in terms

  • f rate distortion. We also show the visual distortion of some sample fields and conduct

parallel experiments for the dumping/loading performance on parallel file system

slide-29
SLIDE 29

29 Exascale Computing Project

Thanks

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department

  • f Energy’s Office of Science and National

Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

slide-30
SLIDE 30

30 Exascale Computing Project 30

Thank you ! Any Questions?