LFZip: Lossy compression of multivariate time series data via - PowerPoint PPT Presentation

LFZip: Lossy compression of multivariate time series data via improved prediction Shubham Chandak Stanford University DCC 2020 Paper ID: 111

Joint work with • Kedar Tatwawadi, Stanford • Tsachy Weissman, Stanford • Chengtao Wen, Siemens • Max Wang, Siemens • Juan Aparicio, Siemens

Outline • Motivation • Problem formulation and our contribution • Previous work • Methods • Results • Conclusions and future work

Motivation • Sensors are omnipresent: generating vast amounts of data • Data usually in form of real-valued time series Nanopore genome sequencing Figure credit: https://directorsblog.nih.gov/2018/02/06/sequencing-human-genome-with-pocket-sized-nanopore-device/ https://semielectronics.com/sensors-lifeblood-internet-things/

Motivation • Floating-point time series data typically noisy • Lossy compression can lead to vast gains without affecting performance of downstream applications • Multivariate time series • Different variables can have correlations • Compression performed on computationally constrained devices • Low CPU and memory usage (streaming compression)

Problem formulation compressed 𝑦 ! , $ $ 𝑦 " , … , $ 𝑦 # 𝑦 ! , 𝑦 " , … , 𝑦 # Compress Decompress bitstream 32-bit floats !×# Compression ratio = $%&' () *(+,-'..'/ 0%1.-'2+ %# 031'. Error constraint: max %45,…,# 𝑦 % − & 𝑦 % ≤ 𝜗 Maximum absolute error

Our contribution • LFZip: Lossy compressor for time series data • Works with user-specified maximum absolute error • Multivariate time series compression • Based on prediction-quantization-entropy coder framework • Normalized Least Mean Squares (NLMS) prediction • Neural Network prediction • Significant improvement for a variety of datasets • Open source: https://github.com/shubhamchandak94/LFZip

Previous work • Swinging door and critical aperture • retain a subset of the points in the time series based on the maximum error constraint and use linear interpolation during decompression • SZ, ISABELA, NUMARCK • polynomial/linear regression model followed by quantization • SZ current state-of-the-art - Bristol, E. H. "Swinging door trending: Adaptive trend recording?." ISA National Conf. Proc., 1990 . 1990. - Williams, George Edward. "Critical aperture convergence filtering and systems and methods thereof." U.S. Patent No. 7,076,402. 11 Jul. 2006. - Liang, Xin, et al. "An efficient transformation scheme for lossy data compression with point-wise relative error bound." 2018 IEEE International Conference on Cluster Computing (CLUSTER) . IEEE, 2018. - Lakshminarasimhan, Sriram, et al. "ISABELA for effective in situ compression of scientific data." Concurrency and Computation: Practice and Experience 25.4 (2013): 524-540. - Chen, Zhengzhang, et al. "NUMARCK: machine learning algorithm for resiliency and checkpointing." SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis . IEEE, 2014.

Encoder architecture

Decoder architecture

Predictor • Predict based on past window (default 32 steps) • NLMS (normalized least mean square) • Adaptively trained (gradient descent) after every step • Multivariate: predict based on past values for all variables

Predictor • Predict based on past window (default 32 steps) • NLMS (normalized least mean square) • Adaptively trained (gradient descent) after every step • Multivariate: predict based on past values for all variables • NN (neural network) • Offline training performed on separate dataset • We tested fully connected (FC) and RNN models (results shown for FC) • To simulate quantization error during training, we add random noise

Quantizer and entropy coder 16-bit uniform quantization , Δ 1 = 𝑦 1 − 𝑧 1 Δ 1 𝑦 1 & ⊕ with step-size Prediction error 2𝜗 𝑧 1 • If prediction error above 2 !$ 𝜗 , set $ 𝑦 % = 𝑦 %

Quantizer and entropy coder 16-bit uniform quantization , Δ 1 = 𝑦 1 − 𝑧 1 Δ 1 𝑦 1 & ⊕ with step-size Prediction error 2𝜗 𝑧 1 • If prediction error above 2 !$ 𝜗 , set $ 𝑦 % = 𝑦 % • Entropy coding: BSC (https://github.com/IlyaGrebnov/libbsc) • High performance compressor based on BWT

Results: datasets

Results: univariate (NLMS prediction)

Results: univariate (NLMS prediction) LFZip performs better LFZip performs worse

Results: univariate (NN prediction)

Results: multivariate (NLMS prediction)

Results: computation • LFZip (NLMS): ~2M timesteps/s for univariate • Slower than SZ but practical for most applications • LFZip (NN): ~1K timesteps/s for the fully connected model used • Run single-threaded on a CPU to allow reproducibility • Requires further optimizations for practical usage

Conclusions and future work • LFZip: error-bounded lossy compressor for multivariate floating-point time series • Based on prediction-quantization-entropy coder framework • Achieve improved compression using NLMS and NN models

Conclusions and future work • LFZip: error-bounded lossy compressor for multivariate floating-point time series • Based on prediction-quantization-entropy coder framework • Achieve improved compression using NLMS and NN models • Future work includes • optimized implementation for the neural network based framework • extension of the framework to multidimensional datasets • exploration of other predictive models to further boost compression

Thank You! Check out https://github.com/shubhamchandak94/LFZip

LFZip: Lossy compression of multivariate time series data via - PowerPoint PPT Presentation

LFZip: Lossy compression of multivariate time series data via improved prediction Shubham Chandak Stanford University DCC 2020 Paper ID: 111 Joint work with Kedar Tatwawadi, Stanford Tsachy Weissman, Stanford Chengtao Wen, Siemens

Lossless compression in lossy compression systems Almost every lossy compression system

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Data Compression Lossless And Lossy Compression compressedData = compress(originalData)

Data Compression Lossless And Lossy Compression compressedData = compress(originalData)

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Algorithms in the Real World Data Compression 4 Page 1 Compression Outline Introduction : Lossy

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Data Compression (Chapters 4-6) presented by Tapani Raiko Feb 26, 2004 Contents (Data

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

The Parametric Complexity of Lossy Counter Machines Sylvain Schmitz ICALP , July 12, 2019,

On Variable Dependencies and Compressed Pattern Databases Malte Helmert 1 Nathan Sturtevant 2

Practical Near-Collisions and Collisions on Reduced-Round ECHO-256 Compression Function Jrmy

Fast Text Compression with Neural Networks Matthew Mahoney Florida Institute of Technology

A Little Confusing Without [a block digest], one must query the offset digest with all

Raimund Seidel

Control Plane Compression Ryan Becke* Aar- Gupta Ratul Mahajan David Walker 3 Good news!

Data Parallelism in Training Sparse Neural Networks Namhoon Lee 1 , Philip Torr 1 , Martin Jaggi 2

Session Slides: Improving Web Performance with Dynamic Compression by Slava Bizyayev