Deep Learning for Image and Video Compression Yao Wang Dept. of - PowerPoint PPT Presentation

Deep Learning for Image and Video Compression Yao Wang Dept. of Electrical and Computer Engineering NYU Wireless Tandon School of Engineering New York University wp.nyu.edu/videolab AOMedia Research Symposium, Oct, 2019, San Francisco

Outline q Learnt image compression using variational encoders ¤ Framework of Balle et al. ¤ Improvement using nonlocal attention maps and masked 3D convolution for conditional entropy coding (with Zhan Ma, Nanjing Univ.) ¤ Scalable extension q Learnt video compression (with Zhan Ma, Nanjing Univ.) q Exploratory work: ¤ Video prediction using dynamic deformable filters ¤ Block-based image compression by denoising with side information 2

Image Compression Using Variational Autoencoder (General Framework) y: features describing image z (hyper priors): features for estimating marginal probability model parameters for y (STD of Gaussian) [Balle2018] J. Balle, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” ICLR 2018 3

VAE Using Autoregressive Context Model [Minnen2018] D. Minnen, J. Balle, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” NIPS 2018. Context model: adjacent previously coded pixels in the current channel, and all previously coded channels Using hyperprior and context to estimate probability model (mean and STD) 4

NLAIC: Non-Local Attention Optimized Image Compression (Collaborator: Zhan Ma, Nanjing Univ.) Main Encoder Hyper Encoder No GDN Hyper Decoder Main decoder 5 Liu, H.; Chen, T.; Guo, P.; Shen, Q.; Cao, X.; Wang, Y.; and Ma, Z. 2019. Non-local attention optimized deep image compression. arXiv:1904.09757.

Non-Local Attention Module (NLAM) NLN q NLAM • NLAM generates attention weights, which allows non-salient regions be quantized more heavily • NLAM uses both local and non-local neighbors (using NLN) to generate the attention maps • X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” CVPR2018 6

Performance on Kodak Dataset 8

Problems with Previous Framework q Train a different model for each bit-rate point using a particular ! * + ! ∗ -() "#$$ = ||' − ) '|| * /) q Hard to deploy in networked applications ¤ Need to have multiple encoder/decoder pairs to meet different bandwidths ¤ Not scalable: low rate bit streams cannot be shared among users with different bandwidths 10

Layered/Variable Rate Image Compression Using a Stack of Auto-Encoders • Each layer uses the structure of [Balle2018], but with different number of latent feature maps. • Chuanmin Jia, Zhaoyi Liu, Yao Wang, Siwei Ma, Wen Gao, Layered Image Compression Using Scalable Auto-Encoder, MIPR 2019. Best student paper award 11

Experimental Results (PSNR and MS-SSIM) 38 36 34 32 PSNR(dB) 30 28 BPG (4:4:4, HM) BPG (4:4:4, x265) [11] (Optimized for MSE) 26 Proposed (Optimized for MSE) Proposed (Optimized for MS-SSIM) 24 [13] (Optimized for MSE) [13] (Optimized for MS-SSIM) 22 0 0.2 0.4 0.6 0.8 1 1.2 Rate (bits/pixel) [11]: Balle et al, ICLR 2017 Scalable coding performance similar to non-scalable [11] over entire range for MS- [13]: Balle et al, ICLR 2018 SSIM, competitive or better at lower rate in terms of PSNR 12

End-to-End Learnt Video Coding [Lu2019] • Implement every part in traditional video coding framework with neural network • Jointly optimize rate-distortion trade- off through a single loss function. • The first end-to-end model that jointly learns motion estimation, motion compression, and residual compression. • Outperforms H.264 in PSNR and MS- SSIM, and on par or better than H.265 in MS-SSIM at high rates. Guo Lu, et al. “DVC: An End-to-End Deep Video Compression Framework”, CVPR2019. https://github.com/GuoLusjtu/DVC 13

Frame Prediction Using Implicit Flow Estimation (Collaborator: Zhan Ma) [Lu2019] Proposed approach Liu, H., Chen, T., Lu, M., Shen, Q., & Ma, Z. (2019). Neural Video Compression using Spatio-Temporal Priors. arXiv preprint arXiv:1902.07383 . (Preliminary version) 14

Entropy coding for flow features Hidden state reflecting history of flow features 15

Video Prediction Using Dynamic deformable filters q Deformable filters q Dynamic filters q Dynamic deformable filters q Zhiqi Chen, NYU 17

Deformable vs. Dynamic Filter Parameter Dynamic Input A generating filters network Inputs Input B Outputs Dynamic filtering layer Jia, Xu, et al. ”Dynamic filter networks.” NIPS 2016. (DFN) Using a very large filter size could have the same Dai, Jifeng, et al. "Deformable convolutional effect as deformable filter networks.” CVPR 2017. 18

Video Prediction Using Dynamic Deformable Filters Offsets Input Filters frames Encoder Decoder Output q Use past frames for generating deformable filters (no need to send side info) frame q Each pixel is predicted from weighted average of multiple displaced pixels 19

Prediction Results for Moving MNIST Ground Truth Deform-DFN (kernel Deform-DFN (kernel DFN DFN DFN size 3) size 5) (kernel size 3) (kernel size 5) (kernel size 9) Ground Truth Deform-DFN (kernel Ground Truth Deform-DFN (kernel size 3) size 3) Use past 10 frames to predict future 10 frames recursively

Visualization of the Offset • Blue: last frame • Red: prediction • Arrow indicates offset with max filter weight (mapping from green spot in last frame to the white spot in the next frame) 21

t=4 t=10 t=0 t=2 t=6 t=8 t=12 t=14 t=16 t=18 Ground truth Ground truth Ground truth Ground truth KTH Action Classification dataset Input frames Predicted frames 22

Block-Based Compression by Denoising with Side Information • Idea inspired from Debargha Mukherjee, Google • Students: Jeffrey Mao and Jacky Yuan, NYU 23

Performance (Very Preliminary) PSNR vs channel number of latent features (N) 31 bpp N values PSNR 30.5 30 0.06 4 26.4 29.5 29 0.12 8 27.87 28.5 28 0.18 12 28.47 27.5 0.25 16 29.2 27 26.5 0.5 32 30.7 26 0 5 10 15 20 25 30 35 • Quantize latent features to binary. Rate obtained by assuming 1 bit per feature. • Context-based entropy coding will reduce the bit rate significantly. • Future work: consider the rate of the side information in the loss function for training to enable end-to-end RD optimization 26

Acknowledgement q Students at Video lab at NYU ¤ https://wp.nyu.edu/videolab/ q Vision lab at Nanjing University, led by Zhan Ma ¤ http://vision.nju.edu.cn/index.php q Work on scalable image compression ¤ Chuanmin Jia, visiting student from Beijing Univ. q Thanks for Google Faculty Research Award! 27

Deep Learning for Image and Video Compression Yao Wang Dept. of - PowerPoint PPT Presentation

Deep Learning for Image and Video Compression Yao Wang Dept. of Electrical and Computer Engineering NYU Wireless Tandon School of Engineering New York University wp.nyu.edu/videolab AOMedia Research Symposium, Oct, 2019, San Francisco

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

Lossless compression in lossy compression systems Almost every lossy compression system

Image and Video Coding: Encoder Control D D = - R d R Problem Statement / Scope of Image

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Contents List of algorithms iii 14 Image data compression 1 14.1 Image data properties 5

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

DIGITAL IMAGE DIGITAL IMAGE COMPRESSION COMPRESSION Fernando Pereira Fernando Pereira

DIGITAL IMAGE DIGITAL IMAGE COMPRESSION COMPRESSION Fernando Pereira Fernando Pereira

Video Compression (cont.) Lecture # 6 Shahab Baqai LUMS Outline Scalable video coding

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics

Perceptually-Driven Video Coding with the Daala Video Codec Timothy B. Terriberry The Xiph.Org

Inform ormat ation & & Cor Correlati tion on Jill illes V s Vreeken 11 11 June

Lecture 0 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan

Challenge Codes for Physically Unclonable Functions (PUFs) A Maximum Entropy Problem Alexander

Information Theory Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach Proteek Roy

CS 260: Seminar in Computer Science: Multimedia Networking Jiasi Chen Lectures: MWF 4:10-5pm in

Deep Learning for Image and Video Compression Yao Wang Dept. of - PowerPoint PPT Presentation

Deep Learning for Image and Video Compression Yao Wang Dept. of Electrical and Computer Engineering NYU Wireless Tandon School of Engineering New York University wp.nyu.edu/videolab AOMedia Research Symposium, Oct, 2019, San Francisco

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

Lossless compression in lossy compression systems Almost every lossy compression system

Image and Video Coding: Encoder Control D D = - R d R Problem Statement / Scope of Image

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Contents List of algorithms iii 14 Image data compression 1 14.1 Image data properties 5

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

DIGITAL IMAGE DIGITAL IMAGE COMPRESSION COMPRESSION Fernando Pereira Fernando Pereira

DIGITAL IMAGE DIGITAL IMAGE COMPRESSION COMPRESSION Fernando Pereira Fernando Pereira

Video Compression (cont.) Lecture # 6 Shahab Baqai LUMS Outline Scalable video coding

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics

Perceptually-Driven Video Coding with the Daala Video Codec Timothy B. Terriberry The Xiph.Org

Inform ormat ation &amp; &amp; Cor Correlati tion on Jill illes V s Vreeken 11 11 June

Lecture 0 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan

Challenge Codes for Physically Unclonable Functions (PUFs) A Maximum Entropy Problem Alexander

Information Theory Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience

Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach Proteek Roy

CS 260: Seminar in Computer Science: Multimedia Networking Jiasi Chen Lectures: MWF 4:10-5pm in

Inform ormat ation & & Cor Correlati tion on Jill illes V s Vreeken 11 11 June