Image/video compression: Basics and research issues Christine - - PowerPoint PPT Presentation

image video compression
SMART_READER_LITE
LIVE PREVIEW

Image/video compression: Basics and research issues Christine - - PowerPoint PPT Presentation

Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics in source coding Practical use in standardized solutions Research issues Towards better transforms Towards better prediction


slide-1
SLIDE 1

Image/video compression:

Basics and research issues

Christine GUILLEMOT

slide-2
SLIDE 2

Outline

 A few basics in source coding  Practical use in standardized solutions  Research issues

  • Towards better transforms
  • Towards better prediction

 Inpainting-based compression

slide-3
SLIDE 3
  • 3

Compression: a few basics

slide-4
SLIDE 4
  • 4

Basics in source coding

Lossless Rate Bounds Function of Source Probability Distributions

slide-5
SLIDE 5
  • 5

Basics in source coding

slide-6
SLIDE 6
  • 6

Basics in source coding

How to « optimally » encode separately dependent symbols?

Lossless coding: limits in terms of compression factor (order of 2‐3 for natural images, and 3 to 4 or video)

slide-7
SLIDE 7
  • 7

Basics in source coding

To further decrease the bit rate, one has to tolerate distortion => Lossy compression under a rate or distortion constraint

Uniform scalar quantization + entropy coding

R(D) D D R Source Information Entropy Redundancy Information not relevant Useful Information Maximum Distortion

Scheme quasi-optimal if pixels were independent

slide-8
SLIDE 8
  • 8

Basics in source coding

How to address dependency between symbols ? Transform the pixels into independent data

slide-9
SLIDE 9
  • 9

Basics in source coding

Discrete Wavelet Transform

 Classical transforms: discrete cosine transform, discrete wavelet transform

slide-10
SLIDE 10
  • 10

Basics in source coding

Further/better suppressing dependencies : Prediction

slide-11
SLIDE 11
  • 11

Basics in source coding

In summary

slide-12
SLIDE 12
  • 12

Practical use of these concepts in standardized solutions

slide-13
SLIDE 13
  • 13

Three decades of standards development …. Guided by the same concepts

JPEG JPEG-2000

slide-14
SLIDE 14
  • 14

… leading to a common framework

 The same hybrid motion-compensated temporal prediction + DCT over the years

slide-15
SLIDE 15
  • 15

 Exploiting pixel dependency in the temporal dimension  With many optimizations over the years (e.g. multiple reference frames)

First key ingredient: motion-compensated temporal prediction

slide-16
SLIDE 16
  • 16

 Exploiting dependency in the spatial dimension (H.264)

Second key ingredient: Spatial prediction

 If efficient prediction, difference between original and prediction (residue): independent samples  Many optimizations over the years (up to 35 modes in HEVC)

slide-17
SLIDE 17
  • 17

Third key ingredient: Transform + joint RD optim

 With a joint rate-distortion optimization of prediction and transform support to adapt to local image characteristics (flat regions, contours, texture..)  Transform : a simple block transform (DCT) with R-D optimized support

slide-18
SLIDE 18
  • 18

Fourth key ingredient: entropy coding

 Higher-order statistics to exploit remaining dependencies

  • Context modeling
  • On-line learning of probability laws
  • Binarization followed by arithmetic coding
slide-19
SLIDE 19
  • 19

Performance evolution of video compression over the years

slide-20
SLIDE 20
  • 20

Research Issues: Towards better transforms

  • Anisotropic transforms
  • Graph-based transforms
  • Sparse approximations
slide-21
SLIDE 21

Block-based Transforms limitations

 Assuming an image is a piecewise smooth function, i.e., it contains Sharp boundaries between smooth regions  Block-based Transforms are limited when blocks contain arbitrary shaped discontinuities  2D separable wavelets well adapted to point singularities only, not so well to smooth boundaries (contours , whereas in 2D images, there are mostly line and curve singularities

Super-pixels

  • btained with

SLIC method

=> Design of alternative transforms like curvelets, bandelets,

  • riented wavelets etc. or graph-based-transforms
slide-22
SLIDE 22

22

Bandelets [E. Pennec & S. Mallat 2003]

Using modified (warped) orthogonal wavelets in the flow direction

To perfom a transform on smooth functions

Quad-tree segmentation

Each arrow is a vector orienting the support of the wavelet transform Sub-square 1D Signal

Estimation of the geometrical flow:

  • Sample geometry (green lines)
  • Warped 1D filtering

vs T 1D Wavelet Transform 1D Signal T vs T

slide-23
SLIDE 23

23

Bandelets

[E. Pennec &

  • S. Mallat 2003]

0.44 bpp

  • riginal

Bandelets (0.2bpp) wavelets (0.2bpp)

slide-24
SLIDE 24

Lifting scheme of the 1D-wavelet transform

Generalization to 2D

Separation of the square grid into 2 quincunx cosets Iteration of the splitting on one of the grids

Oriented wavelet transforms

[ V. Chappelier & C. Guillemot TIP-2006]

slide-25
SLIDE 25

Oriented wavelet transforms

Multi-scale quincunx sampling pyramid

Downsampling by a factor of at each scale

Lk

{0,1} either square or quincunx grids 

Orientation of the 1D wavelets along edges with binary orientations [ V. Chappelier & C. Guillemot TIP-2006]

slide-26
SLIDE 26

Oriented wavelet transforms

Better preservation of directionnal frequencies [ V. Chappelier & C. Guillemot TIP-2006]

LL0-wavelet L1-wavelet

slide-27
SLIDE 27
  • 27

Signal values

The field of transform design is reviving with graph-based transforms

pixels [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]

slide-28
SLIDE 28

Towards graph-based transforms

  • Real Symmetric matrix
  • Laplacian operator: difference operator

Characterization of the graph [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]

slide-29
SLIDE 29

Towards graph-based transforms

[Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]

 Normalized Laplacian: weights normalized by

 The Laplacian of the graph

  • Has a complete set of eigenvectors:
  • Associated to real non-negative eigen-values (defining the

spectrum of the graph)

slide-30
SLIDE 30
  • The eigenvectors associated to the eigenvalues carry a notion of frequency. The

eigenvector associated to the eigenvalue 0 is constant whereas the eigenvector associated to a higher eigenvalue varies more on the vertices of the graph.

  • The number of zero crossings is higher with a higher eigenvalue. Analogous to

classical Fourier analysis where a higher f means faster oscillation (Exponentials)

  • The eigenvectors of the Laplacian define the Graph Fourier Transform

iGFT GFT

Towards graph-based transforms

[Shuman et al. 2013]

slide-31
SLIDE 31

Towards graph-based transforms

 Active area of research

  • Wavelets on graphs via spectral graph theory [Hammond et al. 11]
  • Wavelet filterbanks [Narang et Ortega12, Gadde et al.13, …]
  • Overcomplete dictionnaries on graphs [Zhang et al. 12, …]

 Nevertheless a big issue in compression

  • Rate cost for signalling the graph structure
slide-32
SLIDE 32

 Given an input vector , and a dictionary , M>n, and D of full rank,

  • is the norm of x , D is the dictionary (columns are the atoms

)

 Finding an exact solution is difficult. In practice, approximate solutions are good

enough  Or, equivalently, given D and y, computationally tractable search algorithm for an approximate solution:

  • Greedy pursuit algorithms: MP [Mallat & Zhang (1993)], OMP [Pati 1993], OOMP, ….
  • L2-L1 min (constrained least squares): BP denoising [Chen, Donoho, & Saunders (1995)]

Sparse approximations for compression

L

X

Mx1 nxM nx1

D y

nxM

R D

n

R y

 

y Dx t s x  . . min

x

k

d

1

2  k

d

The “basis” vectors are not required to be orthogonal

 

  

p

Dx y t s x . . min

ρ 

  

2 2

. . min arg x t s Dx y

X

slide-33
SLIDE 33

L1-minimization: Basis Pursuit (BP)

Chen, Donoho, & Saunders (1995)

  • The problem becomes convex (linear programming)
  • Very efficient solvers: Interior point methods [Chen, Donoho, & Saunders (`95)],

Sequential shrinkage for union of ortho-bases [Bruce et.al. (`98)], Iterated shrinkage [Figuerido & Nowak (`03), Daubechies, Defrise, & Demole (‘04), E. (`05), E.,

Matalon, & Zibulevsky (`06)].

  • L1 regularization: quadratic programming

Basis Pursuit Denoising (LASSO)

y Dx t s x

x

 . . min

1 Solve

y Dx t s x

x

 . . min

Instead of solving

1 2 2

2 1 min x Dx y   

slide-34
SLIDE 34

 Given training vectors Y=[Y1, ....., YT], learn D that minimizes the averaged error of the sparse representation of the training vectors  The optimization problem is combinatorial and highly non- convex, but convex with respect to one of its variables when the other one is fixed => Two steps approach ) , , 1 , . . min ( min arg

2

T n L X t s DX Y

n F X D

   

T n L X t s DX Y

n F X

, , 1 , . . min

2

   

2

min arg

F D

DX Y 

Sparsity depends on how well the dictionary is adapted to the data in hand

slide-35
SLIDE 35

Sparsity depends on how well the dictionary is adapted to the data in hand

 Extensive work on dictionary learning:  Non-structural learned dictionaries

  • MOD (Engan et al., 1999),
  • K-SVD (Aharon et al., 2006): SVD-based atom-by-atom dictionary update

 Imposing constraints on dictionaries

  • Sparse Dictionary [Rubinstein’10]
  • Translation invariant [Jost’06; Aharon and Elad, 2008]
  • Multiscale dictionaries (Mairal’08)
  • Unions of orthonormal bases (Lesage 2005; Sezer et al., 2008)
  • Online learned dictionaries [Mairal’10]
  • Tree-structured dictionaries [Monaci 2004; Jenatton et al., 2011]

 No so easy to use in compression due to the dimension of the sparse vectors

slide-36
SLIDE 36

Structured dictionaries for compression

 Sparse coding in an overcomplete dictionary does not necessarily mean efficient compression  A small dictionary which changes over the iterations and which is adapted to the signals decomposed at each iteration  Reduced storage by tree pruning or only one branch for upper layers => transform residues so that they have the same principal components

slide-37
SLIDE 37

Dictionary Learning: Tree-Structured

[J. Zepeda, C. Guillemot, E. Kijak, 2010] Index atom Coeff.

slide-38
SLIDE 38

Original Jpeg Jpeg- 2000 ITD

Performance Illustration

[J. Zepeda, C. Guillemot, E. Kijak, 2010]

slide-39
SLIDE 39
  • 39

Research Issues: towards a better prediction

  • Sparse prediction
  • LLE and NMF based prediction
slide-40
SLIDE 40

Prediction using sparse methods

 Prediction analogous to inpainting  Assuming known samples (template) and complete patch (template + block to predict) share similar features (sparse vectors, neighborhood structure)

DCT ac Wc a W   

2 2

. . min arg h t s h W a

c c h

  • Pre-defined dictionaries (DCT, wavelets,

..)

  • Dictionaries composed of patches in the

neighborhood

[M, Turkan, C. Guillemot, 2012]

slide-41
SLIDE 41

41

Sparse Prediction with DCT dictionary vs H264/AVC

Prediction with 9 modes AVC Prediction with 9 modes AVC and SP

[M, Turkan, C. Guillemot, 2012]

slide-42
SLIDE 42

Spatial Prediction Results

Static DCT Dictionary Dictionaries formed by patches in the neighborhood

  • Improvement on more complex structures & contours with texture patches based

dictionary => incentive to use texture patches as dictionary elements

[M, Turkan, C. Guillemot, 2012]

slide-43
SLIDE 43

Prediction using patch-based methods

 Dictionaries formed by K-NN  Known samples and complete patch are assumed to share similar neighborhood structures

Template patches Complete patches

 With non-negativity constraints (NMF)  With sum-to-1 contraints (LLE) W fixed and formed by K-NN patches

1 . . min arg

2 2

 

i i c c h

h t s h W a

, . . min arg

2

  h W t s h W a

F c c h [M, Turkan, C. Guillemot, 2012]

slide-44
SLIDE 44

00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION

  • 44

LLE or NMF based prediction

slide-45
SLIDE 45

00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION

  • 45

LLE or NMF based prediction

slide-46
SLIDE 46
  • 46

Research Issues: towards a better prediction

  • Epitome inpainting based compression
slide-47
SLIDE 47
  • 47

Epitome E Transfor m Map ф

Factored representation Reconstructed image Input image Y

  • Finding self similarities
  • Creating epitome charts
  • Improving the quality of reconstruction by further searching for best

matching and by updating accordingly the transform map

Epitome inpainting based compression

[S. Cherigui, C. Guillemot, D. Thoreau, P. Guillotel, Perez, 2011]

slide-48
SLIDE 48

00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION

  • 48

Epitome inpainting based compression

[M. Alain, S. Cherigui, C. Guillemot, D. Thoreau, P. Guillotel, 2014]

slide-49
SLIDE 49

11 . Guillemot

  • 49

Epitome inpainting based compression

slide-50
SLIDE 50

Still moving field despite its old age … with new image modalities with very large volumes of data

 ISO/ITU working towards H.266  Multi-view and multi-view plus depth (e.g. 3D-HEVC)  High dynamic range with tine mapping issues  Light fields (dense multi-view, plenoptic)

slide-51
SLIDE 51

Thank you