Image/video compression: Basics and research issues Christine - - PowerPoint PPT Presentation
Image/video compression: Basics and research issues Christine - - PowerPoint PPT Presentation
Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics in source coding Practical use in standardized solutions Research issues Towards better transforms Towards better prediction
Outline
A few basics in source coding Practical use in standardized solutions Research issues
- Towards better transforms
- Towards better prediction
Inpainting-based compression
- 3
Compression: a few basics
- 4
Basics in source coding
Lossless Rate Bounds Function of Source Probability Distributions
- 5
Basics in source coding
- 6
Basics in source coding
How to « optimally » encode separately dependent symbols?
Lossless coding: limits in terms of compression factor (order of 2‐3 for natural images, and 3 to 4 or video)
- 7
Basics in source coding
To further decrease the bit rate, one has to tolerate distortion => Lossy compression under a rate or distortion constraint
Uniform scalar quantization + entropy coding
R(D) D D R Source Information Entropy Redundancy Information not relevant Useful Information Maximum Distortion
Scheme quasi-optimal if pixels were independent
- 8
Basics in source coding
How to address dependency between symbols ? Transform the pixels into independent data
- 9
Basics in source coding
Discrete Wavelet Transform
Classical transforms: discrete cosine transform, discrete wavelet transform
- 10
Basics in source coding
Further/better suppressing dependencies : Prediction
- 11
Basics in source coding
In summary
- 12
Practical use of these concepts in standardized solutions
- 13
Three decades of standards development …. Guided by the same concepts
JPEG JPEG-2000
- 14
… leading to a common framework
The same hybrid motion-compensated temporal prediction + DCT over the years
- 15
Exploiting pixel dependency in the temporal dimension With many optimizations over the years (e.g. multiple reference frames)
First key ingredient: motion-compensated temporal prediction
- 16
Exploiting dependency in the spatial dimension (H.264)
Second key ingredient: Spatial prediction
If efficient prediction, difference between original and prediction (residue): independent samples Many optimizations over the years (up to 35 modes in HEVC)
- 17
Third key ingredient: Transform + joint RD optim
With a joint rate-distortion optimization of prediction and transform support to adapt to local image characteristics (flat regions, contours, texture..) Transform : a simple block transform (DCT) with R-D optimized support
- 18
Fourth key ingredient: entropy coding
Higher-order statistics to exploit remaining dependencies
- Context modeling
- On-line learning of probability laws
- Binarization followed by arithmetic coding
- 19
Performance evolution of video compression over the years
- 20
Research Issues: Towards better transforms
- Anisotropic transforms
- Graph-based transforms
- Sparse approximations
Block-based Transforms limitations
Assuming an image is a piecewise smooth function, i.e., it contains Sharp boundaries between smooth regions Block-based Transforms are limited when blocks contain arbitrary shaped discontinuities 2D separable wavelets well adapted to point singularities only, not so well to smooth boundaries (contours , whereas in 2D images, there are mostly line and curve singularities
Super-pixels
- btained with
SLIC method
=> Design of alternative transforms like curvelets, bandelets,
- riented wavelets etc. or graph-based-transforms
22
Bandelets [E. Pennec & S. Mallat 2003]
Using modified (warped) orthogonal wavelets in the flow direction
To perfom a transform on smooth functions
Quad-tree segmentation
Each arrow is a vector orienting the support of the wavelet transform Sub-square 1D Signal
Estimation of the geometrical flow:
- Sample geometry (green lines)
- Warped 1D filtering
vs T 1D Wavelet Transform 1D Signal T vs T
23
Bandelets
[E. Pennec &
- S. Mallat 2003]
0.44 bpp
- riginal
Bandelets (0.2bpp) wavelets (0.2bpp)
Lifting scheme of the 1D-wavelet transform
Generalization to 2D
Separation of the square grid into 2 quincunx cosets Iteration of the splitting on one of the grids
Oriented wavelet transforms
[ V. Chappelier & C. Guillemot TIP-2006]
Oriented wavelet transforms
Multi-scale quincunx sampling pyramid
Downsampling by a factor of at each scale
Lk
{0,1} either square or quincunx grids
Orientation of the 1D wavelets along edges with binary orientations [ V. Chappelier & C. Guillemot TIP-2006]
Oriented wavelet transforms
Better preservation of directionnal frequencies [ V. Chappelier & C. Guillemot TIP-2006]
LL0-wavelet L1-wavelet
- 27
Signal values
The field of transform design is reviving with graph-based transforms
pixels [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]
Towards graph-based transforms
- Real Symmetric matrix
- Laplacian operator: difference operator
Characterization of the graph [Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]
Towards graph-based transforms
[Kim et al. 2012, Shuman et al. 2013, Hu et al. 2015]
Normalized Laplacian: weights normalized by
The Laplacian of the graph
- Has a complete set of eigenvectors:
- Associated to real non-negative eigen-values (defining the
spectrum of the graph)
- The eigenvectors associated to the eigenvalues carry a notion of frequency. The
eigenvector associated to the eigenvalue 0 is constant whereas the eigenvector associated to a higher eigenvalue varies more on the vertices of the graph.
- The number of zero crossings is higher with a higher eigenvalue. Analogous to
classical Fourier analysis where a higher f means faster oscillation (Exponentials)
- The eigenvectors of the Laplacian define the Graph Fourier Transform
iGFT GFT
Towards graph-based transforms
[Shuman et al. 2013]
Towards graph-based transforms
Active area of research
- Wavelets on graphs via spectral graph theory [Hammond et al. 11]
- Wavelet filterbanks [Narang et Ortega12, Gadde et al.13, …]
- Overcomplete dictionnaries on graphs [Zhang et al. 12, …]
Nevertheless a big issue in compression
- Rate cost for signalling the graph structure
Given an input vector , and a dictionary , M>n, and D of full rank,
- is the norm of x , D is the dictionary (columns are the atoms
)
Finding an exact solution is difficult. In practice, approximate solutions are good
enough Or, equivalently, given D and y, computationally tractable search algorithm for an approximate solution:
- Greedy pursuit algorithms: MP [Mallat & Zhang (1993)], OMP [Pati 1993], OOMP, ….
- L2-L1 min (constrained least squares): BP denoising [Chen, Donoho, & Saunders (1995)]
Sparse approximations for compression
L
X
Mx1 nxM nx1
D y
nxM
R D
n
R y
y Dx t s x . . min
x
k
d
1
2 k
d
The “basis” vectors are not required to be orthogonal
p
Dx y t s x . . min
ρ
2 2
. . min arg x t s Dx y
X
L1-minimization: Basis Pursuit (BP)
Chen, Donoho, & Saunders (1995)
- The problem becomes convex (linear programming)
- Very efficient solvers: Interior point methods [Chen, Donoho, & Saunders (`95)],
Sequential shrinkage for union of ortho-bases [Bruce et.al. (`98)], Iterated shrinkage [Figuerido & Nowak (`03), Daubechies, Defrise, & Demole (‘04), E. (`05), E.,
Matalon, & Zibulevsky (`06)].
- L1 regularization: quadratic programming
Basis Pursuit Denoising (LASSO)
y Dx t s x
x
. . min
1 Solve
y Dx t s x
x
. . min
Instead of solving
1 2 2
2 1 min x Dx y
Given training vectors Y=[Y1, ....., YT], learn D that minimizes the averaged error of the sparse representation of the training vectors The optimization problem is combinatorial and highly non- convex, but convex with respect to one of its variables when the other one is fixed => Two steps approach ) , , 1 , . . min ( min arg
2
T n L X t s DX Y
n F X D
T n L X t s DX Y
n F X
, , 1 , . . min
2
2
min arg
F D
DX Y
Sparsity depends on how well the dictionary is adapted to the data in hand
Sparsity depends on how well the dictionary is adapted to the data in hand
Extensive work on dictionary learning: Non-structural learned dictionaries
- MOD (Engan et al., 1999),
- K-SVD (Aharon et al., 2006): SVD-based atom-by-atom dictionary update
Imposing constraints on dictionaries
- Sparse Dictionary [Rubinstein’10]
- Translation invariant [Jost’06; Aharon and Elad, 2008]
- Multiscale dictionaries (Mairal’08)
- Unions of orthonormal bases (Lesage 2005; Sezer et al., 2008)
- Online learned dictionaries [Mairal’10]
- Tree-structured dictionaries [Monaci 2004; Jenatton et al., 2011]
No so easy to use in compression due to the dimension of the sparse vectors
Structured dictionaries for compression
Sparse coding in an overcomplete dictionary does not necessarily mean efficient compression A small dictionary which changes over the iterations and which is adapted to the signals decomposed at each iteration Reduced storage by tree pruning or only one branch for upper layers => transform residues so that they have the same principal components
Dictionary Learning: Tree-Structured
[J. Zepeda, C. Guillemot, E. Kijak, 2010] Index atom Coeff.
Original Jpeg Jpeg- 2000 ITD
Performance Illustration
[J. Zepeda, C. Guillemot, E. Kijak, 2010]
- 39
Research Issues: towards a better prediction
- Sparse prediction
- LLE and NMF based prediction
Prediction using sparse methods
Prediction analogous to inpainting Assuming known samples (template) and complete patch (template + block to predict) share similar features (sparse vectors, neighborhood structure)
DCT ac Wc a W
2 2
. . min arg h t s h W a
c c h
- Pre-defined dictionaries (DCT, wavelets,
..)
- Dictionaries composed of patches in the
neighborhood
[M, Turkan, C. Guillemot, 2012]
41
Sparse Prediction with DCT dictionary vs H264/AVC
Prediction with 9 modes AVC Prediction with 9 modes AVC and SP
[M, Turkan, C. Guillemot, 2012]
Spatial Prediction Results
Static DCT Dictionary Dictionaries formed by patches in the neighborhood
- Improvement on more complex structures & contours with texture patches based
dictionary => incentive to use texture patches as dictionary elements
[M, Turkan, C. Guillemot, 2012]
Prediction using patch-based methods
Dictionaries formed by K-NN Known samples and complete patch are assumed to share similar neighborhood structures
Template patches Complete patches
With non-negativity constraints (NMF) With sum-to-1 contraints (LLE) W fixed and formed by K-NN patches
1 . . min arg
2 2
i i c c h
h t s h W a
, . . min arg
2
h W t s h W a
F c c h [M, Turkan, C. Guillemot, 2012]
00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION
- 44
LLE or NMF based prediction
00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION
- 45
LLE or NMF based prediction
- 46
Research Issues: towards a better prediction
- Epitome inpainting based compression
- 47
Epitome E Transfor m Map ф
Factored representation Reconstructed image Input image Y
- Finding self similarities
- Creating epitome charts
- Improving the quality of reconstruction by further searching for best
matching and by updating accordingly the transform map
Epitome inpainting based compression
[S. Cherigui, C. Guillemot, D. Thoreau, P. Guillotel, Perez, 2011]
00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION
- 48
Epitome inpainting based compression
[M. Alain, S. Cherigui, C. Guillemot, D. Thoreau, P. Guillotel, 2014]
11 . Guillemot
- 49