Image and Video Coding: From Block Transforms to JPEG JPEG Overview - - PowerPoint PPT Presentation
Image and Video Coding: From Block Transforms to JPEG JPEG Overview - - PowerPoint PPT Presentation
Image and Video Coding: From Block Transforms to JPEG JPEG Overview The JPEG Standard Joint Photographic Experts Group (JPEG) Standard is named after the group which created it Joint committee of ITU-T and ISO/IEC JTC 1 ITU-T Study Group 16,
JPEG Overview
The JPEG Standard
Joint Photographic Experts Group (JPEG) Standard is named after the group which created it Joint committee of ITU-T and ISO/IEC JTC 1
ITU-T Study Group 16, Working Party 3, Question 6 (Visual Coding Experts Group: VCEG) ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1 (JTC 1/SC 29/WG 1)
Digital Compression and Coding of Continuous-Tone Still Images (JPEG Standard) Officially ITU-T Recommendation T.81 and International Standard ISO/IEC 10918-1 Commonly referred to as JPEG Specifies compression for gray-level and color images Work commences in 1986 Standard published in 1992 Still most widely used standard for image compression
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 2 / 43
JPEG Overview / Color Components
JPEG: Source Image Formats
Single-Component Images 2D array of integer sample values Characterized by
image width W and image height H sample bit depth B range [ 0; 2B − 1 ]
Multi-Component Images / Color Images Multiple 2D arrays of integer sample values Characterized by
Maximum width W and height H of arrays Vertical and horizontal downsampling factors
Most common: Y’CbCr format
Y’: Full-resolution luma components Cb, Cr: Downsampled chroma components single component Y’ Cb Cr most common: Y’CbCr 4:2:0
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 3 / 43
JPEG Overview / Partitioning
JPEG: Color Components and Partitioning
Partitioning of Color Components Each color components is partitioned into 8×8 blocks of samples If arrays size is not a multiple of 8×8: Insert missing samples (will be removed after decoding) Most common method: Constant border extension
s[ x, y ] = s[ W −1, y ] s[ x, y ] = s[ x, H−1 ]
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 4 / 43
JPEG Overview / Source Coding Algorithm
JPEG: Basic Source Coding Algorithm
sample array
2d block transform quantization quantization lossy part entropy coding bitstream
- riginal
8×8 block bits
sample array
2d inverse transform dequantization entropy decoding bitstream transmission
- r storage
reconstructed 8×8 block bits
Basic Encoding Algorithm (for 8×8 blocks of samples)
1 Transform:
Energy compaction (reduce statistical dependencies between samples)
2 Quantization:
Approximate signal in a suitable way (enables more efficient coding)
3 Entropy Coding:
Represent quantization indexes with as less bits as possible Decoder: Inverse operations of encoder (note: quantization is not invertible)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 5 / 43
Block Transform / Linear Transform
Linear Transform of Sample Vectors
Consider arbitrary vector s = (s0, s1, s2, · · · , sN−1)T of N samples Linear transform: Matrix-vector multiplication (each coefficient = linear combination of samples) Forward Transform (in encoder) Map sample vector s to vector u of transform coefficients u = A · s with A : N×N transform matrix u : vector with N coefficients u
=
A
·
s Inverse Transform (in decoder) Map reconstructed coefficients u′ to sample vector s′ s′ = A−1 · u′ with A−1 : inverse of matrix A u′ : N reconstructed coefficients s′
=
A−1
·
u′ Perfect reconstruction (s′ = s) in the absence of quantization (u′ = u)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 6 / 43
Block Transform / Linear Transform
Interpretation of Linear Transforms
Inverse Transform Reconstructed vector of samples is obtained by s′ = A−1 · u′ s′ = b0 b1 b2 · · · bN−1 · u′ u′
1
. . . u′
N−1
= u′
0 ·b0 + u′ 1 ·b1 + u′ 2 ·b2 + · · · + u′ N−1 ·bN−1
Reconstructed vector s′ is represented as linear combination of basis vectors bk (columns of A−1) Transform coefficients u′
k represent weighting factors for corresponding basis vectors bk
Forward Transform Vector of transform coefficients is obtained by u = A · s Decomposition of sample vector s into linear combination of basis vectors bk Transform coefficients uk represent the corresponding weighting factors
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 7 / 43
Block Transform / Linear Transform
Example for Possible Basis Vectors
Linear independent basis vectors (required for invertibility of A and A−1) b0 =
1 1 1 1 ,
b1 =
3 1 −1 −3 ,
b2 =
1 −1 −1 1 ,
b3 =
1 −1 1 −1
Inverse transform matrix A−1 A−1 =
b0 b1 b2 b3 = 1 3 1 1 1 1 −1 −1 1 −1 −1 1 1 −3 1 −1
Forward transform matrix A A =
- A−1−1 =
0.250 0.250 0.250 0.250 0.125 0.125 −0.125 −0.125 0.250 −0.250 −0.250 0.250 0.125 −0.375 0.375 −0.125
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 8 / 43
Block Transform / Orthogonal Transform
Orthogonal Transform
Mean Squared Error (MSE) of Samples
1 N
- k
- s′
k − sk
2 = 1 N
- s′ − s
- T
s′ − s
- =
1 N
- A−1u′ − A−1u
- T
A−1u′ − A−1u
- =
1 N
- u′ − u
- T ·
- A−1
TA−1
·
- u′ − u
- In general: Complicated relationship to quantization errors u′
k − uk of transform coefficients
MSE cannot be minimized by independent quantization of transform coefficients Orthogonal Transforms Inverse matrix is equal to the transpose: A−1 = AT MSE in sample domain is equal to MSE in transform domain
1 N
- k
- s′
k − sk
2 = 1 N
- k
- u′
k − uk
2
Transform coefficients can be quantized independently of each other Main reason for using orthogonal transforms in lossy coding
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 9 / 43
Block Transform / Orthogonal Transform
Orthonormal Basis
Property of Orthogonal Transforms Consider product of forward and inverse matrix: A · AT = A · A−1 = I
b0 b1 b2 . . . bN−1 · b0 b1 b2 · · · bN−1 = 1 · · · 1 · · · 1 · · · . . . . . . . . . ... . . . · · · 1
Basis vectors bk are orthogonal to each other Basis vectors bk have a length equal to 1 Basis vectors of orthogonal matrices form an orthonormal basis Geometric Interpretation Rotation (and possible reflection) of coordinate system
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 10 / 43
Block Transform / Orthogonal Transform
Example of an Orthogonal Transform for N = 2
Vector of two samples s = (s0, s1)T Inverse transform matrix A−1 =
- b0
b1
- =
1 √ 2 1 1 1 −1
- Representation of signal vector
s = u0 · b0 + u1 · b1
- 4
2
- = u0 · 1
√ 2 1 1
- + u1 · 1
√ 2
- 1
−1
- 4
2
- = 3 ·
1 1
- + 1 ·
- 1
−1
- s0
s1 u0 · b0 u1 · b1 s b0 b1
Forward transform: Project signal vector onto basis vectors u0 = bT
0 · s = 3
√ 2 and u1 = bT
1 · s =
√ 2
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 11 / 43
Block Transform / Decorrelation
Goal of Transform
sn sn+1 N(sn, sn+1)
joint histogram of two adjacent samples
15 test images (each 768×512)
Goal of Transform: Compaction of Signal Energy Reduce statistical dependencies between samples inside block ! Linear transforms can only remove linear dependencies
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 12 / 43
Block Transform / Decorrelation
Statistical Measures for Characterizing Linear Dependencies
Consider collection of one-dimensional signals (interpreted as realization of random source) Consider two samples with fixed spatial relationship s[ k ] and s[ ℓ ] Associated random variables are denoted as Sk and Sℓ Covariance and Correlation Coefficient Variance and Mean: σ2
k = E
- (Sk − µk)2
µk = E{ Sk } Covariance: σ2
k,ℓ = E
- (Sk − µk) (Sℓ − µℓ)
- ( σ2
k,k = σ2 k )
Correlation coefficient: ̺k,ℓ = σ2
k,ℓ
- σ2
k · σ2 ℓ
( −1 ≤ ̺k,ℓ ≤ 1 ) Interpretation No linear dependencies: ̺k,ℓ = 0 and σ2
k,ℓ = 0
(decorrelated samples) Strong linear dependencies: |̺k,ℓ| 1
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 13 / 43
Block Transform / Decorrelation
Linear Dependencies inside Vectors of Samples
Consider collection of sample vectors s = [ s0, s1, s2, · · · , sN−1 ]T of size N Associated random vectors is denoted by S = [ S0, S1, S2, · · · , SN−1 ]T Covariance Matrix Matrix of covariances between samples inside vectors C SS = E
- (S − µ) (S − µ)T
- =
σ2
0,0
σ2
0,1
σ2
0,2
· · · σ2
0,N−1
σ2
1,0
σ2
1,1
σ2
1,2
· · · σ2
1,N−1
σ2
2,0
σ2
2,1
σ2
2,2
· · · σ2
2,N−1
. . . . . . . . . ... . . . σ2
N−1,0 σ2 N−1,1 σ2 N−1,2 · · · σ2 N−1,N−1
with µ =
µ0 µ1 µ2 . . . µN−1
Covariance matrix characterizes linear dependencies between samples of a vector
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 14 / 43
Block Transform / Decorrelation
Determination of Covariance Matrix for Training Set
Determination of Covariance Matrix Use sufficiently large set of N example vectors {si} Each covariance (matrix element) can be estimated according to σ2
k,ℓ = 1
N
- ∀i
- si[ k ] − µk
- si[ ℓ ] − µℓ
- with
µk = 1 N
- ∀i
si[ k ] and µℓ = 1 N
- ∀i
si[ ℓ ] Examples for Covariance Matrices (vectors of 4 horizontally adjacent samples)
σ2
S
1.00 0.97 0.93 0.88 0.97 1.00 0.97 0.93 0.93 0.97 1.00 0.97 0.88 0.93 0.97 1.00 σ2
S
1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 1.00
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 15 / 43
Block Transform / Decorrelation
Linear Dependencies between Transform Coefficients
Covariance Matrix for Vectors of Transform Coefficients Same expression as for signal vectors C UU = E
- (U − µU) (U − µU)T
= E
- A · (S − µS) (S − µS)T · AT
= A · C SS · AT Goal: Choose orthogonal matrix A in a way that transform coefficients get decorrelated Measure for Energy Compaction and Decorrelation Linear algebra: Trace of matrix is similarity-invariant: tr
- X
- = tr
- A X A−1
Transform Gain (in dB) GT = 10 · log10
- 1
N
- k σ2
k
N
- kσ2
k
- ←
arithmetic mean of transform coefficient variances
←
geometric mean of transform coefficient variances
Maximized if transform coefficients are completely decorrelated (C UU is diagonal matrix)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 16 / 43
Block Transform / Decorrelation
Example: Effect of Decorrelating Transform s0 s1
C SS =
- 1
0.9 0.9 1
- A = 1
√ 2
- 1 1
−1 1
- rotation by
φ = −45◦
transform gain: GT = 3.6 dB
u0 u1
C UU = 1.9 0.1
- Signal energy is concentrated in first transform coefficient
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 17 / 43
Block Transform / The Karhunen Loève Transform
The Karhunen Loève Transform (KLT)
Optimal Transform for Lossy Source Coding Need to consider interactions with quantization and entropy coding Difficult to determine The Karhunen Loève Transform (KLT) Orthogonal transform A that produces completely uncorrelated transform coefficients Design criterion CUU = A · CSS · AT = σ2 · · · σ2
1
· · · . . . . . . ... . . . · · · σ2
N−1
KLT exists for all random sources (symmetric matrices are always orthogonally diagonalizable)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 18 / 43
Block Transform / The Karhunen Loève Transform
Basis Vectors of the Karhunen Loève Transform
Required property for the orthogonal transform matrix A A · CSS · AT = CUU (with CUU being a diagonal matrix)
- AT · A
- · CSS · AT = AT · CUU
(orthogonal transform: AT = A−1) CSS · AT = AT · CUU (rows of A: basis vectors bk) CSS · b0 b1 · · · bN−1 = b0 b1 · · · bN−1 · σ2 · · · σ2
1
· · · . . . . . . ... . . . · · · σ2
N−1
Consider individual columns of matrix equation ∀k : CSS · bk = σ2
k · bk
Eigenvector equation
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 19 / 43
Block Transform / The Karhunen Loève Transform
Determination of KLT Transform Matrix
Condition for KLT Basis Vectors For each basis vector bk, we have an eigenvector equation C SS · bk = σ2
k · bk
C SS · v = ξ · v Note: Eigenvectors v are not unique (can be scaled by any non-zero factor) But basis vectors bk must have an ℓ2-norm equal to bk2 = 1 KLT Matrix Basis vectors are the unit-norm eigenvectors of C SS bk = v k v k2 Transform coefficient variances σ2
k = ξk are given by the
associated eigenvalues of C SS AKLT = b0 b1 . . . bN−1 Typically, basis vectors are sorted in decreasing order of their eigenvalues
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 20 / 43
Block Transform / The Karhunen Loève Transform
Simple Model for Correlated Sources
Auto-Regressive Sources of Order One — AR(1) Sources Auto-covariances depend only on difference of sample locations σ2
k,ℓ = E{ (Sk − µ)(Sℓ − µ) } = σ2 S · ̺|k−ℓ|
with ̺ being the first-order correlation coefficient (between two adjacent samples) N-th order auto-covariance matrix CSS = E
- (S − µ) (S − µ)T
CSS = σ2
S
1 ̺ ̺2 ̺3 · · · ̺N−1 ̺ 1 ̺ ̺2 · · · ̺N−2 ̺2 ̺ 1 ̺ · · · ̺N−3 ̺3 ̺2 ̺ 1 · · · ̺N−4 . . . . . . . . . . . . ... . . . ̺N−1 ̺N−2 ̺N−3 ̺N−4 · · · 1
KLT Transform matrix only depends on correlation coefficient ̺ and the transform size N
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 21 / 43
Block Transform / The Karhunen Loève Transform
KLT of size N = 4 for AR(1) Source with Correlation Coefficient ̺ = 0.5
CSS = σ2
S
1.0000 0.5000 0.2500 0.1250 0.5000 1.0000 0.5000 0.2500 0.2500 0.5000 1.0000 0.5000 0.1250 0.2500 0.5000 1.0000
AKLT =
0.4352 0.5573 0.5573 0.4352 0.6325 0.3162 −0.3162 −0.6325 0.5573 −0.4352 −0.4352 0.5573 0.3162 −0.6325 0.6325 −0.3162
CUU = σ2
S
2.0856 1.0000 0.5394 0.3750
GT = 0.94 dB (factor 1.24) b0 b1 b2 b3
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 22 / 43
Block Transform / The Karhunen Loève Transform
KLT of size N = 4 for AR(1) Source with Correlation Coefficient ̺ = 0.95
CSS = σ2
S
1.0000 0.9500 0.9025 0.8574 0.9500 1.0000 0.9500 0.9025 0.9025 0.9500 1.0000 0.9500 0.8574 0.9025 0.9500 1.0000
AKLT =
0.4937 0.5062 0.5062 0.4937 0.6516 0.2747 −0.2747 −0.6516 0.5062 −0.4937 −0.4937 0.5062 0.2747 −0.6516 0.6516 −0.2747
CUU = σ2
S
3.7568 0.1627 0.0506 0.0300
GT = 7.58 dB (factor 5.73) b0 b1 b2 b3
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 23 / 43
Block Transform / The Discrete Cosine Transform
Intermediate Results
Linear Transform of Sample Vectors Forward transform: u = A · s Inverse transform: s′ = A−1 · u′ Orthogonal Transform Transform matrix has property A−1 = AT Allows independent quantization of transform coefficients Karhunen Loève Transform Orthogonal transform that produces uncorrelated transform coefficients Maximum energy compaction for a given source Transform matrix is signal dependent Question: Can we derive a suitable signal-independent transform ?
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 24 / 43
Block Transform / The Discrete Cosine Transform
Convergence of KLT for AR(1) Sources
̺ = 0.1
b0 b1 b2 b3
̺ = 0.5
b0 b1 b2 b3
̺ = 0.9
b0 b1 b2 b3
̺ = 0.95
b0 b1 b2 b3
KLT transform matrix converges for ̺ → 1
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 25 / 43
Block Transform / The Discrete Cosine Transform
The Discrete Cosine Transform of Type II (DCT-II)
Transform Matrix of the Discrete Cosine Transform of Type II (DCT-II) The DCT is an orthogonal transform The N×N transform matrix ADCT = {akn} has the elements akn = αk N · cos π N k
- n + 1
2
- with
αk = 1 : k = 0 2 : k = 0 The basis vectors bk = {akn} represent sampled cosine functions of different frequencies Relation to KLT Unit-norm eigenvectors of CSS approach DCT-II basis vectors for ̺ → 1 Advantages of DCT-II Transform matrix does not depend on the input signal Fast algorithms for computing the forward and inverse transforms (butterfly structure)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 26 / 43
Block Transform / The Discrete Cosine Transform
Basis Functions of the DCT-II (Example for N = 8)
bk[n] = αk N · cos π N k
- n + 1
2
- b0
b1 b2 b3 b4 b5 b6 b7
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 27 / 43
Block Transform / The Discrete Cosine Transform
AR(1) Sources: Energy Compaction of KLT and DCT-II for N = 8
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
loss of DCT-II vs KLT in energy compaction GT
correlation factor ̺ Difference in energy compaction [dB]
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 28 / 43
Block Transform / 2D Transforms
Image Coding: 2D Transform
Image Coding Statistical dependencies in multiple directions (e.g., vertically and horizontally adjacent samples) Images are typically coded using N×N blocks of samples Straightforward Extension to Two Dimensions Arrange samples of N×N block into vector of size N2 sblk =
s00 s01 s02 s03 s10 s11 s12 s13 s20 s21 s22 s23 s30 s31 s32 s33
svec =
- s00 s01 s02 s03 s10 s11 s12 s13 s20 s21 s22 s23 s30 s31 s32 s33
T
Design transform matrix A for vectors svec of size N2 Transform matrix has the size N2 × N2 Requires N2 multiplications per sample (for N = 8 → 64 multiplications per sample)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 29 / 43
Block Transform / 2D Transforms
Image Coding: Separable 2D Transform
Separable 2D Transform Successive 1D transform for rows and columns of an N×N image block Separable orthogonal transform
u00 u01 u02 u03 u10 u11 u12 u13 u20 u21 u22 u23 u30 u31 u32 u33 = Aver · s00 s01 s02 s03 s10 s11 s12 s13 s20 s21 s22 s23 s30 s31 s32 s33 · AT
hor
with Aver being an N×N transform matrix for transforming the columns, and Ahor being an N×N transform matrix for transforming the rows Inverse transform is also separable: s′ = AT
ver · u′ · Ahor
Impact on Complexity Require 2N multiplication per sample (instead of N2 for non-separable design)
N = 8: 16 multiplications (instead of 64 multiplications) N = 16: 32 multiplications (instead of 256 multiplications)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 30 / 43
Block Transform / 2D Transforms
Transform in JPEG
s u
= ADCT AT
DCT
and u′ s′
= AT
DCT
ADCT
Separable DCT-II for 8×8 blocks Successive 1d transforms of rows and columns (both orders yield same result) DCT-II transform matrix ADCT for both transforms
- riginal block
after horizontal DCT horizontal DCT-II after 2D DCT vertical DCT-II
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 31 / 43
Block Transform / 2D Transforms
Basis Images of Separable 8×8 DCT-II (used in JPEG)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 32 / 43
Block Transform / 2D Transforms
Separable 8×8 DCT-II for an Example Image
- riginal image (256×256)
after block-wise 8×8 DCT sorted DCT coefficients
Transform gain (energy compaction): GT = 22.5 dB
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 33 / 43
Scalar Quantization / Uniform Reconstruction Quantizer
Scalar Quantization of Transform Coefficients
∆ −∆ 2∆ −2∆ 3∆ −3∆ 4∆ −4∆ quantization indexes q: 1 −1 2 −2 3 −3 4 −4 ∆ u f (u)
JPEG: Uniform Reconstruction Quantizer Reconstruction levels: Uniformly spaced and centered around zero (quantization step size ∆) Simple decoder operation: u′ = q · ∆ (q: quantization index) Encoder: Freedom to adapt decision to source and entropy coding Simplest encoder: q = round u ∆
- Better encoder:
Will be discussed later (rate-distortion optimized quantization) Quantization step size ∆ determines trade-off between quality and bit rate
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 34 / 43
Scalar Quantization / Uniform Reconstruction Quantizer
Uniform Reconstruction Quantizer versus Optimal Scalar Quantizer
General scalar quantizers provide more freedom than URQs (optimization of reconstruction levels) Compare optimal scalar quantizers (ECSQs) and URQs with optimal encoder decision
Gaussian Pdf
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 ·10−3
- ptimal URQ vs ECSQ
bit rate (entropy) [bits per sample] SNR loss [dB]
∆SNR < 0.0063 dB MSEURQ MSEopt < 1.0015 Laplacian Pdf
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 ·10−4
- ptimal URQ vs ECSQ
bit rate (entropy) [bits per sample] SNR loss [dB]
∆SNR < 0.00081 dB MSEURQ MSEopt < 1.0002
Restriction to URQs has (typically) very small impact on coding efficiency
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 35 / 43
Scalar Quantization / Quantization Tables
Quantization Tables
Freedom for Quantization of Transform Blocks Unique quantization step size per frequency position Can emulate behavior of contrast sensitivity functions Quantization Tables in JPEG Different step sizes ∆ik for frequency positions (i, k) Need to be transmitted in image header (no defaults) Example tables for YCbCr format specified in Annex K (empirically derived based on psychovisual experiments) Quantization Tables in Praxis Most JPEG encoders use flat tables ∆ik = ∆ Quantization step size ∆ selected by quality parameter Yields best coding efficiency in terms of PSNR
luma blocks
16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99
chroma blocks
17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 36 / 43
Scalar Quantization / Summary of JPEG Quantization
JPEG: Transform and Quantization
JPEG Encoder Select quantization table {∆ik} For each 8×8 block
Apply separable DCT-II Quantize transform coefficients u simple : qik = round uik ∆ik
- JPEG Decoder
Read quantization table {∆ik} For each 8×8 block
Reconstruct transform coefficients u′ u′
ik = qik · ∆ik
Apply separable inverse DCT-II
- riginal block s
transform coefficients u DCT-II reconstructed coefficients u′ u′ = ∆ u ∆
- reconstructed block s′
IDCT-II
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 37 / 43
Entropy Coding / Scanning
Entropy Coding of Quantization Indexes
Scanning of Quantization Indexes Decorrelating transform concentrates signal energy into low-frequency transform coefficients (top-left corner) Quantization indexes for high-frequency positions (bottom-right) are more likely to be become equal to zero Scanning: Traverse positions from low to high frequencies JPEG uses a so-called zig-zag scan Coding of Quantization Indexes Different concepts for DC coefficient and AC coefficients DC coefficient: Prediction + variable-length coding AC coefficients: Zig-zag scan + run-level coding
DC AC
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 38 / 43
Entropy Coding / DC Coefficient
Entropy Coding of DC Coefficient
Prediction of DC Quantization Index DC (direct current) represents scaled average of block samples (often similar to DC of neighboring blocks) Prediction using DC quantization index of preceding block Prediction difference DIFF = DCk − DCk−1 is entropy coded
block k − 1 block k DCk−1 DCk DIFF = DCk − DCk−1
Entropy Coding of Difference DIFF Combination of variable-length and fixed-length coding Variable-length coding of category C
Specifies range of values for DIFF Codeword table has to be transmitted in header
Fixed-length coding value inside category C
Number of bits = category C
C range of DIFF example codeword 00 1
- 1, 1
010 2
- 3 .. -2, 2 .. 3
011 3
- 7 .. -4, 4 .. 7
100 4
- 15 .. -8, 8 .. 15
101 5
- 31 .. -16, 16 .. 31
110 6
- 63 .. -32, 32 .. 63
1110 7
- 127 .. -64, 64 .. 127
1111 0 8
- 255 .. -128, 128 .. 255
1111 10 9
- 511 .. -256, 256 .. 511
1111 110 10
- 1023 .. -512, 512 .. 1023
1111 1110 11
- 2047 .. -1024, 1024 .. 2048
1111 1111 0
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 39 / 43
Entropy Coding / AC Coefficients
Run-Level Coding of AC Coefficients
6 2 2 1 6 2 2 1
6, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0 (0, 6) (0, 2) (2, 2) (3, 1) (eob)
Run-Level Coding of AC Quantization Indexes Convert into sequence using zig-zag scan Represent scanned sequence as (run, level) pairs (with special end-of-block symbol)
run : Number of zero indexes before next non-zero index level : Value of next non-zero index eob : Special (run, level) pair specifying end-of-block (all following indexes are equal to zero)
Variable-length coding of (run, category) pair (codeword table includes eob symbol)
Category has same meaning as for DC quantization index Codeword table is transmitted in image header
Fixed-length coding of level inside category (same as for DC quantization index)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 40 / 43
Entropy Coding / AC Coefficients
Example Codeword Table for Run-Category Pairs
run/category codeword eob 1010 0/1 00 0/2 01 0/3 100 0/4 1011 0/5 11010 0/6 1111000 0/7 11111000 0/8 1111110110 0/9 1111111110000010 0/10 1111111110000011 1/1 1100 1/2 11011 1/3 1111001 1/4 111110110 1/5 11111110110 1/6 1111111110000100 1/7 1111111110000101 1/8 1111111110000110 1/9 1111111110000111 1/10 1111111110001000 run/category codeword 2/1 11100 2/2 11111001 2/3 1111110111 2/4 111111110100 2/5 1111111110001001 2/6 1111111110001010 2/7 1111111110001011 2/8 1111111110001100 2/9 1111111110001101 2/10 1111111110001110 3/1 111010 3/2 111110111 3/3 111111110101 3/4 1111111110001111 3/5 1111111110010000 3/6 1111111110010001 3/7 1111111110010010 3/8 1111111110010011 3/9 1111111110010100 3/10 1111111110010101 · · · · · ·
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 41 / 43
Coding Efficiency
JPEG: Coding Efficiency and Visual Quality
Quantization step size (or quantization table) controls trade-off between quality and bit rate
0.5 1 1.5 2 24 26 28 30 32 34 36 38
JPEG
bit rate [bits per pixel] PSNRRGB [dB]
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 42 / 43
Summary
Summary of Lecture
Image Coding Standard “JPEG” Separate coding of color components (typically in Y’CbCr 4:2:0 format) Transform coding of 8×8 blocks of samples (transform, quantization, entropy coding) Orthogonal Block Transform Separable block transform: Successive transformation of rows and columns of a block DCT-II: High energy compaction for highly correlated sources Scalar Quantization Uniform Reconstruction Quantizers (possible usage of quantization tables) Lossy part: Controls trade-off between reconstruction quality and bit rate Entropy Coding of Quantization Indexes DC coefficient: Prediction and variable-length coding AC coefficients: Zig-zag scan and run-level coding (with end-of-block symbol)
Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: From Block Transforms to JPEG 43 / 43