In the name of Allah f the compassionate, the merciful p , - - PowerPoint PPT Presentation
In the name of Allah f the compassionate, the merciful p , - - PowerPoint PPT Presentation
In the name of Allah f the compassionate, the merciful p , Digital Video Processing g g S. Kasaei S. Kasaei Room: CE 307 Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu a s
In the name of Allah f
the compassionate, the merciful p ,
Digital Video Processing g g
- S. Kasaei
- S. Kasaei
Room: CE 307 Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu a s asae @s a edu Webpage: http://sharif.edu/~skasaei
- Lab. Website: http://ipl.ce.sharif.edu
Acknowledgment g
Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].
Chapter 9 p
Waveform-Based Coding: g Transform & Predictive Coding
Outline
Overview of video coding systems Overview of video coding systems Transform coding
Predictive coding
Predictive coding
Kasaei 6
Components in a Coding System
Focus of this lecture.
Kasaei 7
Video Coding
Reduces the data rate of a video sequence by
Reduces the data rate of a video sequence by exploiting the spatial & spectral correlation between neighboring pixels as well as temporal correlation b t ti f f th between consecutive frames of the sequence.
Spatial & spectral correlation is due to the fact that color
values of adjacent pixels in the same video frame usually j p y change smoothly.
Temporal correlation refers to the fact that consecutive
frames usually show the same physical scene occupied by frames usually show the same physical scene, occupied by the same object that may have moved.
Kasaei 8
Transform Coding
Motivation: Motivation:
Represent a vector (e.g., a block of image pels)
as the superposition of some typical vectors p p yp (basic patterns, transform basic functions).
Quantize & code the coefficients. Can be thought of as a constrained vector
quantizer.
transform
+
t1 t2 t3 t4
basis transform coefficients
Kasaei 9
Transform Coding Block Diagram
Kasaei 10
Which Transform Basis to Use?
The transform should: The transform should:
Minimize the correlation among resulting
coefficients, so that scalar quantization can be , q employed without losing too much in coding efficiency (compared to vector quantization).
Compact the energy into as few coefficients as
possible.
Kasaei 11
Which Transform Basis to Use?
Optimal transform:
Optimal transform:
Karhunen-Loeve transform (KLT): signal statistics
dependent (ensemble dependent, 2nd order t ti ti ) statistics).
Suboptimal transform:
Discrete cosine transform (DCT): nearly as good as
Discrete cosine transform (DCT): nearly as good as
the KLT for common image signals (highly correlated signals).
Wavelet transform (WT): a powerful multiresolution
tool that utilizes space-frequency properties of non- stationary signals
Kasaei 12
stationary signals.
General Linear Transform
Linearly independent basis vectors (or blocks): Linearly independent basis vectors (or blocks): Inverse transform represents a vector (or Inverse transform represents a vector (or
block) as the superposition of basis vectors (or blocks): blocks):
Forward transform determines the contribution Forward transform determines the contribution
(weight) of each basis vector:
Kasaei 13
V U s s t
Unitary Transform
Unitary (orthonormal) basis: Unitary (orthonormal) basis:
Basis vectors are orthogonal to each other & each
has length 1: g
Transform coefficient associated with a basis
vector is simply the projection of the input vector is simply the projection of the input vector onto the basis vector:
Kasaei 14
Discrete Cosine Transform Basis Images
Kasaei 15
Energy Distribution of DCT Coefficients in Typical Images
Kasaei 16
Images Approximated by Different Number of DCT Coefficients
Original With 4/64 Original Image Coefficients With 8/64 Coefficients With 164/64 Coefficients
Kasaei 17
Demos
Use Matlab demo to demonstrate Use Matlab demo to demonstrate
approximation by using different number of DCT coefficients. (dctdemo.m)
Kasaei 18
Distortion in Transform Coding
Distortion in sample domain: Distortion in sample domain: Distortion in coefficient domain: These two distortions equal when using These two distortions equal when using
unitary transform:
Kasaei 19
Modeling of Distortion due to Coefficient Quantization
High Resolution Approximation of Scalar High Resolution Approximation of Scalar
Quantization:
With the MMSE quantizer, when each coefficient With the MMSE quantizer, when each coefficient
is scalar quantized with sufficiently high rates, so that the pdf in each quantization bin is i l fl h approximately flat we have:
Depends on the pdf of the k th coefficient
Kasaei 20
Depends on the pdf of the k-th coefficient.
Optimal Bit Allocation among Coefficients
How many bits to use for each coefficient? How many bits to use for each coefficient?
Can be formulated as an constrained optimization
problem: p
Minimize:
The constrained problem can be converted to
Subject to:
unconstrained one using the Lagrange multiplier method:
Kasaei 21
Minimize:
Derivation & Result
Kasaei 22
Implication of Optimal Bit Allocation
Bitrate for a coefficient proportional to its variance
Bitrate for a coefficient proportional to its variance
(energy):
Larger variance, more bits.
geometric mean
g
Distortion is equalized among all coefficients &
depends on the geometric mean of the coefficient i variances:
Kasaei 23
Transform Coding Gain over PCM
Distortion for PCM (SQ samples) if each sample is
Distortion for PCM (SQ samples) if each sample is quantized to R bit:
Gain over PCM:
arithmetic mean
Kasaei 24
geometric mean
Transform Coding Gain over PCM
For any arbitrary set of values, the arithmetic mean
For any arbitrary set of values, the arithmetic mean is equal to (or larger than) the geometric mean ( G>=1).
For Gaussian sources:
If each sample is Gaussian, then coefficients are also
p , Gaussian (& are all the same). Then:
Kasaei 25
Example
Determine the optimal bit allocation & corresponding TC
p p g gain for coding 2x2 image block, using 2x2 DCT. Assume that the image is a stationary Gaussian process ith i t l l ti h b l with inter-sample correlation as shown below.
Kasaei 26
Example (cnt’d)
Correlation matrix: zero-mean data.
Correlation matrix: zero mean data.
DCT basis images: Equivalent transform matrix:
Kasaei 27
Example (cnt’d)
- covar. matrix
- geo. mean
g For R=2:
Kasaei 28
For R=2:
Optimal Transform Design
Optimal transform:
p
Should minimize the distortion for a given average bitrate. Equivalent to minimize the geometric mean of the coefficient
variances variances.
When the source is Gaussian, the optimal transform is
the KLT (which depends on the covariance matrix of the samples), ensemble dependent.
Basis vectors are the eigenvectors of the covariance matrix, the
coefficient variances are the eigenvalues:
k-th coeff. var. KLT unit. minimal geo. mean
Kasaei 29
maximize TC
Example
Determine the KLT for the 2x2 image block in the
g previous example.
Determine the eigenvalues by solving:
(Same as the coefficient variances with DCT!)
Determine the eigenvectors by solving:
Resulting transform is the DCT!
Kasaei 30
Resulting transform is the DCT!
Data Compression
Data Transform Quantization Symbol Encoding O i i l O t t Original Image Output Bit Stream
Transform coding s stem
4/14/2008
- S. Kasaei
31
Transform coding system.
JPEG-98 Compression
8×8 Pixel DCT Scalar Quantizer Run- Length & Pixel Block Quantizer & Huffman Original I Output Image Bitstream
JPEG 98 encoder
4/14/2008
- S. Kasaei
32
JPEG-98 encoder.
DCT Transform
- DCT transform:
+ + =
∑∑
) 16 1 2 )(cos 16 1 2 (cos 4
7 7 , , y x v u v u
v y u x S C C T π π =
= =
2 2 16 16 4
y x
C C
u, v = 0
= 1 2 ,
v u C
C
- therwise
S: source block, T: transformed block. T has the same size as S with real coefficients.
4/14/2008
- S. Kasaei
33
DCT Transform
In matrix T, elements
Lower
, containing high energy levels (with low frequencies) are placed in
Lower Frequency
frequencies) are placed in upper left corner.
When going towards lower
right corner, energy level goes down (& frequency
Higher Frequency
goes down (& frequency goes up).
q y
DCT coefficients.
4/14/2008
- S. Kasaei
34
Scalar Quantization
Quantization tables used by JPEG (entries are the stepsizes)
4/14/2008
- S. Kasaei
35
Quantization tables used by JPEG (entries are the stepsizes).
Scalar Quantization
Quantized matrix has the same size as S & T Quantized matrix has the same size as S & T,
with elements of:
T Q T QT =
Q: quantization matrix, QT: quantized T.
4/14/2008
- S. Kasaei
36
Scalar Quantization
Q’s coefficients are small
in upper left corner & large in opposite corner.
Non-Zero Coefficients
After division & rounding,
(uniform quantization)
Zero
( q ) most of the coefficients of QT will be zero except for a few in the upper left
Zero Coefficients
a few in the upper left corner.
Quantized coefficients.
4/14/2008
- S. Kasaei
37
Zigzag Scanning
For maximum efficiency,
For maximum efficiency, coefficients are read & saved in a zigzag way.
Except for first couple of
l t th t ill b elements, the rest will be mostly zero.
Zigzag scanning
4/14/2008
- S. Kasaei
38
Zigzag scanning.
JPEG-98 Image Encoder
Q-table Zigzag scan VLC entropy table image compressed image data AC DCT Q differential VLC Offset (128) image image data DC entropy table
JPEG-98 standard.
4/14/2008
- S. Kasaei
39
IDCT Transform
When matrix is filled image data is When matrix is filled, image data is
reconstructed using Inverse DCT (IDCT):
+ + = ∑∑ ) 16 1 2 cos( ) 16 1 2 cos( 4 1
7 7 , , v u v u y x
v y u x T C C S π π =
∑∑
= =
2 2 16 16 4
u v y
C C
u , v = 0
= 1 2 ,
v u C
C
- therwise
4/14/2008
- S. Kasaei
40
JPEG: A bit more details
Uses 8x8 DCT. Each coefficient is quantized using a uniform
quantizer.
The step size varies based on coefficient’s variance & its
p visual importance (HVS).
Quantized coefficients are converted into binary bit
streams using run-length coding plus Huffman g g g p coding.
It can be applied on:
Image intensity, when the prediction is not accurate or when it Image intensity, when the prediction is not accurate or when it
is desired to reset the prediction loop, or
Prediction error.
Quality factor scales the quantization table.
Kasaei 41
Quality factor scales the quantization table.
JPEG: A bit more details
Perceptual-based quantization matrix Zig-zag ordering of DCT coefficients smaller smaller stepsize Run-length coding example: different tables for luminance & chrominance.
Kasaei 42
DC prediction error
JPEG Image Encoder
The dynamic range of the coefficient value is
partitioned into several segments.
For DC coefficient:
Segment number is Huffman coded
Segment number is Huffman coded. Relative magnitude is fix-length coded.
For AC coefficients:
The part consisting of zero run-length is Huffman
coded. Segment number of nonzero value is Huffman coded
Segment number of nonzero value is Huffman coded. Relative magnitude of nonzero value is fix-length
coded.
Kasaei 43
JPEG Image Encoder
To further improve the coding efficiency:
p g y
Arithmetic coding can be used (instead of
Huffman coding). A th DC l f dj t bl k i il
As the DC values of adjacent blocks are similar,
the predicted DC error is quantized & coded.
Kasaei 44
JPEG-98 VS. JPEG-2000
JPEG-98 at 0.125 bpp (192:1) JPEG-2000 at 0.125 bpp (192:1)
Performance of JPEG 98 & JPEG 2000
4/14/2008
- S. Kasaei
45
Performance of JPEG-98 & JPEG-2000.
Predictive Image Coding
Σ
Quantizer Quantizer e(n) ~ ~ Input
- +
Encoder Symbol Compressed x(n) e(n) Image Data
Σ
^ x(n) x(n) ~ + + Predictor
Encoder repeats the same
Encoder
~ ~
same process as the decoder.
Decoder Symbol
Σ
Image Decompressed Predictor ^ x(n) e(n) ~ x(n) ~ + + Compressed Data Predictor
Decoder
A l d l l di ti di t (DPCM)
4/14/2008
- S. Kasaei
46
A closed-loop lossy predictive coding system (DPCM).
Predictive Image Coding
Uses intra-pixel & inter-pixel redundancies Uses intra pixel & inter pixel redundancies
among pixels.
Motivation: Predicts a sample from adjacent
p j pixels in the same frame or in a previous frame, and quantizes & codes the error only.
If the prediction error is typically small, then it
can be represented with a lower average bitrate.
Optimal predictor: minimizes the prediction error.
Kasaei 47
DPCM Encoder (Closed-Loop Prediction)
Encoder repeats the same process as the the decoder.
Kasaei 48
Differential pulse coded modulation (DPCM).
Distortion in Predictive Coder
With a closed-loop prediction, the reconstruction
es=s-s^=eq
With a closed loop prediction, the reconstruction error of a sample is equal to the quantization error (or the prediction error).
quantization
Error usually has a nonuniform distribution
( L l i )
q error
(zero-mean Laplacian).
predicted sample quantized prediction
Kasaei 49
error
Optimal Predictor
Question: Which predictor should we use? Question: Which predictor should we use?
Minimize the bitrate for coding the prediction error. Because quantization error with a given bitrate Because quantization error with a given bitrate
depends on the variance of the signal: minimizing the quantization error = minimizing the g g prediction error variance.
We will limit our consideration to linear predictor
l
- nly:
Kasaei 50
Linear Minimum MSE Predictor
Prediction error:
RV
Prediction error: Optimal coefficients must satisfy:
/
2
= ∂ ∂
l
a σ
(1)
/ ∂ ∂
l p
a σ
Note: Eq (1) is also known as the orthogonality principle in estimation theory
Kasaei 51
Note: Eq. (1) is also known as the orthogonality principle in estimation theory.
Matrix Form
The previous equation can be rewritten as: The previous equation can be rewritten as: } { ) , (
l kS
S E l k R = Optimal solution:
Yule-Walker Equation:
p
Kasaei 52
Predictive Coding Gain
predictive prediction coding gain (2) prediction error predictor d (2)
- rder
spectral flatness flatness measure Thus, predictive coding gain is inversely proportional to the spectral flatness. A i l ith fl t t (i hit i ) i di t bl !
Kasaei 53
A signal with a flat spectrum (i.e., white noise) is unpredictable!
Predictive Coding Gain
kth eigenvalue f Nth d Integral in (2)
- f Nth order
covariance matrix KLT
TC=PC, if the block length in TC & the predictive order in PC both go to infinity. For any finite length data, PC is better (PC of any order uses infinite memory).
Kasaei 54
2-D Linear Prediction Example
A a B a C a D
3 2 1
ˆ + + =
Kasaei 55
Example (cnt’d)
Kasaei 56
(DPCM is better than TC for this case!)
Predictive Coding for Video
For video, we apply prediction both among pixels in
For video, we apply prediction both among pixels in the same frame (intra-prediction or spatial prediction), and also among pixels in adjacent f (i t di ti t l di ti ) frames (inter-prediction or temporal prediction).
Temporal prediction is efficient only if the underlying
scene is stationary scene is stationary.
With moving objects & cameras, the temporal
prediction is done using motion compensation. prediction is done using motion compensation.
More on this subject in the next lecture.
Kasaei 57
Homework 8
Reading assignment: Reading assignment:
- Sec. 9.1, 9.2
Written assignment: Written assignment:
- Prob. 9.3, 9.4, 9.5, 9.6, 9.7
Computer assignment Computer assignment
- Prob. 9.8, 9.9
Kasaei 58