In the name of Allah f the compassionate, the merciful p , - - PowerPoint PPT Presentation

in the name of allah f
SMART_READER_LITE
LIVE PREVIEW

In the name of Allah f the compassionate, the merciful p , - - PowerPoint PPT Presentation

In the name of Allah f the compassionate, the merciful p , Digital Video Processing g g S. Kasaei S. Kasaei Room: CE 307 Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu a s


slide-1
SLIDE 1
slide-2
SLIDE 2

In the name of Allah f

the compassionate, the merciful p ,

slide-3
SLIDE 3

Digital Video Processing g g

  • S. Kasaei
  • S. Kasaei

Room: CE 307 Room: CE 307 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu a s asae @s a edu Webpage: http://sharif.edu/~skasaei

  • Lab. Website: http://ipl.ce.sharif.edu
slide-4
SLIDE 4

Acknowledgment g

Most of the slides used in this course have been provided by: Prof. Yao Wang (Polytechnic University, Brooklyn) based on the book: Video Processing & Communications written by: Yao Wang, Jom Ostermann, & Ya-Oin Zhang Prentice Hall, 1st edition, 2001, ISBN: 0130175471. [SUT Code: TK 5105 .2 .W36 2001].

slide-5
SLIDE 5

Chapter 9 p

Waveform-Based Coding: g Transform & Predictive Coding

slide-6
SLIDE 6

Outline

Overview of video coding systems Overview of video coding systems Transform coding

Predictive coding

Predictive coding

Kasaei 6

slide-7
SLIDE 7

Components in a Coding System

Focus of this lecture.

Kasaei 7

slide-8
SLIDE 8

Video Coding

Reduces the data rate of a video sequence by

Reduces the data rate of a video sequence by exploiting the spatial & spectral correlation between neighboring pixels as well as temporal correlation b t ti f f th between consecutive frames of the sequence.

Spatial & spectral correlation is due to the fact that color

values of adjacent pixels in the same video frame usually j p y change smoothly.

Temporal correlation refers to the fact that consecutive

frames usually show the same physical scene occupied by frames usually show the same physical scene, occupied by the same object that may have moved.

Kasaei 8

slide-9
SLIDE 9

Transform Coding

Motivation: Motivation:

Represent a vector (e.g., a block of image pels)

as the superposition of some typical vectors p p yp (basic patterns, transform basic functions).

Quantize & code the coefficients. Can be thought of as a constrained vector

quantizer.

transform

+

t1 t2 t3 t4

basis transform coefficients

Kasaei 9

slide-10
SLIDE 10

Transform Coding Block Diagram

Kasaei 10

slide-11
SLIDE 11

Which Transform Basis to Use?

The transform should: The transform should:

Minimize the correlation among resulting

coefficients, so that scalar quantization can be , q employed without losing too much in coding efficiency (compared to vector quantization).

Compact the energy into as few coefficients as

possible.

Kasaei 11

slide-12
SLIDE 12

Which Transform Basis to Use?

Optimal transform:

Optimal transform:

Karhunen-Loeve transform (KLT): signal statistics

dependent (ensemble dependent, 2nd order t ti ti ) statistics).

Suboptimal transform:

Discrete cosine transform (DCT): nearly as good as

Discrete cosine transform (DCT): nearly as good as

the KLT for common image signals (highly correlated signals).

Wavelet transform (WT): a powerful multiresolution

tool that utilizes space-frequency properties of non- stationary signals

Kasaei 12

stationary signals.

slide-13
SLIDE 13

General Linear Transform

Linearly independent basis vectors (or blocks): Linearly independent basis vectors (or blocks): Inverse transform represents a vector (or Inverse transform represents a vector (or

block) as the superposition of basis vectors (or blocks): blocks):

Forward transform determines the contribution Forward transform determines the contribution

(weight) of each basis vector:

Kasaei 13

V U s s t

slide-14
SLIDE 14

Unitary Transform

Unitary (orthonormal) basis: Unitary (orthonormal) basis:

Basis vectors are orthogonal to each other & each

has length 1: g

Transform coefficient associated with a basis

vector is simply the projection of the input vector is simply the projection of the input vector onto the basis vector:

Kasaei 14

slide-15
SLIDE 15

Discrete Cosine Transform Basis Images

Kasaei 15

slide-16
SLIDE 16

Energy Distribution of DCT Coefficients in Typical Images

Kasaei 16

slide-17
SLIDE 17

Images Approximated by Different Number of DCT Coefficients

Original With 4/64 Original Image Coefficients With 8/64 Coefficients With 164/64 Coefficients

Kasaei 17

slide-18
SLIDE 18

Demos

Use Matlab demo to demonstrate Use Matlab demo to demonstrate

approximation by using different number of DCT coefficients. (dctdemo.m)

Kasaei 18

slide-19
SLIDE 19

Distortion in Transform Coding

Distortion in sample domain: Distortion in sample domain: Distortion in coefficient domain: These two distortions equal when using These two distortions equal when using

unitary transform:

Kasaei 19

slide-20
SLIDE 20

Modeling of Distortion due to Coefficient Quantization

High Resolution Approximation of Scalar High Resolution Approximation of Scalar

Quantization:

With the MMSE quantizer, when each coefficient With the MMSE quantizer, when each coefficient

is scalar quantized with sufficiently high rates, so that the pdf in each quantization bin is i l fl h approximately flat we have:

Depends on the pdf of the k th coefficient

Kasaei 20

Depends on the pdf of the k-th coefficient.

slide-21
SLIDE 21

Optimal Bit Allocation among Coefficients

How many bits to use for each coefficient? How many bits to use for each coefficient?

Can be formulated as an constrained optimization

problem: p

Minimize:

The constrained problem can be converted to

Subject to:

unconstrained one using the Lagrange multiplier method:

Kasaei 21

Minimize:

slide-22
SLIDE 22

Derivation & Result

Kasaei 22

slide-23
SLIDE 23

Implication of Optimal Bit Allocation

Bitrate for a coefficient proportional to its variance

Bitrate for a coefficient proportional to its variance

(energy):

Larger variance, more bits.

geometric mean

g

Distortion is equalized among all coefficients &

depends on the geometric mean of the coefficient i variances:

Kasaei 23

slide-24
SLIDE 24

Transform Coding Gain over PCM

Distortion for PCM (SQ samples) if each sample is

Distortion for PCM (SQ samples) if each sample is quantized to R bit:

Gain over PCM:

arithmetic mean

Kasaei 24

geometric mean

slide-25
SLIDE 25

Transform Coding Gain over PCM

For any arbitrary set of values, the arithmetic mean

For any arbitrary set of values, the arithmetic mean is equal to (or larger than) the geometric mean ( G>=1).

For Gaussian sources:

If each sample is Gaussian, then coefficients are also

p , Gaussian (& are all the same). Then:

Kasaei 25

slide-26
SLIDE 26

Example

Determine the optimal bit allocation & corresponding TC

p p g gain for coding 2x2 image block, using 2x2 DCT. Assume that the image is a stationary Gaussian process ith i t l l ti h b l with inter-sample correlation as shown below.

Kasaei 26

slide-27
SLIDE 27

Example (cnt’d)

Correlation matrix: zero-mean data.

Correlation matrix: zero mean data.

DCT basis images: Equivalent transform matrix:

Kasaei 27

slide-28
SLIDE 28

Example (cnt’d)

  • covar. matrix
  • geo. mean

g For R=2:

Kasaei 28

For R=2:

slide-29
SLIDE 29

Optimal Transform Design

Optimal transform:

p

Should minimize the distortion for a given average bitrate. Equivalent to minimize the geometric mean of the coefficient

variances variances.

When the source is Gaussian, the optimal transform is

the KLT (which depends on the covariance matrix of the samples), ensemble dependent.

Basis vectors are the eigenvectors of the covariance matrix, the

coefficient variances are the eigenvalues:

k-th coeff. var. KLT unit. minimal geo. mean

Kasaei 29

maximize TC

slide-30
SLIDE 30

Example

Determine the KLT for the 2x2 image block in the

g previous example.

Determine the eigenvalues by solving:

(Same as the coefficient variances with DCT!)

Determine the eigenvectors by solving:

Resulting transform is the DCT!

Kasaei 30

Resulting transform is the DCT!

slide-31
SLIDE 31

Data Compression

Data Transform Quantization Symbol Encoding O i i l O t t Original Image Output Bit Stream

Transform coding s stem

4/14/2008

  • S. Kasaei

31

Transform coding system.

slide-32
SLIDE 32

JPEG-98 Compression

8×8 Pixel DCT Scalar Quantizer Run- Length & Pixel Block Quantizer & Huffman Original I Output Image Bitstream

JPEG 98 encoder

4/14/2008

  • S. Kasaei

32

JPEG-98 encoder.

slide-33
SLIDE 33

DCT Transform

  • DCT transform:

+ + =

∑∑

) 16 1 2 )(cos 16 1 2 (cos 4

7 7 , , y x v u v u

v y u x S C C T π π    =

= =

2 2 16 16 4

y x

C C

u, v = 0

   = 1 2 ,

v u C

C

  • therwise

S: source block, T: transformed block. T has the same size as S with real coefficients.

4/14/2008

  • S. Kasaei

33

slide-34
SLIDE 34

DCT Transform

In matrix T, elements

Lower

, containing high energy levels (with low frequencies) are placed in

Lower Frequency

frequencies) are placed in upper left corner.

When going towards lower

right corner, energy level goes down (& frequency

Higher Frequency

goes down (& frequency goes up).

q y

DCT coefficients.

4/14/2008

  • S. Kasaei

34

slide-35
SLIDE 35

Scalar Quantization

Quantization tables used by JPEG (entries are the stepsizes)

4/14/2008

  • S. Kasaei

35

Quantization tables used by JPEG (entries are the stepsizes).

slide-36
SLIDE 36

Scalar Quantization

Quantized matrix has the same size as S & T Quantized matrix has the same size as S & T,

with elements of:

T Q T QT =

Q: quantization matrix, QT: quantized T.

4/14/2008

  • S. Kasaei

36

slide-37
SLIDE 37

Scalar Quantization

Q’s coefficients are small

in upper left corner & large in opposite corner.

Non-Zero Coefficients

After division & rounding,

(uniform quantization)

Zero

( q ) most of the coefficients of QT will be zero except for a few in the upper left

Zero Coefficients

a few in the upper left corner.

Quantized coefficients.

4/14/2008

  • S. Kasaei

37

slide-38
SLIDE 38

Zigzag Scanning

For maximum efficiency,

For maximum efficiency, coefficients are read & saved in a zigzag way.

Except for first couple of

l t th t ill b elements, the rest will be mostly zero.

Zigzag scanning

4/14/2008

  • S. Kasaei

38

Zigzag scanning.

slide-39
SLIDE 39

JPEG-98 Image Encoder

Q-table Zigzag scan VLC entropy table image compressed image data AC DCT Q differential VLC Offset (128) image image data DC entropy table

JPEG-98 standard.

4/14/2008

  • S. Kasaei

39

slide-40
SLIDE 40

IDCT Transform

When matrix is filled image data is When matrix is filled, image data is

reconstructed using Inverse DCT (IDCT):

+ + = ∑∑ ) 16 1 2 cos( ) 16 1 2 cos( 4 1

7 7 , , v u v u y x

v y u x T C C S π π    =

∑∑

= =

2 2 16 16 4

u v y

C C

u , v = 0

   = 1 2 ,

v u C

C

  • therwise

4/14/2008

  • S. Kasaei

40

slide-41
SLIDE 41

JPEG: A bit more details

Uses 8x8 DCT. Each coefficient is quantized using a uniform

quantizer.

The step size varies based on coefficient’s variance & its

p visual importance (HVS).

Quantized coefficients are converted into binary bit

streams using run-length coding plus Huffman g g g p coding.

It can be applied on:

Image intensity, when the prediction is not accurate or when it Image intensity, when the prediction is not accurate or when it

is desired to reset the prediction loop, or

Prediction error.

Quality factor scales the quantization table.

Kasaei 41

Quality factor scales the quantization table.

slide-42
SLIDE 42

JPEG: A bit more details

Perceptual-based quantization matrix Zig-zag ordering of DCT coefficients smaller smaller stepsize Run-length coding example: different tables for luminance & chrominance.

Kasaei 42

DC prediction error

slide-43
SLIDE 43

JPEG Image Encoder

The dynamic range of the coefficient value is

partitioned into several segments.

For DC coefficient:

Segment number is Huffman coded

Segment number is Huffman coded. Relative magnitude is fix-length coded.

For AC coefficients:

The part consisting of zero run-length is Huffman

coded. Segment number of nonzero value is Huffman coded

Segment number of nonzero value is Huffman coded. Relative magnitude of nonzero value is fix-length

coded.

Kasaei 43

slide-44
SLIDE 44

JPEG Image Encoder

To further improve the coding efficiency:

p g y

Arithmetic coding can be used (instead of

Huffman coding). A th DC l f dj t bl k i il

As the DC values of adjacent blocks are similar,

the predicted DC error is quantized & coded.

Kasaei 44

slide-45
SLIDE 45

JPEG-98 VS. JPEG-2000

JPEG-98 at 0.125 bpp (192:1) JPEG-2000 at 0.125 bpp (192:1)

Performance of JPEG 98 & JPEG 2000

4/14/2008

  • S. Kasaei

45

Performance of JPEG-98 & JPEG-2000.

slide-46
SLIDE 46

Predictive Image Coding

Σ

Quantizer Quantizer e(n) ~ ~ Input

  • +

Encoder Symbol Compressed x(n) e(n) Image Data

Σ

^ x(n) x(n) ~ + + Predictor

Encoder repeats the same

Encoder

~ ~

same process as the decoder.

Decoder Symbol

Σ

Image Decompressed Predictor ^ x(n) e(n) ~ x(n) ~ + + Compressed Data Predictor

Decoder

A l d l l di ti di t (DPCM)

4/14/2008

  • S. Kasaei

46

A closed-loop lossy predictive coding system (DPCM).

slide-47
SLIDE 47

Predictive Image Coding

Uses intra-pixel & inter-pixel redundancies Uses intra pixel & inter pixel redundancies

among pixels.

Motivation: Predicts a sample from adjacent

p j pixels in the same frame or in a previous frame, and quantizes & codes the error only.

If the prediction error is typically small, then it

can be represented with a lower average bitrate.

Optimal predictor: minimizes the prediction error.

Kasaei 47

slide-48
SLIDE 48

DPCM Encoder (Closed-Loop Prediction)

Encoder repeats the same process as the the decoder.

Kasaei 48

Differential pulse coded modulation (DPCM).

slide-49
SLIDE 49

Distortion in Predictive Coder

With a closed-loop prediction, the reconstruction

es=s-s^=eq

With a closed loop prediction, the reconstruction error of a sample is equal to the quantization error (or the prediction error).

quantization

Error usually has a nonuniform distribution

( L l i )

q error

(zero-mean Laplacian).

predicted sample quantized prediction

Kasaei 49

error

slide-50
SLIDE 50

Optimal Predictor

Question: Which predictor should we use? Question: Which predictor should we use?

Minimize the bitrate for coding the prediction error. Because quantization error with a given bitrate Because quantization error with a given bitrate

depends on the variance of the signal: minimizing the quantization error = minimizing the g g prediction error variance.

We will limit our consideration to linear predictor

l

  • nly:

Kasaei 50

slide-51
SLIDE 51

Linear Minimum MSE Predictor

Prediction error:

RV

Prediction error: Optimal coefficients must satisfy:

/

2

= ∂ ∂

l

a σ

(1)

/ ∂ ∂

l p

a σ

Note: Eq (1) is also known as the orthogonality principle in estimation theory

Kasaei 51

Note: Eq. (1) is also known as the orthogonality principle in estimation theory.

slide-52
SLIDE 52

Matrix Form

The previous equation can be rewritten as: The previous equation can be rewritten as: } { ) , (

l kS

S E l k R = Optimal solution:

Yule-Walker Equation:

p

Kasaei 52

slide-53
SLIDE 53

Predictive Coding Gain

predictive prediction coding gain (2) prediction error predictor d (2)

  • rder

spectral flatness flatness measure Thus, predictive coding gain is inversely proportional to the spectral flatness. A i l ith fl t t (i hit i ) i di t bl !

Kasaei 53

A signal with a flat spectrum (i.e., white noise) is unpredictable!

slide-54
SLIDE 54

Predictive Coding Gain

kth eigenvalue f Nth d Integral in (2)

  • f Nth order

covariance matrix KLT

TC=PC, if the block length in TC & the predictive order in PC both go to infinity. For any finite length data, PC is better (PC of any order uses infinite memory).

Kasaei 54

slide-55
SLIDE 55

2-D Linear Prediction Example

A a B a C a D

3 2 1

ˆ + + =

Kasaei 55

slide-56
SLIDE 56

Example (cnt’d)

Kasaei 56

(DPCM is better than TC for this case!)

slide-57
SLIDE 57

Predictive Coding for Video

For video, we apply prediction both among pixels in

For video, we apply prediction both among pixels in the same frame (intra-prediction or spatial prediction), and also among pixels in adjacent f (i t di ti t l di ti ) frames (inter-prediction or temporal prediction).

Temporal prediction is efficient only if the underlying

scene is stationary scene is stationary.

With moving objects & cameras, the temporal

prediction is done using motion compensation. prediction is done using motion compensation.

More on this subject in the next lecture.

Kasaei 57

slide-58
SLIDE 58

Homework 8

Reading assignment: Reading assignment:

  • Sec. 9.1, 9.2

Written assignment: Written assignment:

  • Prob. 9.3, 9.4, 9.5, 9.6, 9.7

Computer assignment Computer assignment

  • Prob. 9.8, 9.9

Kasaei 58

slide-59
SLIDE 59

The End