Transform Coding - Overview Principle of block-wise transform coding - - PowerPoint PPT Presentation

transform coding overview
SMART_READER_LITE
LIVE PREVIEW

Transform Coding - Overview Principle of block-wise transform coding - - PowerPoint PPT Presentation

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Threshold coding Typical coding artifacts Fast implementation


slide-1
SLIDE 1

Girod: Image and Video Compression

Transform Coding - Overview

Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Threshold coding Typical coding artifacts Fast implementation of the DCT

6-1

slide-2
SLIDE 2

Girod: Image and Video Compression

Transform Coding

6-2

  • riginal image

reconstructed image Transform A Inverse transform A-1 Quantization & Transmission

quantized transform coefficients

  • riginal

image block reconstructed block transform coefficients

slide-3
SLIDE 3

Girod: Image and Video Compression

Properties of Orthonormal Transforms

Forward Transform Inverse transform Parseval‘s Theorem holds: transform is a rotation of the signal vector around the origin of an N2-dimensional vector space.

6-3 input signal block of size N*N, arranged as a vector

y = A x

N*N transform coefficients, arranged as a vector Transform matrix

  • f size N2 * N2

→ →

x = A y = A y

  • 1

T → → → Linearity: x is represented as linear combination of

“basis functions“.

slide-4
SLIDE 4

Girod: Image and Video Compression

Separable Orthonormal Transforms, I

An orthonormal transform is separable, if the transformof a signal block of size N*N-can be expressed by

x = AT y A

The inverse transform is Great practical importance: The transform requires 2 matrix multiplications of size N*N instead one multiplication of a vector of size 1*N2 with a matrix of size N2*N2 Reduction of the complexity from O(N4) to O(N3)

6-4

A = A ⊗ A

Kronecker product

y = A x AT

N*N transform coefficients N*N block of input signal Orthonormal transform matrix

  • f size N * N

Note:

slide-5
SLIDE 5

Girod: Image and Video Compression

2D transform realized by 2 one-dimensional transforms (along rows and columns of the signal block) N*N block of pixels

x

column-wise N-transform row-wise N-transform

A x A x A

T N*N block of transform coefficients

6-5

N N

Separable Orthonormal Transforms, II

slide-6
SLIDE 6

Girod: Image and Video Compression

Criteria for the Selection of a Particular Transform

Decorrelation, energy concentration (e.g., KLT, DCT, . . .) Visually pleasant basis functions (e.g., pseudo-random-noise , m-sequences, lapped transforms) Low complexity of computation

6-6

slide-7
SLIDE 7

Girod: Image and Video Compression

Karhunen Loève Transform (KLT)

Karhunen Loève Transform (KLT) yields decorrelated transform coefficients. KLT achieves optimum energy concentration. Disadvantages: KLT dependent on signal statistics KLT not separable for image blocks Transform matrix cannot be factored into sparse matrices. Basis functions are eigenvectors of the covariance matrix of the input signal.

6-7

slide-8
SLIDE 8

Girod: Image and Video Compression

Comparison of Various Transforms, I

Haar transform (1910) Walsh-Hadamard transform(1923) Slant transform (Enomoto, Shibata, 1971) Discrete CosineTransform (DCT) (Ahmet, Natarajan, Rao, 1974) Karhunen Loève transform (1948/1960) Comparison of 1D basis functions for block size N=8

6-8

slide-9
SLIDE 9

Girod: Image and Video Compression

Energy concentration measured for typical natural images, block size 1x32 (Lohscheller):

6-9

KLT is optimum DCT performs only slightly worse than KLT

Comparison of Various Transforms, II

slide-10
SLIDE 10

Girod: Image and Video Compression

Discrete Cosine Transform and Discrete Fourier Transform

Transform coding of images using the Discrete Fourier Transform (DFT): For stationary image statistics, the energy concentration properties of the DFT converge against those of the KLT for large block sizes. Problem of blockwise DFT coding: blocking effects due to circular topology of the DFT and Gibbs phenomena. Remedy: reflect image at block boundaries, DFT of larger symmetric block -> “DCT“

6-10

edge folded pixel folded

slide-11
SLIDE 11

Girod: Image and Video Compression

DCT

6-11

2D basis functions of the DCT: Type II-DCT of blocksize M x M is defined by transform matrix A containing elements aik = αi cos π (2k + 1) i 2 M α0 = 1 M αi = 2 M ∀ i ≠ 0 with i, k = 0.....M-1

slide-12
SLIDE 12

Girod: Image and Video Compression

Bit Allocation for Transform Coefficients I

Problem: divide bit-rate R among MxM transform coefficients i such that resulting distortion D is minimized.

6-12

Assumptions ∂Di ∂Ri = ∂Dj ∂Rj R = Ri

i

D = Di

i

Total rate Rate for coefficient i Total distortion Distortion contributed by coefficient i

lead to "Pareto condition" for all i,j

slide-13
SLIDE 13

Girod: Image and Video Compression

Bit Allocation for Transform Coefficients II

Additional assumptions “Gaussian r.v.“ and mse distortion yield the optimum rate for each transform coefficient i: Literature contains many practical bit allocation schemes that are based on this insight

variance of transform coefficient (i,j) Maximum acceptable mean squared error

R = max [ ( log ), 0 ] bit 1 2

2

σi

2 i

D

6-13

slide-14
SLIDE 14

Girod: Image and Video Compression

Amplitude Distribution of the DCT Coefficients

✗ Histograms for 8x8 DCT coefficient amplitudes measured for natural images (from Mauersberger):

6-14

DC coefficient is typically uniformly distributed. For the other coefficients, the distribution resembles a Laplacian pdf.

slide-15
SLIDE 15

Girod: Image and Video Compression

Threshold Coding, I

6-15

Transform coefficients that fall below a threshold are discarded. Implementation by uniform quantizer with threshold characteristic:

Quantizer input Quantizer

  • utput

Positions of non-zero transform coefficients are transmitted in addition to their amplitude values.

slide-16
SLIDE 16

Girod: Image and Video Compression

Threshold Coding, II

  • rdering of the transform coefficients

by zig-zag-scan 6-16

Efficient encoding of the position of non-zero transform coefficients: zig-zag-scan + run-level-coding

slide-17
SLIDE 17

Girod: Image and Video Compression

(185 3 1 0 1 1 1 -1 0 1 0 1 1 0 -3 2 -1 0 0 0 0 0 0 1 -1 -1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 -1 EOB) 201 195 188 193 169 157 196 190 193 188 187 201 195 193 213 193 184 192 180 195 182 151 199 193 176 172 179 179 152 148 198 183 196 195 169 171 159 185 218 175 214 213 205 170 173 185 206 150 207 205 207 184 180 167 173 160 198 203 205 186 196 149 159 163 1480 49 33 -15 -14 33 -38 20 10 -52 11 -12 16 17 -13 -12 19 32 -22 -10 22 -20 9 8 16 10 17 27 -31 12 6 -5

  • 30 -6 13 -12 8 4 -3 -3
  • 25 16 6 -24 9 3 3 3
  • 2 17 4 -6 0 -4 -9 8

1 -2 6 0 7 -5 -8 -7

DCT

run-level- coding

196 193 187 192 179 176 196 189 198 188 182 198 196 192 208 200 185 189 191 197 174 159 184 189 167 181 182 177 154 153 187 189 201 199 178 165 163 185 206 179 220 217 193 176 165 179 197 170 194 198 195 193 169 156 180 179 210 196 192 209 185 149 157 160 Mean of block: 185 (0,3) (0,1) (1,1) (0,1) (0,1) (0,-1) (1,1) (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,- 1) (1,-1) (14,1) (9,-1) (0,-1) (EOB)

scaling and inverse DCT inverse zig-zag- scan run-level- decoding

Original 8x8 block Reconstructed 8x8 block

transmission

185 3 1 1 -3 2 -1 0 1 1 -1 0 -1 0 0 1 0 0 1 0 -1 0 0 0 1 1 0 -1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 185 3 1 1 -3 2 -1 0 1 1 -1 0 -1 0 0 1 0 0 1 0 -1 0 0 0 1 1 0 -1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mean of block: 185 (0,3) (0,1) (1,1) (0,1) (0,1) (0,-1) (1,1) (1,1) (0,1) (1,-3) (0,2) (0,-1) (6,1) (0,-1) (0,- 1) (1,-1) (14,1) (9,-1) (0,-1) (EOB)

Threshold Coding, III

6-17

Q

(185 3 1 0 1 1 1 -1 0 1 0 1 1 0 -3 2 -1 0 0 0 0 0 0 1 -1 -1 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 -1 EOB)

slide-18
SLIDE 18

Girod: Image and Video Compression

Detail in a Block vs. DCT Coefficients Transmitted

image block DCT coefficients

  • f block

quantized DCT coefficients

  • f block

block reconstructed from quantized coefficients

6-18

2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30 2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30 2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30 2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30 2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30 2 4 6 2 4 6
  • 30
  • 20
  • 10
10 20 30
slide-19
SLIDE 19

Girod: Image and Video Compression

Typical DCT Coding Artifacts

DCT coding with increasingly coarse quantization, block size 8x8

6-19

quantizer stepsize for AC coefficients: 25 quantizer stepsize for AC coefficients: 100 quantizer stepsize for AC coefficients: 200

slide-20
SLIDE 20

Girod: Image and Video Compression 6-20

Adaptive Transform Coding

Transform Quantization Entropy coding Block classification

Input signal class

Quantization and entropy coding optimized separately for each class. Typical classes: Blocks without detail Horizontal structures Vertical structures Diagonals Textures without preferred orientation

slide-21
SLIDE 21

Girod: Image and Video Compression 6-21

Influence of DCT Block Size

Efficiency as a function of blocksize NxN, measured for 8 bit quantization in the

  • riginal domain and equivalent quantization in the transform domain

Block size 8x8 is a good compromise. G =

Memoryless entropy of original signal mean entropy of transform coefficients

slide-22
SLIDE 22

Girod: Image and Video Compression 6-22

DCT matrix factored into sparse matrices (Arai, Agui, and Nakajima; 1988): y = M x = S P M M M M M M x 1 2 3 4 5 6 1 M =

1 1 1 1 1 1 1 1 1 -1

  • 1 1

S =

s

1

s

2

s

3

s

4

s

5

s

6

s

7

s

P =

1 1 1 1 1 1 1 1

2 M =

1 1 1 1

  • 1 1

1 1 1 1

  • 1 1

3 M =

1 1 1 1

C2

  • C2

C4 C4

  • C6
  • C6

4 M =

1 1 1 -1 1 1 1 1 1 1 1

5 M =

1 1 1 1 1 -1 1 -1

  • 1 -1

1 1 1 1 1

6 M =

1 1 1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1

Fast DCT Algorithm I

slide-23
SLIDE 23

Girod: Image and Video Compression

Signal flow graph for fast (scaled) 8-DCT according to Arai, Agui, Nakajima:

6-23

x0 x 1 x 2 x 3 x 5 x 6 x 7 a5 a4 a3 a2 a1 s0 s4 s2 s6 s3 s1 s7 s5 y0 y 4 y 2 y 6 y 5 y 1 y 7 y 3 x4 u m u m

Multiplication:

u+v v u v u-v u a1= C4 a2= C2 a3= C4 a5= C6

  • C6

a4 = C6 + C2 C = cos(kπ/16)

k

Addition:

s 0 = 1 2 2 1 sk

k

; k = 1...7 = 4 C

Fast DCT Algorithm II

scaling

  • nly 5 + 8

multiplications (direct matrix multiplication: 64 multiplications)

slide-24
SLIDE 24

Girod: Image and Video Compression

Transform Coding: Summary

Orthonormal transform: rotation of coordinate system in signal space Purpose of transform: decorrelation, energy concentration KLT is optimum, but signal dependent and, hence, without a fast algorithm DCT shows reduced blocking artifacts compared to DFT Bit allocation proportional to logarithm of variance Threshold coding + zig-zag-scan + 8x8 block size is widely used today (e.g. JPEG, MPEG, ITU-T H.263) Fast algorithm for scaled 8-DCT: 5 multiplications, 29 additions

6-24