- TSBK01 Image Coding and Data Compression
Lecture 10 Jörgen Ahlberg
TSBK01 Image Coding and Data Compression Lecture 10 Jrgen - - PowerPoint PPT Presentation
TSBK01 Image Coding and Data Compression Lecture 10 Jrgen Ahlberg I. Colour coding II. Moving images: From 2D to 3D? III. Hybrid coding IV. Video coding standards
Lecture 10 Jörgen Ahlberg
Colour coding II. Moving images: From 2D to 3D? III. Hybrid coding IV. Video coding standards
– Red:
700 nm
– Green:
546 nm
– Blue:
435 nm
Three base colours enough to synthesize any visible colour!
B G R
luminance Y = R+G+B = 1
Cr = 0.70R - 0.59G - 0.11B Cb = - 0.30R - 0.59G + 0.89B Y luminance; Cr, Cb chrominance Matrix R G B Y R-Y B-Y
Change basis to YUV (almost the same as YCrCb). – For more info on color spaces, see colour FAQ at
www.poynton.com/Poynton-color.html
The Human Visual System perceives the luminance in higher
resolution than the chrominance! Subsample the colour components.
U V 4:2:0
Y
U V 4:2:2
Low Very high 0.1 - 0.5 Fractal High High 0.1 – 1.0 Subband/ Wavelet High High 0.5 – 1.5 Transform Low Low 2 – 5 Predictive Low Very high 0.5 – 2 VQ Low Low 6 – 8 PCM
Decoding complexity Complexity Prestanda (bpp) Coding Method
–
3D predictors
–
Motion compensated predictors
–
3D transforms
–
3D subband filters
BUT! The properties of the image signal are different in the temporal and the spatial domain!
Hybrid predictive/transform coding popular++
Use predictive coding to predict the next frame in the
sequence.
Use transform coding to code the prediction error.
Q VLC T: Transform Q: Quantizer VLC: Variable Length Coder
Q-1 VLC P
Q: Quantizer Q-1: Inverse quantizer (reconstructor) P: Predictor
T-1 Q Q-1 VLC P
I-frame Predictively coded P-frames
Better prediction if it can compensate for motion!
ME
ME: Motion estimation
TQ-1 TQ P VLC
TQ: Transform + quantization
transform blocks)
Motion estimation is a time consuming process
– Hierarchical motion estimation – Maximum length of motion vectors – Clever search strategies
Motion vector accuracy:
– Integer, half or quarter pixel – Bilinear interpolation
16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate Mobile videophone Videophone
ISDN videophone Digital TV HDTV Video CD
MPEG-4 MPEG-1 MPEG-2 H.261 H.263
– Standards for real time communication like video telephony
and video conferencing.
– Standardized by ITU.
MPEG
– Standards for stored video data like movies on CDs, DVDs,
etc.
– Standardized by ISO.
Motion compensation: – One motion vector per macroblock. – One macroblock = four 88 luminance blocks + two chrominance
blocks (one U and one V).
– Motion vectors max 15 pixels long in each direction. Format: – CIF (352288) or QCIF (176144) – 7.5 – 30 frames/s. Bitrate: Multiple of 64 kbit/s (=ISDN) including audio. Quality: Acceptable for small motion at 128 kbit/s.
lines in 1995.
Format:
– CIF, QCIF or Sub-QCIF. – Usually less than 10 frames/s.
Bitrate: Typically 20 – 30 kbit/s. Quality: With new options as good as H.261 (at half the
bitrate).
ISO and IEC.
Original plan:
– MPEG-1 for 1.5 Mbit/s (VideoCD) – MPEG-2 for 10 Mbit/s (Digital TV) – MPEG-3 for 40 Mbit/s (HDTV)
What happened:
– MPEG-1 for 1.5 Mbit/s (Video CD) – MPEG-2 for 2 – 60 Mbit/s (TV and HDTV) – MPEG-4, -7 and -21 for other things.
Target bitrate around 1.5 Mbit/s (Video CD). Properties:
– Bi-directionally predictively coded frames (”B-frames”, see next
slide).
– More flexible than H.261. – Almost JPEG for intra frames.
Format:
– CIF – No interlace. – 24 – 30 frames/s.
B P B B P B B P B B I B
Intra-coded I-frame Predictively coded P-frames Bi-directionally predictively coded B-frames
Group of frames (GOF)
88 DCT Arbitrary weighting matrix for coefficients Predictive coding of DC-coefficients Uniform quantization Zig-zag, run-level, entropy coding
Half-pixel accuracy of motion vectors, bilinear
interpolation.
Predictive coding of motion vectors. Prediction error coded as I-frame.
– Forward prediction only (1 vector/macroblock). – Backward prediction only (1 vector/macroblock). – Average of fwd and bwd (2 vectors/macroblock).
Otherwise as P-frames.
Properties: – Handles interlace (optimized for TV) – Even more flexible than MPEG-1 Format: – 352288 – 704576 (25 frames/s) or 720480 (30 frames/s) – 14401152 or 19201080 (HDTV) Bitrate: – 2 – 60 Mbit/s – ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM. – 18 – 20 Mbit/s: HDTV.
– Simple profile without B-frames. – Scaleable profiles.
Experience tells that:
– At 1.5 – 2 Mbit/s MPEG-2 is not better than MPEG-1. – With manual interaction at the coding, good quality can be
achieved at 3 – 4 Mbit/s.
– Problems with implementing the full standard has caused
compatibility problems.
– Buffering and rate control hard problems.
Instead of frames as coding units, MPEG-4 use audio-visual
Focus is not primarily on compression, but on content-based
functionality
Contains definitions of: – Media object types (video, audio, text, graphics, ...) – Parameters for describing the objects – Bitstream syntax for the (compressed) parameters – Scene description, file format, streaming, synchronization, ... Allows mixing of media objects.
– The bitstream syntax and the the binary ”language” for scene
description
– Computer graphics object descriptions – Multiplexing, transport, ... Part 2, Visual, contains – Video coding – Still image coding – Texture coding, ... Part 3, Audio, contains a toolbox of audio coders for different
applications
...
Coded with Shape Adaptive DCT
TQ VLC
VLC VLC
– Compose virtual environments – Easy to add text, graphs, images, etc High compression Receive object from separate sources – Use predefined or locally defined objects Scaleability – Progressive decoding – Better terminal gives better quality.
–
Lines, polygons
–
Still images
–
Image/video mapping on polygon meshes
Different environments for different users Simple change between environments Synthetic environments are cheaper than real ones
– Scaleable quality and resolution – Progressive decoding – Can be mapped on 2D or 3D meshes
Compression of 2D and 3D meshes
– Mesh geometry and animation – Transmit vertex coordinates and let the receiving terminal
calculate the polygons
– A moving or still image can be mapped on the mesh (texture
mapping).
Text-to-speech (TTS) interface View-dependent scaleable texture
– Information about the users view position in a 3D scene is
transmitted on a back-channel
– Only the necessary texture information is transmitted to the
user
The texture is mapped on a surface What the user sees
All are variations of the hybrid coder used in MPEG-
coders, with some extra features.
Finished in 2003.
Prediction of blocks of sizes up to 1616. Motion vectors for blocks of sizes 44 up to 1616. Up to 5 reference images for prediction. Non-uniform qunatization. Arithmetic coding of run-level pairs.
– Audio layer I, II and III (mp3).
MPEG-2
– Four channels, same codec as in MPEG-1. – AAC (Advanced Audio Codec) added later.
MPEG-4
– AAC – Two speech coders – Structured audio – And more...
More on audio coding in Lecture 11.
– Change basis from RGB to YUV – Colour components are compressed harder than the
luminance
Moving image coding
– Hybrid coding: Motion compensated predictive coding and
transform coding of the prediction error
– I-, P-, and B-frames – Object-based coding (MPEG-4) mixing synthetic and natural
audio & video
– MPEG-1:
Video CD
– MPEG-2:
Digital TV
– MPEG-4:
Multimedia
– H.261:
ISDN videophone
– H.263:
PSTN videophone
– H.264 / MPEG-4 part 10: Universal video
That was the last slide!