[PDF] - Contents: 1. Introduction 1.1 Hybrid Video Coding 1.2 Object PDF Document

SLIDE 1

1

Siu_...OnVideoTranscoding 1

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Keynote Speaker: Professor W.C. Siu

Chair Professor and Director Centre for Signal Processing Department of Electronic and Information Engineering Hong Kong Polytechnic University

On Video Transcoding to Super-Resolution Videos

ICNNSP’2008:

IEEE International Conference on Neural Networks and Signal Processing, 6-10 June 2008, Zhenjiang China Siu_...OnVideoTranscoding 2

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Contents:

1. Introduction

1.1 Hybrid Video Coding 1.2 Object Oriented Coding 1.3 Advanced Video Concepts 1.4 A highlight of others of our studies

2. Motion Estimation Algorithms

2.1 Sample studies: Fast Adaptive Search Algorithm, Fast Pixel Decimation 2.2 Sample studies: Fast Exhaustive Full Search, Novel Directional Search

3. Video Transcoding

3.1 Video Transcoding (frame skipping) 3.2 Video Transcoding (H.264 to H.264 conversion)

4. Extending to video Enlargement Super-resolution Videos

4.1 New Edge-Directed Interpolation 4.2 Modified Edge-Directed Interpolation 4.3 SR video Construction 4.4 SR video re-encoding

6. Summary and Conclusion

SLIDE 2

2

Siu_...OnVideoTranscoding 3

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

1. Introduction

1.1 Hybrid Video Coding In the recent years, there is a remarkable progress in Video Coding. In this talk we mainly concentrate on predictive Hybrid Video Coding. Predictive Coding: Instead of transmitting a frame (called current frame), its motion vectors with reference to a previous frame (called reference frame) are transmitted. This will produce the motion compensated frame which consists of prediction errors. Most videos nowadays are coded by using the Hybrid Video Coding which makes use of the Predictive Coding.

Siu_...OnVideoTranscoding 4

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

In order to recover the signal without the motion estimation errors, a motion compensated Residual Frame is constructed, such that Residual frame = Current frame – Motion compensated frame This predicted residual frame is subsequently coded in a similar manner as an intra frame

i.e. through DCT  Quantization  Entropy coding Reference frame Current frame Motion compensated frame Residual frame

SLIDE 3

3

Siu_...OnVideoTranscoding 5

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Hybrid Video Coding

Source video

Frame Memory Frame Memory Frame Memory Frame Memory Motion Estimation Motion vectors Motion Estimation Motion Compensation Predicted frame Motion Compensation 2D-DCT

+

2D-DCT

Quantizer VLC coder Buffer Quantizer VLC coder Buffer Dequantizer Inverse 2D-DCT

+ +

Dequantizer Inverse 2D-DCT Regulator 011010…111 Compressed bit stream Regulator 011010…111 Siu_...OnVideoTranscoding 6

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Hybrid Video Coding

Source video

Frame Memory Frame Memory Motion Estimation Motion Compensation Predictive frame Regulator 011010…111 Motion Compensation 2D-DCT

+

Quantizer

VLC coder Buffer Dequantizer Inverse 2D-DCT

+ +

Compressed bit stream Motion vectors

Error Frame Motion Compensation

SLIDE 4

4

Siu_...OnVideoTranscoding 7

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

1.2 Object Oriented Coding

To divide the scene into Background object(s):

Hybrid Coding, or Sprite Generation Technique

and foreground objects:

Multiple objects Hybrid Coding Object based coding (complete object) Time, position, motion manipulations, etc.

Segmentation is still a problem, since the definition of an
bject can never make clear to computers.
Object boundaries always merge with the background,

etc.

Object Extraction

DM4 Siu_...OnVideoTranscoding 8

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

1.3 Advanced Video Concepts - from HVC to Advanced Video Coding

(H.264)

Source video

Frame Memory Frame Memory Motion Estimation Motion Compensation Predictive frame Regulator 011010…111 Motion Compensation 2D-DCT

+

Quantizer

VLC coder Buffer Dequantizer Inverse 2D-DCT

+ +

Compressed bit stream Motion vectors

Source video

Frame Memory Frame Memory Motion Estimation Motion Compensation Predictive frame Regulator 011010…111 Motion Compensation

+

VLC

coder Buffer Dequantizer Inverse 2D-DCT

+ +

Compressed bit stream 2D-DCT Quantizer Motion vectors Motion Estimation Motion Compensation Frame Memory Frame Memory Frame Memory

+

Intra-frame Prediction Predictive frame 2D-DCT Quantizer Integer Transform, Scaling, Quantization

Dequantizer

Inverse 2D-DCT Scaling and Inverse Transform

+ Source video

Frame Memory

+

Regulator 011010…111 VLC coder Buffer Compressed bit stream Motion vectors

SLIDE 5

5

Siu_...OnVideoTranscoding 9

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Source video

Frame Memory Frame Memory Motion Estimation Motion Compensation Predictive frame Regulator 011010…111 Motion Compensation

+

VLC

coder Buffer

+ +

Compressed bit stream Motion vectors Frame Memory Frame Memory Frame Memory Intra-frame Prediction Transform, Scaling, Quantization Scaling and Inverse Transform

Advanced Video Coding

Digital Transform Multi-

Ref. Frame

Variable Block Size

Siu_...OnVideoTranscoding 10

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

1. Background:

Hybrid Coding, or Sprite Generation Technique

2. Foreground:

Multiple objects Hybrid Coding Object based coding (complete object) Time, position, motion manipulations, etc.

Techniques developed:

1. Motion estimation/Sprite Generation

Our own fast hybrid Coding making use new concepts of fast motion estimation, and sprite generation techniques.

Video Composition: (Object Oriented Processing)

SLIDE 6

6

Siu_...OnVideoTranscoding 11

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Techniques developed (con’t):

2. Improved Automatic Image Segmentation

Making use a new marker-extraction technique and color information, simplified area morphology and modified watershed algorithm.

3. Fast Wavelet Computation

Using fast lifting algorithm, possibility to use overcompete wavelets.

4. Etc.

Object Player Jump to Conclusion ______________________________________________________________________________________

Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan , “Fast Motion Estimation of Arbitrary Shaped Video Objects in MPEG-4”, pp.33-50, Vol.18, Issue 1, Signal Processing: Image Communication, Elsevier Science, January 2003, The Netherlands.

H. Gao, W.C. Siu and C. Hou, ‘Improved Techniques for Automatic Segmentation’, pp.1273-80, Vol. 11, No.12, December 2001,

IEEE Transactions on Circuits and Systems for Video Technology, USA. Siu_...OnVideoTranscoding 12

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

A few Published Works: 1. A few adaptive motion estimation algorithms proposed, which make use

f simple statistics to determine the search directions and locations;

some pioneer work obtained very good citations. 2. Worked on fast algorithms with (i) with a selected sub-set of pattern(s), and (ii) with pixel adaptive pixel decimation.

_________________________________________________________________

References: Yui-Lam Chan and Wan-Chi Siu, ‘An Efficient Search Strategy for Block Motion Estimation using Image Features’, IEEE Transactions on Image Processing , pp.1223-38, Vol.10, No.8, August 2001, USA. Yui-Lam Chan and Wan-Chi Siu, ‘Reliable Block Motion Estimation through the Confidence Measure of Error Surface’, pp.135-46, Vol.76, issue 2, Signal Processing, 1999, Switzerland Yui-Lam Chan and Wan-Chi Siu, 'New Adaptive Pixel Decimation for Block Motion Vector Estimation', IEEE Transactions on Circuits & Systems for Video Technology, pp.113-118, Vol.6, No.1, February, 1996, U.S.A. (Listed as one the Most Cited Papers since 1990 on CSVT website, http://tcsvt.polito.it/, 2008 IEEE Trans on CSVT.)

2. Studies on Fast Motion Estimation

SLIDE 7

7

Siu_...OnVideoTranscoding 13

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

3. Recently suggested the concept of error clustering, which gives a completely revised concept on adaptive motion estimation. This is able to replace the PDS, and working Successive Elimination Algorithm (SEA) or Multilevel SEA for extremely fast full search motion estimation. 4. Recently we suggested:

(i) to use a search window being equal to the size of Motion Vector(s) in the 1st for multi-frame motion estimation. (ii) to use partial SAD for variable block sizes motion estimation. (iii) to use directional search to form a novel scheme, etc.

_________________________________________________________________

References: Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan, ‘New Adaptive Partial Distortion Search using Clustered Pixel Matching Error Characteristic’, pp.597-607, Vol.14, No.5, May 2005, IEEE Transactions on Image Processing M.Y. Chiu and W.C. Siu, New Results on Exhaustive Search Algorithm for Motion Estimation using Adaptive Partial Distortion Search and Successive Elimination Algorithm’, Proceedings, pp.3978-81, IEEE International Symposium on Circuits and Systems (ISCAS’2006), May 2006, Island of Kos, Greece. Liangming Ji and Wan-Chi Siu, ‘Reduced Computation using Adaptive Search Window Size for H.264 Multi-frame Motion Estimation’, Paper 1568982117, pp.1-5, Proceedings, 14th European Signal Processing conference (EUSIPCO’2006), September 2006, Florence Italy Yan-Ho Kam and Wan-Chi Siu, ‘A Fast Full Search Scheme for Rate-Distortion Optimization of Variable Block Size and Multi- frame Motion Estimaiton’, Proceedings, paper 3095, pp.1-4, Proceedings, IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’2006), August 2006, San Juan, Puerto Rico, USA. Ying Zhang, Wan-Chi Siu and Tingzhi Shen, ‘Yet a Faster Motion Estimation Algorithm with Directional Search Strategies’, pp.475-78, Proceedings, 15th International Conference on Digital Signal Processing (DSP’2007), July 2007, Cardiff, UK.

Adaptive Search Window Error Clustering

Siu_...OnVideoTranscoding 14

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

3. Video Transcoding

Universal Access Servers Video Mobile Television Set-top Box Intelligent Home Home PC Office Computers Wide Area Network

Video Transcoding: Given a variety of client devices, it is difficult for a server to tailor the content for individual devices. A video server may have to provide quality support services to heterogeneous clients or transmission channels. It is in this reason that the video server should have the capability to perform transcoding: a process of converting a previously compressed video bitstream into a bit stream of different nature or lower bitrate.

SLIDE 8

8

Siu_...OnVideoTranscoding 15

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Homogeneous Transcoding – three types: 1. Frame Skipping 2. Video Downscaling 3. Transcoding with Bit-rate Reduction Heterogeneous Transcoding: 4. Conversion of videos among standards (or between frame types)

Siu_...OnVideoTranscoding 16

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Conventional Transcoder:

VLD: Variable Length Decoding VLC: Variable Length Coding Q1

1: Inverse Quantization (Fine Quantzer)

Q2: Quantization (Q2 Coarse Quantzer) MCF: Motion Compensated Frame DCT: Discrete Cosine Transform DCT -1: Inverse Discrete Cosine Transform EMV: Encoding Motion Vector MC: Motion Compensation VLD Q2

1

DCT-1 Frame Buffer MC

Stream Separation

VLD Q1

1

DCT-1 MCF Reference Frame 1 Motion Vectors Q2 Q2

1

DCT-1 MCF Reference Frame 2 Motion Vectors VLC DCT ME EMV Compressed Bit Stream Decoding Front Encoder Coarse Re-Encoder Motion Vectors End Decoder

SLIDE 9

9

Siu_...OnVideoTranscoding 17

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Frame-Skipping Transcoder

When the frame rate changes, the incoming quantized DCT coefficients of residual signal may no longer be valid because they refer to a frame which may have been dropped. First, the transcoder decodes the incoming bitstream in the pixel

domain. Second, the decoded video frame is then re-encoded at

the desired lower frame rate.

Transcoder Front-encoder decoder encoder End-decoder The incoming bitstream is decoded into the pixel domain The decoded video frame is re- encoded at the desired frame rate.

Frame Skipping

Siu_...OnVideoTranscoding 18

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

To look at the decoding and re-encoding parts alone

First, the video bitstream performs VLC decoding, inverse

quantization and inverse DCT. So, frame Rt-1 can be reconstructed and stored in buffer FB. Note that Rt-1 is required to act as the reference frame for the reconstruction of frame Rt. Hence we have where is the prediction error (residual signal).

Frame-skipping Transcoder

Q-1 DCT-1 DCT Q FB MC Q-1 DCT-1 FB MC

+

Rt

) , (

t t v

u

) , (

s t s t v

u

+ + + A B

s VLC-1 VLC

+

from front- encoder

t

e ˆ

s 2 t

R 

1 t

R 

s t

R )] e ( DCT [ Q

s t

) , ( ˆ ) , ( ) , (

1

j i e v j u i R j i R

t t t t t

   



t

e ˆ

SLIDE 10

10

Siu_...OnVideoTranscoding 19

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Let us assume that Rt-1 be dropped.

If Rt-1 is dropped, we have to find frame Rt at time t, with

reference to the previous non-skipped frame at time t-2, i.e. Rt-2 .

New compensation error, es

t(i,j) has to be found. Frame-skipping Transcoder

Q-1 DCT-1 DCT Q FB MC Q-1 DCT-1 FB MC

+

Rt

) , (

t t v

u

) , (

s t s t v

u

+ + + A B

s VLC-1 VLC

+

from front- encoder

t

e ˆ

s 2 t

R 

1 t

R 

s t

R )] e ( DCT [ Q

s t

Siu_...OnVideoTranscoding 20

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Direction Addition Approach:

Let us consider a special case - Macroblocks without motion compensation Recall that Rt-1 is dropped; we can use the motion vectors (ut, vt) and (ut-1, vt-1) to reconstruct the new motion vector (us

t, vs t).

Now we need to find

)] ( [

s t

e DCT Q

Rt Rt-1 (dropped) Rt-2 BMt BMt-1 BMt-2

) , ( ) , (

1 1  



t t s t s t

v u v u ) , ( ) , ( ) , (

1 1 t t t t s t s t

v u v u v u  

 

(ut,vt) = (0,0) (ut-1,vt-1)

SLIDE 11

11

Siu_...OnVideoTranscoding 21

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Macroblocks without motion compensation

Note that and are available from the incoming bitstream.

)] e ˆ ( DCT [ Q

1 t

)] ˆ ( [

t

e DCT Q

t t t

e MB MB ˆ

1 



 1 2 1

ˆ 

 

 

t t t

e MB MB ) ˆ ( ) ˆ ( ) (

1 

 

t t s t

e DCT e DCT e DCT )] ˆ ( [ )] ˆ ( [ )] ( [

1 

 

t t s t

e DCT Q e DCT Q e DCT Q

1 2

ˆ ˆ

 

   

t t t t s t

e e MB MB e

Rt Rt-2 BMt BMt-1 BMt-2

)] ˆ ( [

t

e DCT Q )] ˆ ( [

1  t

e DCT Q

)] ( [

s t

e DCT Q

Rt-1 (dropped)

Siu_...OnVideoTranscoding 22

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Macroblocks without motion compensation

A Direct Addition of the DCT Coefficients: newly quantized DCT coefficients can be computed in the DCT-domain by adding directly the quantized DCT coefficients between the data in the DCT-domain buffer and the incoming DCT coefficients, whilst the updated DCT coefficients are stored in the DCT-domain buffer.

t t t

e MB MB ˆ

1 



 1 2 1

ˆ 

 

 

t t t

e MB MB ) ˆ ( ) ˆ ( ) (

1 

 

t t s t

e DCT e DCT e DCT )] ˆ ( [ )] ˆ ( [ )] ( [

1 

 

t t s t

e DCT Q e DCT Q e DCT Q

1 2

ˆ ˆ

 

   

t t t t s t

e e MB MB e

Rt Rt-2 BMt BMt-1 BMt-2

)] ˆ ( [

t

e DCT Q )] ˆ ( [

1  t

e DCT Q

)] ( [

s t

e DCT Q

Rt-1 (dropped)

SLIDE 12

12

Siu_...OnVideoTranscoding 23

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Direct Addition of the DCT Coefficients

It is not necessary to perform the motion compensation, DCT, quantization,

inverse DCT and inverse quantization – the complexity is greatly reduced.

Requantization is not necessary for macroblocks coded without motion

compensation – the quality degradation due to re-encoding of the transcoder is avoided.

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 frame number Percentage of marcoblock without Motion Compensation (%)

By using a direct addition of the DCT coefficients for non-moving macroblocks, the computational complexity involved in processing these macroblocks can be reduced significantly and the additional re- encoding error can be avoided. The distribution of the coding modes for the typical “salesman” sequence

____________________________________________________________________

References: Kai-Tat Fung, Yui-Lam Chan and Wan-Chi Siu, “Low-Complexity and High-Quality Frame-Skipping Transcoder for Continuous Presence Mutlipoint Video Conferencing”, pp.31-46, Vol.6, No.1. February 2004, IEEE Transactions on Multimedia, USA. Kai-Tat Fung, Yui-Lam Chan and and W.C. Siu, ‘New Architecture for Dynamic Frame-Skipping Transcoder’, pp.860-900, Vol.11, No.8, IEEE Transactions on Image Processing, August 2002, USA. Siu_...OnVideoTranscoding 24

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Video Transcoding: Sample Study 2–

Transcoding the H.263 to the H.264 within the Transform Domain

Why transcoding from H.263 to H.264 ? The complete migration to the new video coding algorithm will take several years since H.263 and MPEG are widely used in many multimedia applications nowadays. This creates an important need for transcoding technologies that convert the widely available H.263 compressed videos to H.264 compressed format and vice versa. However, given the significant differences between the H.263 and the H.264 algorithms, transcoding is much more complex.

Vodeo Coding & Transcoding: Prof. Wan-Chi Siu References: Wan-Chi Siu, Yui-Lam Chan and Kai-Tat Fung, “On Transcoding a B-frame to a P-frame in the Compressed Domain”, pp.1093- 1102, Vol.9, Issue 6, October 2007, IEEE Transactions on Multimedia, USA. Kai-Tat Fung and Wan-Chi Siu, ‘DCT-based Video Downscaling Transcoder using Split and Merge Technique’ , pp.394-403, Vol.15, No.2, February 2006, IEEE Transactions on Image Processing, USA. Kai-Tat Fung and Wan-Chi Siu, ‘On Re-composition of Motion Compensated Macroblock for DCT-based Video Transcoding’, pp.44-58, Vol.21, No.1, January 2006, Signal Processing: Image Communication, Elsevier Science, The Netherlands. Kai-Tat Fung, Yui-Lam Chan and Wan-Chi Siu, “Low-Complexity and High-Quality Frame-Skipping Transcoder for Continuous Presence Mutlipoint Video Conferencing”, pp.31-46, Vol.6, No.1. February 2004, IEEE Transactions on Multimedia, USA.

SLIDE 13

13

Siu_...OnVideoTranscoding 25

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Sample Study 3: H.264H.264

Architecture of a

Down-sizing Transcoder-

Read and Decode

ne frame

P-Frame Check frame type I-frame

E.g. for H.264: High profile from HD to SD We have to convert a video of HD (1920 x 1080) format to SD (1280 x 720) format

For a macroblock:

1. To determine its mode type: Intra or inter
2. To determine its prediction mode

for intra-mode

3. To determine its mode (VBS) for inter-mode
4. Check if skip mode to be used
5. Motion re-estimation.

Siu_...OnVideoTranscoding 26

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

HDTV: HD 19201080 SD 1280720

3 2

Transcoding

3 2

Transcoding

SLIDE 14

14

Siu_...OnVideoTranscoding 27

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

A B C D

3 2

Transcoding

3 2

Transcoding Siu_...OnVideoTranscoding 28

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

HDTV: Video Transcoding

Transcoding from HD and SD form ats:

Procedure: (i) timing Analysis, (ii) data extraction from the codec, (iii) building an ideal speed-up model for transcoding in the H.264 platform (architecture realization) and (iv) video transcoder refinement using various technolgies:

Technologies: (Algorithms have to be designed for )

(1) inter/intra re-decision (2) intra mode re-decision (I16x16 or I4x4) (3) inter mode re-prediction (4) motion vector re-estimation

SLIDE 15

15

Siu_...OnVideoTranscoding 29

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Study of the average encoding speed of the JM12.2 encoder

Sequence name: CrowdRun_720p

No. of frames:

500 Intra-frame period:

No. of slices per frame:

1 QP-I, QP-P: 30 QP-B: 32 Inter-block-sizes used: 16*16, 16*8, 8*16, 8*8 Intra-block-sizes used: 16*16, 8*8, 4*4 Max search range: 128 Number of reference frames: 1 Sub-pixel depth: Quarter-pixel Entropy coding method: CAVLC Special features applied: Weighted prediction, skip and direct coding modes, 8x8 integer transform, deblocking filter Encoder complied by: VC6.0

Siu_...OnVideoTranscoding 30

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

No. of B-frames = 0

Total time: 1770.327 sec Reading frames time: 11.927 sec (0.67%) Padding reference frames time: 0.000 sec (0.00%) Integer ME time (EPZS): 156.196 sec (8.82%) Sub-pel ME time (1/2 & 1/4 pel): 277.982 sec (15.70%) Interpolation time: 88.747 sec (5.01%) Getting MVs for direct mode time: 0.000 sec (0.00%) Weighted prediction time: 1.918 sec (0.11%) Intra prediction time: 657.611 sec (37.15%) Computing distortion values for modes time: 28.422 sec (1.61%) Computing rate values for modes time: 185.476 sec (10.48%) Luma residue coding time: 95.462 sec (5.39%) Chroma residue coding time: 139.090 sec (7.86%) Setting parameters time: 7.544 sec (0.43%) Entropy coding time: 10.229 sec (0.58%) Deblocking time: 9.499 sec (0.54%) Other time: 100.224 sec (5.66%) PSNR: 33.43 dB Bit-rate: 22625.70 kbps@50Hz

SLIDE 16

16

Siu_...OnVideoTranscoding 31

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

No. of B-frames = 2

Total time: 2649.320 sec Reading frames time: 16.621 sec (0.63%) Padding reference frames time: 0.000 sec (0.00%) Integer ME time (EPZS): 532.062 sec (20.08%) Sub-pel ME time (1/2 & 1/4 pel): 463.725 sec (17.50%) Interpolation time: 29.039 sec (1.10%) Getting MVs for direct mode time: 1.854 sec (0.700%) Weighted prediction time: 3.054 sec (0.12%) Intra prediction time: 628.926 sec (23.74%) Computing distortion values for modes time: 43.832 sec (1.65%) Computing rate values for modes time: 237.783 sec (8.98%) Luma residue coding time: 190.330 sec (7.18%) Chroma residue coding time: 221.465 sec (8.36%) Setting parameters time: 7.975 sec (0.30%) Entropy coding time: 8.078 sec (0.30%) Deblocking time: 9.350 sec (0.35%) Other time: 255.226 sec (9.63%) PSNR: 32.74 dB Bit-rate: 20901.61 kbps@50Hz

Siu_...OnVideoTranscoding 32

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

JM 12.2 encoder timing

QP(I,P,B) = 27, 28, 29 720P 250 Frames search range = ±32 All modes turned on

(4.81%) (12.89%) (7.20%) Average: (1.69%) (5.58%) (66.26%) Average: 44.02 (4.67%) 121.83 (12.93%) 68.72 (7.29%) 942.18 44.13 (1.69%) 148.10 (5.66%) 1719.91 (65.76%) 2615.36 ducks take off 43.94 (4.95%) 114.01 (12.85%) 63.00 (7.10%) 887.47 43.55 (1.69%) 141.39 (5.49%) 1719.91 (66.75%) 2573.35 crowd run

16x16, 16x8, 8x16, 8x8

(7.53%) (4.55%) (1.56%) Average: (1.99%) (1.39%) 73.64% Average: 43.66 (7.41%) 26.94 (4.57%) 8.95 (1.52%) 589.05 43.65 (1.98%) 30.65 (1.39%) 1617.56 (73.41%) 2203.40 ducks take off 43.75 (7.66%) 25.87 (4.53%) 9.16 (1.6%) 571.48 43.75 (2.00%) 30.53 (1.40%) 1613.98 (73.87%) 2185.02 crowd run Intepolation time (s) Sub ME time (s) Integer ME time (s) Total time (s) Intepolation time (s) Sub ME time (s) Integer ME time (s) Total time (s)

16x16 only EPZS (Extended diamond pattern) SAD reuse algorithm

SLIDE 17

17

Siu_...OnVideoTranscoding 33

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Key Technologies:

(1) Inter/Intra mode Decision (I,P, SKIP, etc. by block with fixed location, or simple majority) (2) Inter-block modes re-decision (16x16, 16x8, …4x4) (a) Natural reduction, using majority etc. (b) Further mode selection with better quality, such as refinement (3) Intra Prediction Modes (differential code, vertical, horizontal,…, diagonal.. Prediction) (5) Interpolations for sub-pixel interpolation: integer decimation (6) Motion Vector Re-estimation (a) MV reuse using original MV (as far as possible) (b) MV reuse using, mean, median, align to the best, align to the worst, weighed residual error signal,.. (c) MV refinement using temporal and spatial records (d) MV reuse and residual error signal reuse (d) sub-pixel motion re-estimation

(6) Video Down Sizing and Interpolation (a) Interpolations for downsizing 21, 32, 1M/ N, fixed ration interpolation, (b) and then variable ratio interpolation (c) Quality interpolation, for example edge-preserving interpolation. (7) Transform dom ain video transcoding Etc.

Video transcoding Technologies:

Mode Decision

Siu_...OnVideoTranscoding 34

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Samples of Experimential Results

Table 2 shows the results of our realization of the transcoding results using the H.264 JM12.2 and using our fast approaches for converting the Crowd Run of size 1280x720 to 2/3 of this size. It is seen that there is a substantial reduction in computation time for motion estimation, mode decision, and etc. and a speedup of 2.6 time is achieved. Table 2: Comparison of results using JM12.2 and our fast approach

284.32s (2.68X) 763.07s Total time 211.01s (74.14%) 210.80s (27.63%) Others 17.54s (6.17%) 131.78s (17.27%) Intra prediction 5.23s (1.84%) 39.28s (5.15%) Other ME time 30.25s (10.64%) 227.23s (29.78%) Sub-pel ME 20.50s (7.21%) 153.98s (20.18%) Integer-pel ME After fast algorithms JM 12.2

SLIDE 18

18

Siu_...OnVideoTranscoding 35

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Transcoder Demonstration

Transcoder Demonstration On video Coding for HDTV using H.2 6 4 Standard To convert HD ( 1 9 2 0 x 1 0 8 0 ) form at to SD ( 1 2 8 0 x 7 2 0 ) form at ( Real-tim e dem onstration has bee done, but here is a reduced version, due the speed constraint of the Labtop com puter.)

Siu_...OnVideoTranscoding 36

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Transcoder Demonstration

Full Decode + Downsize + Full

Encode

Transcoding

– (Mode Re-Decision + MV Refinement)

Full 1 Full 1 Trans 1 Trans 1 Full 2 Full 2 Trans 2 Trans 2

SLIDE 19

19

Siu_...OnVideoTranscoding 37

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Related Publication (technologies used)

Transcoding:

Zhaoguang Liu and W.C. Siu, ‘A Downsizing Video Transcoder Based on H.264’, Progress Report, PhD/Transcoder research report, November 2006, EIE, The Hong Kong Polytechnic University. K.T. Fung and W.C. Siu, ‘Diversity and Importance Measures for Video Downscaling’, Proceedings, pp.1061-4, Vol.2, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’2005), March 2005, Philadelphia, USA. Kai-Tat Fung and Wan-Chi Siu, ‘DCT-based Video Downscaling Transcoder using Split and Merge Technique’ , pp.394-403, Vol.15, No.2, February 2006, IEEE Transactions on Image Processing, USA

Motion Estimation:

Ying Zhang, Wan-Chi Siu and Tingzhi Shen, ‘Yet a Faster Motion Estimation Algorithm with Directional Search Strategies’, pp.475-478, Proceedings, 15th International Conference on Digital Signal Processing (DSP’2007), July 2007, Cardiff, UK Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan, ‘New Adaptive Partial Distortion Search using Clustered Pixel Matching Error Characteristic’, pp.597-607, Vol.14, No.5, May 2005, IEEE Transactions on Image Processing M.Y. Chiu and W.C. Siu, New Results on Exhaustive Search Algorithm for Motion Estimation using Adaptive Partial Distortion Search and Successive Elimination Algorithm’, Proceedings, pp.3978-81, IEEE International Symposium on Circuits and Systems (ISCAS’2006), May 2006, Island of Kos, Greece Liangming Ji and Wan-Chi Siu, ‘Reduced Computation using Adaptive Search Window Size for H.264 Multi-frame Motion Estimation’, Paper 1568982117, pp.1-5, Proceedings, 14th European Signal Processing conference (EUSIPCO’2006), September 2006, Florence Italy Siu_...OnVideoTranscoding 38

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

4. Extending to Video Amplificaitons and Super-

resolution videos

With the development of visual communication and image processing, there is a high demand for high-resolution images such as video surveillance, remote sensing, medical imaging, HDTV and other entertainment applications. However, image resolution depends on the physical characteristics of the imaging devices. It is sometimes difficult to improve the image resolution by using better sensors because of the high cost or hardware physical limits. Super-resolution (SR) image reconstruction is a promising technique to increase the resolution of an image or sequence

f images beyond the resolving power of the imaging system.

SLIDE 20

20

Siu_...OnVideoTranscoding 39

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

A SR video may also require to be re-encoded for various reasons, including

(i) to allow standard devices to view the SR videos without using additional conversion devices, (ii) to save SR video reconstruction time since the computing power of the viewing devices may not be sufficient and (iii) to avoid the unavailability of the SR video package at the viewing site. It is also true that broadcasting companies are looking for good technologies to covert videos between formats, different resolutions, and frame/bit rates. It is particularly difficult to do up-conversion of a compressed video, say for example from SDTV to HDTV, due to the missing data and blurring effect of edges by simple interpolation. Furthermore re-encoding of these SR videos is required in many practical situations, since contents providers often have to standardize various video clips for uniform storage or transmission.

Siu_...OnVideoTranscoding 40

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

(1) Read and Decode one frame (2) Parameter Extraction (i) MV, Modes, etc. (ii) No. of zero coefficients (iii) Residual errors Video Interpolation Possible Output Yes Yes No No Exit Initialization Check MB/Slice type (mode type decision) Re-encoding Process Intra MB Inter MB End of Bit stream? Re-encoding? Full Mode Decision? Predict the Mode from

riginal parameters

Find the Best Mode ME by Full Search? Find the Best MV Predict the MV from original parameters Encode MB Full Encoding? Find the Best Prediction Mode Predict the Mode from original parameters Encode MB Frame end? Yes No No Yes No Frame end? No Write Bitstream Deblock Picture No Yes Yes Yes

Figure 5: Architecture of Transcoding Platform (Video Enlargement)

SLIDE 21

21

Siu_...OnVideoTranscoding 41

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

In the previous years, most researchers, including us, just concentrated on downward conversion. Recently it is clear to us that there is a great need to develop techniques for upward conversion, including image/video up-sizing, frame interpolation, and super- video coding. This is a challenging topic but difficult one. Some works have been done by few researchers, but many technologies are still unavailable or pre-mature. Hence this forms a fruitful direction for further research.

Siu_...OnVideoTranscoding 42

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

We have built an architecture which allows us to re-encode the SR video for either storage or transmission. We fully utilized the decoded data, statistics and parameters available from the previously encoded LR video to facilitate the super-resolution conversion. As shown in fig.5, a model has to be built for this

investigation. The H.264 is our codec kernel.

The model consists of three parts

(a) “encoded bit-stream” decoding, (b) video interpolation and (c) re-encoding.

We opt for a simple frame work as shown in fig.5, while many fundamental technologies are desperately needed for its practical realization.

SLIDE 22

22

Siu_...OnVideoTranscoding 43

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

(i) The interpolation is done initially within the decoded LR video frame without considering information from the temporal direction. The re-encoding part is done simply using the H.264 encoder, which requires relatively long encoding time. Fig.6 shows the result of a preliminary test on converting the “Rush Hour” sequence from the SD(1280x720) format to HD(1920x1080) format in the high profile of the H.264. The upper curve shows the quality and bit-rate of using fully decoding and encoding, with simple linear interpolation for magnification. The lowest curve shows the production of the compressed HR video by the simplest and quickest approach. In this approach we made use of the decoded motion vectors, decoded prediction modes, decoded mode sizes,

etc. of the LR frame for the re-coding. This is done by some default
arrangements. No motion estimation, no mode decision, etc. were required.

It is about three times or more faster than that using the fully re-encoding mode, but it suffers from low PSNR and high bit rate. The middle curve shows a hypothetically case. This gives the best possible result that can be achieved if we do not perform full motion estimation, mode decision, etc. while the best parameters (MV, modes, etc.) were picked from the list of parameters decoded from the LR frame. This forms the target for fast algorithm development.

Siu_...OnVideoTranscoding 44

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Figure 6: Video Enlargement

Rush Hour 37.5 38.5 39.5 40.5 41.5 42.5 43.5 44.5 45.5 1000 3000 5000 7000 9000 Bitrate (kbps) PSNR (dB) Full Speed Target

SLIDE 23

23

Siu_...OnVideoTranscoding 45

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

(ii) A key part of this work is to design fast and accurate algorithms for obtaining encoding modes, motion vectors, or even transform coefficients without going through the heavy computational processes. The process is surprisingly close to downsize

transcoding. We have to do

(1) inter/inter mode re-decision, (2) intra mode re-decision (16x16 or 4x4), (3) inter mode re-prediction, (4) motion vector re-estimation, etc. The data and parameters available in originally encoded LR video are used to formulate the fast algorithms.

Siu_...OnVideoTranscoding 46

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

(ii) The following strategies are used. (a) Higher weights should to be given to parameters with larger areas. (b) All modes/MV (from LR frames) with the areas of LR blocks overlapping with the SR block should have a good priority to be checked. (c) The number of zero coefficients should be able to reflect the motion activities of the block. (d) Treat cases with different QP differently. (e) Refinement are made according to models built.

SLIDE 24

24

Siu_...OnVideoTranscoding 47

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

A: Interpolation Techniques: In order to remove the burring effect, edge enhancement is one of the best way to improve the quality of a super- resolution image/video sequence. We propose an improved edge directed interpolation method by removing the accumulated interpolation error, and reducing correlation structure miss-match problem. Let us recall the transfer function of a Wiener filter, where Y(k) is the predicted value, (k)’s are the linear prediction coefficients and x(n)’s are known samples. By optimizing the mean square error, MSE (=E[e2(n)] ), we can come up with an equation for finding the coefficients of the Wiener filter for the interpolation, rdx = Rxx (1) where rdx (=E[x(n)x(n-i)]) is a cross-correlation function and Rxx (=E[x(n-k)x(n-i)]) is an autocorrelation function.



 

  ) ( ) ( ) (

n

n k x n k Y 

Siu_...OnVideoTranscoding 48

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

The New Edge-Directed Interpolation (NEDI)[14] scheme is to model a natural image as a second-order locally stationary Gaussian process which allows the interpolation using a simple linear prediction. The covariance of the image pixels in a local block (training window) can be used to obtain the prediction coefficients of the estimation problem. Consider the interpolation of an image X to a high-resolution image Y.

32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (a) 32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (b)

Figure 7: New Edge-Directed Interpolation (NEDI)

To minimize the distance between estimated unknown pixel to its real postion

SLIDE 25

25

Siu_...OnVideoTranscoding 49

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

In fig.7, the numbers are used to represent the locations of the original low resolution pixel points. The solid point, entitled as yi, as shown in fig.7(a) is a high resolution point to be interpolated from four neighbor low-resolution pixels {x18, x19,x26, x27}. In order to have the simplest formulations, one-D representation has been used as far as possible for explanation. The predicted pixel becomes, From eqn.1, we have  = R-1

xx rdx

(2)

) 1 ( '

  

 

pixels g surroundin selected i i i i i

x x y  

Siu_...OnVideoTranscoding 50

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

The computation of rdx (cross-correlation between yi and it’s interpolating points) and Rxx (the auto-correlation among interpolating points) would require knowledge of statistics of yi with its neighbors which are not available before the interpolation. This difficulty is overcome by the “geometric duality” property, as illustrated fig.7(b). The correlations between yi in the high resolution domain and its neighbors points, 18, 19, 26 and27 are replaced by the correlations of four sets of sample (training) points as enclosed by dotted lines as shown in fig.7(b).

) 1 ( '

  

 

pixels g surroundin selected i i i i i

x x y    = R-1

xx rdx

(2) 32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (a) 32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (b)

SLIDE 26

26

Siu_...OnVideoTranscoding 51

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

For example the statistics are available for interpolating point 18 from its neighbors, points 9, 11, 25 and 27 in the low resolution (LR) domain. Hence we can write and where elements of y are the training points and the row of C are the set of respective points to interpolate elements of y. In this case we have rdx = CTy and Rxx = CTC.

            

27 26 19 18

x x x x y             

36 34 20 18 35 33 19 17 28 26 12 10 27 25 11 9

x x x x x x x x x x x x x x x x C

32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (a) 32 33 34 35 36 37 24 25 26 27 28 29 16 17 18 19 20 21 8 9 10 11 12 13 1 2 3 4 5 (b)

Siu_...OnVideoTranscoding 52

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

To interpolate a point between two vertical LR pixels (2nd step), the same procedure is used with a rotation by an angle /4 as shown in figs.8(a) and (b). In fig.8, circles represent LR pixels and grey dots represent the interpolated points in the 1st step (fig.7) and small black dots represent HR points to be interpolated. To save computation, the NEDI adopted a hybrid approach, this correlation based interpolation is applied to edge pixels only and bilinear interpolation is applied to non-edge pixels (i.e. pixels in smooth regions).

SLIDE 27

27

Siu_...OnVideoTranscoding 53

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

However, the NEDI suffers from the prediction error propagation problem which limits the performance of the algorithm. NEDI is a two-step interpolation scheme, where the first step makes use of the original pixels for interpolation, whilst the second step makes use of the interpolation results obtained from the first step, i.e. gray pixels in Fig.8 to obtain the interpolation pixel (the small black dot). The interpolation error in the first step will be propagated to the second interpolation step, and thus causes the interpolation error propagation problem. At the same time, NEDI also suffers from covariance structure miss-match problem. The span of pixels does not represent the best coverage in the HR domain.

Siu_...OnVideoTranscoding 54

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Figure 8:Modified 2nd step, (a) Interpolation problem, (b) original training set, (c) and (d) proposed training sets.

(a) (b) (c) (d)

SLIDE 28

28

Siu_...OnVideoTranscoding 55

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Hence a different set of pixels could give a better interpolation of the edges. We resolve the problem by suggesting a new version. The first step is the same as before. In the second step, we propose to interpolate the unknown pixels by a sixth-order linear prediction with a training window as shown in figs.8(c) and (d) by using points on the original LR domain only. This completely eliminates the error propagation problem. To reduce the covariance miss-match problem, we may use multiple low-resolution training window candidates, i.e. a scheme to choose one from more than one low-resolution training windows to represent the covariance of the high- resolution block to perform the linear prediction, as shown in fig.8(d).

Siu_...OnVideoTranscoding 56

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008 j i

Figure 5: Suggested Enhanced NEDI.

References:

1. X. Li and M. Orchard, “New Edge-Directed Interpolation”, IEEE Trans. On Image

Processing, vol. 10, no. 10, October 2001, pp. 1521 – 1527

2. W.S. Tam, C.W. Kowk and W.C. Siu, “A Modified Edge Directed Interpolation for Images”, Paper submitted to

IEEE Transactions on Image.

3. ….

Statistics to be used

SLIDE 29

29

Siu_...OnVideoTranscoding 57

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Figs.9 and 10 show that results of our approach on interpolation for the enlargement of an image and simulated SR video reconstruction. The reader may note the bar and connection parts above the wheel of fig.10, which look more smooth and sharper. The effect is more effective if we use some further level of amplifications. (a) Using original Step 2 (b) Using new Step 2 Figure 9: Preliminary results of the proposed new approach for edge enhancement

Siu_...OnVideoTranscoding 58

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Figure 10: SR video by simulation, Top right: original video frame, bottom left: by linear interpolation by Intel lib, bottom right: SR video with accurate MVs.

SLIDE 30

30

Siu_...OnVideoTranscoding 59

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Experimental Results – Original Images

Jet Plane Bicycle

Siu_...OnVideoTranscoding 60

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Experimental Results – test image: Jet Plane

Bilinear (PSNR=28.98dB) NEDI (PSNR=32.47dB) MEDI (PSNR=32.34dB) Final Image Error Image* * Intensity is scaled between range 0 to 255

SLIDE 31

31

Siu_...OnVideoTranscoding 61

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Experimental Results – test image: bicycle

Bilinear (PSNR=18.68dB) NEDI (PSNR=20.89dB) MEDI (Our) (PSNR=20.67dB) Final Image Error Image* * Intensity is scaled between range 0 to 255

Siu_...OnVideoTranscoding 62

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

B. Super-Resolution Video: Since the interpolation

from a frame to form an enlarged frame is restricted by the resolution and information available from the original image, it is very natural to use more frames (both in temporal and spatial domains) to construct the enlarge frame. An enlarged frame obtained from more then one orginal frame is defined as a super-resolution farme (video) in this paper. This can be achieved by both non-iterative and iterative approach. Due to the limitation in space, let us not to discuss the details of our approach, but code some experimental as shown in fig.9. Interested reader may refer to the literature for further information.

SLIDE 32

32

Siu_...OnVideoTranscoding 63

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Super-resolution Images/Videos

Super-resolution (SR) image/video reconstruction (SR) is a promising technique to increase the resolution of an image

r sequence of images (video) beyond the resolving power of

the imaging system. Modified definition of Transcoding: A process of converting a previously compressed video bitstream into a bit stream of different nature or lower/higer bit-rate.

Siu_...OnVideoTranscoding 64

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Figure1: Video Interpolation

SLIDE 33

33

Siu_...OnVideoTranscoding 65

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Reasons for Re-encoding:

A SR video may also require to be re-encoded for various reasons, including (i) to allow standard devices to view the SR videos without using additional conversion devices, (ii) to save SR video reconstruction time since the computing power of the viewing devices may not be sufficient and (iii) to avoid the unavailability of the SR video package at the viewing site.

Siu_...OnVideoTranscoding 66

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Reasons for Re-encoding (con’t):

It is also true that broadcasting companies are looking for good technologies to covert videos between formats, different resolutions, and frame/bit rates. It is particularly difficult to do up-conversion of a compressed video, say for example from SDTV to HDTV, due to the missing data and blurring effect of edges by simple interpolation. Furthermore re-encoding of these SR videos are required in many practical situations, since contents providers often have to standardize various video clips for uniform storage or transmission. The viewer side may not have the conversion module or computing power for real- time SR reconstruction.

Go to Demonstration

SLIDE 34

34

Siu_...OnVideoTranscoding 67

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Key Technologies on the study of in Super-resolution (SR) videos:

1. Image Interpolation, hence
2. Video Interpolation
blurring effiect
aliasing effect
edge enhancement techniques (new technique is available)
iterative and non-iterative approaches (non-itera.. is simple)
noise reduction technique (new technique is available)
3. Spatial domain super-resolution videos using multi-images

4.Temporal domain super-resolution videos using temporal features (- using further information from video frames)

5. SR video from encoded video frames
6. SR video re-encoding using lower-resolution compressed

videos

(general kernel suggested here, further technologies required)

Siu_...OnVideoTranscoding 68

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Block diagram for re-encoding SR Videos

First, the transcoder decodes the incoming bitstream in the pixel domain. Second, the decoded video frame magnified with super-resolution techniques, and then re-encoded at the desired frame-rate.

Transcoder

Front-encoder decoder Magnification & re-encoding End-decoder

The incoming bitstream is decoded into the pixel domain The decoded video frame is magnified and subsequently re- encoded at the desired frame rate.

SLIDE 35

35

Siu_...OnVideoTranscoding 69

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Framework of the Architecture of Super- Resolution Transcoding

Read and Decode

ne frame

P-MB

Check frame type

I-MB E.g. for H.2 6 4 -High profile: from SD to HD

For a macroblock: 1.To determine its mode type: Intra or inter

2. To determine its prediction mode for intra-mode
3. To determine its mode (VBS) for inter-mode
4. Check if skip mode to be used
5. Motion re-estimation.

(1) Read and Decode one frame (2) Parameter Extraction (i) MV, Modes, etc. (ii) No. of zero coefficients (iii) Residual errors

Form SR Video

Possible Output Yes Yes No No Exit Initialization Check mode Re-encoding Process Intra MB Inter MB End of Bit stream? Re-encoding? Full Mode? Decision? Predict the Mode from

riginal parameters

Find the Best Mode ME by Full Search? Find the Best MV Predict the MV from

riginal parameters

Encode MB Full mode? Find the Best Prediction Mode Predict the Mode from

riginal

parameters Encode MB Frame end? Yes No No Yes No Frame end? No Write Bitstream Deblock Picture No Yes Yes Yes

Ideal Case Basis

Siu_...OnVideoTranscoding 70

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

SLIDE 36

36

Siu_...OnVideoTranscoding 71

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008 Siu_...OnVideoTranscoding 72

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

HDTV: Video Upsizing with SR techniques

Transcoding from SD and HD form ats (i) Compressed video decoding: also decoding the original motion vector, modes, residual error value and statistics available.) (ii) Super-resolution video formation: mosaicing using multi-frames from the video Simple linear interpolation (for missing point) Edge enhancement (edge detection and edge-directed interpolation, etc.) Noise removal and/or deblocking, etc. (ii) SR video Re-encoding: (1) inter/intra re-decision (2) intra mode re-decision (I16x16 or I4x4) (3) inter mode re-prediction (4) motion vector re-estimation etc. Details

SLIDE 37

37

Siu_...OnVideoTranscoding 73

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Super-Resolution Video Kernel Demonstration:

Fully Decode + Upsize + Fully Encode
Transcoding

– (Fully Decode + Upsize +Fast Encoding using Mode Re-Decision + MV Refinement + etc.) Full 1 Full 1 Trans 1 Trans 1 Full 2 Full 2 Trans 2 Trans 2

Converting HDTV video w ith H.2 6 4 Standard from To convert SD ( 1 2 8 0 x 7 2 0 ) form at To HD ( 1 9 2 0 x 1 0 8 0 ) form at

Siu_...OnVideoTranscoding 74

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Ideal Super Ideal Super-

resolution Video:

resolution Video:

Original Frame

Simulated four LR images: LR0, LR1, LR2, LR3 with size of 172*144

SLIDE 38

38

Siu_...OnVideoTranscoding 75

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Original Frame

Left: Linear Interpolation by Intel Library Right: SR video with accurate MVs

Super Resolution Demo

Siu_...OnVideoTranscoding 76

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Super Resolution Demo

SLIDE 39

39

Siu_...OnVideoTranscoding 77

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

Conclusion:

1. We started with the Hybrid Video Coding model, and gradually moved on to Object-oriented Video Coding. 2. The Advanced Video Coding (H.264) includes almost no new concepts, except that it fine trims existing techniques in a systematic way to optimize the coding efficiency. This can reduce the bitrate to half of that of the MPEG-2 standard. 3. Can we squeeze further that the bitrate be improved by one more time? Some researchers go back to the object oriented coding, whilst others continue with the optimization or move to other sophisticated applications, such as multi-view video coding or advanced scalable coding. 4. Motion Estimation is an important topic. We have done much work on it, but did not talk too much about it in this presentation. Can we have a Fast ME algorithm which gives better quality as compared with Exhaustive Full Search Algorithm?

Siu_...OnVideoTranscoding 78

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

6. We then talked about transcoding, which is a process to convert an encoded video

from one format to another format. Our work involves both (i) heterogeneous transcoding, such as from H.263 to H.264 and (ii) homogeneous transcoding, such as from H.264 to H.264.

H.264 to H.264 transcoding: in the high profile, from SD to HD (Not difficult, but good quality is difficult.) from HD to SD (why?) Would pixel interpolation be important? Mode Type re-decision: Intra mode Inter mode (including skip mode, etc.) Intra Mode: 4x4 or 16x16? What is the prediction mode? Inter Mode: Mode decision (horizontal, vertical, …) Motion vector re-prediction

7. We then talk about the significance of super-resolution video and some of it

related techniques: Simple interpolation, and the new edge-directed interpolation and

ur modified edge-directed interpolation ,.., also a complete Kernel

structure.

8. A brief highlight of our work being carried out has also been given. These

include (i) to look for practical ways to form SR videos and (ii) the re-encoding of SR videos Centre for Signal Processing

More Details?

SLIDE 40

40

Siu_...OnVideoTranscoding 79

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

The End:

thank you!

Siu_...OnVideoTranscoding 80

The Hong Kong Polytechnic University Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director

8 June, 2008

(1) Read and Decode one frame (2) Parameter Extraction (i) MV, Modes, etc. (ii) No. of zero coefficients (iii) Residual errors Video Interpolation Possible Output Yes Yes No No Exit Initialization Check MB/Slice type (mode type decision) Re-encoding Process Intra MB Inter MB End of Bit stream? Re-encoding? Full Mode Decision? Predict the Mode from

riginal parameters

Find the Best Mode ME by Full Search? Find the Best MV Predict the MV from original parameters Encode MB Full Encoding? Find the Best Prediction Mode Predict the Mode from original parameters Encode MB Frame end? Yes No No Yes No Frame end? No Write Bitstream Deblock Picture No Yes Yes Yes

Figure 5: Architecture of Transcoding Platform (Video Enlargement)

Interpolation

A