h 264 avc standard h 264 avc standard
play

H.264/AVC Standard H.264/AVC Standard 1 History History - PowerPoint PPT Presentation

H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings compared to MPEG-2 High quality video at both low and high bit rates: 64kbps to 240Mbps Network-friendly: more error resilient


  1. Intra- -prediction prediction Intra � Motivation: intra-frames are natural images, so they exhibit strong spatial correlation � Macro-blocks in intra-coded frames are predicted based on previously-coded ones � Above and/or to the left of the current block � The macro-block may be divided into 16, 4x4 sub-blocks which are predicted in cascading fashion � 9 modes for 4x4 and 4 modes for 16x16 size 28

  2. Intra- -prediction (cont prediction (cont’ ’d) d) Intra Coder � Directional spatial prediction Input Video Control (9 types for luma, 4 chroma) Control Signal Data Q A B C D E F G H Transform/ Quant. I a b c d Scal./Quant. - Transf. coeffs J e f g h Decoder Scaling & Inv. K i j k l Split into Transform L m n o p Macroblocks M 16x16 pixels Entropy N Coding 0 O De-blocking 7 P Filter 2 Intra-frame 8 Prediction Output 4 3 Motion- Video 6 1 5 e.g., Mode 3: Compensation Signal Intra/Inter diagonal down/right prediction a, f, k, p are predicted by Motion (A + 2Q + I + 2) >> 2 Data Motion 29 Estimation

  3. Luma 4x x4 Intramodes 30

  4. Luma 4x x4 Intramodes � d = round (B/4 + C/2 + D/4) 31

  5. Luma 16x x16 Intramodes 32

  6. Intra4x4- -Prediction Ex. Prediction Ex. - - Vertical Vertical Intra4x4 33

  7. Intra4x4- -Prediction Ex. Prediction Ex.- - Horizontal Horizontal Intra4x4 34

  8. Intra4x4- -Prediction Ex. Prediction Ex.- - DC DC Intra4x4 35

  9. Intra4x4- -Prediction Ex. Prediction Ex.– – Diagonal Diagonal Intra4x4 Down- -Right Right Down 36

  10. Optimal Intra4x4 Mode Selection Optimal Intra4x4 Mode Selection � Select the mode with the best R-D tradeoff. � Full search method: Divide each MB into sixteen 4x4 blocks. For each 4x4 block: For each of the nine lntra_4x4 prediction modes: � Predict the current 4x4 block by the current mode. 37 G t di ti id l

  11. Intra_16x16 Prediction Intra_16x16 Prediction � Intra_16x16 prediction (4 modes) � Predict the entire 16 x 16 block � Suitable for smooth areas � Four modes: � 0: Vertical � 1: Horizontal � 2: DC � 3. Plane 38

  12. Optimal Intra16x16 Mode Selection Optimal Intra16x16 Mode Selection Full search method: � For each lntra_l6x16 prediction mode: Get prediction of the current MB. � Find the prediction residual. � Perform 2D 4-point Hadamard transform for each 4x4 block. � Extract all the DC from the sixteen 4x4 blocks and apply 2D 4-point � Hadamard transform to the 4x4 DC again. Cost estimation: Compute the absolute value of all the Hadamard transform � coefficients. end Find the mode with the smallest cost as the best Intra_16x16 prediction mode for this MB. Decision between Intra_4x4 and Intra_16x16: � Compare the costs of Intra_4x4 mode and Intra_16x16 mode to find the � best mode. 39

  13. Motion Estimation (ME) Motion Estimation (ME) For each block, find the best match in the previous frame (reference � frame) Upper-left corner of the block being encoded: (x0, y0) � Upper-left corner of the matched block in the reference frame: (x1, y1) � Motion vector (dx, dy): the offset of the two blocks: � (dx, dy) = (x1 – x0, y1 – y0) � (x0, y0) + (dx, dy) = (x1, y1) � Motion vector need to be sent to the decoder. � 40

  14. Motion Compensation (MC) Motion Compensation (MC) � Given reference frame and the motion vector, can obtain a prediction of the current frame � Prediction error: Difference between the current frame and the prediction. � The prediction error will be coded by DCT, quantization, and entropy coding. 41

  15. GOP, I, P, and B Frames GOP, I, P, and B Frames GOP: Group of pictures (frames). � I frames (Key frames): � Intra-coded frame, coded as a still image. Can be decoded directly. � Used for GOP head, or at scene changes. � I frames also improve the error resilience. � P frames: (Inter-coded frames) � Predication-based coding, based on previous frames. � 42

  16. GOP, I, P, and B Frames GOP, I, P, and B Frames � B frames: Bi-directional interpolated prediction frames � Predicted from both the previous frame and the next frame: more flexibilities -> better prediction. � Encoding order: 1 4 2 3 7 5 6 � Decoding order: 1 4 2 3 7 5 6 � Display order: 1 2 3 4 5 6 7 � Need more buffers � Need buffer manipulations to display the correct order. 43

  17. Block Matching Algorithms for ME Block Matching Algorithms for ME Each frame splits into 16x16 pel blocks (MB), motion estimation � will be done for each macro-block. Search windows (Maximum movement): w: typically 8, 16 or 32 � Defining a cost for finding the best match for each block in � previous frame Mean Absolute Error (MAE) or sum Absolute Difference (SAD) � Mean Square Error (MSE) � Sum of the Squared Error (SSE) � Motion vector (MV) calculation between current block and its � counterpart in previous frame Calculating macro block differences and sending it � 44

  18. Cost Function Cost Function � The best match is found by minimizing the SAD (sum Absolute Difference) function that is computed as: 16 , 16 ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 � Where s being the original video signal and c being the coded video signal 45

  19. Motion Estimation in H.264 Motion Estimation in H.264 What is new? � � Variable Block size Motion Estimation, Can yield 15% bit rate savings � � Multiple reference frame Motion Estimation, � 5-20% bit rate savings � Sub Pixel Motion Estimation, � 20% bit rate savings over integer ME 46

  20. Search Window Search Window � Search Window (in previous frame) � Rectangle with the same coordinate as current block in current frame, extended by w pixels in each directions q+2w w p+2w q w w p w 47

  21. Cost Function Cost Function � The best match is found by minimizing the SAD (sum Absolute Difference) function that is computed as : 16 , 16 ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 � Where s being the original video signal and c being the coded video signal 48

  22. Full Search Method Full Search Method Full Search � All candidates within search window � are examined (2w+1) 2 positions should be � examined Advantage: Good accuracy, Finds � best match Disadvantage: Large amount of � computation, (2w+1) 2 matches, 16x16 MAE for each match that is Impractical for real-time applications In order to avoid this complexity, we � should reduce search points so we have to use Fast Block Matching Algorithms 49

  23. Initial Search Point Prediction Initial Search Point Prediction A median predictor is used for defining the initial search point � That is the median value of the motion vectors of three spatially � adjacent blocks: left, top and top-right (top-left) of the current block. ( ) = mv _ pred ( pred _ x , pred _ y ) median mv _ A , mv _ B , mv _ C If C not exist then C=D � D B C If B, C not exist then prediction = MV_A � A If A, C not exist then prediction = MV_B � If A, B not exist then prediction = MV_C E � Otherwise Prediction = median(MV_A,MV_B,MV_C) � 50

  24. 2- -D Logarithmic Search (TDL) D Logarithmic Search (TDL) 2 Examine central point & its � four surroundings 2 1 Distance from center: w/2 � Find best match � If the best match is not in the � center examine three new 2 1 1 1 points centering previous best Half the distance, continue � 3 3 3 until the distance is 1, use all 9 matches, find best. Stop 3 2 3 1 Here the maximum search � points is: 2 + 7 log w 3 3 3 51

  25. Three Step Search (TSS) Three Step Search (TSS) 1. check nine search points � 2. Step size is reduced by half � after each step. 1 1 1 3. At the end of the search the � step size is one pel. Repeat algorithm 3 times � Examines 25 points � 1 1 1 Number of search points: 1 + � 8 log w 2 2 2 Advantage: simple and regular � 3 3 3 structure, good for HW 3 2 3 1 2 1 1 implementation 3 3 3 Disadvantage: a uniformly � allocated checking point that 2 2 2 makes it inefficient for small motion. 52

  26. Diamond Search (DS) Diamond Search (DS) � Experimental results show that: 53% to 98% of the motion � vectors are enclosed in a circular area with a radium of 2 pels and centered on the position of zero motion. � The block displacement of real-world video sequences is mainly in horizontal and vertical directions. � the search points incurred within the circle with a radium of 2 pels. � outperforms the TSS algorithm 53

  27. DS Algorithm DS Algorithm � 1. 9 checking points of LDSP are tested. If the minimum point is located at the center position, go to Step 2; otherwise recursively repeat this step for the best point. � 2. Switch the search pattern from LDSP to SDSP. The minimum point found in the best point. LDSP SDSP 54

  28. DS Algorithm DS Algorithm (b) LDSP->LDSP when minimum is at one of the corner points � (c) LDSP->LDSP when minimum is along the edge of the � diamond (d) LDSP->SDSP when minimum is at the center of the search � pattern. 55

  29. H.264 ME Algorithm (UMHexagonS) H.264 ME Algorithm (UMHexagonS) � 1) Initial search point prediction � 2) Unsymmetrical-cross search � 3) Uneven multi-hexagonal-grid search � 4) Extended hexagonal based search Note that the ME is not a mandatory part, Here just the implemented ME in reference software is described. 56

  30. Initial Search Point Prediction Initial Search Point Prediction � A median predictor is used for defining the initial search point � That is the median value of the motion vectors of three spatially adjacent blocks- left, top and top-right (top-left) of the current block. D B C A E ( ) = mv _ pred ( pred _ x , pred _ y ) median mv _ A , mv _ B , mv _ C 57

  31. Unsymmetrical- -Cross Search Cross Search Unsymmetrical the movement in the horizontal direction is much heavier than � that in the vertical direction- Based on experimental results The distance between search points is chosen to be 2 � The minimum cost MV will be chosen as search center of next � search step 58

  32. Uneven Multi- -Hexagonal Hexagonal- -Grid Search Grid Search Uneven Multi 59

  33. Extended Hexagonal- -Based Search Based Search Extended Hexagonal When previous optimum � MV locates in the outer concentric area, the search result has relatively low accuracy motion vector � refinement by extended hexagonal based search method. 60

  34. Motion Estimation in H.264 Motion Estimation in H.264 � On of the main H. 264 enhancement feature is its motion estimation algorithm What is new? � � Variable Block size Motion Estimation, Can yield 15% bit rate savings � � Multiple reference frame Motion Estimation, � 5-20% bit rate savings � Sub Pixel Motion Estimation, � 20% bit rate savings over integer ME 61

  35. Variable Block Size ME Variable Block Size ME A 16x16 macro block may contain more than one object � In other words: size of moving/stationary objects is variable � The objects may move in different directions, � one motion vector is not enough to describe all objects movement � By defining one MV some part of the object will describe well and the other part � will give a big error. The solution is defining variable block size � The macro block with more details will be coded using a smaller block size � block size partitioning 7 various block size in H.264 62

  36. Variable Block Size ME (Cont’ ’d) d) Variable Block Size ME (Cont Coder Input Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Transform Macroblocks 16x16 pixels Entropy Coding De-blocking 16x16 8x8 16x8 8x16 Filter Intra-frame MB 0 0 1 Prediction 0 0 1 Types 2 3 Output 1 Motion- Video 8x8 8x4 4x8 Compensation Signal 4x4 Intra/Inter 0 1 0 8x8 0 0 1 Motion Types 2 3 1 Data Motion 63 Estimation

  37. Partitions of MB Partitions of MB 64

  38. Variable Block Size ME (Cont’ ’d) d) Variable Block Size ME (Cont � Inter MB can be partitioned into smaller regions for ME: � Up to 16 MVs � MVs are differentially encoded. � Need lots of optimization efforts to decide the best mode: SAD + λ (Q) R � Mode decision: � R-D optimization with Lagrangian method � Also an active research area. 65

  39. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 66

  40. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 67

  41. Variable Block Size ME - - Example Example Variable Block Size ME T=1 T=2 68

  42. Multiple Reference Frames ME Multiple Reference Frames ME In previous standards up to 2 reference frames used for ME � Here, up to five different reference frames can be selected � resulting better subjective video quality and more efficient � coding of the video sequence. might help making the H.264 bit stream error resilient. � 69

  43. Multiple Reference Frames ME Multiple Reference Frames ME � In H.263, the reference frame for prediction is always the previous frame � In MPEG and H.26L, some frames are predicted from both the previous and the next frames (bi-prediction) � In H.264, up to 16 frames may be used as reference: � Encoder and decoder maintain synchronized buffers of available frames (previously decoded) � resulting better subjective video quality and more efficient coding of the video sequence � might help making the H.264 bit stream error resilient 70

  44. Multiple Reference Frames ME Multiple Reference Frames ME Coder Input Video Control Control Signal Data Transform/ Quant. Scal./Quant. - Transf. coeffs Decoder Scaling & Inv. Split into Transform Macroblocks 16x16 pixels Entropy Coding De-blocking Filter Intra-frame Prediction Output Motion- Video Compensation Signal Intra/Inter Multiple Reference Frames for Motion Data Motion Compensation Motion 71 Estimation

  45. Subpixel Motion Estimation Motion Estimation Subpixel When an object has a sub-pixel movement the integer pixel ME can’t � describe it, so sub pixel ME is defined H.263 uses only half pixel and MPEG-4 uses quarter pixel accuracy � A gain of 1.5-2dB across the board over ½-pixel � H.264 uses higher precision of spatial accuracy for ME up to eighth � pixel accuracy 72

  46. Example � b = round [(E – 5F + 20G + 20H – 5I + J)/32] 73

  47. Example (cont’d) � a = round [(G + b)/2] 74

  48. Chroma Motion Vector 75

  49. H. 264 Cost Function H. 264 Cost Function The best match is found by minimizing the cost function: � λ = + λ − J ( m , ) SAD ( s , c ( m )) . R ( m p ) motion motion m=(m x ,m y ) T is the motion vector � p=(p x ,p y ) T is the predicted motion vector � λ motion is the Lagrange multiplier � R(m-p) represents the bits used to encode the motion information � The SAD (sum Absolute Difference) is computed as: � B , B ∑ = − − − SAD ( s , c ( m )) s [ x , y ] c [ x m , y m ] x y = = x 1 , y 1 Where B = 16, 8 or 4 and s being the original video signal and c being the coded video � signal 76

  50. MB Modes MB Modes � A MB can select one of these modes: � Intra_16x16 � Intra_8x8 (not allowed in Baseline) � Intra_4x4 � I_PCM: � enables an encoder to transmit the values of the image samples directly (without prediction or transformation). � Inter_16x16 � Inter_16x8 � Inter_8x16 � Inter_8x8 � SKIP 77

  51. P_SKIP Type P_SKIP Type � For this type, neither a quantized prediction error signal, nor a motion or reference index parameter is transmitted � The reference picture is located at index 0 in the multi-picture buffer � The motion vector is predicted from motion vector predictor It’s used for large are with no change or � constant motion. � Its size is 16x16 78

  52. Mode Decision Method in H.264/AVC Mode Decision Method in H.264/AVC Calculate the RDCost for each Intra mode � Calculate the RDCost for SKIP mode � For each inter mode (16x16, 16x8, 8x16 and 8x8), � For each block in the current mode � Do ME in a search area, select the point that minimizes below equation: � λ = + λ − J ( m , ) SAD ( s , c ( m )) . R ( m p ) � motion motion End � Calculate the RDCost using: � RDCost = Distortion + λ × Rate Note that : � Rate needs doing: Transform, Quantization and entropy coding � Distortion needs doing: Transform, Quantization Transform -1 and Quantization -1 � End � From the calculated RDCosts: � (RDCost_Intra_16x16, RDCost_Intra_4x4, RDCost_I_PCM, RDCost_SKIP, RDCost_Inter_16x16, RDCost_Inter_16x8, RDCost_Inter_8x16 and RDCost_Inter_8x8) select the least one as the best mode. 79

  53. Slice � Each frame can be coded in one or more slices, each containing one (16 x x 16) or all the macroblocks in the frame (1 slice per picture) � The number of macroblocks per slice need not be constant within a picture � Because of minimal inter-dependency between coded slices propagation of error can be limited 80

  54. Slice Coding Slice Coding Slices can have different shapes and sizes � Slices do not have to be consecutive in the raster scan � Each slice is self-contained � Can be decoded without knowing data other slices � Useful for: � Error resilience and concealment � Parallel processing � 81

  55. Slice Type Slice Type � Each slice can be coded as one of 5 types: � I slice: � All MBs are coded using intra mode. � P slice: � A MB can be coded in intra mode or inter mode with at most one prediction signal per block. � B slices: � In addition to modes in P slice, some MBs can also be predicted using two prediction signal per block. � SP slice: Switching-P slice � To facilitate switching between different video streams � SI slice: Switching-I slice Using only Intra prediction � 82

  56. Slice Modes in H.264 Slice Modes in H.264 83

  57. Slice Syntax Slice Syntax 84

  58. Slice Syntax � A macroblock contains coded data corresponding to a 16 x x 16 sample region of a video frame 16 x x 16 for luma and 8 x x 8 for cr, cb 85

  59. Slices � The H.264 encoder intelligently groups MBs into a slice whose size is less than (or equal to) the size of the maximum transportation unit (MTU). � Slices are decoded independently � Prediction beyond the slice boundaries is forbidden to prevent error propagation from intra-frame predictions 86

  60. Arbitrary Slice Order (ASO) The Baseline Profile supports the decoding order of � the slices to be arbitrary. � permits, for example, to reduce decoding delay in case of out-of-order delivery of NAL units. � Application example � reduce end-end transmission delay in RT app 87

  61. Flexible Macroblock Ordering (FMO) � Using FMO, it is no longer required that slices consist of neighboring macroblock. � provide efficient methods for error concealment in error-prone channels � The objective behind the flexible macroblock ordering (FMO) is to scatter possible errors to the whole frame as equally as possible to avoid error accumulation in a limited region . 88

  62. Slice Group � Slice Group : a subset of the macroblocks and may contain one or more slices � In FMO frame is divided to some slice groups. � Each macroblock could be assigned freely to a certain slice group using a MAP function. 89

  63. MAP Function 90

  64. Redundant Coded Picture � Send the duplicated part or all of a coded picture � In normal operation, the decoder reconstructs the frame from ‘primary’ (nonredundant)’ pictures and discards any redundant pictures. � However, if a primary coded picture is damaged (e.g. due to a transmission error), the decoder may replace the damaged area with decoded data from a redundant picture if available. 91

  65. MB Prediction Types MB Prediction Types � Intra: � MB is predicted from the neighboring blocks of the same frame. Intra prediction is performed on 16x16, 4x4 and 8x8 (in FRExt profile) blocks. � Inter: � MB is predicted form the regions in previous (next) frames, using motion estimation. 92

  66. MB Syntax Element MB Syntax Element 93

  67. Transformation and Quantization in H.264 94

  68. Transformation � H.264 uses three transforms � Hadamard transform for the 4 x x 4 array of luma DC coefficients � Hadamard transform for the 2 x x 2 array of chroma DC coefficients � DCT-based transform for all other 4 x x 4 blocks in the residual data 95

  69. Transformation [1] 96

  70. Transformation � Fundamental differences between H.264 transform and DCT � It is an integer transform � It is possible to ensure zero mismatch between encoder and decoder � Can be implemented using only additions and shifts � A scaling multiplication is integrated into the quantizer � Can be carried out using 16-bit integer arithmetic 97

  71. Transformation � DCT [1] 98

  72. Transformation � DCT Approximation [1] 99

  73. Transformation � 4 x x 4 Hadamard Transform � 2 x x 2 Hadamard Transform [1] 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend