Image and Video Coding: Motion Estimation and Coding 4 5 6 B C - - PowerPoint PPT Presentation

image and video coding motion estimation and coding
SMART_READER_LITE
LIVE PREVIEW

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C - - PowerPoint PPT Presentation

Image and Video Coding: Motion Estimation and Coding 4 5 6 B C D 1 D 0 3 7 A current 2 block 1 E 1 E 0 Last Lecture Last Lecture: Hybrid Video Coding Hybrid Video Coding Partitioning of pictures into blocks current picture


slide-1
SLIDE 1

Image and Video Coding: Motion Estimation and Coding

A C D0 D1 B E0 E1

current block

1 2 3 4 5 6 7

slide-2
SLIDE 2

Last Lecture

Last Lecture: Hybrid Video Coding

Hybrid Video Coding Partitioning of pictures into blocks Block-adaptive prediction

intra: ˆ s[ x, y ] = f ( s′[ x, y ], pintra ) inter: ˆ s[ x, y ] = s′

ref[ x + mx, y + my ]

Transform coding of prediction error

u[ x, y ] = s[ x, y ] − ˆ s[ x, y ]

Motion-Compensated Prediction Key technique for utilizing temporal dependencies Parameters that impact coding efficiency

Accuracy of motion vectors Choice of interpolation filters ...

x y displacement vector for current block m = (mx, my) current picture displaced object best matching block in reference picture reconstructed reference picture moving object

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 2 / 38

slide-3
SLIDE 3

Picture Partitioning / Variable Block Sizes

Picture Partitioning for Motion-Compensated Prediction

Effectiveness of Motion-Compensated Prediction Large blocks: Rather inaccurate motion representation, but small rate overhead for motion data Small blocks: More accurate prediction signal, but higher motion data rate Transform Coding of Prediction Error Signal Avoid transform over motion edges (edges in residual signal decrease efficiency of transform coding) Larger blocks for prediction provide more flexibility for transform partitioning

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 3 / 38

slide-4
SLIDE 4

Picture Partitioning / Variable Block Sizes

Variable Block Sizes for Motion-Compensated Prediction

Regions for Motion Description Moving objects can have arbitrary shapes

Variable size regions for motion description

Partitioning has to be signaled in bitstream

Sample-accurate partitionings are too expensive (and difficult to decide in an encoder) Need simple, but effective solutions Blocks of variable sizes

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 4 / 38

slide-5
SLIDE 5

Picture Partitioning / Variable Block Sizes

Simple Partitioning Concepts with Variable Block Sizes

Most Common Adaptive Partitioning Initial partitioning into fixed size blocks Signal-adaptive subdivision of initial blocks Quadtree Partitioning Consecutive splits into four sub-blocks Allows signal-adaptive subdivision Low signaling overhead (one flag per node) Extended Tree Partitioning Multiple split types per node (e.g., quadtree & binary splits) Additional flexibility for subdivision Results in larger blocks (on average)

root

1 1 1 1 1 1

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 5 / 38

slide-6
SLIDE 6

Picture Partitioning / Variable Block Sizes

Motion-Compensated Prediction for Chroma

Multiple Color Components Same motion in luma and chroma components Use same partitioning & same motion vectors Motion Compensation in YCbCr 4:2:0 Chroma blocks have quarter number of samples Chroma motion vectors have double precision

Twice as many phases as in luma Typically: Simpler interpolation filters

luma and chroma interpolation filters of HEVC phase

  • 3
  • 2
  • 1

1 2 3 4 1/4

  • 1

4

  • 10

58 17

  • 5

1 luma 1/2

  • 1

4

  • 11

40 40

  • 11

4

  • 1

3/4 1

  • 5

17 58

  • 10

4

  • 1

1/8

  • 2

58 10

  • 2

1/4

  • 4

54 16

  • 2

3/8

  • 6

46 28

  • 4

chroma 1/2

  • 4

36 36

  • 4

5/8

  • 4

28 46

  • 6

3/4

  • 2

16 54

  • 4

7/8

  • 2

10 58

  • 2

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 6 / 38

slide-7
SLIDE 7

Picture Partitioning / Supported Block Sizes in Video Coding Standards

Picture Partitioning in Early Video Coding Standards

MPEG-2 Video 16×16 macroblocks (decision: intra or inter) One motion vector per macroblock (all components) Transform blocks of 8×8 samples

Macroblock in 4:2:0 format: 6 transform blocks

No adaptivity in partitioning

Y Cb Cr

H.263 and MPEG-4 Visual 16×16 macroblocks and 8×8 transform blocks Two inter coding mode per macroblock

1 One motion vector per macroblock 2 Macroblock is split into four 8×8 blocks,

with one motion vector per 8×8 block

One level of quadtree partitioning

MV0 Inter-16x16 MV0 MV1 MV2 MV3 Inter-8x8

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 7 / 38

slide-8
SLIDE 8

Picture Partitioning / Supported Block Sizes in Video Coding Standards

Block Partitioning in H.264 | AVC

Picture Partitioning 16×16 macroblocks (decision: intra or inter) Motion Partitioning Two levels of macroblock partitioning

1 Four modes for partitioning into sub-macroblocks Inter-16x16, Inter-16x8, Inter-8x16, Inter-8x8 2 8×8 sub-macroblocks can be further subdivided 8x8, 8x4, 4x8, 4x4

Transform Partitioning (High profile) Two modes for partitioning of luma residual

1 Four 8×8 transform blocks not allowed if motion blocks smaller 8×8 are used 2 Sixteen 4×4 transform blocks

Chroma: Only 4×4 transforms

Inter-16x16 Inter-16x8 Inter-8x16 Inter-8x8 8x8 8x4 4x8 4x4 luma 8x8 luma 4x4 chroma

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 8 / 38

slide-9
SLIDE 9

Picture Partitioning / Supported Block Sizes in Video Coding Standards

Block Partitioning in H.265 | HEVC

Partitioning into Coding Units

1 Initial partitioning into Coding Tree Units (CTUs)

Fixed size of 64×64 (typical), 32×32, or 16×16

2 Quadtree partitioning into Coding Units (CUs)

Up to minimum CU size, with (N×N)min ≥ 8×8

Coding Unit: Decision between intra and inter Motion Description (inter coded CUs) Split CU into up to four Prediction Units (PUs) Supports non-square block shapes Prediction Unit: Coding of motion data Transform Coding Quadtree part. of CU into Transform Units (TUs) Minimum supported transform size: 4×4 Transform Unit: Block sizes for transform coding

CTU

  • nly if

N > 8, N = Nmin

N×N N×(N/2) (N/2)×N (N/2)×(N/2)

  • nly if

N > 8

N×(N/4) N×(3N/4) (N/4)×N (3N/4)×N Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 9 / 38

slide-10
SLIDE 10

Picture Partitioning / Supported Block Sizes in Video Coding Standards

Block Partitioning in H.266 | VVC

Initial Partitioning Fixed size Coding Tree Units (CTUs) CTU size of 128×128, 64×64, or 32×32 Partitioning into Coding Units (CUs) Multi-type tree partitioning of CTUs Split types: quad, binary, ternary splits Some restrictions on multi-type tree

No quad split after binary/ternary split CU width and height: Multiple of 4 Avoid redundant split sequences

Motion Description and Transform Coding Motion data are signaled per CU One transform per color component of CU (exception: CU larger than max. transform)

1 4 1 2 1 4

Th

1/4 1/2 1/4

Tv

1/2 1/2

Bh

1/2 1/2

Bv no

1/2 1/2

Q

split codeword no Q 1 1 Bh 1 0 0 1 Bv 1 0 1 1 Th 1 0 0 0 Tc 1 0 1 0 CTU

Q Bh Tv Q Bv Bv Tv Th Bh Q Th Bh no no no no Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 10 / 38

slide-11
SLIDE 11

Picture Partitioning / Coding Order and Encoder Decision

Coding Order of Blocks

Transmission of Block Parameters Map 2D arrangement of blocks into 1D bitstream Require defined coding order Coding Order of CTUs / Macroblocks Raster scan order From top to bottom, from left to right Coding Order of Blocks At each split node (z-scan):

1 From top to bottom 2 From left to right

Properties for Prediction Blocks to the left and above always available Block above right is often available

1 2 3 4 5 6 7 8 9 10 11 12 13 ...

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 11 / 38

slide-12
SLIDE 12

Picture Partitioning / Coding Order and Encoder Decision

Partitioning Selection in Encoder

Lagrangian Mode Decision Selection partitioning with minimum J = D + λ · R

distortion D of reconstructed block number of bits R for sending all coding parameters

Each node: Compare Lagrangian costs

Jnoslpit for non-split block Jsplit,A for split mode A (e.g., quad split) ...

Decision Order Need correct neighborhood for decisions (for predictors) Decide split modes in coding order Evaluate tree in depth first order Rather complex (in particular for multi-type trees): Each sample is coded multiple times

example: two-level quadtree 1 2 3 4 5

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 12 / 38

slide-13
SLIDE 13

Picture Partitioning / Coding Order and Encoder Decision

Fast Partitioning Selection Approaches

Fast Partitioning Algorithms General Idea: Skip evaluation of unlikely partitionings Properties of typical video sequences

Small inter blocks are rather unlikely Many inter blocks have zero prediction error

Basic Fast Partitioning Design Evaluate non-split blocks first Don’t test splits if large block fulfills “quality criterion”

Block has zero residual signal, or All transform coefficients are less than threshold, or ...

Combination of top-down fast decisions with conventional depth-first Lagrangian mode decision

example: two-level quadtree 1 2 4 6 7 3 5

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 13 / 38

slide-14
SLIDE 14

Picture Partitioning / Coding Efficiency

Block Sizes for Motion-Compensated Prediction: Coding Efficiency

30 31 32 33 34 35 36 37 38 39 5 10 15 20 25 30 4×4 8×8 16×16 32×32 64×64 adaptive PSNR (Y) [dB] bit rate [Mbit/s] Cactus (1920×1080, 50 Hz) IPPP, 4×4 transform bit-rate saving of adaptive vs 16×16: 23 % on average 35 36 37 38 39 40 41 42 43 44 1 2 3 4 5 4×4 8×8 16×16 32×32 64×64 adaptive PSNR (Y) [dB] bit rate [Mbit/s] Johnny (1280×720, 60 Hz) IPPP, 4×4 transform bit-rate saving of adaptive vs 32×32: 35 % on average

Coding Experiment (HEVC): IPPP, square blocks only, 4×4 transform Transform coding with 4×4 transform: Exclude impact on residual coding Significant coding gain for variable block sizes (quadtree approach)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 14 / 38

slide-15
SLIDE 15

Picture Partitioning / Coding Efficiency

Block Sizes for Motion-Compensated Prediction: Coding Efficiency

30 31 32 33 34 35 36 37 38 39 5 10 15 20 25 30 4×4 8×8 16×16 32×32 64×64 adaptive PSNR (Y) [dB] bit rate [Mbit/s] Cactus (1920×1080, 50 Hz) IPPP, all transform sizes bit-rate saving of adaptive vs 16×16: 22 % on average 35 36 37 38 39 40 41 42 43 44 1 2 3 4 5 4×4 8×8 16×16 32×32 64×64 adaptive PSNR (Y) [dB] bit rate [Mbit/s] Johnny (1280×720, 60 Hz) IPPP, all transform sizes bit-rate saving of adaptive vs 32×32: 29 % on average

Coding Experiment (HEVC): IPPP, square blocks only, all transform sizes Adaptive transform size improves coding efficiency for large block sizes Significant coding gain for variable block sizes (quadtree approach)

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 15 / 38

slide-16
SLIDE 16

Picture Partitioning / Coding Efficiency

Coding Efficiency Impact of Flexible Block Partitioning

Coding Efficiency Impact of Non-Square Blocks Non-square PUs in HEVC: 2-4 % bit-rate savings MTT in VVC vs quadtree: 9-15 % bit-rate savings Block Sizes in Video Coding Standards Number of options increased from one standard generation to the next Significantly contributes to coding efficiency Large blocks important for high-res. content Coding experiments with HEVC and VTM-1 Coding tools of HEVC Only supported block sizes are modified 40 % bit-rate savings (MTT over fixed size)

impact of supported block sizes bit-rate savings relative to ... MPEG-2 H.263 H.264 H.265 H.263 4 % H.264/AVC 9 % 5 % H.265/HEVC 35 % 32 % 28 % H.266/VVC 40 % 37 % 34 % 9 %

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 16 / 38

slide-17
SLIDE 17

Coding of Motion Parameters

Motion Parameter Coding

Statistical Properties of Motion Vectors Motion in videos caused by object and/or camera motion Region of objects and background often covers multiple blocks Motion parameters of neighboring blocks are often very similar Utilize motion dependencies for efficient coding of motion parameters

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 17 / 38

slide-18
SLIDE 18

Coding of Motion Parameters / Motion Vector Prediction

Predictive Coding of Motion Vectors

Motion Vector Predictor Derive motion vector predictor m = ( mx, my) Use already coded motion vectors in spatial/temporal neighborhood Coding of Motion Vector Differences Entropy coding of motion vector differences (MVDs) ∆mx = mx − mx ∆my = my − my Example: Entropy coding of ∆m in HEVC

Two context-coded bins (abs_greater0_flag, abs_greater1_flag) Exp-Golomb code of order 1 (prefix and suffix are bypass coded) Sign (bypass coded)

MVD coding in HEVC ∆m codeword ±1 10s ±2 1110 s ±3 1111 s ±4 1101 00s ±5 1101 01s ±6 1101 10s ±7 1101 11s ±8 1100 1000 s ±9 1100 1001 s ±10 1100 1010 s ±11 1100 1011 s ±12 1100 1100 s ±13 1100 1101 s ±14 1100 1110 s ±15 1100 1111 s ±16 1100 0100 00s ±17 1100 0100 01s · · · · · ·

Coding efficiency depends on quality of motion vector predictor

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 18 / 38

slide-19
SLIDE 19

Coding of Motion Parameters / Motion Vector Prediction

Motion Vector Prediction

Simple Variant of Motion Vector Prediction Used in MPEG-2 Video Use motion vector of left block as predictor

  • mx = m(l)

x ,

  • my = m(l)

y

Large motion vector differences at object boundaries

m(l) m

Median Prediction Used in H.263, MPEG-4 Visual, H.264|AVC Component-wise median of three neighboring blocks

  • mx = median
  • mA

x , mB x , mC x

  • my = median
  • mA

y , mB y , mC y

  • On average, smaller motion vector differences

A B C current block

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 19 / 38

slide-20
SLIDE 20

Coding of Motion Parameters / Motion Vector Prediction

Switched Motion Vector Prediction

Switch Motion Vector Prediction Idea: Choose most suitable neighboring motion vector Construct ordered list of motion vectors of spatially and temporally neighboring blocks mlist = { mA, mB, mC, · · · } Select best candidate (which minimizes bit rate) Transmit index into candidate list Realization in H.265 | HEVC and H.266 | VVC List of two candidates (obtained by defined search) Index is a single flag Bit-rate saving for differences is typically larger than bit rate required for signaling predictor

A0 A1 B0 B1 B2

current block

T0 T1 co-located area in reference picture

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 20 / 38

slide-21
SLIDE 21

Coding of Motion Parameters / Modes with Inferred Motion Parameters

Inferred Motion Parameters: Temporal Direct Mode

Coding Modes with Inferred Motion Parameters All motion parameters are derived at the decoder side Only the corresponding mode is signaled (low side information) Exploit data of already coded blocks (spatial, temporal neighbors) Suitable for consistently moving regions Temporal Direct Mode Motion parameters for bi-directional prediction Scale motion vector mcol of co-located block in already coded reference picture according to time differences m0 = tc − t0 t1 − t0 · mcol m1 = tc − t1 t1 − t0 · mcol

m1 m0 mcol

time

t0 tc t1

co-located block current block

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 21 / 38

slide-22
SLIDE 22

Coding of Motion Parameters / Modes with Inferred Motion Parameters

Inferred Motion Parameters: Merge Mode

Merge Mode Similar concept as switched motion vector prediction Construct candidate list of motion parameters Select best candidate and transmit index into list No motion vector difference is transmitted Realization in H.265 | HEVC List of up to five spatial and temporal candidates List is filled with combinations (if not enough candidates) Extensions in H.266 | VVC List of up to six candidates

Spatial and temporal candidates (as in HEVC) History-based candidates (FIFO buffer of size 6)

A0 A1 B0 B1 B2

current block

T0 T1 co-located area in reference picture

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 22 / 38

slide-23
SLIDE 23

Coding of Motion Parameters / Modes with Inferred Motion Parameters

Selected Improvements of Motion Coding in H.266 | VVC

Merge Mode with Motion Vector Difference Select one of first two candidates in merge list (by flag) Send restricted motion vector difference

Logarithmic distance (1, 2, 4, 8, 16, 32, 64, 128) Direction index (+x, −x, +y, −y)

Subblock-Based Temporal Motion Vector Prediction Derive local motion vector field for block from reference picture, based on 4×4 grid Offsets are derived from local neighboring blocks Symmetric Motion Vector Differences For bi-predicted blocks (two prediction signals) Transmit only list 0 motion vector difference ∆m0 Derive list 1 motion vector difference by ∆m1 = −∆m0

A

  • m1
  • m0

∆m0 ∆m1 = −∆m0

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 23 / 38

slide-24
SLIDE 24

Estimation of Motion Vectors / Cost Measure for Motion Search

Motion Estimation: Block Matching Algorithm

Block Matching Algorithm Define motion vector search range R =

  • −mmax; mmax
  • ×
  • −mmax; mmax
  • Test all vectors m = (mx, my) in search range R

Calculate cost measure C(m)

Choose vector m that minimizes cost measure C(m) Simple Cost Measures for Motion Search General p-norm distortion measures

Dp(mx, my) =

  • x,y∈B
  • s[x, y] − s′

ref[x + mx, y + my]

  • p

p = 1: Sum of absolute differences (SAD) p = 2: Sum of squared differences (SSD)

  • rec. reference picture s′

ref[ x, y ]

current original picture s[ x, y ]

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 24 / 38

slide-25
SLIDE 25

Estimation of Motion Vectors / Cost Measure for Motion Search

Lagrangian Motion Estimation

Mode decision concept too complex Ignore transform coding (often coded error will be zero) Lagrangian Motion Estimation Selection motion vector m according to m∗ = arg min

m∈R D(m) + λm · R(m)

D(m) – Distortion between original and predicted block R(m) – Number of bits for motion vector difference

Distortion Measure Sum of squared differences (SSD) λm = λ Sum of absolute differences (SAD) λm = √ λ SAD after Hadamard transform

better approximation of real RD costs

Hadamard Transform Defined for integer powers of 2 A1 =

  • 1
  • AN =

1 √ 2

  • AN/2

AN/2 AN/2 −AN/2

  • Example: Transform matrix for N = 4

A4 = 1 2     1 1 1 1 1 −1 1 −1 1 1 −1 −1 1 −1 −1 1     Hadamard SAD

1 Calculate Hadamard Transform of

prediction error s[x, y] − ˆ s[x, y]

2 Sum up absolute values of the

Hadamard transform coefficients

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 25 / 38

slide-26
SLIDE 26

Estimation of Motion Vectors / Cost Measure for Motion Search

Lagrangian Motion Estimation: Impact of Cost Measure

5 10 15 20 25 30 35 40 36 37 38 39 40 41 42 43 44 Full transform coding (avg. 17.8 %) SSD (avg. 6.5 %) SAD (avg. 6.6 %) Hadamard SAD (avg. 12.2 %) bit-rate saving vs SAD-based ME [%] PSNR (Y) [dB] Kimono (1920×1080, 24 Hz)

reference: distortion only search no rate term SAD distortion Full transform coding (similar to mode decision): best, but extremely complex Lagrangian search with SAD/SSD distortion measures: both have same performance Lagrangian with SAD in Hadamard domain: good compromise

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 26 / 38

slide-27
SLIDE 27

Estimation of Motion Vectors / Sub-Sample Refinement

Speed Up: Integer Search with Sub-Sample Refinement

Reduce Search Complexity Sub-sample locations require interpolation Testing sub-sample locations much more complex Reduce complexity by spliting motion search into

1 Integer sample search 2 Sub-sample refinement(s)

Integer Search Can also use simpler distortion measure SAD instead of Hadamard SAD Sub-Sample Refinement 9-point search per resolution doubling Half-sample accuracy: 8 sub-sample locations Quarter-sample accuracy: 16 sub-sample locations

integer-sample locations tested half-sample locations tested quarter-sample locations

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 27 / 38

slide-28
SLIDE 28

Estimation of Motion Vectors / Sub-Sample Refinement

Lagrangian Motion Estimation: Impact of Sub-Sample Refinement

5 10 15 20 25 30 35 40 36 37 38 39 40 41 42 43 44 Exhaustive Search: Hadamard SAD (avg. 12.2 %) Refinement: HSAD + HSAD (avg. 10.3 %) Refinement: SAD + HSAD (avg. 9.3 %) Refinement: SAD + SAD (avg. 5.3 %) bit-rate saving vs TM5-ME [%] PSNR (Y) [dB] Kimono (1920×1080, 24 Hz)

reference: distortion only search no rate term SAD distortion Good Compromise between Complexity and Coding Efficiency Integer search with SAD as distortion measure Sub-sample refinement with Hadamard SAD as distortion measure

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 28 / 38

slide-29
SLIDE 29

Estimation of Motion Vectors / Fast Integer Search Strategies

Fast Integer Search: Logarithmic Search

Logarithmic Search

[ Jain, Jain, 1981 ]

Fast strategy for integer search Iterative search with diamond-shaped pattern, consisting of 5 points (corners + center) Compare costs of 5 points in each search step Basic Search Strategy Move pattern so that it is centered around best match

No more than 3 new candidates

Logarithmic refinement if best match in center of pattern

4 new candidates

Terminate motion search if

best match is in center of pattern, and smallest pattern size is used

1 2 3 4 5 6 7

final vector

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 29 / 38

slide-30
SLIDE 30

Estimation of Motion Vectors / Fast Integer Search Strategies

Fast Integer Search: Diamond Search

Diamond Search

[ Li, Zeng, Liou, 1994 ]

Another fast strategy for integer search Iterative search with diamond-shaped pattern, consisting of 9 points Similar strategy as logarithmic search Basic Search Strategy Move pattern so that it is centered around best match

Up to 5 new candidates

Logarithmic refinement if best match in center of pattern

Up to 8 new candidates

Terminate motion search if

best match is in center of pattern, and smallest pattern size is used

1 2 3 4 5 6

final vector

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 30 / 38

slide-31
SLIDE 31

Estimation of Motion Vectors / Fast Integer Search Strategies

Fast Integer Search: Choosing Start Point

Start Point for Motion Search Use motion vector (0, 0) as start point for motion search Use motion vector predictor as start point Block adaptive selection of start point can significantly speed up motion search Adaptive Choice of Start Point Idea: In most cases, motion of block is similar to at least one of the neighboring blocks First evaluate already estimated motion vectors

  • f blocks in the neighborhood

Example: Blocks A, B, C, D, E Can also include temporal candidate

Choose best match among the candidates as start point of the motion search

A C D0 D1 B E0 E1

current block

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 31 / 38

slide-32
SLIDE 32

Estimation of Motion Vectors / Fast Integer Search Strategies

Lagrangian Motion Estimation: Impact of Fast Integer Search

5 10 15 20 25 30 35 40 36 37 38 39 40 41 42 43 44 Exhaustive integer search (avg. 9.3 %) Fast integer search (avg. 9.1 %) bit-rate saving vs SAD-based ME [%] PSNR (Y) [dB] Kimono (1920×1080, 24 Hz)

reference: distortion only search no rate term SAD distortion Good Compromise between Complexity and Coding Efficiency Fast integer search with SAD as distortion measure (various strategies possible) Sub-sample refinement with Hadamard SAD as distortion measure

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 32 / 38

slide-33
SLIDE 33

Increased Flexibility of Motion Description / Motion Vectors Outside Picture Boundaries

Motion Vectors Over Picture Boundaries

reference picture current picture

Range of Motion Vectors At first glace: Restrict motion vectors to reference blocks completely contained in the picture area Often, there are objects that move inside the picture area from one frame to another Improve prediction by allowing Motion Vectors over Picture Boundaries

Motion vectors are not restricted to picture area Samples outside picture area are generated by constant padding (sample repetition)

Supported in all modern video coding standards

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 33 / 38

slide-34
SLIDE 34

Increased Flexibility of Motion Description / Higher Order Motion Models

Describing Motion of Blocks: Linear Models

Representation for motion field of a block/region Translation Model (2 parameters) mx(x, y) = mx my(x, y) = my Scaling / Rotation Model (4 parameters) mx(x, y) = a0 + a1 · x + a2 · y my(x, y) = a3 + a2 · x − a1 · y Affine Motion Model (6 parameters) mx(x, y) = a0 + a1 · x + a2 · y my(x, y) = a3 + a4 · x + a5 · y

4 parameter model affine model

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 34 / 38

slide-35
SLIDE 35

Increased Flexibility of Motion Description / Higher Order Motion Models

Describing Motion of Blocks: Non-Linear Models

perspective model parabolic model

Planar Perspective Model (8 parameters) mx(x, y) = (a0 + a1 · x + a2 · y)/(1 + a6 · x + a7 · y) − x my(x, y) = (a3 + a2 · x − a1 · y)/(1 + a6 · x + a7 · y) − y Parabolic Motion Model (12 parameters) mx(x, y) = a0 + a1 · x + a2 · y + a3 · x2 + a4 · xy + a5 · y 2 my(x, y) = a6 + a7 · x + a8 · y + a9 · x2 + a10 · xy + a11 · y 2

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 35 / 38

slide-36
SLIDE 36

Increased Flexibility of Motion Description / Higher Order Motion Models

Using Higher Order Motion Models

Estimation of Motion Parameters Need to determine parameter vector a = (a0, a1, · · · ) Exhaustive testing of search space is infeasible Gauss-Newton iterations for minimizing D(a) =

  • x,y
  • s
  • x, y
  • − s′

ref

  • x − mx(a), y − my(a)

2 Typically rather complex, also sub-optimal Motion Compensation with Higher Order Motion Models Higher order models can (potentially) better approximate real motion of objects Require larger bit rate for transmitting motion parameters Require more general and more complex interpolation methods Observed coding gains typically do not justify additional complexity

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 36 / 38

slide-37
SLIDE 37

Increased Flexibility of Motion Description / Higher Order Motion Models

Affine Motion Model in H.266 | VVC

Motion Models in H.266 | VVC Most blocks: Translational model Affine model (with 4 or 6 parameters) Coding of Affine Motion Parameters Use motion vector of 2 or 3 control points Explicit: Conventional predictive coding Merge: Derive control point motion vectors from neighborhood Motion Compensation for Affine Blocks Derive motion vectors for 4×4 sub-blocks, with a precision of 1/16-th sample Conventional motion compensation for 4×4 blocks VVC includes interpolation filters for 16 phases

6-parameter model: 3 control point motion vectors 4-parameter model: 2 control point motion vectors

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 37 / 38

slide-38
SLIDE 38

Summary

Summary of Lecture

Partitioning into Block for Motion Compensation Providing variable block sizes significantly improves coding efficiency Suitable concept: Quadtree with additional non-square partitioning options Motion Parameter Coding Predictive coding: Derive predictor from neighborhood Entropy coding of difference to predictor Modes with inferred motion: Temporal direct mode, merge mode Motion Estimation Compromise between complexity and coding efficiency:

Fast integer search with SAD as distortion measure Sub-sample refinement with Hadamard SAD as distortion measure

Use Lagrangian costs that take into account rate for sending motion data

Heiko Schwarz (Freie Universität Berlin) — Image and Video Coding: Motion Estimation and Coding 38 / 38