Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju - - PowerPoint PPT Presentation

towards low partition overhead in image decomposition
SMART_READER_LITE
LIVE PREVIEW

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju - - PowerPoint PPT Presentation

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida Motivation and Goals Image decomposition is required in parallelizing many image processing algorithm Different decomposition solutions can


slide-1
SLIDE 1

Towards Low Partition Overhead in Image Decomposition

SCI (2003)

Ju Wang, University of Florida

slide-2
SLIDE 2

Motivation and Goals

  • Image decomposition is required in parallelizing many image processing

algorithm

  • Different decomposition solutions can affect the performance gain of the

parallelism

  • Decomposition scheme determines the additional overhead associated

with inter-processor communication, due to – Local dependency of image pixels – Communication delay in cluster and distributed memory systems

  • This overhead should be minimized, especially for real-time image

(video) processing, such as parallelized video encoding.

  • Goal: develop image decomposition algorithms that result in

low partition overhead, and demonstrate one particular application in MPEG-2 video decoding.

slide-3
SLIDE 3

Model of the Parallelized Image Algorithm

  • The image processing algorithms considered here are decomposable
  • Parallel processing is achieved by executing multiple identical threads
  • Each thread works on an assigned area of the original image
  • There is no dependancy among parallel threads
  • The results of individual threads are merged in a master node
slide-4
SLIDE 4

Image Decomposition Problem Description Notations:

  • pixel set I : the set of all pixels for a rectangle-shape image with width w

and height h

  • local dependency f : I −

→ 2I, which maps a pixel in I to a subset of I. f is determined by a specific image processing algorithm

  • {P1, P2, ..., PN} be a N-way disjoint partition for image I where Pk ⊂ I is

the kth part

Definition of Decomposition Overhead

PO(P, N, f) =

  • k=1...N
  • i∈Pk

{j ∈ f(i) and not j ∈ Pk}

slide-5
SLIDE 5

Determining Communication Overhead

The amount of additional image data need to be accessed by other processing threads is determined by

  • the pattern of local dependence determined by the image processing

algorithm. – In image smoothing filter, the size of the kernel mask determines the neighborhood surrounding the target pixel.

  • the partition method used.

A good partition can limit the partition

  • verhead within a reasonable range.

While a ill-partitioned image might require the access of the whole image in each threads. Such a bad partition can be constructed by zig-zag scanning the image, and assigning pixels to different threads in round-robin.

slide-6
SLIDE 6

Decomposition Overhead Example

A B C D

  • Total length of internal partition edge
  • Diameter of neighboring area
  • Problem reduced to the search of partition with the shortest partition

circumference

slide-7
SLIDE 7

More Partition Examples

A B C D (b) (a) A B C D A D B C B A D C (c) (d)

slide-8
SLIDE 8

More Partition Examples

A B C D B C (a) another 4−piece−partition (b)the reference area of piece A A D

slide-9
SLIDE 9

Low Boundary Analysis

Fact: for a given area, the shape with the shortest circumference is circle. Assume that the image is a perfect square with edge length w,

  • the image is to be divided into N disjoint piece with equal area, the area
  • f each part must be w2/N.
  • the radius of the circles in the ideal partition is r =
  • w2/N ∗ π.
  • partition must be the optimal one with shortest overall circumference.
  • the circumference for each circle: c = 2 ∗ π ∗ r = 2 ∗
  • π ∗ w2/N.
  • The total circumference of the N circles become Tc = N ∗ c = 2 ∗

√ π ∗ N ∗ w2 = 2 ∗ w ∗ √ π ∗ N.

  • subtract the external circumference of the original image, which is 4 ∗ w
  • ci(w, N) = Tc−ce

2

= w ∗ √ π ∗ N − 2 ∗ w

slide-10
SLIDE 10

Horizontal-Vertical Partition and Low Boundary Results

10 20 30 40 50 60 70 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 −*−pseudo ideal partition −o−HV partition

  • Partition overhead grows as the number of partition increases.
  • H-V partition is comparable to ideal partition at small scale parallel setting.
slide-11
SLIDE 11

Problem Formulation

  • quadric assignment problem :

min(

  • i,j∈{1···N}
  • k,h∈{1···K}

Xi,k.Xj,h.dk,h.ri,j)

  • Xi,j = 1 when macroblock i is assigned to jth processor.
  • Xi,j ∈ {0, 1},

i Xi,j = H·V K

j Xi,j = 1

slide-12
SLIDE 12

Heuristic Partition Algorithm

  • divide and conquer
  • image area is divided into two parts each time
  • always try the shortest dimension when dividing a image part
  • the division is balanced in determine the area of sub image area
slide-13
SLIDE 13

Partition Example with Proposed Heuristic Algorithm

partition(r2,3) partition(r1,4) partition(r3,2) partition(r4,2) partition(r5,2) partition(r0,7)

slide-14
SLIDE 14

Performance Results

5 10 15 20 25 30 35 40 5 10 15 20 25 30 35

Number of Partition Amount of Reference Data per Picture (Mbits) −.− Horizontal partition −*−vertical partition −+−HV partition −d−Quick partition Search Window Size=32 pixels Picture size: 720*480 Reference Data In Data Partition Algorithms

slide-15
SLIDE 15

Conclusion

  • Discussed the challenge of image decomposition in parallel image

processing

  • provided an analysis for the low boundary of partition overhead
  • Proposed an heuristic algorithm which can produce good partitions with

low decomposition overhead

  • Our experiments of this algorithm in a parallel MPEG-2 video decoder

shows positive preliminary results.

Thank You!