Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju - - PowerPoint PPT Presentation
Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju - - PowerPoint PPT Presentation
Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida Motivation and Goals Image decomposition is required in parallelizing many image processing algorithm Different decomposition solutions can
Motivation and Goals
- Image decomposition is required in parallelizing many image processing
algorithm
- Different decomposition solutions can affect the performance gain of the
parallelism
- Decomposition scheme determines the additional overhead associated
with inter-processor communication, due to – Local dependency of image pixels – Communication delay in cluster and distributed memory systems
- This overhead should be minimized, especially for real-time image
(video) processing, such as parallelized video encoding.
- Goal: develop image decomposition algorithms that result in
low partition overhead, and demonstrate one particular application in MPEG-2 video decoding.
Model of the Parallelized Image Algorithm
- The image processing algorithms considered here are decomposable
- Parallel processing is achieved by executing multiple identical threads
- Each thread works on an assigned area of the original image
- There is no dependancy among parallel threads
- The results of individual threads are merged in a master node
Image Decomposition Problem Description Notations:
- pixel set I : the set of all pixels for a rectangle-shape image with width w
and height h
- local dependency f : I −
→ 2I, which maps a pixel in I to a subset of I. f is determined by a specific image processing algorithm
- {P1, P2, ..., PN} be a N-way disjoint partition for image I where Pk ⊂ I is
the kth part
Definition of Decomposition Overhead
PO(P, N, f) =
- k=1...N
- i∈Pk
{j ∈ f(i) and not j ∈ Pk}
Determining Communication Overhead
The amount of additional image data need to be accessed by other processing threads is determined by
- the pattern of local dependence determined by the image processing
algorithm. – In image smoothing filter, the size of the kernel mask determines the neighborhood surrounding the target pixel.
- the partition method used.
A good partition can limit the partition
- verhead within a reasonable range.
While a ill-partitioned image might require the access of the whole image in each threads. Such a bad partition can be constructed by zig-zag scanning the image, and assigning pixels to different threads in round-robin.
Decomposition Overhead Example
A B C D
- Total length of internal partition edge
- Diameter of neighboring area
- Problem reduced to the search of partition with the shortest partition
circumference
More Partition Examples
A B C D (b) (a) A B C D A D B C B A D C (c) (d)
More Partition Examples
A B C D B C (a) another 4−piece−partition (b)the reference area of piece A A D
Low Boundary Analysis
Fact: for a given area, the shape with the shortest circumference is circle. Assume that the image is a perfect square with edge length w,
- the image is to be divided into N disjoint piece with equal area, the area
- f each part must be w2/N.
- the radius of the circles in the ideal partition is r =
- w2/N ∗ π.
- partition must be the optimal one with shortest overall circumference.
- the circumference for each circle: c = 2 ∗ π ∗ r = 2 ∗
- π ∗ w2/N.
- The total circumference of the N circles become Tc = N ∗ c = 2 ∗
√ π ∗ N ∗ w2 = 2 ∗ w ∗ √ π ∗ N.
- subtract the external circumference of the original image, which is 4 ∗ w
- ci(w, N) = Tc−ce
2
= w ∗ √ π ∗ N − 2 ∗ w
Horizontal-Vertical Partition and Low Boundary Results
10 20 30 40 50 60 70 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 −*−pseudo ideal partition −o−HV partition
- Partition overhead grows as the number of partition increases.
- H-V partition is comparable to ideal partition at small scale parallel setting.
Problem Formulation
- quadric assignment problem :
min(
- i,j∈{1···N}
- k,h∈{1···K}
Xi,k.Xj,h.dk,h.ri,j)
- Xi,j = 1 when macroblock i is assigned to jth processor.
- Xi,j ∈ {0, 1},
i Xi,j = H·V K
j Xi,j = 1
Heuristic Partition Algorithm
- divide and conquer
- image area is divided into two parts each time
- always try the shortest dimension when dividing a image part
- the division is balanced in determine the area of sub image area
Partition Example with Proposed Heuristic Algorithm
partition(r2,3) partition(r1,4) partition(r3,2) partition(r4,2) partition(r5,2) partition(r0,7)
Performance Results
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35
Number of Partition Amount of Reference Data per Picture (Mbits) −.− Horizontal partition −*−vertical partition −+−HV partition −d−Quick partition Search Window Size=32 pixels Picture size: 720*480 Reference Data In Data Partition Algorithms
Conclusion
- Discussed the challenge of image decomposition in parallel image
processing
- provided an analysis for the low boundary of partition overhead
- Proposed an heuristic algorithm which can produce good partitions with
low decomposition overhead
- Our experiments of this algorithm in a parallel MPEG-2 video decoder