Towards Efficient Video Compression Using Scalable Vector Graphics - - PowerPoint PPT Presentation

towards efficient video compression using scalable vector
SMART_READER_LITE
LIVE PREVIEW

Towards Efficient Video Compression Using Scalable Vector Graphics - - PowerPoint PPT Presentation

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University Politehnica of Bucharest Computer


slide-1
SLIDE 1

Towards Efficient Video Compression Using Scalable Vector Graphics

  • n the Cell Broadband Engine

Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University Politehnica of Bucharest Computer Science and Engineering Department

International Workshop on Multi-core Software Engineering, Cape Town, South Africa, May 1st 2010

slide-2
SLIDE 2

2

Outline

  • Video Codecs & Image Characteristics
  • NURBS Curves
  • NURBS Curves
  • Image Representation
  • Image Encoding
  • Porting to the Cell/B.E.
  • Results
  • Related Projects @cs.pub.ro
  • Conclusions & Outlook
slide-3
SLIDE 3

3

Video Codecs

  • A software program or library
  • Encodes/Decodes the video component of a movie/clip in a

digital format digital format

  • Aim: create a decoder using scalar vector graphics (SVG)
  • Advantages of SVG:
  • Data Compression – efficient representation
  • Losseless display at any resolution – shape preservation
  • Disadvantages of SVG: difficult conversion from raster
  • Disadvantages of SVG: difficult conversion from raster

Vector Raster

slide-4
SLIDE 4

4

NURBS Curves

NURBS = Non-uniform relational B-splines Can be used to represent curves and surfaces Used extensively in Computer Aided Design (CAD) Parameters

Degree (1,2,3,5,…) Control points & weights Knots

NURBS advantages: NURBS advantages:

Invariant to scalar transformations Computable with stable algorithms (e.g. DeBoor) Can represent complex features with few parameters A curve can be handled easily through its parameters

slide-5
SLIDE 5

5

NURBS Conversion

Polygonal approximation:

Curve evaluation – deBoor’s algorithm Initial approximation to curve knots Initial approximation to curve knots Iterative process of adding nodes

Integrated in ffmpeg & ogg & used in the VLC player

Video frame Internal Representation

slide-6
SLIDE 6

6

Image Encoding

Despeckling Quantization Follow curves NURBS

Video frame Raster Edges Raster

Modular Design Stage algorithms can be treated independently: Despeckling & noise filtering

Create big pieces of same color zones – similar to AutoTrace, by

smoothing/combining neighboring pixels of similar colors

Colors Curves

smoothing/combining neighboring pixels of similar colors

Color quantization

Create a new color scale The algorithm is based on octrees Reduce number of colors in order to reduce the image size in the

vector representation – loses details/quality vs. original image

slide-7
SLIDE 7

7

Feature Extraction with NURBS

Determine zones of constant color Determine edges between these Determine edges between these

zones using NURBS curves

Determine knots of sharing edges

The approximation is passing

through these knots through these knots

The approximation uses a least-

squares approach

slide-8
SLIDE 8

8

IBM’s Cell/B.E. Processor

  • Heterogeneous multi-

core system architecture

SPE

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

LS SXU

SPU

– Power Processor Element for control tasks – Synergistic Processor Elements for data- intensive processing

  • Synergistic Processor

Element (SPE) consists

  • f

– Synergistic Processor Unit (SPU)

16B/cycle (2x) 16B/cycle BIC MIC 16B/cycle

EIB (up to 96B/cycle)

16B/cycle

PPE

SMF

PXU

L1 PPU L2

SMF SMF SMF SMF SMF SMF SMF

Unit (SPU) – Synergistic Memory Flow Control (SMF)

  • Data movement and

synchronization

  • Interface to high-

performance Element Interconnect Bus

FlexIOTM Dual XDRTM

64-bit Power Architecture with VMX

PXU

L1

16B/cycle

L2

32B/cycle

slide-9
SLIDE 9

9

400

Encoding – Serial Profiling

  • Profiling is done on slices of 308 x 400 pixels

50 100 150 200 250 300 350

Despeckling Quantize Follow Curves

50 Time (ms)

Phase Despeckling Quantize Follow Curves Time (ms) 368.8 61.051 25.822 68.884 Percentage 70.31% 11.64% 4.92% 13.13%

slide-10
SLIDE 10

10

Porting to the Cell/B.E. Architecture

  • IBM’s Cell/B.E. is heterogeneous: PPE/SPE
  • Usual image processing algorithms methodology
  • Usual image processing algorithms methodology

– Divide the problem – process data – reconstruct results

  • Image Despeckling

– Whole algorithm is run on the SPEs

  • Image Quantization

– The PPE divides the frame – The PPE divides the frame – The SPEs generate the octrees – The PPE fuses the octrees together

  • NURBS curves

– Entire algorithm is run on SPEs

slide-11
SLIDE 11

11

Despeckling Design Tradeoffs

  • Split image in slices and distribute them to SPUs
  • Smoothing is done independently by each SPU
  • Smoothing is done independently by each SPU
  • The PPU rebuilds the image from the processed

fragments

  • Tradeoffs:
  • The slices are too small – the smoothing will be
  • The slices are too small – the smoothing will be

exaggerated

  • The slices are too big – they will not fit on the

SPU local storage memory

slide-12
SLIDE 12

12

Quantization Design Tradeoffs

  • The PPU decides if/when to reduce the number of

colors

  • The SPUs generate partial color trees with a
  • The SPUs generate partial color trees with a

maximal number of levels by counting pixels of each color

  • The PPU combines the SPU generated trees in a

global tree

  • Tradeoffs:
  • Tradeoffs:
  • The slices are too big – generate too many partial trees

and too many DMA transfers & significant overhead in the global tree reconstruction

  • The slices are too small – processing on the PPU may

be more efficient

slide-13
SLIDE 13

13

Quantization SIMD/Vectorization

  • Groups of 3 bits from the three basic color (RGB)

components are forming paths in the partial trees built by SPUs: built by SPUs:

  • bit_R<<2
  • bit_G<<1
  • bit_B<<0
  • Computing the paths serially is done with

successive shifts in 8 iterations

  • The vector/SIMD version allows the computing of

entire vector paths in the partial tree at once

slide-14
SLIDE 14

14

Ongoing developments

  • Edge detection component in the quantization phase

moved from PPU to SPUs

– Color trees are aligned to ease transfer and processing – Color trees are aligned to ease transfer and processing – Each SPU makes a local copy & converts pixels to codes – After conversion release memory to allow edge detection passes to continue

  • Tradeoffs:

– Edge detection algorithm generates useless edges around – Edge detection algorithm generates useless edges around the current slice to avoid lots of coordinate testing – Big slices are good because of code serialization – no more branching code – Small slices – generate lots of useless edges thus increasing storage requirements

slide-15
SLIDE 15

15

Results – Image Quality

Original 4SPUs Original Image 4SPUs

x86 codec speed: 4-6 fps Compression ratio to date: 0.982 – 1.754

16SPUs 8SPUs

slide-16
SLIDE 16

16

Results – Despeckling@SPUs

SPUs 1 2 4 8 16 Time (ms) 368.80 204.96 104.10 54.35 29.21

8 10 12 14

eedup

Speedup 1.00 1.80 3.54 6.79 12.63

2 4 6 1 2 4 8 16

Spee Number of SPUs

slide-17
SLIDE 17

17

Results – Quantization@SPUs

SPUs 1 2 4 8 16 Time (ms) 61.05 23.83 12.64 10.29 12.06 Speedup 1.00 2.56 4.83 5.93 5.06

3 4 5 6 7

Speedup

Speedup 1.00 2.56 4.83 5.93 5.06

1 2 3 1 2 4 8 16

Sp Number of SPUs

slide-18
SLIDE 18

18

Feature Extraction from Satellite Images on Hybrid x86/CellBE Systems

Related Projects @cs.pub.ro

Original Grayscale Image Detection (Sobel) Hough Accumulator

X86_64

Hough Peaks

  • ver image edges

Mark road segment edges Final identified feature (road)

X86_64 Cell/B.E.

Saved as SVG

slide-19
SLIDE 19

19

Related Projects @cs.pub.ro

Interactive 3D Map of Romania SVG for Map Representation

slide-20
SLIDE 20

20

Conclusions & Outlook

  • Conclusions

– The performance of the SVG codec benefits from its deployment on the Cell/B.E. architecture deployment on the Cell/B.E. architecture – The quality and performance of the codec are strongly dependent on design choices in the processing steps – The codec compression still requires further improvement

  • Outlook

– Currently only Intra-coded-frames (I-frame) are encoded – Currently only Intra-coded-frames (I-frame) are encoded leading to big SVG file sizes – Add support for Predicted (previous) & Bi-coded (previous & next) frames thus improving SVG storage requirements

  • Use motion estimation techniques between reference I-frame

blocks & blocks in subsequent frames (translation/rotation/etc)

  • The offsets/differences are stored in motion vectors
slide-21
SLIDE 21

Thank you for your attention

Q & A cs.pub.ro emil.slusanschi@cs.pub.ro