Towards Efficient Video Compression Using Scalable Vector Graphics - PowerPoint PPT Presentation

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University Politehnica of Bucharest Computer Science and Engineering Department International Workshop on Multi-core Software Engineering, Cape Town, South Africa, May 1 st 2010

Outline 2 • Video Codecs & Image Characteristics • NURBS Curves • NURBS Curves • Image Representation • Image Encoding • Porting to the Cell/B.E. • Results • Related Projects @cs.pub.ro • Conclusions & Outlook

Video Codecs 3 • A software program or library • Encodes/Decodes the video component of a movie/clip in a digital format digital format • Aim: create a decoder using scalar vector graphics (SVG) • Advantages of SVG: • Data Compression – efficient representation • Losseless display at any resolution – shape preservation • Disadvantages of SVG: difficult conversion from raster • Disadvantages of SVG: difficult conversion from raster Vector Raster

NURBS Curves 4 � NURBS = Non-uniform relational B-splines � Can be used to represent curves and surfaces � Used extensively in Computer Aided Design (CAD) � Parameters � Degree (1,2,3,5,…) � Control points & weights � Knots � NURBS advantages: � NURBS advantages: � Invariant to scalar transformations � Computable with stable algorithms (e.g. DeBoor) � Can represent complex features with few parameters � A curve can be handled easily through its parameters

NURBS Conversion 5 � Polygonal approximation: � Curve evaluation – deBoor’s algorithm � Initial approximation to curve knots � Initial approximation to curve knots � Iterative process of adding nodes � Integrated in ffmpeg & ogg & used in the VLC player Video frame Internal Representation

Image Encoding 6 Despeckling Quantization Follow curves NURBS Video frame Raster Raster Edges � Modular Design Colors Curves � Stage algorithms can be treated independently: � Despeckling & noise filtering � Create big pieces of same color zones – similar to AutoTrace, by smoothing/combining neighboring pixels of similar colors smoothing/combining neighboring pixels of similar colors � Color quantization � Create a new color scale � The algorithm is based on octrees � Reduce number of colors in order to reduce the image size in the vector representation – loses details/quality vs. original image

Feature Extraction with NURBS 7 � Determine zones of constant color � Determine edges between these � Determine edges between these zones using NURBS curves � Determine knots of sharing edges � The approximation is passing through these knots through these knots � The approximation uses a least- squares approach

IBM’s Cell/B.E. Processor 8 SPE • Heterogeneous multi- SPU SPU SPU SPU SPU SPU SPU SPU SXU SXU SXU SXU SXU SXU SXU SXU core system architecture LS LS LS LS LS LS LS LS – Power Processor Element SMF SMF SMF SMF SMF SMF SMF SMF for control tasks – Synergistic Processor 16B/cycle Elements for data- EIB (up to 96B/cycle) intensive processing • Synergistic Processor 16B/cycle 16B/cycle 16B/cycle (2x) PPE Element (SPE) consists of PPU MIC BIC – Synergistic Processor Unit (SPU) Unit (SPU) PXU PXU L1 L1 L2 L2 – Synergistic Memory Flow Control (SMF) 16B/cycle 32B/cycle • Data movement and FlexIO TM Dual XDR TM synchronization • Interface to high- 64-bit Power Architecture with VMX performance Element Interconnect Bus

Encoding – Serial Profiling 9 • Profiling is done on slices of 308 x 400 pixels 400 350 300 Despeckling 250 Quantize 200 Follow 150 Curves 100 50 50 0 Time (ms) Phase Despeckling Quantize Follow Curves Time (ms) 368.8 61.051 25.822 68.884 Percentage 70.31% 11.64% 4.92% 13.13%

Porting to the Cell/B.E. Architecture 10 • IBM’s Cell/B.E. is heterogeneous: PPE/SPE • Usual image processing algorithms methodology • Usual image processing algorithms methodology – Divide the problem – process data – reconstruct results • Image Despeckling – Whole algorithm is run on the SPEs • Image Quantization – The PPE divides the frame – The PPE divides the frame – The SPEs generate the octrees – The PPE fuses the octrees together • NURBS curves – Entire algorithm is run on SPEs

Despeckling Design Tradeoffs 11 • Split image in slices and distribute them to SPUs • Smoothing is done independently by each SPU • Smoothing is done independently by each SPU • The PPU rebuilds the image from the processed fragments • Tradeoffs: • The slices are too small – the smoothing will be • The slices are too small – the smoothing will be exaggerated • The slices are too big – they will not fit on the SPU local storage memory

Quantization Design Tradeoffs 12 • The PPU decides if/when to reduce the number of colors • The SPUs generate partial color trees with a • The SPUs generate partial color trees with a maximal number of levels by counting pixels of each color • The PPU combines the SPU generated trees in a global tree • Tradeoffs: • Tradeoffs: • The slices are too big – generate too many partial trees and too many DMA transfers & significant overhead in the global tree reconstruction • The slices are too small – processing on the PPU may be more efficient

Quantization SIMD/Vectorization 13 • Groups of 3 bits from the three basic color (RGB) components are forming paths in the partial trees built by SPUs: built by SPUs: • bit_R<<2 • bit_G<<1 • bit_B<<0 • Computing the paths serially is done with successive shifts in 8 iterations • The vector/SIMD version allows the computing of entire vector paths in the partial tree at once

Ongoing developments 14 • Edge detection component in the quantization phase moved from PPU to SPUs – Color trees are aligned to ease transfer and processing – Color trees are aligned to ease transfer and processing – Each SPU makes a local copy & converts pixels to codes – After conversion release memory to allow edge detection passes to continue • Tradeoffs: – Edge detection algorithm generates useless edges around – Edge detection algorithm generates useless edges around the current slice to avoid lots of coordinate testing – Big slices are good because of code serialization – no more branching code – Small slices – generate lots of useless edges thus increasing storage requirements

Results – Image Quality 15 Original Original 4SPUs 4SPUs Image 8SPUs 16SPUs � x86 codec speed: 4-6 fps � Compression ratio to date: 0.982 – 1.754

Results – Despeckling@SPUs 16 SPUs 1 2 4 8 16 Time (ms) 368.80 204.96 104.10 54.35 29.21 Speedup 1.00 1.80 3.54 6.79 12.63 14 12 10 eedup 8 Spee 6 4 2 0 1 2 4 8 16 Number of SPUs

Results – Quantization@SPUs 17 SPUs 1 2 4 8 16 Time (ms) 61.05 23.83 12.64 10.29 12.06 Speedup Speedup 1.00 1.00 2.56 2.56 4.83 4.83 5.93 5.93 5.06 5.06 7 6 5 Speedup 4 Sp 3 3 2 1 0 1 2 4 8 16 Number of SPUs

Related Projects @cs.pub.ro 18 Feature Extraction from Satellite Images on Hybrid x86/CellBE Systems Grayscale Image Detection Hough Accumulator Original (Sobel) X86_64 Cell/B.E. X86_64 Hough Peaks Mark road segment Final identified over image edges edges feature (road) Saved as SVG

Related Projects @cs.pub.ro 19 Interactive SVG for Map 3D Map of Representation Romania

Conclusions & Outlook 20 • Conclusions – The performance of the SVG codec benefits from its deployment on the Cell/B.E. architecture deployment on the Cell/B.E. architecture – The quality and performance of the codec are strongly dependent on design choices in the processing steps – The codec compression still requires further improvement • Outlook – Currently only Intra-coded-frames (I-frame) are encoded – Currently only Intra-coded-frames (I-frame) are encoded leading to big SVG file sizes – Add support for Predicted (previous) & Bi-coded (previous & next) frames thus improving SVG storage requirements • Use motion estimation techniques between reference I-frame blocks & blocks in subsequent frames (translation/rotation/etc) • The offsets/differences are stored in motion vectors

Thank you for your attention Q & A cs.pub.ro emil.slusanschi@cs.pub.ro

Towards Efficient Video Compression Using Scalable Vector Graphics - PowerPoint PPT Presentation

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University Politehnica of Bucharest Computer

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

Video Compression (cont.) Lecture # 6 Shahab Baqai LUMS Outline Scalable video coding

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Scalable Video Scalable Video Bishoy Gamil Stefanos Outline Outline Introduction

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Computer Graphics CS 543 Lecture 12 (Part 2) CS 543 Lecture 12 (Part 2) Advances in Graphics

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Teaching and Learning in the Digital Age: Redesigning Assessment Strategies in Norwegian Higher

Code Communication SWEN-610 Foundations of Software Engineering Department of Software

READING COMPREHENSION AND COMMUNICATIVE APPROACH THROUGH ESP MATERIALS FOR STUDENTS OF LAW

Degeneration of Bethe subalgebras in the Yangian Aleksei Ilin National Research University

restarting the movement P&P Convention Objectives the bridge the penetration the

UNIMAS as a GLOBAL BRAND MOHAMAD KADIM SUAIDI KONVENSYEN PENTADBIR UNIMAS 2019 SRI AMAN 16.8.19

Towards Efficient Video Compression Using Scalable Vector Graphics - PowerPoint PPT Presentation

Towards Efficient Video Compression Using Scalable Vector Graphics on the Cell Broadband Engine Andreea Sandu, Emil Slusanschi, Alin Murarasu, Andreea Serban, Alexandru Herisanu, Teodor Stoenescu University Politehnica of Bucharest Computer

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

Video Compression (cont.) Lecture # 6 Shahab Baqai LUMS Outline Scalable video coding

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Scalable Video Scalable Video Bishoy Gamil Stefanos Outline Outline Introduction

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Computer Graphics CS 543 Lecture 12 (Part 2) CS 543 Lecture 12 (Part 2) Advances in Graphics

SIMD+ Overview Illiac IV History Early machines First massively parallel (SIMD) computer

Teaching and Learning in the Digital Age: Redesigning Assessment Strategies in Norwegian Higher

Code Communication SWEN-610 Foundations of Software Engineering Department of Software

READING COMPREHENSION AND COMMUNICATIVE APPROACH THROUGH ESP MATERIALS FOR STUDENTS OF LAW

Degeneration of Bethe subalgebras in the Yangian Aleksei Ilin National Research University

restarting the movement P&amp;P Convention Objectives the bridge the penetration the

UNIMAS as a GLOBAL BRAND MOHAMAD KADIM SUAIDI KONVENSYEN PENTADBIR UNIMAS 2019 SRI AMAN 16.8.19

restarting the movement P&P Convention Objectives the bridge the penetration the