SITOLA Network Performing Arts Production Workshop 20130312 1/32 - - PowerPoint PPT Presentation

sitola
SMART_READER_LITE
LIVE PREVIEW

SITOLA Network Performing Arts Production Workshop 20130312 1/32 - - PowerPoint PPT Presentation

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts... UltraGrid: Low-Latency High-Quality Video Transmissions on Commodity Hardware Petr Holub CESNET z.s.p.o., Prague/Brno, Czech Republic


slide-1
SLIDE 1

1/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid: Low-Latency High-Quality Video Transmissions on Commodity Hardware

Petr Holub CESNET z.s.p.o., Prague/Brno, Czech Republic

<Petr.Holub@cesnet.cz>

SITOLA

Network Performing Arts Production Workshop 2013–03–12

slide-2
SLIDE 2

2/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

  • Technology

◾ an affordable platform for high-quality interactive image transmissions ◾ use of commodity hardware

◆ Linux PC and Mac platforms ◆ commodity video capture cards ◆ commodity GPU cards ◆ 10GE is a plus but not necessary

◾ as low latency as possible on commodity hardware ◾ open-source software, BSD license ◾ a platform for implementing research results (not just ours! :) )

◆ compression & image processing, FEC, scheduling, congestion control...

slide-3
SLIDE 3

3/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Applications of UltraGrid

  • Generic scientific visualization
  • Medicine

◾ X-ray imagery, cardiology, pathology

slide-4
SLIDE 4

4/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Applications of UltraGrid

  • Education

◾ remote education

slide-5
SLIDE 5

5/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Applications of UltraGrid

  • Cinematography

Detached BaseLight consoles at CinePost (Barrandov, CZ)

slide-6
SLIDE 6

6/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Applications of UltraGrid

  • Arts

◾ distributed performances: music, theater

slide-7
SLIDE 7

7/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

  • History of Development

◾ 2002–2004: ISI EAST (720p) ◾ 2005–now: CESNET (→ 1080i) ◾ 2006–2008: forks by KISTI (AJA KONA) and i2cat (SAGE) ◾ 2012–now: i2cat (H.264)

  • Some milestones

◾ 2002: 720p ◾ 2005: 1080i, multipoint ◾ 2007: CPU compressions, self-organization, optical multicast ◾ 2008: 2K/4K ◾ 2011: GPU compressions ◾ 2012: 8K

slide-8
SLIDE 8

8/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

  • Supported formats

◾ HD, 2K ◾ 4K – tiled or native ◾ 8K – new ◾ multichannel video (e.g., 3D HD, 4K)

  • Uncompressed vs. compressed

◾ low-latency compression ◾ GLSL-accelerated DXT1, DXT5-YCoCg ◾ CUDA-accelerated JPEG, DXT5-YCoCg ◾ CPU-based DXT1, ffmpeg (e.g., H.264)

  • Supported audio formats

◾ uncompressed, multi-channel

slide-9
SLIDE 9

9/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

  • I/O

◾ capture/playback cards: HD-SDI, SDI, HDMI, analog HD and SD

◆ manufacturers’ SDKs, Video4Linux2, QuickTime

◾ screen capture input ◾ computer screen output (OpenGL, SDL) ◾ SAGE output ◾ specialized display filters ◾ stereoscopic HDMI 1.4a

  • Full-duplex operation
  • Simple GUI

◾ QT-based, native MacOS ◾ permanent storage of configuration ◾ simple startup + advanced configuration dialog

Line-interlaced stereoscopic video

slide-10
SLIDE 10

10/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

GUI on MacOS X

slide-11
SLIDE 11

11/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

GUI on Linux

slide-12
SLIDE 12

12/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

UltraGrid Platform

  • Audio

◾ balanced, unbalanced, HD-SDI, HDMI ◾ various system interfaces including JACK ◾ PortAudio, ALSA, CoreAudio, JACK ◾ embedded HD-SDI/HDMI ◾ simple mono software echo canceler based on Speex ◾ channel mixer/duplicator

slide-13
SLIDE 13

13/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • Available compression schemes

◾ DXT1: CPU-based (FastDXT library from EVL) ◾ DXT1, DXT5: OpenGL Shader Language (GLSL) based ◾ JPEG: NVidia CUDA based ◾ DXT5: NVidia CUDA based (for 8K)

SAGE display with various compressions

slide-14
SLIDE 14

14/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • Fine-grained parallelization of JPEG

◾ per-row/column DCT/IDCT ◾ per pixel RLE ◾ per pixel Huffman ◾ parallel stream compacting ◾ parallel decompression using restart intervals ◾ use of auxiliary indexes for more efficient parsing

  • Available also as BSD-licensed open-source library:

http://gpujpeg.sf.net/

slide-15
SLIDE 15

15/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • Fine-grained parallelization of JPEG

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 | __ballot(even)

tid (= thread ID) DC coefficient is always treated as non-zero. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

__ballot(odd) bitwise OR tmp = __clz(map & mask); pzc = 2*(tmp - (32 - tid)); if ((0x80000000 >> tmp) > (map_o & mask)) {pzc++;} pzc =

0 0 0 0 1 0 0 1 3 0 0 1 3 0 0 0 0 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22

Decompose to zeros before even and odd elements.

0 0 0 0 1 0 0 1 3 0 0 1 3 0 0 0 0 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 1 0 0 2 0 0 2 4 0 0 2 0 0 0 0 0 0 0 0 1 3 5 7 9 11 13 15 17 19 21 23

pzc (even==0) ? pzc+1 : 0

slide-16
SLIDE 16

16/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • Performance numbers (including transfer to/from GPU)

◾ DXT1 GLSL: 798 Mpix/s (NVidia 580GTX), 593 Mpix/s (ATI 6990) ◾ DXT5 GLSL: 349 Mpix/s (NVidia 580GTX), 305 Mpix/s (ATI 6990) ◾ JPEG CUDA: up to 1.580 Mpix/s = 4.740 MB/s (NVidia 580GTX, 4:4:4, Q=60) ◾ DXT5 CUDA: ≥1.580 Mpix/s (NVidia 580GTX)

slide-17
SLIDE 17

17/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • JPEG performance

20 40 60 80 100 3 6 9 12 Quality Duration [ms]

(a) Encoder performance (GPU only)

20 40 60 80 100 3 6 9 12 Quality Duration [ms] 1080p 1080p 4:2:0 2160p 2160p 4:2:0

(b) Decoder performance (GPU only)

20 40 60 80 100 5 10 15 20 Quality Duration [ms]

(c) Encoder performance (both CPU and GPU)

20 40 60 80 100 5 10 15 20 Quality Duration [ms]

(d) Decoder performance (both CPU and GPU)

slide-18
SLIDE 18

18/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

GPU-Accelerated Compression

  • Performance of JPEG stages for 2160p video

20 40 60 80 100 2 4 6 8 Quality duration [ms] non-interleaved non-subsampled 20 40 60 80 100 2 4 6 8 Quality duration [ms] interleaved subsampled Copy to/from GPU Preprocessor DCT & Quantization Huffman Encoder Stream Formatter

(a) for JPEG encoder

20 40 60 80 100 2 4 6 8 Quality duration [ms] non-interleaved non-subsampled 20 40 60 80 100 2 4 6 8 Quality duration [ms] interleaved subsampled Copy to/from GPU Stream Parser Huffman Decoder DCT & Quantization Postprocessor

(b) for JPEG decoder

slide-19
SLIDE 19

19/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Forward Error Correction

  • LDGM

◾ CPU and GPU implementations ◾ CPU (SSE optimized) is used because of CPU↔GPU transmissions overhead ◾ packet loss up to 10% can be mitigated with reasonable

  • verhead

◾ can make JPEG survive up to 25% packet loss

  • Simple method: shifted multiplication
slide-20
SLIDE 20

20/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Latency

  • Latency limits

◾ <150 ms for interactivity: ITU-T rec G.114

  • End-to-end latency

◾ in a local network ◾ measured using video (1/60 s quantization) ◾ depends substantially on hardware cards used (2.0–5.0 frames) ◾ Bluefish444 should get us much lower: line-by-line API for HD-SDI ◾ application-level traffic shaping to control bursts

  • Uncompressed for DeckLink HD → DeltaCast 3G

◾ 2.5 frames (83 ms)

  • Impact of compressions

◾ 2.5 frames (+<16.7 ms) for CUDA JPEG ◾ 3.5 frames (+33.3 ms) for GLSL DXT1/5

slide-21
SLIDE 21

21/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

User-Empowered Multi-Point Distribution

  • UltraGrid supports multicast, but...

◾ how available/dependable it is?

  • UDP packet reflectors

◾ controlled by the user ◾ lower efficiency ◾ possible per-user processing: transcoding, security,...

  • Self-organization of the network

◾ scheduling streams with bitrates comparable to capacity of links ◾ CoUniverse framework (http://couniverse.sitola.cz) ◾ constraints, MIP, local search

slide-22
SLIDE 22

22/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Users Worldwide

SourceForge stats

  • source, binaries (http://ultragrid.sitola.cz/)
  • embedded in SAGE (http://www.sagecommons.org/)
  • Czech Republic (universities and university hospitals), USA

(UCSD, UMich, UIC, Internet2, NLM/NIH, NorthwesternU, ...), Spain (i2cat, UPM), Portugal (FCCN), Netherlands (SARA), Poland (PSNC), Korea (KISTI), Russia, ...

slide-23
SLIDE 23

23/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Recent Updates

Since November 2012

  • ffmpeg support – low latency H.264

◾ 150% CPU core for HD, well usable at >18 Mb/s ◾ 4K being examined ◾ due to licensing issues, we don’t interface directly to X264 and leave it up to the user (GPL is viral and would propagate upstream)

  • Windows port (almost done)

◾ OpenGL, SDL displays ◾ native BlackMagic SDK ◾ DirectShow capture

slide-24
SLIDE 24

24/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Recent Updates

Since November 2012

  • Support for DELTACAST DVI-I/DVI-D grabbers

◾ ideal for content capture, computer screen resolutions ◾ supports multiple cards (e.g., 6x DVI-I in in a single PC)

  • File-based I/O

◾ input/output of raw data ◾ can be piped into mencoder (but not very convenient) ◾ planned integration with further processing (e.g., GStreamer) for lecture/event/experiment recording, etc.

  • Transcoding reflectors

◾ change of formats “along the way” , as a part of multi-point data distribution ◾ implemented using UltraGrid as backend ◾ intended for automated setup with CoUniverse (later in 2013)

slide-25
SLIDE 25

25/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Recent Updates

Since November 2012

  • Integration of 2-camera GColl

◾ group-to-group communication with partial gaze awareness

slide-26
SLIDE 26

26/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Future Plans

  • Short-term:

◾ Advanced multi-point with scheduling (release with CoUniverse) ◾ Software processor for multi-channel video

  • Long-term:

◾ Acceleration of low-latency H.264/H.265 ◾ New compression formats for specific purposes (e.g., SAGE)

slide-27
SLIDE 27

27/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

World Firsts...

  • 2005 – Multi-Point Uncompressed HD

◾ n-way using packet reflectors

Brno Chicago San Diego Baton Rouge

C M Y CM MY CY CMY K

intro.pdf฀฀฀10.10.2005฀฀฀19:43:14 intro.pdf฀฀฀10.10.2005฀฀฀19:43:14

slide-28
SLIDE 28

28/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

World Firsts...

  • 2007 – Self-Organizing Multi-Point

Uncompressed/Compressed HD

◾ with CoUniverse ◾ self-organizing multi-point distribution setup with uncompressed/DXT1 compression switching based on available bandwidth

slide-29
SLIDE 29

29/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

World Firsts...

  • 2011 – GPU-JPEG Transatlantic 4K

◾ CineGrid Workshop, December 2011 ◾ real-time movie post-production review/approval process ◾ playback on a machine worth $1.000 ($500 PC + $500 NVIDIA 580GTX)

slide-30
SLIDE 30

30/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

World Firsts...

  • 2012 – GPU-JPEG Transatlantic Multi-Point 8K

◾ from pre-rendered sources ◾ JPEG → DXT5-YCoCg on a single machine ◾ useful also as 16× HD (multi-camera setups)

slide-31
SLIDE 31

31/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Selected Papers

[1] HOLUB, Petr - MATYSKA, Luděk - LIŠKA, Miloš - HEJTMÁNEK, Lukáš - DENEMARK, Jiří - REBOK, Tomáš - HUTANU, Andrei - PARUCHURI, Ravi - RADIL, Jan - HLADKÁ, Eva. High-definition multimedia for multiparty low-latency interactive communication. Future Generation Computer Systems, Amsterdam, The Netherlands, Elsevier Science, The

  • Nederlands. ISSN 0167-739X, 2006, vol. 22, no. 8, pp. 856–861.

[2] MATELA, Jiří - RUSŇÁK, Vít - HOLUB, Petr. Efficient JPEG2000 EBCOT Context Modeling for Massively Parallel Architectures. In Data Compression Conference (DCC), 2011. Washington, DC, USA : IEEE Computer Society, 2011. ISBN 978-0-7695-4352-9, pp. 423–432. 2011, Snowbird, Utah, USA. [3] MATELA, Jiří - ŠROM, Martin - HOLUB, Petr. Low GPU Occupancy Approach to Fast Arithmetic Coding in JPEG2000. Mathematical and Engineering Methods in Computer Science, Lecture Notes in Computer Science, Heidelberg, Springer Berlin / Heidelberg,

  • Germany. ISSN 0302-9743, 2011, vol. 2012, no. 7119, pp. 136–145.

[4] HOLUB, Petr - ŠROM, Martin - PULEC, Martin - MATELA, Jiří - JIRMAN Martin. GPU-Accelerated DXT and JPEG Compression Schemes for Low-Latency Network Transmissions of HD, 2K, and 4K Video. Future Generation Computer Systems. Submitted 2012.

slide-32
SLIDE 32

32/32

UltraGrid Platform GPU Acceleration Latency Distribution Updates & Plans World Firsts...

Thank you for your attention!

<petr.holub@cesnet.cz> <ultragrid-dev@cesnet.cz> This work is supported by LM2010005 project.