A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality - - PowerPoint PPT Presentation

▶

Dec 23, 2023 116 likes •466 views

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video Amrita Mazumdar University of Washington Armin Alaghi Jonathan T. Barron Google David Gallup Luis Ceze Mark Oskin University of Washington Steven M. Seitz 1

SLIDE 1

A Hardware-Friendly   Bilateral Solver for   Real-Time Virtual-Reality Video

Amrita Mazumdar Armin Alaghi Jonathan T. Barron David Gallup Luis Ceze Mark Oskin Steven M. Seitz

University of Washington Google University of Washington

SLIDE 2

virtual reality video with omnidirectional stereo (ODS)

SLIDE 3

the Google Jump camera rig can capture ODS video easily

16 GoPros x 4K camera feed 3.6 GB/s raw video

SLIDE 4

the Google Jump camera rig can capture ODS video easily

Anderson et al., SIGGRAPH Asia 2016

SLIDE 5

the Google Jump camera rig can capture ODS video easily

Anderson et al., SIGGRAPH Asia 2016

SLIDE 6

processing video from Google Jump is slow

1 hour of video 10 hours

n 1000 cores

Anderson et al., SIGGRAPH Asia 2016

SLIDE 7

Google Jump pipeline breakdown

sensor

download to viewer pre- processing alignment

ptical

flow

compositing

Anderson et al., SIGGRAPH Asia 2016

SLIDE 8

download to viewer pre- processing alignment

ptical

flow

compositing

12% 69% 17% 2%

the bilateral solver dominates processing time

Google Jump pipeline breakdown

Anderson et al., SIGGRAPH Asia 2016

sensor

SLIDE 9

The bilateral solver produces an image that is smooth and accurate.

input pair   (from two cameras) blocky flow field upsample into noisy flow field transform to bilateral grid and solve

utput result:

smooth flow field Anderson et al., SIGGRAPH Asia 2016

SLIDE 10

this work: a hardware-friendly bilateral solver (HFBS)

SLIDE 11

The bilateral solver is hard to parallelize

second-order global optimization global communication prevents aggressive parallelization high-dimensional, sparse matrices sparsity results in significant divergence on GPUs why not a dense grid? too large to store on-chip

SLIDE 12

Barron Poole 2016 HFBS (our work)

✅ includes color grayscale only dense matrix too big to fit in memory ✅ dense matrix fits in memory global communication required ✅ local communication only iterative bistochastization before solving ✅ partial, non-iterative bistochastization

HFBS is easier to parallelize

detailed formulation in paper

SLIDE 13

HFBS demonstrates imperceptible accuracy loss

task: Ferstl et al., ICCV 2013,   data: Middlebury stereo dataset

input image noisy depth map Barron Poole 2016 HFBS (this work)

SLIDE 14

algorithm optimizations make it easier to implement bilateral solver in parallel hardware

SLIDE 15

plan: exploit this parallelism with a custom hardware accelerator

algorithm optimizations make it easier to implement bilateral solver in parallel hardware

SLIDE 16

Mapping HFBS to hardware

download to viewer pre- processing alignment

ptical

flow

compositing

sensor

SLIDE 17

load video pair construct bilateral grid per pair perform hardware- friendly bilateral solver slice out solution into output images

CPU FPGA

Mapping HFBS to hardware

download to viewer pre- processing alignment

ptical

flow

compositing

sensor

SLIDE 18

microarchitecture

CPU main memory AXI memory interface HFBS controller z-axis memory controller z-axis memory bank z-axis memory bank z-axis memory bank bilateral filter worker bilateral filter worker bilateral filter worker memory access selector

fixed-point datapath custom memory layout

SLIDE 19

Floating-point resource requirements limit hardware parallelism

float64 32-bit fixed 64-bit fixed 47-bit fixed DSPs per worker 18 1 16 4 Maximum # workers 379 6840 427 1710 Error (MSE)

8.3 x 10-4

7.16 x 10-13 6.69 x 10-7

SLIDE 20

Fixed-point datapath conversion

Error   (MSE relative to float64) 1E-12 1E-10 1E-08 1E-06 1E-04 1E-02 Decimal Precision (Fraction of Bitwidth) 40% 50% 60% 70% 80% 90% Max Error

32 64 47

Bitwidth

SLIDE 21

z-axis slicing for bilateral grid memory layout

x:0,y:0,r:255,g:172,b:0 x:0,y:1,r:255,g:172,b:0 . . . . . x:100,y:100,r:255,g:172,b:0 z = 0

SLIDE 22

Evaluation

SLIDE 23

Evaluation

download to viewer pre- processing alignment

ptical

flow

compositing

12% 69% 17% 2%

Does HFBS improve runtime? How does parallelization affect power?

sensor

SLIDE 24

Experimental Setup

CPU: Intel Xeon E5-2620 GPU: NVIDIA GTX 1080 Ti FPGA: Xilinx Virtex Ultrascale+ Baseline: Barron Poole et al. 2016 (CPU only) 256 iterations of optimization Varied bilateral grid vertices count   ⇒ 4 KB - 1.8 GB grid sizes

SLIDE 25

HFBS is faster and more scalable than prior work.

log Runtime (ms)

0.01 1 100 10000

log Bilateral Grid Vertices

1,000 100,000 10,000,000 Prior Work (CPU) CPU GPU FPGA

SLIDE 26

30 FPS and better

log Runtime (ms)

0.01 1 100 10000

log Bilateral Grid Vertices

1,000 100,000 10,000,000 Prior Work (CPU) CPU GPU FPGA

HFBS is faster and more scalable than prior work.

SLIDE 27

HFBS-FPGA is more power-efficient than other platforms

Ops / Watt Improvement

10 20 30 40

Power-efficiency relative to prior work

30.72x 2.12x 0.45x 1.00x

Prior Work CPU GPU FPGA

SLIDE 28

building a VR video camera rig with HFBS

SLIDE 29

this work full system

SLIDE 30

HFBS-FPGA consumes much less power than a GPU for the same task

16 GPUs = 4,560 W full system 16 FPGAs = 400 W

SLIDE 31

HFBS makes real-time VR video more feasible with FPGAs

ffloaded to cloud

n-node with FPGAs

sensor

download to viewer pre- processing alignment

ptical

flow

compositing

SLIDE 32

to conclude

fast, parallel implementation of bilateral solving with little accuracy loss fixed-point datatypes and a custom bilateral-grid memory layout for improved FPGA performance hardware-software codesign to reduce latency and improve quality for future VR applications

SLIDE 33

parallel algorithm for bilateral solving FPGA architecture 50x faster, 30x more power-efficient

A Hardware-Friendly Bilateral Solver for Real-Time Virtual-Reality Video

the Google Jump camera rig can capture ODS video easily

the Google Jump camera rig can capture ODS video easily

the Google Jump camera rig can capture ODS video easily

processing video from Google Jump is slow

Google Jump pipeline breakdown

Google Jump pipeline breakdown

The bilateral solver produces an image that is smooth and accurate.

this work: a hardware-friendly bilateral solver (HFBS)

The bilateral solver is hard to parallelize

HFBS is easier to parallelize

HFBS demonstrates imperceptible accuracy loss

algorithm optimizations make it easier to implement bilateral solver in parallel hardware

plan: exploit this parallelism with a custom hardware accelerator

algorithm optimizations make it easier to implement bilateral solver in parallel hardware

Mapping HFBS to hardware

Mapping HFBS to hardware

microarchitecture

Floating-point resource requirements limit hardware parallelism

Fixed-point datapath conversion

z-axis slicing for bilateral grid memory layout

Evaluation

Evaluation

Experimental Setup

HFBS is faster and more scalable than prior work.

HFBS is faster and more scalable than prior work.

HFBS-FPGA is more power-efficient than other platforms

building a VR video camera rig with HFBS

HFBS-FPGA consumes much less power than a GPU for the same task

HFBS makes real-time VR video more feasible with FPGAs

to conclude

A Hardware-Friendly Bilateral Solver for Real-Time Virtual Reality Video

A Hardware-Friendly   Bilateral Solver for   Real-Time Virtual-Reality Video