SKYSTITCH A Cooperative Multi-UAV-based Real-time Video - - PowerPoint PPT Presentation

skystitch
SMART_READER_LITE
LIVE PREVIEW

SKYSTITCH A Cooperative Multi-UAV-based Real-time Video - - PowerPoint PPT Presentation

SKYSTITCH A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore Motivation Aerial video surveillance has become ubiquitous Search &


slide-1
SLIDE 1

SKYSTITCH

A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching

Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore

slide-2
SLIDE 2

Motivation

Aerial video surveillance has become ubiquitous

  • Search & rescue
  • TV live broadcast
  • Border monitoring
slide-3
SLIDE 3

Motivation

We always prefer… Higher resolution (More details) Larger field of view (Better awareness)

slide-4
SLIDE 4

The problem

Limited resolution Limited field of view

Single aircraft

slide-5
SLIDE 5

Increasing #aircraft?

slide-6
SLIDE 6

Increasing #aircraft?

Hard to correlate multiple videos!

slide-7
SLIDE 7

Stitch them together!

slide-8
SLIDE 8

Stitch them together!

Video Streaming

Video Stitching

slide-9
SLIDE 9

Challenges

Image stitching is computationally expensive Artefacts affect perceptual experience

slide-10
SLIDE 10

Our contributions

Sensor Hints

Distributed Architecture

GPU

H Sanity Check

Fusion

Failure Recovery

4X improvement on stitching speed Ensuring good quality under dynamic conditions

slide-11
SLIDE 11

Image stitching in a nutshell

H

  • 2. Feature matching
  • 4. Compositing
  • 3. Estimation of H
  • 1. Feature extraction

Image alignment

slide-12
SLIDE 12

Image stitching in a nutshell

H

  • 2. Feature matching
  • 4. Compositing
  • 3. Estimation of H
  • 1. Feature extraction

Image alignment

slide-13
SLIDE 13

Image stitching in a nutshell

H

  • 2. Feature matching
  • 4. Compositing
  • 3. Estimation of H
  • 1. Feature extraction

Image alignment

slide-14
SLIDE 14

Image stitching in a nutshell

H

  • 2. Feature matching
  • 4. Compositing
  • 3. Estimation of H
  • 1. Feature extraction

Image alignment

slide-15
SLIDE 15

Conventional architecture

Feature Extraction Ground Station Stitching Pipeline Feature Matching RANSAC Compositing

Video frames

slide-16
SLIDE 16

Conventional architecture

Feature Extraction Ground Station Stitching Pipeline Feature Matching RANSAC Compositing

Video frames

6ms / image Feature Extraction Feature Extraction Feature Extraction

+ Features

slide-17
SLIDE 17

Offloading feature extraction

6X faster than CPU benchmark 2X faster than GPU benchmark Scalability: constant 50 100 150 200 250 300 2 3 4 5 6 7 8 9 10 11 12 Time (ms) # Video sources CPU Benchmark GPU Benchmark SkyStitch

slide-18
SLIDE 18

Speed optimization

Feature Matching RANSAC Compositing

Video streams + Features

Ground Station Stitching Pipeline

slide-19
SLIDE 19

Speeding up feature matching

¨ Bruteforce feature matching: inefficient and error-

prone

¨ 1K features ➔1 million comparisons ➔ 16 ms

+ + + + + + + + + + + + + + + + + + + + + + + + + +

slide-20
SLIDE 20

Exploiting flight status information

Camera Attitude Camera Heading Camera Location

Accelerometer Gyroscope Compass GPS

slide-21
SLIDE 21

Exploiting flight status information

¨ Idea: estimate the matched feature’s location

+ + + + + + + + + + + + + + + + + + + + + + + + + +

slide-22
SLIDE 22

Exploiting flight status information

¨ Idea: search for the matched feature around the

estimated location

slide-23
SLIDE 23

Exploiting flight status information

¨ Idea: search for the matched feature around the

estimated location

r r < 30 pixels for a 1280x1024 image

slide-24
SLIDE 24

Speeding up feature matching

70X faster than CPU benchmark 4X faster than GPU benchmark Note: SkyStitch’s feature matching runs on CPU! Potentially much faster if implemented on GPU. 2 4 6 8 10 12 14 16 18 20 2 3 4 5 6 7 8 9 10 11 12 Time (ms) # Video sources GPU Benchmark SkyStitch

slide-25
SLIDE 25

Speed optimization

Feature Matching++ GPU RANSAC GPU Compositing Ground Station

Video streams + Features

30 ms for compositing 12 HD images 20X faster

slide-26
SLIDE 26

Putting things together…

5 10 15 20 25 30 35 40 2 3 4 5 6 7 8 9 10 11 12 Stitching rate (fps) # Video sources CPU Benchmark GPU Benchmark SkyStitch SkyStitch: 22 fps CPU Benchmark: 1.4 fps GPU Benchmark: 4.2 fps

slide-27
SLIDE 27

Speed is not everything

Perspective distortion Frame drops Perspective jerkiness

slide-28
SLIDE 28

Frame drops

¨ When we get a bad homography, we have to drop

the frame

Frame n Frame n

Hn

Stitched frame n+1

×

L

slide-29
SLIDE 29

Failure recovery

¨ Instead of dropping the frame, we predict a good

  • ne

Hn

F

2,n+1

F

1,n+1

Computing an optical flow homography F on each UAV

Hn+1 = M2,n+1F

2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1

(M is the orthorectification matrix) Predict Hn+1 : Stitched frame n+1

×

L

Frame n Frame n+1 UAV 1 Frame n Frame n+1 UAV 2

slide-30
SLIDE 30

Failure recovery

¨ Instead of dropping the frame, we predict a good

  • ne

Hn

F

2,n+1

F

1,n+1

Computing an optical flow homography F on each UAV

Hn+1 = M2,n+1F

2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1

(M is the orthorectification matrix) Predict Hn+1 : Stitched frame n+1

J

Frame n Frame n+1 UAV 1 Frame n Frame n+1 UAV 2

Hn+1

slide-31
SLIDE 31

Perspective jerkiness

slide-32
SLIDE 32

Frame 1 Frame 1

H1 R+tnT

Roll, pitch, yaw

Quantifying jerkiness

slide-33
SLIDE 33

Frame 1 Frame 1 Frame 2 Frame 2 Frame 3 Frame 3 Frame 4 Frame 4

H1 H2 H3 H4

Stitching each pair of frame

slide-34
SLIDE 34

Stitching is noisy

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames

Pitch angles in estimated homographies

Only Stitching

slide-35
SLIDE 35

Observation

Frame n Frame n Frame n+1 Frame n+1

Hn+1 Hn

Stitched frame n Stitched frame n+1

F

2,n+1

F

1,n+1

We have two homography solutions for a pair of frames One is from stitching The other is from prediction

slide-36
SLIDE 36

Frame 1 Frame 1 Frame 2 Frame 2 Frame 3 Frame 3 Frame 4 Frame 4

H1 H2 H3 H4

Keep doing prediction

Stitching Prediction Prediction Prediction

slide-37
SLIDE 37

Optical flow is drifty

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames

Pitch angles in estimated homographies

Only Optical Flow

slide-38
SLIDE 38

Stitching vs. Prediction

Short Term Long Term Stitching Noisy Stable Prediction Smooth Drifty

slide-39
SLIDE 39

R2 t2 n2

Idea: fuse them

Hstitching Hprediction R1 t1 n1

Multiplicative Extended Kalman Filter

R’

Solve Translation Matched features

t’

H’

slide-40
SLIDE 40

Fusion

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames

Pitch angles in estimated homographies

Only Stitching Only Optical Flow Fused

slide-41
SLIDE 41
slide-42
SLIDE 42

Implementation

UAVs: Two DIY Quadcopters Ground station: A single Linux desktop 16k lines of C/C++

DIY Quadcopter ($1200 USD each)

slide-43
SLIDE 43

Demo 1

Demo 1

Synchronized shutter Very strong wind UAV 1 UAV 2 Camera views

slide-44
SLIDE 44

Demo 2

Demo 2

Simulation of stitching 12 video streams

slide-45
SLIDE 45

Conclusion & Future work

Real-time performance (20 fps @ 12 videos) High quality (High success rate, low jerkiness) Present Future Test in more complex scenarios Maximizing field of view Network optimization

slide-46
SLIDE 46

Thank you!

slide-47
SLIDE 47

Unused slides

slide-48
SLIDE 48
slide-49
SLIDE 49

Performance of existing solutions

H

  • 2. Feature matching
  • 4. Compositing
  • 3. Estimation of H
  • 1. Feature extraction

Image alignment

6 ms per image (GPU) 3 ms per image pair (GPU) 11 ms per image pair (CPU)

Test setup: Intel Core i7 2600K; GeForce GTX 670; OpenCV 2.4.8 with CUDA; 1000 features per image

slide-50
SLIDE 50

Performance of existing solutions

¨ Each stage could be a computational bottleneck ¨ Optimize each stage one by one

slide-51
SLIDE 51

Offloading feature extraction

Feature Matching RANSAC Compositing

Video streams + Features

Ground Station Stitching Pipeline

slide-52
SLIDE 52

Video frames

R

Quadcopter attitude

Exploiting flight status information

Flight controller Camera

Warp each image as if camera is always pointed vertically downwards

Cameras can be tilted due to wind turbulence Orthorectified video frames

M = KBR−1B−1K −1

Orthorectification

slide-53
SLIDE 53

Speed optimization

Feature Matching++ RANSAC Compositing Ground Station

Video streams + Features

slide-54
SLIDE 54

Speeding up RANSAC

¨ Existing RANSAC homography estimator

¤ Each iteration: Solve a 4-point homography

n Do SVD on a 9x9 matrix to find the eigenvector

corresponding to eigenvalue zero.

¤ Takes 11 ms for 512-iteration RANSAC on a 3.4GHz

Core i7

¤ GPU SVD?

n SVD is not well suited for GPU architecture (shown later)

slide-55
SLIDE 55

Speeding up RANSAC

¨ Idea: no need to do SVD at all!

¤ Just find the null vector for the 9x9 matrix ¤ Gauss-Jordan elimination is sufficient

n Well suited to GPU architecture

n Much simpler code n No branching

n Takes 0.6 ms for 512-iteration RANSAC

slide-56
SLIDE 56

Speeding up RANSAC

¨ Idea: maximize parallelism and minimize I/O

¤ Compute ALL pairwise homographies in one pass

H H H H H 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 ... ... ... ...

q p q p q p q p q p q p q p q p q p

Uploaded from CPU Uploaded from CPU

...

Reprojection Error

Pair Pair

Scores Inlier mask All matches correspondences 4−point homographies Candidate

Gauss Jordan Elimination

Pair

...

Pair

Downloaded Downloaded from GPU from GPU

... ... ...

i1, j1 ik, jk i1, j1 ik, jk Hn−1 Hn−1 H0 H0

slide-57
SLIDE 57

Speeding up RANSAC

20 40 60 80 100 120 140 160 180 2 4 6 8 10 12 RANSAC time (ms) Number of video sources benchmark, CPU benchmark, GPU SkyStitch

24X faster than CPU benchmark 18X faster than GPU benchmark based on Jacobian SVD (512 RANSAC iterations)

slide-58
SLIDE 58

Multiple video sources

Camera 1 Camera 3

H1 H2 H3 H4

Camera 4 Camera 2

slide-59
SLIDE 59

Multiple video sources

Between Camera 1 and 4 Solution #1: Solution #2: Which one to choose? Camera 1 Camera 3

H1 H2 H3 H4

Camera 4 Camera 2

H3H2H1 H4

−1

slide-60
SLIDE 60

Multiple video sources

Ideally: In reality, often not. Camera 1 Camera 3

H1 H2 H3 H4

Camera 4 Camera 2

I ~ H4H3H2H1

slide-61
SLIDE 61

Multiple video sources

slide-62
SLIDE 62

Closing the loop

¨ Classical method: bundle adjustment ¨ But it has two limitations:

¤ Does not work if pairwise stitching fails and is

predicted

¤ Does not consider temporal information (jerkiness can

be introduced)

slide-63
SLIDE 63

Closing the loop

¨ Idea: distribute the error into each H in the loop

I ~ HH4H3H2H1

slide-64
SLIDE 64

Closing the loop

¨ Decompose H into a chain of elementary

transformations H = T 4S4D4R4 I ~ T 4S4D4R4H4H3H2H1 I ~ T 4S4D4R3(H4H3H2H2

−1H3 −1H4 −1)RH4H3H2H1

I ~ T 4S4D4R3H4H3H2(H2

−1H3 −1H4 −1RH4H3H2)H1

Identity matrix

Rotational error reduced by 1/4

The new H1

slide-65
SLIDE 65

Closing the loop

slide-66
SLIDE 66

Exploiting flight status information

¨ With orthorectified images and GPS positions,

world location of each pixel can be estimated

+ + + + + + + + + + + + + + + + + + + + + + + + + +

slide-67
SLIDE 67

Rejecting bad homographies

¨ If the number of inliers is too few (e.g., < 30), reject

H

slide-68
SLIDE 68

Rejecting bad homographies

¨ If the features are not well spread out, it also leads

to large perspective errors

slide-69
SLIDE 69

Rejecting bad homographies

¨ Intuition: after orthorectification, image planes are

approximately parallel to each other, but there is some perspective residue

Image 1 Image 2

slide-70
SLIDE 70

Rejecting bad homographies

¨ Intuition: the perspective residue is bounded

Image 1 Image 2

slide-71
SLIDE 71

Rejecting bad homographies

¨ Intuition: the perspective residue is bounded

Image 1

θ H = R+tnT

Do homography decomposition

cos(θ) = r

33

Reject H if |θ| > a threshold (5° in our case)

slide-72
SLIDE 72

Real-world evaluation

¨ Two different ground scenarios

¤ Grass field

n Features are abundant and similar

¤ Running track

n Features are scarce and unevenly distributed

slide-73
SLIDE 73

Failure recovery

Maintaining high success rate by prediction

50 55 60 65 70 75 80 85 90 95 100 19 30 42 53 64 77 89 Success rate (%) Overlap percentage (%) No prediction With prediction

slide-74
SLIDE 74

Real-world evaluation

¨ Grass field

¤ Features are abundant and similar

slide-75
SLIDE 75

Real-world evaluation

¨ Running track

¤ Features are scarce and unevenly distributed

slide-76
SLIDE 76

Challenge #1

¨ Image stitching is computationally expensive

¤ No image stitching solution can stitch 10+ dynamic

aerial videos at real-time frame rate

¤ Those that can support real-time frame rate require

cameras to be in fixed relative positions

360-degree camera

slide-77
SLIDE 77

Challenge #2

¨ Stitching dynamic aerial videos can introduce

perceptual artefacts

¤ Perspective distortion ¤ Perspective jerkiness ¤ Frame drops

slide-78
SLIDE 78

Failure recovery

¨ When we get a bad homography, we have to drop

the frame

Frame n+1 Frame n+1

Hn+1

Stitched frame n+1

×

L

slide-79
SLIDE 79

Offloading feature extraction

NVIDIA Jetson TK1 On-board computer 35 ms to extract 1000 ORB features

slide-80
SLIDE 80

Stitch them together!

Ground station V i d e

  • s

t r e a m i n g Video capturing Video capturing Video stitching

slide-81
SLIDE 81

Failure recovery

¨ Instead of dropping the frame, we predict a good

  • ne

Frame n Frame n

Hn

Stitched frame n

F

2,n+1

F

1,n+1

Computing an optical flow homography F on each UAV

Hn+1 = M2,n+1F

2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1

(M is the orthorectification matrix) Predict Hn+1 : Frame n+1 Frame n+1

Hn+1

Stitched frame n+1

×

L

slide-82
SLIDE 82

Challenges

¨ Image stitching is computationally expensive

¤ Need to stitch 10+ HD videos in real-time!

¨ Stitching dynamic aerial videos can introduce

perceptual artefacts