SKYSTITCH
A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching
Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore
SKYSTITCH A Cooperative Multi-UAV-based Real-time Video - - PowerPoint PPT Presentation
SKYSTITCH A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore Motivation Aerial video surveillance has become ubiquitous Search &
Xiangyun Meng, Wei Wang and Ben Leong National University of Singapore
We always prefer… Higher resolution (More details) Larger field of view (Better awareness)
Limited resolution Limited field of view
Video Streaming
Video Stitching
Image stitching is computationally expensive Artefacts affect perceptual experience
Sensor Hints
Distributed Architecture
GPU
H Sanity Check
Fusion
Failure Recovery
4X improvement on stitching speed Ensuring good quality under dynamic conditions
H
Image alignment
H
Image alignment
H
Image alignment
H
Image alignment
Feature Extraction Ground Station Stitching Pipeline Feature Matching RANSAC Compositing
Video frames
Feature Extraction Ground Station Stitching Pipeline Feature Matching RANSAC Compositing
Video frames
6ms / image Feature Extraction Feature Extraction Feature Extraction
+ Features
6X faster than CPU benchmark 2X faster than GPU benchmark Scalability: constant 50 100 150 200 250 300 2 3 4 5 6 7 8 9 10 11 12 Time (ms) # Video sources CPU Benchmark GPU Benchmark SkyStitch
Feature Matching RANSAC Compositing
Video streams + Features
Ground Station Stitching Pipeline
¨ Bruteforce feature matching: inefficient and error-
¨ 1K features ➔1 million comparisons ➔ 16 ms
+ + + + + + + + + + + + + + + + + + + + + + + + + +
Camera Attitude Camera Heading Camera Location
Accelerometer Gyroscope Compass GPS
¨ Idea: estimate the matched feature’s location
+ + + + + + + + + + + + + + + + + + + + + + + + + +
¨ Idea: search for the matched feature around the
¨ Idea: search for the matched feature around the
r r < 30 pixels for a 1280x1024 image
70X faster than CPU benchmark 4X faster than GPU benchmark Note: SkyStitch’s feature matching runs on CPU! Potentially much faster if implemented on GPU. 2 4 6 8 10 12 14 16 18 20 2 3 4 5 6 7 8 9 10 11 12 Time (ms) # Video sources GPU Benchmark SkyStitch
Feature Matching++ GPU RANSAC GPU Compositing Ground Station
Video streams + Features
30 ms for compositing 12 HD images 20X faster
5 10 15 20 25 30 35 40 2 3 4 5 6 7 8 9 10 11 12 Stitching rate (fps) # Video sources CPU Benchmark GPU Benchmark SkyStitch SkyStitch: 22 fps CPU Benchmark: 1.4 fps GPU Benchmark: 4.2 fps
Perspective distortion Frame drops Perspective jerkiness
¨ When we get a bad homography, we have to drop
Frame n Frame n
Hn
Stitched frame n+1
¨ Instead of dropping the frame, we predict a good
Hn
F
2,n+1
F
1,n+1
Computing an optical flow homography F on each UAV
Hn+1 = M2,n+1F
2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1
(M is the orthorectification matrix) Predict Hn+1 : Stitched frame n+1
Frame n Frame n+1 UAV 1 Frame n Frame n+1 UAV 2
¨ Instead of dropping the frame, we predict a good
Hn
F
2,n+1
F
1,n+1
Computing an optical flow homography F on each UAV
Hn+1 = M2,n+1F
2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1
(M is the orthorectification matrix) Predict Hn+1 : Stitched frame n+1
Frame n Frame n+1 UAV 1 Frame n Frame n+1 UAV 2
Hn+1
Frame 1 Frame 1
H1 R+tnT
Roll, pitch, yaw
Frame 1 Frame 1 Frame 2 Frame 2 Frame 3 Frame 3 Frame 4 Frame 4
H1 H2 H3 H4
0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames
Pitch angles in estimated homographies
Only Stitching
Frame n Frame n Frame n+1 Frame n+1
Hn+1 Hn
Stitched frame n Stitched frame n+1
F
2,n+1
F
1,n+1
We have two homography solutions for a pair of frames One is from stitching The other is from prediction
Frame 1 Frame 1 Frame 2 Frame 2 Frame 3 Frame 3 Frame 4 Frame 4
H1 H2 H3 H4
Stitching Prediction Prediction Prediction
0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames
Pitch angles in estimated homographies
Only Optical Flow
R2 t2 n2
Hstitching Hprediction R1 t1 n1
Multiplicative Extended Kalman Filter
R’
Solve Translation Matched features
t’
0.5 1 1.5 2 2.5 3 50 100 150 200 250 Angle (deg) Frames
Pitch angles in estimated homographies
Only Stitching Only Optical Flow Fused
UAVs: Two DIY Quadcopters Ground station: A single Linux desktop 16k lines of C/C++
DIY Quadcopter ($1200 USD each)
Synchronized shutter Very strong wind UAV 1 UAV 2 Camera views
Simulation of stitching 12 video streams
Real-time performance (20 fps @ 12 videos) High quality (High success rate, low jerkiness) Present Future Test in more complex scenarios Maximizing field of view Network optimization
H
Image alignment
6 ms per image (GPU) 3 ms per image pair (GPU) 11 ms per image pair (CPU)
Test setup: Intel Core i7 2600K; GeForce GTX 670; OpenCV 2.4.8 with CUDA; 1000 features per image
¨ Each stage could be a computational bottleneck ¨ Optimize each stage one by one
Feature Matching RANSAC Compositing
Video streams + Features
Ground Station Stitching Pipeline
Video frames
R
Quadcopter attitude
Flight controller Camera
Warp each image as if camera is always pointed vertically downwards
Cameras can be tilted due to wind turbulence Orthorectified video frames
M = KBR−1B−1K −1
Orthorectification
Feature Matching++ RANSAC Compositing Ground Station
Video streams + Features
¨ Existing RANSAC homography estimator
¤ Each iteration: Solve a 4-point homography
n Do SVD on a 9x9 matrix to find the eigenvector
corresponding to eigenvalue zero.
¤ Takes 11 ms for 512-iteration RANSAC on a 3.4GHz
Core i7
¤ GPU SVD?
n SVD is not well suited for GPU architecture (shown later)
¨ Idea: no need to do SVD at all!
¤ Just find the null vector for the 9x9 matrix ¤ Gauss-Jordan elimination is sufficient
n Well suited to GPU architecture
n Much simpler code n No branching
n Takes 0.6 ms for 512-iteration RANSAC
¨ Idea: maximize parallelism and minimize I/O
¤ Compute ALL pairwise homographies in one pass
H H H H H 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 ... ... ... ...
q p q p q p q p q p q p q p q p q p
Uploaded from CPU Uploaded from CPU
...
Reprojection Error
Pair Pair
Scores Inlier mask All matches correspondences 4−point homographies Candidate
Gauss Jordan Elimination
Pair
...
Pair
Downloaded Downloaded from GPU from GPU
... ... ...
i1, j1 ik, jk i1, j1 ik, jk Hn−1 Hn−1 H0 H0
20 40 60 80 100 120 140 160 180 2 4 6 8 10 12 RANSAC time (ms) Number of video sources benchmark, CPU benchmark, GPU SkyStitch
24X faster than CPU benchmark 18X faster than GPU benchmark based on Jacobian SVD (512 RANSAC iterations)
Camera 1 Camera 3
Camera 4 Camera 2
Between Camera 1 and 4 Solution #1: Solution #2: Which one to choose? Camera 1 Camera 3
Camera 4 Camera 2
H3H2H1 H4
−1
Ideally: In reality, often not. Camera 1 Camera 3
Camera 4 Camera 2
I ~ H4H3H2H1
¨ Classical method: bundle adjustment ¨ But it has two limitations:
¤ Does not work if pairwise stitching fails and is
predicted
¤ Does not consider temporal information (jerkiness can
be introduced)
¨ Idea: distribute the error into each H in the loop
¨ Decompose H into a chain of elementary
−1H3 −1H4 −1)RH4H3H2H1
−1H3 −1H4 −1RH4H3H2)H1
Identity matrix
Rotational error reduced by 1/4
The new H1
¨ With orthorectified images and GPS positions,
+ + + + + + + + + + + + + + + + + + + + + + + + + +
¨ If the number of inliers is too few (e.g., < 30), reject
¨ If the features are not well spread out, it also leads
¨ Intuition: after orthorectification, image planes are
Image 1 Image 2
¨ Intuition: the perspective residue is bounded
Image 1 Image 2
¨ Intuition: the perspective residue is bounded
Image 1
Do homography decomposition
33
Reject H if |θ| > a threshold (5° in our case)
¨ Two different ground scenarios
¤ Grass field
n Features are abundant and similar
¤ Running track
n Features are scarce and unevenly distributed
Maintaining high success rate by prediction
50 55 60 65 70 75 80 85 90 95 100 19 30 42 53 64 77 89 Success rate (%) Overlap percentage (%) No prediction With prediction
¨ Grass field
¤ Features are abundant and similar
¨ Running track
¤ Features are scarce and unevenly distributed
¨ Image stitching is computationally expensive
¤ No image stitching solution can stitch 10+ dynamic
aerial videos at real-time frame rate
¤ Those that can support real-time frame rate require
cameras to be in fixed relative positions
360-degree camera
¨ Stitching dynamic aerial videos can introduce
¤ Perspective distortion ¤ Perspective jerkiness ¤ Frame drops
¨ When we get a bad homography, we have to drop
Frame n+1 Frame n+1
Hn+1
Stitched frame n+1
NVIDIA Jetson TK1 On-board computer 35 ms to extract 1000 ORB features
Ground station V i d e
t r e a m i n g Video capturing Video capturing Video stitching
¨ Instead of dropping the frame, we predict a good
Frame n Frame n
Hn
Stitched frame n
F
2,n+1
F
1,n+1
Computing an optical flow homography F on each UAV
Hn+1 = M2,n+1F
2,n+1M2,n −1 HnM1,nF 1,n+1 −1 M1,n+1 −1
(M is the orthorectification matrix) Predict Hn+1 : Frame n+1 Frame n+1
Hn+1
Stitched frame n+1
¨ Image stitching is computationally expensive
¤ Need to stitch 10+ HD videos in real-time!
¨ Stitching dynamic aerial videos can introduce