Efficient Deep Vision for Aerial Visual Understanding Dr Christos - - PowerPoint PPT Presentation
Efficient Deep Vision for Aerial Visual Understanding Dr Christos - - PowerPoint PPT Presentation
Efficient Deep Vision for Aerial Visual Understanding Dr Christos Kyrkou KIOS Research and Innovation Center of Excellence, University of Cyprus KIOS Seminar Series, 01/06/2020 kyrkou.christos@ucy.ac.cy funded by: @ChristosKyrkou
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
2
Computer Vision (CV) finally works. Now What?
Similarly large accuracy improvements on tasks such as
- Semantic Segmentation
- Object Detection
- 3D reconstruction
- …and so on
Mostly Deeper Networks Intricate Structures Millions of training images
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
3
CV/DL Deployment accelerating Rapidly
Image Sensor Mobile PC/Workstation Cloud
Benefits ! Security/Privacy Cost Saving Requirements: Less Power Consumption Less Memory Usage Fast Response
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
4
Markets Demands Scalability for Machine Learning
1000s of classes Large Workloads Highly Efficient
- (Performance/W)
Varying Accuracy Server Form Factor <10 classes Frame Rate: 15-30 fps Power 1W-5W Cost: Low Varying Accuracy Custom Form Factor
Cloud Analytics Edge Intelligence
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
5
Small Models have Big Advantages #1
Fewer parameter weights means bigger opportunities for scaling training Bigger networks increase the cost of communication between machines for distributed training
Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
6
Small Models have Big Advantages #2
Smaller number of weights enables complete on-chip integration
- f CNN model with weights – no need for off-chip memory
- Dramatically reduces the energy for computing inference
- Gives the potential for pushing the data-processing close to the
data gathering (e.g., onboard cameras and other sensors)
- Limited memory of embedded devices makes small models
absolutely essential for many applications.
Credit: Song Han “Bandwidth-Efficient Deep Learning ——from Compression to Acceleration”
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
7
Small Models Have Big Advantages #3
Small models enable continuous wireless updates of models Each time any sensor discovers a new image/situation that requires retraining, all models should be updated. Data is uploaded to cloud and used for training But… how to update all the vehicles that are running all the model? At <500KB downloading new model parameters is easy.
Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”
Continuous Updating of CNN Models
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
8
Model + Hardware Specialization
Credit: Song Han, Hardware Design Automation for Efficient Deep Learning, Samsung Forum
- Convolution, ReLU and Pooling
- perations are inherently highly
parallel in nature
- They are best accelerated by
dedicated hardware in the FPGA But how much Convolution, ReLU and Pooling operations is needed?
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
9
Application of small DNNs to UAVs
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
10
Challenges
State-of-the-art CV algorithms often require extensive hardware: limited payload! Remote processing of images: solution?
- Use of ground station
- High bandwidth, minimal latency, ultra reliable
connection
- Severe limitations!
- (especially when targeting autonomous UAVs!)
On-board processing: specific inherent challenges
- Limited computational power
- Limited weight, power consumption
Extreme optimization of HW and SW is the solution for on-board processing!
Contradiction!
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
11
Autonomous patrolling and recognition
…
Image Sensor Image Acquisition
Embedded Platform
Embedded UAV System
Fire Flood Collapsed Car Crash Fire Flood Collapsed Car Crash Fire Collapsed Building Flood & Collapsed Buildings Flood
Automated Path Planning Software
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
12
Vision System for disasters and incidents
Aerial Image Dataset for Emergency Response (AIDER) Order of magnitude more images than previous works
- C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
- C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
13
Pretrained Networks
For transfer learning established networks are used which have also been used in prior works for disaster monitoring[1,2].
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networksfor large-scale image recognition,”CoRR, vol. abs/1409.1556, 2014.[Online]. Available: http://arxiv.org/abs/1409.1556 [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,”CoRR, vol. abs/1512.03385, 2015. [Online]. Available:http://arxiv.org/abs/1512.03385 [5] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neuralnetworks for mobile vision applications,”CoRR, vol. abs/1704.04861,2017. [Online]. Available: http://arxiv.org/abs/1704.04861 [3] [5] [4]
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
14
How do you create a small DNN?
Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
15
Atrous Convolutional Feature Fusion
- C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
- C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
16
Macro-Architecture Design Choices
Reduced Cost of First Layer and Early downsampling
- 16 channels with strided convolution
Canonical Architecture
- A progressive reduction of spatial resolution
with an increase in depth of up to 256 channels.
Fully Convolutional Architecture
- No dense layers
Network Depth
- 7 main blocks
Capped leaky ReLU
- Capped from [0,…255] with different modes
during training and inference
255 Training Inference Capped leaky ReLU
- C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
- C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
17
Performance Evaluation
- C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
- C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
18
DroNet – Aerial Vehicle Detection
- Trained with a custom database for vehicle detection
- Processes 512x512 images
- Make use of 3x3 filters and cheaper 1x1 convolutions
- Progressively reduce the feature maps size by a factor of 2
- Lower number of filters at early layers
- Shortcut connections to further improve accuracy for small objects
Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis Theocharides, Christos-Savvas Bouganis, "DroNet: Efficient convolutional neural network detector for real-time UAV applications," DATE2018 , pp. 967-972, March 2018.
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
19
No free lunch!
- UAVs equipped with high resolution cameras and fly in high
altitudes
- Reducing the image size effectively reduces the object
resolution and detail
- Difficult to detect smaller objects such as pedestrians
Resizing Detections Single-shot CNN
DroNet
- r tinyYOLO
Input Image
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
20
Selective Tile Processing - Overview
…. …. Input Image Tiling How to process a higher resolution image? What happened in previous frames? Attention Mechanism Which image parts to process? Small Fast CNN
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
21
Memory Mechanism
- Keep track of detection metrics in each tile over time
- Relative position of objects will not change significantly over a few successive
frames
- A memory buffer is introduced that keeps track of:
- position of the bounding box with respect to the image
- a detection counter for each bounding box
- the latest tile that it was detected in
- detection confidence
- and the class type (vehicle, pedestrian etc.)
Previous New Compare Across Successive Tiles t=0 t=1 t=2
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
22
Tile Selection with Memory – TSM Attention
- Calculate the value of tile 𝑗 based on four main criteria:
- The number of objects detected in each tile – 𝑷𝒋
- Extracted from the CNN
- The cumulative intersection-over-union between
current (𝐶𝑙) and previous (𝐶
𝑘) bounding boxes – 𝑱𝒋
- The number of times not selected for processing
- ver time – 𝑻𝒋
- The number of frames past since selected for
processing – 𝑮𝒋
- Select top 𝑶𝑻 tiles above a threshold for processing
𝑊
𝑗 = 𝑃𝑗 max𝑘(𝑃𝑘) + 1 − 𝐽𝑗 max𝑘 𝐽𝑘
+
𝑇𝑗 max𝑘(𝑇𝑘)+ 𝐺𝑗 max𝑘(𝐺𝑘), ∀𝑘 ∈ [0, … , (𝑂𝑈 − 1)]
𝑱𝒋 =
𝑙=0 𝑃𝑗
max
𝑘 (𝐽𝑝𝑉 𝐶𝑙, 𝐶 𝑘 )
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
23
Various Attention Mechanisms
- Select which tiles to be processed by the CNN
- All-Tiles – TA
- Single Tile (Round Robin) – T1
- Selective Tile Processing – STP
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
24
Tiling + Quantization
We are selective in our processing but might still process more than one tiles per frame which increases computation demands. Can we make further improvements? Combine quantization and tiling techniques and analyse the impact on Accuracy, Average Processing Time on a UAV-based pedestrian dataset.
- 8-bit integer quantization on both input and weights
Implementation is based on Darknet, a C- and CUDA- based Neural Network Framework[1] with the use of DP4A instruction [2].
24
[1] https://github.com/AlexeyAB/yolo2_light [2] https://developer.nvidia.com/blog/mixed-precision-programming-cuda-8/
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
25
Evaluation: Accuracy
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
26
Evaluation: Average Processing Time
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
27
Edge Intelligence demands Holistic Optimization
Data Selection/Reduction Small DNN search and optimization HW Acceleration
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
28
Concluding Remarks
Deep Learning and Computer Vision are moving to the Edge
- Drones are a prime example of a resource-constrained system with
additional challenges for detectability at a distance
Exploration of neural network architectures is key for deployment
- n hardware-constrained devices
- Prepares the model for embedded/FPGA applications
Prior knowledge can further push performance
Small Efficient CNNs Search Strategies Prior Knowledge and sensor information
Efficient Deep Vision
Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020
29
Thank you for your attention!
kyrkou.christos@ucy.ac.cy https://sites.google.com/site/chriskyrkou/ https://github.com/ckyrkou/EmergencyNet https://github.com/gplast/DroNet
Lights Questions Camera
NVIDIA Corporation has supported this research with the donation of 2 Titan Xp GPUs This work has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 739551 (KIOS CoE)