Efficient Deep Vision for Aerial Visual Understanding Dr Christos - - PowerPoint PPT Presentation

efficient deep vision for aerial visual understanding
SMART_READER_LITE
LIVE PREVIEW

Efficient Deep Vision for Aerial Visual Understanding Dr Christos - - PowerPoint PPT Presentation

Efficient Deep Vision for Aerial Visual Understanding Dr Christos Kyrkou KIOS Research and Innovation Center of Excellence, University of Cyprus KIOS Seminar Series, 01/06/2020 kyrkou.christos@ucy.ac.cy funded by: @ChristosKyrkou


slide-1
SLIDE 1

funded by:

Efficient Deep Vision for Aerial Visual Understanding

Dr Christos Kyrkou KIOS Research and Innovation Center of Excellence, University of Cyprus KIOS Seminar Series, 01/06/2020 kyrkou.christos@ucy.ac.cy @ChristosKyrkou christoskyrkou.com

slide-2
SLIDE 2

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

2

Computer Vision (CV) finally works. Now What?

Similarly large accuracy improvements on tasks such as

  • Semantic Segmentation
  • Object Detection
  • 3D reconstruction
  • …and so on

Mostly Deeper Networks Intricate Structures Millions of training images

slide-3
SLIDE 3

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

3

CV/DL Deployment accelerating Rapidly

Image Sensor Mobile PC/Workstation Cloud

Benefits ! Security/Privacy Cost Saving Requirements: Less Power Consumption  Less Memory Usage Fast Response

slide-4
SLIDE 4

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

4

Markets Demands Scalability for Machine Learning

1000s of classes Large Workloads Highly Efficient

  • (Performance/W)

Varying Accuracy Server Form Factor <10 classes Frame Rate: 15-30 fps Power 1W-5W Cost: Low Varying Accuracy Custom Form Factor

Cloud Analytics Edge Intelligence

slide-5
SLIDE 5

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

5

Small Models have Big Advantages #1

Fewer parameter weights means bigger opportunities for scaling training Bigger networks increase the cost of communication between machines for distributed training

Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”

slide-6
SLIDE 6

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

6

Small Models have Big Advantages #2

Smaller number of weights enables complete on-chip integration

  • f CNN model with weights – no need for off-chip memory
  • Dramatically reduces the energy for computing inference
  • Gives the potential for pushing the data-processing close to the

data gathering (e.g., onboard cameras and other sensors)

  • Limited memory of embedded devices makes small models

absolutely essential for many applications.

Credit: Song Han “Bandwidth-Efficient Deep Learning ——from Compression to Acceleration”

slide-7
SLIDE 7

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

7

Small Models Have Big Advantages #3

Small models enable continuous wireless updates of models Each time any sensor discovers a new image/situation that requires retraining, all models should be updated. Data is uploaded to cloud and used for training But… how to update all the vehicles that are running all the model? At <500KB downloading new model parameters is easy.

Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”

Continuous Updating of CNN Models

slide-8
SLIDE 8

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

8

Model + Hardware Specialization

Credit: Song Han, Hardware Design Automation for Efficient Deep Learning, Samsung Forum

  • Convolution, ReLU and Pooling
  • perations are inherently highly

parallel in nature

  • They are best accelerated by

dedicated hardware in the FPGA But how much Convolution, ReLU and Pooling operations is needed?

slide-9
SLIDE 9

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

9

Application of small DNNs to UAVs

slide-10
SLIDE 10

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

10

Challenges

 State-of-the-art CV algorithms often require extensive hardware: limited payload!  Remote processing of images: solution?

  • Use of ground station
  • High bandwidth, minimal latency, ultra reliable

connection

  • Severe limitations!
  • (especially when targeting autonomous UAVs!)

 On-board processing: specific inherent challenges

  • Limited computational power
  • Limited weight, power consumption

 Extreme optimization of HW and SW is the solution for on-board processing!

Contradiction!

slide-11
SLIDE 11

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

11

Autonomous patrolling and recognition

Image Sensor Image Acquisition

Embedded Platform

Embedded UAV System

Fire Flood Collapsed Car Crash Fire Flood Collapsed Car Crash Fire Collapsed Building Flood & Collapsed Buildings Flood

Automated Path Planning Software

slide-12
SLIDE 12

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

12

Vision System for disasters and incidents

Aerial Image Dataset for Emergency Response (AIDER) Order of magnitude more images than previous works

  • C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
  • C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
slide-13
SLIDE 13

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

13

Pretrained Networks

For transfer learning established networks are used which have also been used in prior works for disaster monitoring[1,2].

[3] K. Simonyan and A. Zisserman, “Very deep convolutional networksfor large-scale image recognition,”CoRR, vol. abs/1409.1556, 2014.[Online]. Available: http://arxiv.org/abs/1409.1556 [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,”CoRR, vol. abs/1512.03385, 2015. [Online]. Available:http://arxiv.org/abs/1512.03385 [5] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neuralnetworks for mobile vision applications,”CoRR, vol. abs/1704.04861,2017. [Online]. Available: http://arxiv.org/abs/1704.04861 [3] [5] [4]

slide-14
SLIDE 14

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

14

How do you create a small DNN?

Credit: Forrest Iandola “Small Deep Neural Networks - Their Advantages, and Their Design”

slide-15
SLIDE 15

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

15

Atrous Convolutional Feature Fusion

  • C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
  • C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
slide-16
SLIDE 16

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

16

Macro-Architecture Design Choices

 Reduced Cost of First Layer and Early downsampling

  • 16 channels with strided convolution

 Canonical Architecture

  • A progressive reduction of spatial resolution

with an increase in depth of up to 256 channels.

 Fully Convolutional Architecture

  • No dense layers

 Network Depth

  • 7 main blocks

 Capped leaky ReLU

  • Capped from [0,…255] with different modes

during training and inference

255 Training Inference Capped leaky ReLU

  • C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
  • C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
slide-17
SLIDE 17

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

17

Performance Evaluation

  • C. Kyrkou and T. Theocharides, "EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion," in IEEE JSTARS, vol. 13, pp. 1687-1699, 2020
  • C. Kyrkou, T. Theocharides "Deep-Learning-Based Aerial Image Classification for Emergency Response Applications using Unmanned Aerial Vehicles", CVPR 3d International Workshop in Computer Vision for UAVs, Long Beach, CA, 16-20 June, 2019, pp. 517-525.
slide-18
SLIDE 18

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

18

DroNet – Aerial Vehicle Detection

  • Trained with a custom database for vehicle detection
  • Processes 512x512 images
  • Make use of 3x3 filters and cheaper 1x1 convolutions
  • Progressively reduce the feature maps size by a factor of 2
  • Lower number of filters at early layers
  • Shortcut connections to further improve accuracy for small objects

Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis Theocharides, Christos-Savvas Bouganis, "DroNet: Efficient convolutional neural network detector for real-time UAV applications," DATE2018 , pp. 967-972, March 2018.

slide-19
SLIDE 19

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

19

No free lunch!

  • UAVs equipped with high resolution cameras and fly in high

altitudes

  • Reducing the image size effectively reduces the object

resolution and detail

  • Difficult to detect smaller objects such as pedestrians

Resizing Detections Single-shot CNN

DroNet

  • r tinyYOLO

Input Image

slide-20
SLIDE 20

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

20

Selective Tile Processing - Overview

…. …. Input Image Tiling How to process a higher resolution image? What happened in previous frames? Attention Mechanism Which image parts to process? Small Fast CNN

slide-21
SLIDE 21

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

21

Memory Mechanism

  • Keep track of detection metrics in each tile over time
  • Relative position of objects will not change significantly over a few successive

frames

  • A memory buffer is introduced that keeps track of:
  • position of the bounding box with respect to the image
  • a detection counter for each bounding box
  • the latest tile that it was detected in
  • detection confidence
  • and the class type (vehicle, pedestrian etc.)

Previous New Compare Across Successive Tiles t=0 t=1 t=2

slide-22
SLIDE 22

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

22

Tile Selection with Memory – TSM Attention

  • Calculate the value of tile 𝑗 based on four main criteria:
  • The number of objects detected in each tile – 𝑷𝒋
  • Extracted from the CNN
  • The cumulative intersection-over-union between

current (𝐶𝑙) and previous (𝐶

𝑘) bounding boxes – 𝑱𝒋

  • The number of times not selected for processing
  • ver time – 𝑻𝒋
  • The number of frames past since selected for

processing – 𝑮𝒋

  • Select top 𝑶𝑻 tiles above a threshold for processing

𝑊

𝑗 = 𝑃𝑗 max𝑘(𝑃𝑘) + 1 − 𝐽𝑗 max𝑘 𝐽𝑘

+

𝑇𝑗 max𝑘(𝑇𝑘)+ 𝐺𝑗 max𝑘(𝐺𝑘), ∀𝑘 ∈ [0, … , (𝑂𝑈 − 1)]

𝑱𝒋 = ෍

𝑙=0 𝑃𝑗

max

𝑘 (𝐽𝑝𝑉 𝐶𝑙, 𝐶 𝑘 )

slide-23
SLIDE 23

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

23

Various Attention Mechanisms

  • Select which tiles to be processed by the CNN
  • All-Tiles – TA
  • Single Tile (Round Robin) – T1
  • Selective Tile Processing – STP
slide-24
SLIDE 24

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

24

Tiling + Quantization

We are selective in our processing but might still process more than one tiles per frame which increases computation demands. Can we make further improvements? Combine quantization and tiling techniques and analyse the impact on Accuracy, Average Processing Time on a UAV-based pedestrian dataset.

  • 8-bit integer quantization on both input and weights

Implementation is based on Darknet, a C- and CUDA- based Neural Network Framework[1] with the use of DP4A instruction [2].

24

[1] https://github.com/AlexeyAB/yolo2_light [2] https://developer.nvidia.com/blog/mixed-precision-programming-cuda-8/

slide-25
SLIDE 25

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

25

Evaluation: Accuracy

slide-26
SLIDE 26

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

26

Evaluation: Average Processing Time

slide-27
SLIDE 27

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

27

Edge Intelligence demands Holistic Optimization

Data Selection/Reduction Small DNN search and optimization HW Acceleration

slide-28
SLIDE 28

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

28

Concluding Remarks

Deep Learning and Computer Vision are moving to the Edge

  • Drones are a prime example of a resource-constrained system with

additional challenges for detectability at a distance

Exploration of neural network architectures is key for deployment

  • n hardware-constrained devices
  • Prepares the model for embedded/FPGA applications

Prior knowledge can further push performance

Small Efficient CNNs Search Strategies Prior Knowledge and sensor information

Efficient Deep Vision

slide-29
SLIDE 29

Dr Christos Kyrkou, “Efficient Deep Vision for Aerial Visual Understanding”, RCML2020, 4 September 2020

29

Thank you for your attention!

 kyrkou.christos@ucy.ac.cy  https://sites.google.com/site/chriskyrkou/  https://github.com/ckyrkou/EmergencyNet  https://github.com/gplast/DroNet

Lights Questions Camera

NVIDIA Corporation has supported this research with the donation of 2 Titan Xp GPUs This work has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 739551 (KIOS CoE)