Constrained Environments Ted Hromadka 1 and Cameron Hunt 2 1 - - PowerPoint PPT Presentation

constrained environments
SMART_READER_LITE
LIVE PREVIEW

Constrained Environments Ted Hromadka 1 and Cameron Hunt 2 1 - - PowerPoint PPT Presentation

= Prototyping Vision-Based Classifiers in Constrained Environments Ted Hromadka 1 and Cameron Hunt 2 1 Integrity Applications Incorporated , 2 SOFWERX (DEFENSEWERX, Inc.) Presented at GTC 2018 Integrity Applications Incorporated < >


slide-1
SLIDE 1

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Prototyping Vision-Based Classifiers in Constrained Environments

Ted Hromadka1 and Cameron Hunt2

1Integrity Applications Incorporated℠, 2SOFWERX (DEFENSEWERX, Inc.)

Presented at GTC 2018

slide-2
SLIDE 2

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Integrity Applications IncorporatedSM

15020 Conference Center Drive, Suite 100, Chantilly, VA 20151 • 703-378-8672 • www.integrity-apps.com 2 UNCLASSIFIED

Company Overview

  • Capabilities – image processing / computer vision applications

for US government customers

  • Number of Employees – around 700, most with MS/PhD
  • Main locations

– Chantilly, VA – Dayton, OH – Carlsbad, CA – Kihei, HI

2

List cities ? IAI Work Location IAI Office Location IAI Future Work Location IAI HQ, Chantilly Dahlgren PAX River Charlottesville

  • Ft. Belvoir

Seattle New England Colorado Springs Ann Arbor Denver St. Louis Dayton Albuquerque Las Cruces DC Area (DC Area) Valley Forge Carlsbad El Segundo San Diego LA

  • So. CA

Area

slide-3
SLIDE 3

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

SOFWERX

  • SOFWERX performs collaboration, ideation and facilitation with the best minds of Industry,

Academia and Government. SOFWERX can also conduct rapid prototyping and rapid proof of concepts from ideation discovery.

  • Run by DEFENSEWERX (formerly the Doolittle Institute)
  • Located in Tampa, FL
slide-4
SLIDE 4

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Background – requirement to track usage of tank ammunition

  • Commanders asked for an automated means of tracking and reporting the firing of the

Abrams main gun – Location – Timestamp – Type of ammunition used 

  • Various other means of tracking the ammunition unacceptable due to wear & tear, etc.
  •  Computer vision solution
slide-5
SLIDE 5

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Context

  • Loader (1) pulls 120mm round from cabinet (5) and loads it into main breech (3)

Source: unattributed on multiple websites, appears to be scanned pages from a book

slide-6
SLIDE 6

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Concept

  • Vision-based classifier
  • Camera
  • Processor
  • GPS and SATCOM links
  • No impact on tank’s systems
  • Mounted somewhere inside cabin
slide-7
SLIDE 7

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Collecting Training Data

  • Raspberry Pi 2B (900 MHz)
  • 1 GB RAM
  • RPi camera board v2

– 8 MP = 3280x2464

  • 5V USB battery pack (12 hours)
  • Python script to take and write

images to SD card as quickly as possible (~1 Hz) Source: adafruit.com

slide-8
SLIDE 8

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Collecting Data - RPi

slide-9
SLIDE 9

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Collecting Data – static photos

  • Compact Nikon digital camera
  • Resolution 4610 x 3460
  • Slightly over 1000 photos per class
  • Wide range of background scenes
slide-10
SLIDE 10

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Collecting Data

  • Day 2: added GoPro to tank commander’s

GPS extension eyepiece

  • HD video can be matched to RPi quality in

post-processing

slide-11
SLIDE 11

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Early network

  • Initial comparison runs of Caffe and TensorFlow on stock GoogLeNet (Inception v1)

– Caffe trained using DIGITS software; TF trained using python – Remainder of this talk will only discuss TF

  • Initially treated as Image Classification

– 4 classes – No need to label bounding boxes – Runs faster than object detection – We never more than one object in scene

  • Trained on a DevBox-1 (4x TITAN X)
slide-12
SLIDE 12

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Why use old version of GoogLeNet?

Network MAC (million) Parameters (million) Inception v1 1550 6.8 Inception v2 3800 (?) Inception v3 5000 23 VGG 16 15300 138 ResNet-50 3900 25.5 AlexNet 720 60

slide-13
SLIDE 13

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Early results (sanity check)

  • Model was confidently wrong
  • Averaged results of 25% mini-batches:

M829A1 M830 M830A1 M1028 TOTAL ACC % M829A1 270 270 “100%” M830 265 4 269 0% M830A1 267 3 270 1% M1028 270 270 100% TOTAL 802 7 270 1079

PREDICTED TRUTH

slide-14
SLIDE 14

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Augmented training data

  • CATALYST tool

– Noise background – Transparent on top of “tank scene” background

slide-15
SLIDE 15

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Re-training baseline model

  • Still treating as image classification
  • ~10,000 images per class
  • Switched from DIGITS to manual
slide-16
SLIDE 16

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Misclassified images

  • No longer deciding that everything is an M829A1
  • Mistakes now due to orientation, possibly also due to shadowing
slide-17
SLIDE 17

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Better results

  • 99% accuracy on synthetic imagery, 76% on “action shots”

– Need to incorporate real imagery in next model

  • Good enough to switch focus to deployment on Raspberry Pi
  • To build TF on RPi, relied heavily on excellent guide in:

https://github.com/samjabrahams/tensorflow-on-raspberry-pi/blob/master/GUIDE.md

  • Makefile needed for RPi can be found at:

https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/contrib/makefile/tf_op_fil es.txt

slide-18
SLIDE 18

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

RPi struggled to keep up

  • Need to catch a specific 3s critical window over many hours of movement in scene
  • Evaluated several approaches

– Frame grabs

  • High accuracy, low false positives, but too slow (1/4 fps)

– Darknet/YOLO video

  • Could not run it usefully on RPi

– Possibility of hardware trigger from cabinet door opening: discarded due to complexity – Just sending imagery to server for processing there

slide-19
SLIDE 19

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

RPi struggled to keep up

  • Need to catch a specific 3s critical window over many hours of movement in scene
  • Evaluated several approaches

– Frame grabs

  • High accuracy, low false positives, but too slow (1/4 fps)

– Darknet/YOLO video

  • Could not run it usefully on RPi

– Possibility of hardware trigger from cabinet door opening: discarded due to complexity – Just sending imagery to server for processing there

slide-20
SLIDE 20

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

TF model_pruning

  • Attempted to simplify network

down to an RPi level

  • Exploit sparsity of large model
  • TensorFlow model_pruning

– Threshold & mask – Prune, train(100), repeat

  • pb reduced from 87.4 MB to 22.4

MB

  • Sacrifice ~3% model accuracy for

~60% speedup

  • Still only getting ~1/2 fps on RPi

https://www.tensorflow.org/versions/master/api_docs /python/tf/contrib/model_pruning/Pruning

slide-21
SLIDE 21

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

MobileNets

  • Very different approach
  • Small-dense models vs large-sparse [pruned]

model (same number of calcs)

  • Depthwise-separable convolutions followed by

1x1 pointwise convolution

  • = 1/8 the MAC of a regular convolution
  • Depending on settings for W and resolution, pb

size ranged from 16.7 MB down to 1.9 MB (!)

  • Peak accuracy was still around 75%

https://arxiv.org/pdf/1704.04861.pdf

slide-22
SLIDE 22

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

MobileNets tradeoff space

Resolution

  • Width multiplier only affected MAC, not parameters count

W Size on disk (MB)

slide-23
SLIDE 23

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

MobileNets tradeoff space

  • W made a bigger impact than R
  • (W < 0.5, R < 192)  accuracy fell off quickly

Resolution W Accuracy

slide-24
SLIDE 24

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Latest TF model results on Raspberry Pi 2B

Model Accuracy Fps on RPi 2B GoogLeNet / Inception v1 0.76 ~1/4 model_prune(GoogLeNet) 0.73 ~1/2 MobileNet-1.0-224 0.75 ~1/2 MobileNet-0.25-224 0.53 ~1 MobileNet-0.25-128 0.33 ~1

  • Frame size = 320x240
  • Possible issues other than CPU processing: camera data bus
slide-25
SLIDE 25

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Jetson TX2

  • GPU hardware + cuDNN + TensorRT 3
  • Conclusion: TX2 is far overpowered for the application requirements

– No latency or processing issues at all – Darknet/YOLO9K @ 24 fps

  • YOLO accuracy: “pretty good”… anecdotally
slide-26
SLIDE 26

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

TensorRT 3

  • Optimization engine for Caffe/TF models running on NVIDIA GPU

– Layer and tensor fusion and elimination of unused layers; – FP16 and INT8 reduced precision calibration; – Target-specific autotuning; – Efficient memory reuse

Source = https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/

slide-27
SLIDE 27

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Next steps

  • Taylor criteria ranking… ¼ size, 3x faster, 2%-5% accuracy loss?
  • Sparse MobileNets?
  • fp16, int8, maybe even fixed-point (quantized)?
  • RGB -> YCbCr?
  • Reduced image resolution?
  • TF object detection (not just image classification)

– Updated dataset

  • Draw boundaries on still images by hand using LabelImg
  • CATALYST generated bonding boxes on the synthetic images
  • Convert to TFRecords

– Optimize for speed/accuracy tradeoff

  • Video again: SSD, F-RCNN… on Jetson
slide-28
SLIDE 28

> <

=

Integrity Applications Incorporated

15020 Conference Center Drive Chantilly, VA 20151 • (703) 378-8672 • www.integrity-apps.com

Conclusions

  • Visual classification is feasible in daylight conditions

– NIR camera or other night vision needed for dark conditions

  • Pruning reduced network by 3X
  • RPi 2B could only handle ~1 image/sec, even with extensive compression and optimization

– tf.model_prune = best accuracy – TF MobileNets = best speed

  • Jetson TX2 exhibited no practical limits in this application

– TensorRT 3