Detec%ng Wildlife in Uncontrolled Outdoor Video using Convolu%onal - - PowerPoint PPT Presentation

detec ng wildlife in uncontrolled outdoor video using
SMART_READER_LITE
LIVE PREVIEW

Detec%ng Wildlife in Uncontrolled Outdoor Video using Convolu%onal - - PowerPoint PPT Presentation

Detec%ng Wildlife in Uncontrolled Outdoor Video using Convolu%onal Neural Networks Connor Bowley * , Alicia Andes + , Susan Ellis-Felege + , Travis Desell * Department of Computer Science * Department of Biology + University of North Dakota


slide-1
SLIDE 1

Detec%ng Wildlife in Uncontrolled Outdoor Video using Convolu%onal Neural Networks

Connor Bowley*, Alicia Andes+, Susan Ellis-Felege+, Travis Desell*

Department of Computer Science* Department of Biology+ University of North Dakota

slide-2
SLIDE 2

Wildlife@Home

  • Ci%zen Science project combining crowd

sourcing and volunteer compu%ng.

  • Users can examine videos and images and

record what happens

  • They can also volunteer their computer to

download videos and run algorithms over them

  • There is a web portal to compare results from

the users, experts, and computer vision algorithms

slide-3
SLIDE 3

Wildlife@Home

  • Nest cameras
  • Around 7.8 years of video %me gathered over 3

years

– Over 91,000 videos of Grouse, Interior Least Tern, and Piping Plover – A liZle over 4.5 TB

  • Challenges with dataset

– Changing weather – Changing ligh%ng as day progresses, cloud cover – Some species are camouflaged – Video quality can be low

slide-4
SLIDE 4

Crowd sourcing interface users can give us informa%on about the video through. The biology experts have a similar interface.

slide-5
SLIDE 5

Convolu%onal Neural Networks

  • CNNs commonly used for image classifica%on
  • A few types of layers

– Convolu%onal (has weights to be trained) – Ac%va%on – Max Pooling – Fully Connected

  • Socmax or SVM usually used at the end
  • Local connec%ons, shared weights
  • Learns from labeled training data

hZp://cs231n.github.io/assets/cnn/cnn.jpeg

slide-6
SLIDE 6

Crea%ng Training Data

  • Images of variable sizes
  • Sub-images size 32x32 used for training
  • Striding process used to get sub-images
  • Careful cropping needed to minimize

mislabeled data

slide-7
SLIDE 7

Crea%ng Training Data

  • Images of variable sizes
  • Sub-images size 32x32 used for training
  • Striding process used to get sub-images
  • Careful cropping needed to minimize

mislabeled data

slide-8
SLIDE 8

Crea%ng Training Data

  • Images of variable sizes
  • Sub-images size 32x32 used for training
  • Striding process used to get sub-images
  • Careful cropping needed to minimize

mislabeled data

slide-9
SLIDE 9

Crea%ng Training Data

slide-10
SLIDE 10

Crea%ng and Training CNN

  • WriZen in C++ and OpenCL

– C++ allows distribu%on via BOINC – OpenCL allows execu%on on most CPUs and GPUs

  • Stochas%c gradient descent backpropaga%on
  • Uses L2 regulariza%on and Nesterov Momentum
  • Weights ini%alized by normal distribu%on with

mean of 0 and standard devia%on of

  • Two way socmax classifier

– (tern not in frame, tern in frame)

2 / n

1 hZp://cs231n.github.io/neural-networks-3/ 1

slide-11
SLIDE 11

Crea%ng and Training CNN

In total 2068 weights

slide-12
SLIDE 12

Running the Trained CNN

  • Strided over full images similar to method

used to create training data

  • A predic%on image is created for each frame

in video to create a predic%on video

  • A chart is also created ploing how much of

each frame is predicted to be of the posi%ve class

slide-13
SLIDE 13

Running the Trained CNN

  • Each pixel in full image has a “pixel classifier”

– Socmax output in sub-image is added into pixel classifier of each pixel in sub-image

  • Sub-images may overlap and their outputs are

summed into pixel classifier

  • Pixel color determined using ra%o of squares
  • f pixel classifier

– red is posi%ve class, blue is nega%ve class

slide-14
SLIDE 14

Results

  • Ini%ally trained 5 epochs over ~73,000 images

from 1 video

  • Ended training with accuracy of 95.6% on training

data

  • Run over test set of 280,000 images from 2 other

videos with 82% accuracy

– These images were not created yet during ini%al training – Videos all from same nest, so some background images might have been similar – 77% of errors from false posi%ves

slide-15
SLIDE 15

Results

Original Image Acer Ini%al Training

slide-16
SLIDE 16

Extra Training

  • Misclassifica%on prompted extra training on

CNN

  • New training set of approx. 17,000 images

– 69% nega%ve – Mostly of trees and ground stubble – Posi%ve examples were reused from original training set

slide-17
SLIDE 17

Original Image Acer Ini%al Training Acer 2 extra epochs Acer 4 extra epochs

slide-18
SLIDE 18

Predic%on Video

slide-19
SLIDE 19

Tracking when a tern is in the frame

  • Charts were made tracking how much of the

image is comprised of red (posi%ve class) pixels

  • Easy to see some trends across whole video
  • Difficult to classify frame by frame
  • Difficult to classify more complex events
slide-20
SLIDE 20

Results of Running Trained CNN over Simple Video

slide-21
SLIDE 21

Results of Running Trained CNN over More Complex Video

slide-22
SLIDE 22

Improving Performance

  • Many computers have mul%ple OpenCL

capable devices.

– Exp. A CPU and a GPU

  • Run%me performance can be increased by

using mul%ple devices simultaneously

  • Some devices may be faster or slower than
  • thers
slide-23
SLIDE 23

Improving Performance

  • Work stealing approach
  • Copy of CNN on each device
  • Each device requests one frame at a %me from

Video manager

  • Once finished, the results are submiZed to

Output manager

– Frames that come out of order are buffered un%l they are next to be outpuZed

slide-24
SLIDE 24
slide-25
SLIDE 25

Performance Results

slide-26
SLIDE 26

Future Work

  • Get more training data

– Grouse and Piping Plover – Crowd source crea%on of training data

  • Full implementa%on with BOINC for distributed

running over en%re dataset

  • Larger sizes than 32x32
  • Speed improvements to CNN code since submission

warrant tes%ng of larger networks

  • BeZer algorithms to determine if frames contain

wildlife or if it is noise

– CNN over output? – Blob detec%on on output?

slide-27
SLIDE 27

Resources

  • Code on Git

– hZps://github.com/Connor-Bowley/ neuralNetwork – Commit 8d95bf087cde7483c4984fc4891778f5280381fc (May 24, 2016)

  • Videos available via Wildlife@Home Data

Release

– hZp://csgrid.org/csg/wildlife/data_releases.php

slide-28
SLIDE 28

Acknowledgements

We appreciate the support and dedica%on of the Wildlife@Home ci%zen scien%sts who have spent significant amounts of %me watching video. This work has been par%ally supported by the Na%onal Science Founda%on under Grant Number 1319700. Any opinions, findings, and conclusions or recommenda%ons expressed in this material are those of the authors and do not necessarily reflect the views of the Na%onal Science Founda%on. Funds to collect data in the field were provided by the U.S. Geological Survey.

slide-29
SLIDE 29

Thanks! Ques%ons?

hZp://csgrid.org/csg/wildlife connor.bowley7@gmail.com