Object Detection using NVIDIA DIGITS Customization and Modification - - PowerPoint PPT Presentation

object detection using nvidia digits
SMART_READER_LITE
LIVE PREVIEW

Object Detection using NVIDIA DIGITS Customization and Modification - - PowerPoint PPT Presentation

Object Detection using NVIDIA DIGITS Customization and Modification Deep Learning Institute NVIDIA Corporation 1 2 2 Introduction to Object Detection Detection by Combining Deep Learning with Traditional Computer Vision AGENDA Detection by


slide-1
SLIDE 1

1

Object Detection using NVIDIA DIGITS

Customization and Modification

Deep Learning Institute NVIDIA Corporation

slide-2
SLIDE 2

2 2

slide-3
SLIDE 3

AGENDA

Introduction to Object Detection Detection by Combining Deep Learning with Traditional Computer Vision Detection by Modifying Network Architecture State of the Art Detection

slide-4
SLIDE 4

4

Object Detection Finding a whale face in the ocean.

We want to know IF there are whale faces in aerial images, and if so, where.

slide-5
SLIDE 5

5

Brainstorm: How can we use what we know about Image Classification to detect whale faces from aerial images?

Take 2 minutes to think through and write down (paper or computer) ideas.

slide-6
SLIDE 6

6 6

AI at scale

Applications that combine trained networks with code can create new capabilities Trained networks play the role of functions Building applications requires writing code to generate expected inputs and useful outputs

Solving novel problems with code

slide-7
SLIDE 7

7

Approach 1: Sliding Window

  • Technique:
  • Build a whale face/not whale face classifier
  • Sliding window python application runs classifier on each 256X256 segment
  • Yes = blue, no = red
slide-8
SLIDE 8

8

Your turn – Launching lab

slide-9
SLIDE 9

9 9

Potential Confusion

Despite existing datasets and models, you will begin the lab by loading a new dataset and training a new classification model.

slide-10
SLIDE 10

10

CONNECTING TO THE LAB ENVIRONMENT

Lab will take place in a Jupyter notebook

slide-11
SLIDE 11

11

JUPYTER NOTEBOOK

1. Make changes in code blocks 2. Simultanious “Shift” + “Enter” while mouse is in code-block

slide-12
SLIDE 12

12

NAVIGATING TO QWIKLABS

1. Navigate to: https://nvlabs.qwiklab.com 2. Login or create a new account

slide-13
SLIDE 13

13

ACCESSING LAB ENVIRONMENT

3. Select the event “Fundamentals of Deep Learning” in the upper left 4. Click the “Object Detection with DIGITS” Class from the list

slide-14
SLIDE 14

14

LAUNCHING THE LAB ENVIRONMENT

5. Click on the Select button to launch the lab environment

  • After a short

wait, lab Connection information will be shown

  • Please ask Lab

Assistants for help!

slide-15
SLIDE 15

15

LAUNCHING THE LAB ENVIRONMENT

6. Click on the Start Lab button

slide-16
SLIDE 16

16

LAUNCHING THE LAB ENVIRONMENT

You should see that the lab environment is “launching” towards the upper-right corner

slide-17
SLIDE 17

17

CONNECTING TO THE LAB ENVIRONMENT

7. Click on “here” to access your lab environment / Jupyter notebook

slide-18
SLIDE 18

18

Follow lab instructions through end of Approach 1

slide-19
SLIDE 19

19

Discuss: Intro to Network Architecture

slide-20
SLIDE 20

20

Approach 1: Sliding Window

  • Works but:
  • Needs human supervision
  • Slow – constrained by image size
slide-21
SLIDE 21

21

Approach 2 – Modifying Network Architecture

Layers are mathematical operations on tensors (Matrices, vectors, etc.) Layers are combined to describe the architecture of a neural network Modifications to network architecture impact capability and performance Each framework has a different syntax for describing architectures Regardless of framework: The output of each layer must fit the input of the next layer.

slide-22
SLIDE 22

22 22

TOOL - UI NETWORK FRAMEWORK

Our current architecture

We’ve been working in a framework called Caffe. Each framework requires a different way (syntax) of describing architectures and hyperparameters. Other frameworks include TensorFlow, MXNet, etc. We’ve been working with a UI called DIGITS The community works to make model building and deployment easier. Other tools include Keras, Tensorboard, or APIs with common programming languages. We’ve been working with a network called AlexNet. Each network can be described and trained using ANY framework. Different networks learn differently: different training rates, methods,

  • etc. Think different

learners.

slide-23
SLIDE 23

23

CAFFE FEATURES

Protobuf model format

  • Strongly typed format
  • Human readable
  • Auto-generates and checks Caffe code
  • Developed by Google, currently

managed by Facebook

  • Used to define network architecture

and training parameters

  • No coding required!

name: “conv1” type: “Convolution” bottom: “data” top: “conv1” convolution_param { num_output: 16 kernel_size: 3 stride: 1 weight_filler { type: “xavier” } }

Deep Learning model definition

slide-24
SLIDE 24

24

Image Classification Network (CNN)

Input Result

Application components: Task objective e.g. Identify face Training data 10-100M images Network architecture ~10s-100s of layers 1B parameters Learning algorithm ~30 Exaflops 1-30 GPU days

Raw data Low-level features Mid-level features High-level features

slide-25
SLIDE 25

25

APPROACH 2 – Network Modification

  • Modify AlexNet

by using Caffe in DIGITS

  • Replace layers

by reading carefully

slide-26
SLIDE 26

RETURN TO THE LAB

Work through the end We will debrief “Approach 3” post-lab Ask for help if needed If at any point you get stuck, seek out solutions

slide-27
SLIDE 27

27

Work through end of lab

slide-28
SLIDE 28

28

Approach 3: End-to-End Solution

Need dataset with inputs and corresponding (often complex) output

slide-29
SLIDE 29

29

Approach 3 – End to end solution

High-performing neural network architectures requires experimentation You can benefit from the work of the community through the modelzoo of each framework Implementing a new network requires an understanding of data and training expectations. Find projects similar to your project as starting points.

slide-30
SLIDE 30

30

Approach 3: End-to-End Solution

  • DetectNet:
  • Architecture

designed for detecting anything

  • Dataset is

whale-face specific

  • DetectNet is

efficient and accurate

slide-31
SLIDE 31

31

ADDITIONAL APPROACHES TO OBJECT DETECTION ARCHITECTURE

  • R-CNN = Region CNN
  • Fast R-CNN
  • Faster R-CNN Region Proposal Network
  • RoI-Pooling = Region of Interest Pooling
slide-32
SLIDE 32

32

Closing thoughts – Creating new functionality

  • Approach 1: Combining DL with programming
  • Scaling models programmatically to create new functionality
  • Approach 2: Experiment with network architecture
  • Study the math of neural networks to create new functionality
  • Approach 3: Identify similar solutions
  • Study existing solutions to implement new functionality
slide-33
SLIDE 33

33

March 26-29, 2018| Silicon Valley | #GTC18

www.gputechconf.com

Enjoy the world’s most important event for GPU developers March 26-29, 2018 in Silicon Valley

INNOVATE

Hear about disruptive innovations from startups

DISCOVER

See how GPUs are creating amazing breakthroughs in important fields such as deep learning and AI

CONNECT

Connect with technology experts from NVIDIA and

  • ther leading organizations

LEARN

Gain insight and valuable hands-on training through hundreds of sessions and research posters

slide-34
SLIDE 34

34

www.nvidia.com/dli