Deep Neural Network Enhanced VSLAM Landmark Selection Dr. Patrick - - PowerPoint PPT Presentation

deep neural network enhanced vslam landmark selection
SMART_READER_LITE
LIVE PREVIEW

Deep Neural Network Enhanced VSLAM Landmark Selection Dr. Patrick - - PowerPoint PPT Presentation

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results Deep Neural Network Enhanced VSLAM Landmark Selection Dr. Patrick Benavidez University of Texas at San Antonio - Department of Electrical and


slide-1
SLIDE 1

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Deep Neural Network Enhanced VSLAM Landmark Selection

  • Dr. Patrick Benavidez

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-2
SLIDE 2

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Overview

1 Introduction 2 Background on methods used in VSLAM 3 Proposed Method 4 Testbed 5 Preliminary Results

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-3
SLIDE 3

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

What is VSLAM?

Visual Simultaneous Localization and Mapping Use of vision and depth sensors to acquire features from an environment, map them and to navigate with the map

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-4
SLIDE 4

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Motivation to use VSLAM

Similar to methods used by humans GPS-denied and contested environments Spoofing attacks on GPS Cloud-based robotics Data simplification and organization

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-5
SLIDE 5

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

What processes are involved in VSLAM?

Sensors capture properties of the surrounding environment Operations to transform captured environmental data with robot pose data Algorithms to place transformed environmental data into existing map Methods to determine whether the robot has already visited a particular location Operations to update existing data in the map Loop closure operations to constrain the bounds of a map

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-6
SLIDE 6

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Typical Scenes for VSLAM

Indoors Outdoors

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-7
SLIDE 7

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

VSLAM Mapping Process

feature detectors – find useful feature rich points in an image feature descriptors – describe sets of features feature matching – match features into map

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-8
SLIDE 8

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Feature Detection

Corner detectors - Harris, Shi Tomasi Scale Invariant Feature Transform (SIFT) – better than Harris Corner detector Speeded-Up Robust Features (SURF) – faster version of SIFT Features from Accelerated Segment Test (FAST) – ”fast enough for SLAM” [http://docs.opencv.org/3.0-

beta/doc/py tutorials/py feature2d/py sift intro/py sift intro.html #sift-intro]

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-9
SLIDE 9

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Feature Descriptors

Feature descriptors describe sets of features Feature descriptors are saved in a database or similar structure Binary Robust Independent Elementary Features (BRIEF) Oriented FAST and Rotated BRIEF (ORB)

SIFT & SURF are patented, ORB is free Fusion of FAST Keypoint Detector and BRIEF descriptor methods with increased performance [http://docs.opencv.org/3.0-

beta/doc/py tutorials/py feature2d/py orb/py orb.html#orb]

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-10
SLIDE 10

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Feature Matching

Commonly used methods are brute force and FLANN matching algorithms These methods match feature descriptors of a newly acquired image to those saved in the database

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-11
SLIDE 11

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Bag of Words

A bag of words in natural language processing is the decomposition of a sentence into its constituent components (words) and storing them in a container (bag) Example [https://en.wikipedia.org/wiki/Bag-of-words model]:

Sentence 1 – John likes to watch movies. Mary likes movies too. Sentence 2 – John also likes to watch football games. Bag of words used to describe sentence 1 and sentence 2 –[”John”, ”likes”, ”to”, ”watch”, ”movies”, ”also”, ”football”, ”games”, ”Mary”, ”too”]

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-12
SLIDE 12

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Visual Bag of Words

A visual bag of words is where an image is broken down into its component regions of interest (ROI) in an image (words) and stored in a collection (bag) Labels can be applied to the ROI in a bag Example: [https://gilscvblog.com/2013/08/23/bag-of-words-models-

for-visual-categorization/]:

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-13
SLIDE 13

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Application of Visual Bag of Words – Scene Identification

A visual bag of words (collection of known images) is created to describe particular components for each scene Features taken from the latest camera image are compared to those in the visual bag of words for each scene A collection of the most relevant words (images) in each bag matching the input image are generated The bag most closely matching the current image identifies the scene

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-14
SLIDE 14

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Application of Visual Bag of Words – Mapping

A visual bag of words (collection of unknown images) is created at runtime to describe particular components discovered by a robot Locations where the words have been discovered are input into the map Features taken from the latest camera image are compared to those in the visual bag of words to determine if the object has been seen before New objects are added to the bag Objects already in the bag are used to identify where the agent is in a map if it has been to that location before

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-15
SLIDE 15

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Problems with existing methods

Features are made too general by design

Recall that features are simple components of an image: corners, edges, intersections, etc. Almost every type of object can contain these features Problem: Both static and dynamic objects in the environment are registered in the map in the same context

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-16
SLIDE 16

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Problems with existing methods (continued)

Objects with freedom to move around the environment

Examples: people, animals, robots, mobile carts, etc. Problem: Traditional SLAM/VSLAM will fail to produce meaningful maps with multiple agents working in the same environment

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-17
SLIDE 17

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Problems with existing methods (continued)

Time varying objects that do not travel around the environment

Examples: trees, plants, tracking solar panels, windmills, flags, banners, televisions, digital billboards Problem: Features should not be taken off of the dynamic portions of these items Features can be acquired from their static components (planted/grounded base, static frame)

[http://www.swri.org/3pubs/ttoday/Summer12/images/IMG0340-

250x167.jpg]

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-18
SLIDE 18

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Problems with existing methods (continued)

Loop closure – revisiting the same place twice produces multiple paths due to odometry errors

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-19
SLIDE 19

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Problems with existing methods (continued)

2D maps from laser scanners contain low levels of information about the environment without use of a vision sensor

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-20
SLIDE 20

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Method Overview

Deep learning for object classification Association of classified objects to known properties Map classified objects by their properties - static/dynamic Localize on the event of identifying clusters of adjoining

  • bjects (preferably static object clusters)

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-21
SLIDE 21

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Deep Neural Network

Use deep neural networks to classify objects into known

  • bjects classes

Known object classes can be any of the following examples: desk, chair, wall, door frame, door, UGV, UAV, cup, trash can, wheels, etc. ImageNet - currently has 14,197,122 images, 21841 synsets (synonym sets) indexed [http://www.image-net.org/about-overview] Convolution Neural Networks (CNNs) will be used for this work

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-22
SLIDE 22

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Deep Neural Network

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-23
SLIDE 23

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Deep Neural Network

*source: www.extremetech.com University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-24
SLIDE 24

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Association of classified objects to known properties

Dynamic properties of an object can be either referenced from a database or measured from the environment Measurement of an object’s dynamic properties from

  • bservation would entail detection of movement with or

without perturbation

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-25
SLIDE 25

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Mapping Process

Identify static/dynamic properties of classified objects Map dynamic objects as temporary obstacles Map static objects as landmark components Identify clusters of landmark components as a landmark Perform loop closure (re-adjustment/alignment of map) with knowledge of landmark locations

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-26
SLIDE 26

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Computers for training models

Custom-built machine learning optimized desktop computers

Relevant Specifications

EVGA NVIDIA GeForce GTX-1080 Video Card – 2560 CUDA Cores, 8 GB GDDR5X Intel Core i5-6600 3.4GHz Quad Core 32GB DDR4 RAM 240GB SSD for operating system 2TB HDD for storage

Acquired these today on 4/7/2017 – will setup after this talk

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-27
SLIDE 27

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Computer for processing input

NVIDIA Jetson TX-1

Relevant Specifications

Mobile ”Supercomputer on a Chip” NVIDIA Maxwell architecture – 256 CUDA Cores 64-bit CPU – Quad ARM R A57 4GB LPDDR4 RAM 16GB eMMC for operating system SD card slot for storage Multiple high speed camera connections (USB3, CSI)

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-28
SLIDE 28

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Robot Hardware

A variety of systems capable of mapping the environment

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-29
SLIDE 29

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Preliminary Results

Use of TensorFlow, Inception V3 and GoogLeNet for detecting various objects and recording their locations in a radial map Convolutional Neural Network to classify people from a car’s perspective

97% Accuracy Can be modified to work from the MAV’s perspective

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection

slide-30
SLIDE 30

Introduction Background on methods used in VSLAM Proposed Method Testbed Preliminary Results

Thank You

Any Questions???

University of Texas at San Antonio - Department of Electrical and Computer Engineering Deep Neural Network Enhanced VSLAM Landmark Selection