Deep Neural Networks II Sen Wang UDRC Co-I WP3.1 and WP3.2 - - PowerPoint PPT Presentation

deep neural networks ii
SMART_READER_LITE
LIVE PREVIEW

Deep Neural Networks II Sen Wang UDRC Co-I WP3.1 and WP3.2 - - PowerPoint PPT Presentation

Deep Neural Networks II Sen Wang UDRC Co-I WP3.1 and WP3.2 Assistant Professor in Robotics and Autonomous Systems Institute of Signals, Sensors and Systems Heriot-Watt University UDRC-EURASIP Summer School 26th June 2019 - Edinburgh


slide-1
SLIDE 1

UDRC-EURASIP Summer School

26th June 2019 - Edinburgh

Sen Wang

UDRC Co-I – WP3.1 and WP3.2 Assistant Professor in Robotics and Autonomous Systems Institute of Signals, Sensors and Systems Heriot-Watt University

Deep Neural Networks II

Slides adapted from Andrej Karpathy, Kaiming He

slide-2
SLIDE 2

Outline

Learning features for machines to solve problems

  • Convolutional Neural Networks (CNNs)
  • Deep Learning Architectures (focus on CNNs) - learning features
  • Some Deep Learning Applications - problems
  • Object detection (image, radar, sonar)
  • Semantic segmentation
  • Visual odometry
  • 3D reconstruction
  • Semantic mapping
  • Robot navigation
  • Manipulation and grasping
  • …………

UDRC-EURASIP Summer School 1

slide-3
SLIDE 3

Deep Learning: a learning technique combining layers of neural networks to automatically identify features that are relevant to the problem to solve

UDRC-EURASIP Summer School

test data

forward

prediction

trained DNN

Testing Training (supervised learning)

big data forward backward

prediction error

label

data label

Deep Learning

Low-level Features Middle-level Features High-level Features Raw Data

2

slide-4
SLIDE 4

Deep Learning in Robotics

UDRC-EURASIP Summer School

IJRR 2016

ICRA2018 ~2500 submissions: the most popular keyword

IJCV 2018

3

slide-5
SLIDE 5

Deep Learning in Robotics

UDRC-EURASIP Summer School 4

slide-6
SLIDE 6

Convolutional Neural Networks (CNNs)

UDRC-EURASIP Summer School 5

slide-7
SLIDE 7

From MLPs to CNNs

  • Feed-forward Neural Networks or Multi-Layer

Perceptrons (MLPs)

  • many multiplications
  • CNNs are similar to Feed-forward Neural Networks
  • convolution instead of general matrix multiplication

UDRC-EURASIP Summer School 6

slide-8
SLIDE 8

UDRC-EURASIP Summer School

CNNs

  • 3 Main Types of Layers:
  • convolutional layer
  • activation layer
  • pooling layer
  • repeat many times

input layer convolutional layer activation layer pooling layer fully-connected layer

……….

7

slide-9
SLIDE 9

32 32 3

32x32x3 image

width height depth

CNNs: Convolution Layer

UDRC-EURASIP Summer School

5x5x3 filter

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”

8

Slides courtesy of Andrej Karpathy

slide-10
SLIDE 10

32 32 3

5x5x3 filter 32x32x3 image

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume

CNNs: Convolution Layer

UDRC-EURASIP Summer School 9

slide-11
SLIDE 11

32 32 3

32x32x3 image 5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

CNNs: Convolution Layer

UDRC-EURASIP Summer School

2 important ideas:

  • local connectivity
  • parameter sharing

10

slide-12
SLIDE 12

32 32 3

32x32x3 image 5x5x3 filter

convolve (slide) over all spatial locations activation map 1 28 28

CNNs: Convolution Layer

UDRC-EURASIP Summer School 11

slide-13
SLIDE 13

32 32 3

32x32x3 image 5x5x3 filter

convolve (slide) over all spatial locations activation maps 1 28 28

consider a second, green filter

CNNs: Convolution Layer

UDRC-EURASIP Summer School 12

slide-14
SLIDE 14

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6

CNNs: Convolution Layer

UDRC-EURASIP Summer School 13

slide-15
SLIDE 15

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps:

We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead?

CNNs: Convolution Layer

UDRC-EURASIP Summer School 14

courtesy of Andrej Karpathy

slide-16
SLIDE 16

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps:

We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead? A: (32*32*3)*(28*28*6) = 14.5M parameters, ~14.5M multiplies

CNNs: Convolution Layer

UDRC-EURASIP Summer School 15

slide-17
SLIDE 17

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps:

We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead?

CNNs: Convolution Layer

UDRC-EURASIP Summer School 16

slide-18
SLIDE 18

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps:

We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? --- And how many multiplies? A: (5*5*3)*6 = 450 parameters

CNNs: Convolution Layer

UDRC-EURASIP Summer School 17

slide-19
SLIDE 19

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps:

We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? A: (5*5*3)*6 = 450 parameters, (5*5*3)*(28*28*6) = ~350K multiplies

CNNs: Convolution Layer

UDRC-EURASIP Summer School

2 Merits:

  • vastly reduce the amount of parameters
  • more efficient

18

slide-20
SLIDE 20

UDRC-EURASIP Summer School

CNNs: Activation Layer

  • 3 Main Types of Layers:
  • convolutional layer
  • activation layer
  • pooling layer

input layer convolutional layer activation layer pooling layer fully-connected layer

……….

19

slide-21
SLIDE 21

UDRC-EURASIP Summer School

CNNs: Pooling Layer

  • 3 Main Types of Layers:
  • convolutional layer
  • activation layer
  • pooling layer
  • repeat many times

input layer convolutional layer activation layer pooling layer fully-connected layer

……….

makes the representations smaller and more manageable

20

slide-22
SLIDE 22

CNNs: A sequence of Convolutional Layers

UDRC-EURASIP Summer School

32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU

….

10 24 24

21

slide-23
SLIDE 23

Deep Learning Architectures

UDRC-EURASIP Summer School 22

slide-24
SLIDE 24

Hand-Crafted Features by Human

UDRC-EURASIP Summer School

Feature Extraction (hand-crafted) Activities, Context, … Locations, Scene types, Semantics, … Objects, Structure, … Inference Pervasive Data

time-series data vision point cloud

23

slide-25
SLIDE 25

Feature Engineering and Representation

UDRC-EURASIP Summer School 24

vision

2563x800x600

point cloud

2?x?x?

Raw data ≈ Bad Representation

Pervasive Data

time-series data

slide-26
SLIDE 26

Deep Learning: Representation Learning

UDRC-EURASIP Summer School

End-to-End Learning Locations, Scene types, … Activities, Context, … Structure, Semantics, …

automatically learn effective feature representation to solve the problem

Inference

time-series data vision point cloud

Pervasive Data

25

slide-27
SLIDE 27

LeNet - 1998

UDRC-EURASIP Summer School 26

  • Convolution:
  • locally-connected
  • spatially weight-sharing
  • weight-sharing is a key in DL
  • Subsampling
  • Fully-connected outputs

“Gradient-based learning applied to document recognition”, LeCun et al. 1998

Foundation of modern ConvNets!

slide-28
SLIDE 28

AlexNet – 2012

UDRC-EURASIP Summer School 27

8 layers: 5 conv and max-pooling + 3 fully-connected LeNet-style backbone, plus:

  • ReLU
  • Accelerate training
  • better gradprop (vs. tanh)
  • Dropout
  • Reduce overfitting
  • Data augmentation
  • Image transformation
  • Reduce overfitting

“ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky, Sutskever, Hinton. NIPS 2012

slide-29
SLIDE 29

VGG16/19 - 2014

UDRC-EURASIP Summer School 28

Very deep ConvNet Modularized design

  • 3x3 Conv as the module
  • Stack the same module
  • Same computation for each module

Stage-wise training

  • VGG-11 => VGG-13 => VGG-16

“Very Deep Convolutional Networks for Large-Scale Image Recognition”, Simonyan & Zisserman. arXiv 2014 (ICLR 2015)

slide-30
SLIDE 30

GoogleNet/Inception - 2014

UDRC-EURASIP Summer School 29

22 layers Multiple branches

  • e.g., 1x1, 3x3, 5x5, pooling
  • merged by concatenation
  • Reduce dimensionality by 1x1 before expensive

3x3/5x5 conv

Szegedy et al. “Going deeper with convolutions”. arXiv 2014 (CVPR 2015)

slide-31
SLIDE 31

Going Deeper

UDRC-EURASIP Summer School 30

Simply stacking layers?

  • Plain nets: stacking 3x3 conv layers
  • 56-layer net has higher training error and test error than

20-layer net

  • A deeper model should not have higher training error
slide-32
SLIDE 32

Going Deeper

UDRC-EURASIP Summer School 31

Problem: deeper plain nets have higher training error on various datasets Optimization difficulties:

  • vanishing gradient
  • solvers struggle to find the solution when going deeper

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.

Cannot go deeper for deep neural networks!

slide-33
SLIDE 33

ResNets-2016

UDRC-EURASIP Summer School 32

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.

Residual net Plain net

gradients can flow directly through the skip connections backwards

slide-34
SLIDE 34

ResNets-2016

UDRC-EURASIP Summer School 33

  • Deep ResNets can be trained easier
  • Deeper ResNets have lower training error, and also lower test

error

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.

slide-35
SLIDE 35

ImageNet experiments

UDRC-EURASIP Summer School 34

top 5 error %

slide-36
SLIDE 36

DenseNets - 2018

  • simply connect every layer

directly with each other

  • each layer has direct access to

the gradients from the loss function and the original input image

  • exploit the potential of the

network through feature reuse

  • DenseNets concatenate the
  • utput feature maps of the

layer with the incoming feature maps.

UDRC-EURASIP Summer School 35

  • G. Huang, Z. Liu and L. van der Maaten, “Densely Connected Convolutional Networks,” 2018.
slide-37
SLIDE 37

MobileNets - 2017

UDRC-EURASIP Summer School 36

Light-weight ConvNets for mobile applications using depth- wise convolutions

  • Howard. et. al. MobileNets: Efficient Convolutional Neural Networks for Mobile VisionApplications 2017

multiply–accumulate operation

slide-38
SLIDE 38

Deep Learning Applications

UDRC-EURASIP Summer School 37

slide-39
SLIDE 39

Deep Learning Applications

UDRC-EURASIP Summer School 38

Feature Extractor Data Feature

(feature from last Conv layer)

Application

Backbone network

slide-40
SLIDE 40

Object Detection and Recognition

UDRC-EURASIP Summer School 39

Vision based: RCNN, Fast RCNN, Faster RCNN, YOLO, SSD,…..

slide-41
SLIDE 41

Object Detection and Recognition

UDRC-EURASIP Summer School 40

Radar and sonar based object detection and recognition

  • bject detection from sonar images

[Valdenegro 2016] vehicle detection using polarised infrared sensors [Sheeny 2018]

  • bject detection/recognition on side scan
  • bject detection/recognition on radar
slide-42
SLIDE 42

Semantic Segmentation

UDRC-EURASIP Summer School 41

FCN, SegNet, RefineNet, PSPNet, …..

slide-43
SLIDE 43

Visual Odometry

  • DeepVO, UnDeepVO, VINet, …..

UDRC-EURASIP Summer School 42

slide-44
SLIDE 44

Image based Localisation

UDRC-EURASIP Summer School 43

PoseNet, VidLoc, …. : map images to 6 DoF poses

slide-45
SLIDE 45

3D Reconstruction

  • OctNet, Octree Generative Network (OGN), Mesh R-

CNN, …

UDRC-EURASIP Summer School 44

slide-46
SLIDE 46

Semantic Mapping

UDRC-EURASIP Summer School 45

slide-47
SLIDE 47

Robot Navigation

UDRC-EURASIP Summer School 46

slide-48
SLIDE 48

Summary

  • Deep Learning is a powerful tool
  • Learning representation is the key for Deep Learning

UDRC-EURASIP Summer School 47

slide-49
SLIDE 49

Thank you for your attention!

Slides adapted from Andrej Karpathy, Kaiming He