CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State - - PowerPoint PPT Presentation

cse 802 spring 2017 deep learning
SMART_READER_LITE
LIVE PREVIEW

CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State - - PowerPoint PPT Presentation

CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State University February 13-15, 2017 1 Deep Learning in Computer Vision Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014 2 Deep Learning in Computer


slide-1
SLIDE 1

CSE 802 Spring 2017 Deep Learning

Inci M. Baytas

Michigan State University February 13-15, 2017

1

slide-2
SLIDE 2

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

2

Deep Learning in Computer Vision

slide-3
SLIDE 3

3

Deep Learning in Computer Vision

Microsoft Deep Learning Semantic Image Segmentation

slide-4
SLIDE 4

4

Deep Learning in Computer Vision

NeuralTalk and Walk, recognition, text description of the image while walking.

slide-5
SLIDE 5

5

Deep Learning in Robotics

Self Driving Cars

slide-6
SLIDE 6

6

Deep Learning in Robotics

Deep Sensimotor Learning

slide-7
SLIDE 7
  • Natural Language Processing (NLP)
  • Speech recognition and machine translation

7

Other Applications of Deep Learning

Why Should We Be Impressed?

  • Automated vision (e.g., object recognition) is challenging: different

viewpoints, scales, occlusions, illumination,…

  • Robotics (e.g., autonomous driving) in real life environments

(constantly changing, new tasks without guidance, unexpected factors) is challenging

  • NLP (e.g., understanding human conversations) is an extremely

complex task: noise, context, partial sentences, different accent,..

slide-8
SLIDE 8

Why Is Deep Learning So Popular Now?

  • Better hardware
  • Bigger data
  • Regularization methods (dropout)
  • Variety of optimization methods
  • SGD
  • Adagrad
  • Adadelta
  • ADAM
  • RMS Prop

8

slide-9
SLIDE 9

Criticism and Limitations of Deep Networks

  • Large amount of data required for training
  • High performance computing a necessity
  • Non-optimal method
  • Task specific
  • Lack of theoretical understanding

9

slide-10
SLIDE 10

10

Common Deep Network Types

Feed forward networks Convolutional neural networks Recurrent neural networks

slide-11
SLIDE 11

Components of Deep Learning

11

Loss functions

  • Squared loss: (y - f(x))2
  • Logistic loss: log(1 + e-yf(x))
  • Hinge loss: (1 + yf(x))+
  • Squared hinge loss: (1 + yf(x))+

2

Non-linear activation functions

  • Linear
  • Tanh
  • Sigmoid
  • Softmax
  • ReLU
slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

Components of Deep Learning

Optimizers

  • Gradient Descent
  • Adagrad (Adaptive Gradient Algorithm)
  • Adadelta (An Adaptive Learning Rate Method)
  • ADAM (Adaptive Moment Estimation)
  • RMSProp

Regularization Methods

  • L2 norm
  • L1 norm
  • Dataset Augmentation
  • Noise robustness
  • Early stopping
  • Dropout [12]
slide-14
SLIDE 14

14

Components of Deep Learning

Number of iterations

  • Less iterations: may underfitting
  • More iterations: use a stopping criteria

Step size

  • Very large step size: may miss optimal point
  • Very small step size: takes longer to converge

Parameter Initialization

  • Initializing with zeros
  • Random initialization
  • Xavier initialization
slide-15
SLIDE 15

15

Components of Deep Learning

Batch size

  • Bigger batch size: might require less iterations
  • Smaller batch size: will need more iterations

Number of layers

  • More layers (more depth): introducing more non-linearity, more complexity,

more parameters

  • Too many layers might cause overfitting.

Number of hidden parameters

  • Large number of hidden layer: more model complexity, can approximate a

more complex classifier

  • Too many parameters: overfitting, increased training time
slide-16
SLIDE 16
  • Convolutional networks are simply neural networks that use convolution

in place of general matrix multiplication in at least one of their layers [1].

16

Convolutional Neural Networks

Convolution:

  • A linear operator
  • Cross-correlation with a flipped

kernel.

  • Convolution in spatial domain

corresponds to multiplication in frequency domain.

slide-17
SLIDE 17
  • Feed forward networks that can extract topological features

from images.

  • Can provide invariance to geometric distortions such as

translation, scaling, and rotation.

  • Hierarchical and robust feature extraction was done before

CNNs.

  • CNN is data-driven.
  • Parameters of filters are learned from the data instead of

predefined values.

  • At each iteration, parameters are updated to minimize the

loss.

17

Convolutional Neural Networks (CNNs)

slide-18
SLIDE 18

18

Convolution Layer

  • Local (sparse)

connectivity

  • Reduces memory

requirements

  • Fewer operations
  • Parameter sharing
  • Same kernel used

at every position of

the input

  • How to choose the

filter size?

  • Receptive field
  • Equivariance

property

slide-19
SLIDE 19

19

Pooling Layer (Subsampling)

  • Convolution stage:
  • several convolutions in

parallel to produce a set

  • f linear activations
  • Followed by non-linear

activation

  • Then pooling layer:
  • Invariance to small

translations

  • Dealing with variable size

inputs

slide-20
SLIDE 20
  • Maps the latent

representation of input to

  • utput
  • Output:
  • One-hot representation of class

label

  • Predicted response
  • Appropriate activation

function, e.g., softmax for classification.

20

Fully-Connected Layer

slide-21
SLIDE 21

21

Feature Extraction with CNNs

slide-22
SLIDE 22

22

Some Example CNN Architectures

LeNet-5 [2]

slide-23
SLIDE 23

23

Some Example CNN Architectures

AlexNet (5 layers)

slide-24
SLIDE 24

24

VGG 16 [3]

Some Example CNN Architectures

slide-25
SLIDE 25

25

GoogLeNet (22 layers)

slide-26
SLIDE 26

26

Tricks to Improve CNN Performance

  • Data augmentation
  • Flipping (commonly used in face)
  • Translation
  • Rotation
  • Stretching
  • Normalizing, Whitening (less redundancy)
  • Cropping and alignment (for especially face)
slide-27
SLIDE 27

27

Project

  • You will implement 11-layer CNN architecture proposed in [6] to

extract features.

slide-28
SLIDE 28

28

Project

  • You can use a deep learning library to implement the network.
  • Library will take care of convolution, pooling, dropout, and

back propagation.

  • You need to define cost function and activation functions.
  • The activation function of the output layer is softmax since it is

a classification problem.

  • You can use tensorflow.
slide-29
SLIDE 29

29

HPCC

  • Data and evaluation protocol are on HPCC.
  • /mnt/research/CSE_802_SPR_17
  • To connect HPCC: ssh msunetid@hpcc.msu.edu and msu

email password

  • To run small examples use developer mode: ssh dev-intel14
  • Try to log in to HPCC and check the course research space.
  • Try to use a python IDE (PyCharm). Debug your code and

understand how tensorflow works (if you are not familiar with a deep learning library).

slide-30
SLIDE 30

30

Casia Dataset (Cropped Images)

  • The database contains 494,414

images.

  • 10,575 subjects in total
  • We provide cropped and original

images under /mnt/research/CSE_802_SPR_17

slide-31
SLIDE 31

31

Test Data and Evaluation Protocol

  • Final evaluation on

Labeled Faces in the Wild (LFW) database [7] with 13,233 images, 5,749 subjects.

  • Evaluation protocol:

○ BLUFR protocol [8]; find under /mnt/research/CSE_80 2_SPR_17

slide-32
SLIDE 32

32

References

1. http://www.deeplearningbook.org/ 2. http://yann.lecun.com/exdb/lenet/ 3. https://www.cs.toronto.edu/~frossard/post/vgg16/ 4.

  • A. Krizhevsky, I. Sutskever and G. E. Hinton “ImageNet Classification with Deep Convolutional Neural

Networks”, NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada 5. http://pubs.sciepub.com/ajme/2/7/9/ 6. Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li. Learning Face Representation from Scratch, arXiv:1411.7923v1 [cs.CV], 2014. 7. http://vis-www.cs.umass.edu/lfw/ 8. http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/ 9. http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html 10. https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc 11. Shengcai Liao, Zhen Lei, Dong Yi, Stan Z. Li, "A Benchmark Study of Large-scale Unconstrained Face Recognition." In IAPR/IEEE International Joint Conference on Biometrics, Sep. 29 - Oct. 2, Clearwater, Florida, USA, 2014. 12. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15 (2014) 1929-1958.