CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv Layers Forward mode vs Reverse mode AD Modern CNN Architectures Zsolt Kira Georgia Tech The architecture of LeNet5 Handwriting Recognition


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– (Finish) Computing Gradients – Backprop in Conv Layers – Forward mode vs Reverse mode AD – Modern CNN Architectures

slide-2
SLIDE 2

The architecture of LeNet5

slide-3
SLIDE 3

Handwriting Recognition Example

slide-4
SLIDE 4

Translation Invariance

slide-5
SLIDE 5

Some Rotation Invariance

slide-6
SLIDE 6

Some Scale Invariance

slide-7
SLIDE 7

Case Studies

  • There are several generations of ConvNets

– 2012 – 2014: AlexNet, ZNet, VGGNet

  • Conv-Relu, Pooling, Fully connected, Softmax
  • Deeper ones (VGGNet) tend to do better

– 2014

  • Fully-convolutional networks for semantic segmentation
  • Matrix outputs rather than just one probability distribution

– 2014-2016

  • Fully-convolutional networks for classification
  • Less parameters, faster than comparable Gen1 networks
  • GoogleNet, ResNet

– 2014-2016

  • Detection layers (proposals)
  • Caption generation (combine with RNNs for language)
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

An Aside

slide-32
SLIDE 32

AlexNet: 60M params ZNet: 75M VGG: 138M GoogleNet: 5M

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

Importance of Depth

  • After a while, adding

depth decreases performance

  • At first,

vanishing/exploding gradients

  • normalized initialization
  • Batch normalization
  • 2nd order methods
  • Then, optimization

limitation

– Deeper network should be able to mimic shallow

  • nes
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

Localization and Detection

slide-43
SLIDE 43

Computer Vision Tasks

slide-44
SLIDE 44

Computer Vision Tasks

slide-45
SLIDE 45

Classification + Localization

slide-46
SLIDE 46

CLS - ImageNet

slide-47
SLIDE 47

Idea 1: Localization as Regression

slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52

Per-Class vs. Class Agnostic

slide-53
SLIDE 53

Where to attach?

slide-54
SLIDE 54

Multiple Objects

slide-55
SLIDE 55

Human Pose Estimation

slide-56
SLIDE 56

Sliding Window: Overfeat

slide-57
SLIDE 57

Sliding Window: Overfeat

slide-58
SLIDE 58

Sliding Window: Overfeat

slide-59
SLIDE 59

Sliding Window: Overfeat

slide-60
SLIDE 60

Sliding Window: Overfeat

slide-61
SLIDE 61

Sliding Window: Overfeat

slide-62
SLIDE 62

Sliding Window: Overfeat

Why aren’t boxes across grid?

slide-63
SLIDE 63

Sliding Window: Overfeat

slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73

Detection as Classification

slide-74
SLIDE 74

Detection as Classification

slide-75
SLIDE 75

Detection as Classification

slide-76
SLIDE 76

Detection as Classification

slide-77
SLIDE 77
slide-78
SLIDE 78

Detection as Classification

slide-79
SLIDE 79

R-CNN

slide-80
SLIDE 80
slide-81
SLIDE 81
slide-82
SLIDE 82

Region of Interest (ROI) Pooling

slide-83
SLIDE 83
slide-84
SLIDE 84