Learning for Computer Vision Ramprasaath Lecture Outline Computer - - PowerPoint PPT Presentation

learning for computer vision
SMART_READER_LITE
LIVE PREVIEW

Learning for Computer Vision Ramprasaath Lecture Outline Computer - - PowerPoint PPT Presentation

A Shallow Introduction to Deep Learning for Computer Vision Ramprasaath Lecture Outline Computer Vision Before (Image/Alex)Net era (Summer 1956-2012) After (Image/Alex)Net era (2012 present) Neural Networks (Brief


slide-1
SLIDE 1

A Shallow Introduction to Deep Learning for Computer Vision

Ramprasaath

slide-2
SLIDE 2

Lecture Outline

  • Computer Vision
  • Before (Image/Alex)Net era – (Summer 1956-2012)
  • After (Image/Alex)Net era – (2012 – present)
  • Neural Networks (Brief Introduction)
  • Need for CNNs
  • Visualizing, Understanding and Analyzing ConvNets
  • Transfer Learning
  • Going beyond Classification:
  • Localization
  • Detection
  • Segmentation
  • Depth Estimation
  • Video Classification
  • Image Ranking and retrieval
  • Image Captioning
  • Visual Question Answering
slide-3
SLIDE 3

Where are we

  • Computer Vision
  • Before (Image/Alex)Net era – (Summer 1956-2012)
slide-4
SLIDE 4

1956 Dartmouth AI Project

“We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”

http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

slide-5
SLIDE 5

1956 Dartmouth AI Project

Five of the attendees of the 1956 Dartmouth Summer Research Project on AI reunited in 2006: Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge, and Ray Solomonoff. Missing were: Arthur Samuel, Herbert Simon, Allen Newell, Nathaniel Rochester and Claude Shannon.

slide-6
SLIDE 6

The beginning of Computer Vision

  • During the summer of 1966, Dartmouth Professor Late Dr. Marvin

Minsky, asked a student to attach a camera to a Computer and asked him to write an algorithm that would allow the computer to describe what it sees.

slide-7
SLIDE 7

Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015

Example: Color (Hue) Histogram

Fei-Fei Li & Andrej Karpathy

hue bins

+1

Lecture 4 - 12

slide-8
SLIDE 8

Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015

Example: HOG features

8x8 pixel region, quantize the edge

  • rientation into 9 bins

Lecture 4 - 13

(images from vlfeat.org)

Fei-Fei Li & Andrej Karpathy

slide-9
SLIDE 9

Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015

Example: Bag of Words

gives a matrix of size [number_of_features x 144]

1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature

Lecture 4 - 15

Fei-Fei Li & Andrej Karpathy

slide-10
SLIDE 10

Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015

Example: Bag of Words

144 visual word vectors e.g. 1000 centroids 1000-d vector learn k-means centroids “vocabulary of visual words 1000-d vector 1000-d vector histogram of visual words

Lecture 4 - 16

Fei-Fei Li & Andrej Karpathy

slide-11
SLIDE 11

Traditional Object recognition

supervised Classifier SIFT HoG K-means Sparse Coding Pooling fixed unsupervised

slide-12
SLIDE 12

Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015

Most recognition systems are build on the same Architecture

CNNs: end-to-end models

(slide from Yann LeCun)

slide-13
SLIDE 13

Lecture Outline

  • Computer Vision
  • After (Image/Alex)Net era – (2012 – present)
slide-14
SLIDE 14

Convolution Pooling Softmax Other

GoogLeNet VGG MSRA SuperVision

[Krizhevsky NIPS 2012]

Year 2012 Year 2014

Fei-Fei Li & AndrejKarpathy 5- ‐ J a n

1 5 16

Dense grid descriptor: HOG,LBP Coding: local coordinate, super-‐vector Pooling,SPM LinearSVM

NEC-UIUC

[Lin CVPR2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014] [He arxiv 2014]

Lecture 1 -

Year 2010 Year 2015

[He arxiv 2015]

Fei-Fei Li & Andrej Karpathy

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

slide-15
SLIDE 15

Lecture Outline

  • Brief Introduction to Neural Networks
slide-16
SLIDE 16

Fei-Fei Li &Andrej Karpathy Lecture 5 - 21 Jan 2015

Neural Networks: Architectures

“Fully-connected” layers

Lecture 5 - 62

Neurons

Fei-Fei Li & Andrej Karpathy

slide-17
SLIDE 17

Fei-Fei Li &Andrej Karpathy Lecture 5 - 21 Jan 2015

Neural Networks: Architectures

“2-layer Neural Net”, or “1-hidden-layer Neural Net”

Lecture 5 - 63

“3-layer Neural Net”, or “2-hidden-layer Neural Net” “Fully-connected” layers

Fei-Fei Li & Andrej Karpathy

slide-18
SLIDE 18

Where are we

  • Need for CNNs
slide-19
SLIDE 19

21

Example: 200x200 image 40K hidden units ~2B parameters!!!

  • Spatial correlation is local
  • Waste of resources + we have not enough

training samples anyway..

Fully Connected Layer

Slide Credit: Marc'Aurelio Ranzato

slide-20
SLIDE 20

22

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition).

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

slide-21
SLIDE 21

23

STATIONARITY? Statistics is similar at different locations Note: This parameterization is good when input image is registered (e.g., face recognition). Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

slide-22
SLIDE 22

24

Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

Locality? Nearby pixels are correlated

slide-23
SLIDE 23

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra 25

slide-24
SLIDE 24

Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye?

Pooling Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra 26

slide-25
SLIDE 25

By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features.

Pooling Layer

Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra 27

slide-26
SLIDE 26

Fei-Fei Li &Andrej Karpathy Lecture 6 - 21 Jan 2015

Hyperparameters to play with:

  • network architecture
  • learning rate, its decay schedule, update type
  • regularization (L2/L1/Maxnorm/Dropout)
  • loss to use

(e.g. SVM/Softmax)

  • initialization

neural networks practitioner music = loss function

Lecture 6 - 36

Fei-Fei Li & Andrej Karpathy

slide-27
SLIDE 27

Y LeCun MA Ranzato

Normalization: eg. Contrast Normalization Filter Bank: Matrix Multiplication Non-Linearity: eg. ReLU Pooling: aggregation over space or feature type

Classifier feature Pooling Non- Linear Filter Bank Norm feature Pooling Non- Linear Filter Bank Norm

slide-28
SLIDE 28

Fast-forward to today

[From recent Yann LeCun slides]

Lecture 7 - 56

slide-29
SLIDE 29

Y LeCun MA Ranzato

SHALLOW DEEP Neural Networks D-AE DBN DBM AE Perceptron RBM BayesNP SVM Supervised Supervised Unsupervised Sparse GMM Coding  Probabilistic Models Boosting DecisionTree Neural Net RNN

  • Conv. Net
slide-30
SLIDE 30

Fei-Fei Li &Andrej Karpathy Lecture 7 - 21 Jan 2015

(convnetjs example of training of CIFAR-10)

Lecture 7 - 79

demo

Fei-Fei Li & Andrej Karpathy

slide-31
SLIDE 31

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Convolutional Layer

Lecture 8 - 9

Just like normal Hidden Layer BUT:

  • Connect neurons to the input

in a local receptive field

  • All neurons in a single depth

slice share weights

Fei-Fei Li & Andrej Karpathy

slide-32
SLIDE 32

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

The weights of this neuron visualized

Lecture 8 - 10

Fei-Fei Li & Andrej Karpathy

slide-33
SLIDE 33

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

convolving the first filter in the input gives the first slice of depth in output volume

Lecture 8 - 11

Fei-Fei Li & Andrej Karpathy

slide-34
SLIDE 34

Visualizing Learned Filters

(C) Dhruv Batra 38

Figure Credit: [Zeiler & Fergus ECCV14]

slide-35
SLIDE 35

Visualizing Learned Filters

(C) Dhruv Batra 39

Figure Credit: [Zeiler & Fergus ECCV14]

slide-36
SLIDE 36

Visualizing Learned Filters

(C) Dhruv Batra 40

Figure Credit: [Zeiler & Fergus ECCV14]

slide-37
SLIDE 37

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd) TOTAL params: 138M parameters

... POOL2: [14x14x512] memory: 14*14*512=100K params: 0 params: (3*3*512) params: (3*3*512) params: (3*3*512) CONV3-512: [14x14x512] memory: 14*14*512=100K *512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K *512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K *512 = 2,359,296 POOL2: [7x7x512] memory: 7*7*512=25K params: 0 POOL2: [7x7x512] memory: 7*7*512=25K params: 0 FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448 FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216 FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000

“CNN code”

A CNN transforms the image to 4096 numbers that are then linearly classified.

Q: What is the learned CNN representation?

Lecture 8 - 16

Fei-Fei Li & Andrej Karpathy

slide-38
SLIDE 38

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Visualizing the CNN code representation

(“CNN code” = 4096-D vector before classifier) query image

Lecture 8 - 17

nearest neighbors in the “code” space

(But we’d like a more global way to visualize the distances)

Fei-Fei Li & Andrej Karpathy

slide-39
SLIDE 39

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

t-SNE visualization

[van der Maaten & Hinton] Embed high-dimensional points so that locally, pairwise distances are conserved i.e. similar things end up in similar

  • places. dissimilar things end up wherever

Right: Example embedding of MNIST digits (0-9) in 2D

Lecture 8 - 18

Fei-Fei Li & Andrej Karpathy

slide-40
SLIDE 40

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

t-SNE visualization: two images are placed nearby if their CNN codes are close. See more:

http://cs.stanford. edu/people/karpathy/cnnembed/

Lecture 8 - 19

slide-41
SLIDE 41

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

t-SNE visualization

Lecture 8 - 20

Fei-Fei Li & Andrej Karpathy

slide-42
SLIDE 42

Where are we

  • Visualizing, Understanding and Analyzing ConvNets
slide-43
SLIDE 43

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Q: What images maximize the score of some class in a ConvNet?

Lecture 8 - 21

slide-44
SLIDE 44

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

  • 1. Find images that maximize some class score:

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Remember: Score for class c (before Softmax)

Lecture 8 - 22

Regularization

slide-45
SLIDE 45

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

  • 1. Find images that maximize some class score:

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Lecture 8 - 23

Fei-Fei Li & Andrej Karpathy

slide-46
SLIDE 46

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Q: What do the individual neurons look for in an image?

Lecture 8 - 28

Fei-Fei Li & Andrej Karpathy

slide-47
SLIDE 47

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Visualizing and Understanding Convolutional Networks Zeiler & Fergus, 2013

Visualizing arbitrary neurons along the way to the top...

Lecture 8 - 30

slide-48
SLIDE 48

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Visualizing arbitrary neurons along the way to the top...

Lecture 8 - 31

slide-49
SLIDE 49

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Visualizing arbitrary neurons along the way to the top...

Lecture 8 - 32

slide-50
SLIDE 50

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Rich feature hierarchies for accurate object detection and semantic segmentation [Girshick, Donahue, Darrell, Malik]

Lecture 8 - 29

slide-51
SLIDE 51

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015 Lecture 8 - 34

Fei-Fei Li & Andrej Karpathy

slide-52
SLIDE 52

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Question: Given a CNN code, is it possible to reconstruct the original image?

Lecture 8 - 35

slide-53
SLIDE 53

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Find an image such that:

  • Its code is similar to a given code
  • It “looks natural” (image prior regularization)

Lecture 8 - 37

slide-54
SLIDE 54

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Understanding Deep Image Representations by Inverting Them [Mahendran and Vedaldi, 2014]

  • riginal image

reconstructions from the 1000 log probabilities for ImageNet (ILSVRC) classes

Lecture 8 - 36

slide-55
SLIDE 55

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Reconstructions from the representation after last pooling layer

(immediately before the first Fully Connected layer) Lecture 8 - 38

slide-56
SLIDE 56

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Reconstructions from intermediate layers

Lecture 8 - 39

slide-57
SLIDE 57

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Multiple reconstructions. Images in quadrants all “look” the same to the CNN (same CNN code)

Lecture 8 - 40

slide-58
SLIDE 58

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

We can pose an optimization over the input image to maximize any class score. That seems useful. Question: Can we use this to “fool” ConvNets?

Lecture 8 - 41

slide-59
SLIDE 59

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Intriguing properties of neural networks [Szegedy et al.]

Lecture 8 - 42

correct +distort

  • strich

correct +distort

  • strich
slide-60
SLIDE 60

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

These kinds of results were around even before ConvNets…

Exploring the Representation Capabilities of the HOG Descriptor [Tatu et al., 2011]

Identical HOG represention

Lecture 8 - 43

slide-61
SLIDE 61

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images [Nguyen, Yosinski, Clune] >99.6% confidences

Lecture 8 - 44

slide-62
SLIDE 62

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images [Nguyen, Yosinski, Clune] >99.6% confidences

Lecture 8 - 45

slide-63
SLIDE 63

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

CNN vs. Human (Andrej)

[What I learned from competing against a ConvNet on ImageNet] Karpathy, 2014: http://bit.ly/humanvsconvnet Try it out yourself: http://cs.stanford.edu/people/karpathy/ilsvrc/

Lecture 8 - 61

Fei-Fei Li & Andrej Karpathy

slide-64
SLIDE 64

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

MSRA (152 layer): 3.57%

Lecture 8 - 63

GoogLeNet: 6.8% Andrej: 5.1% phew...

Fei-Fei Li & Andrej Karpathy

slide-65
SLIDE 65

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Section Summary:

Lecture 8 - 64

  • We looked at several works that try to visualize how

ConvNets work and what they learn

  • We saw that you can “break them”, but this is not a

problem with deep learning (in fact, DL will be the solution), and has little to do with Computer Vision or

  • ConvNets. It’s a problem with the mathematical forms

we use in forward pass and training objective.

  • We looked at where ConvNets work and don’t work
slide-66
SLIDE 66

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

What makes ConvNets tick?

  • depth
  • small filter sizes
  • Conv layers > FC layers

Lecture 9 - 26

Fei-Fei Li & Andrej Karpathy

slide-67
SLIDE 67

So basically add several layers and then learn them??

slide-68
SLIDE 68

Wait… It can’t be that simple…

Where is the catch??

slide-69
SLIDE 69

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

“You need a lot of a data if you want to train/use CNNs”

Lecture 9 - 27

slide-70
SLIDE 70

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015 Lecture 9 - 28

“You need a lot of a data if you want to train/use CNNs”

slide-71
SLIDE 71

Transfer Learning

To the rescue…

slide-72
SLIDE 72

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Transfer Learning with CNNs

  • 1. Train on

Imagenet

Lecture 9 - 29

slide-73
SLIDE 73

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Transfer Learning with CNNs

  • 1. Train on

Imagenet

  • 2. If small dataset: fix

all weights (treat CNN as fixed feature extractor), retrain only the classifier i.e. swap the Softmax layer at the end

Lecture 9 - 30

slide-74
SLIDE 74

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Transfer Learning with CNNs

  • 1. Train on

Imagenet

  • 2. If small dataset: fix

all weights (treat CNN as fixed feature extractor), retrain only the classifier i.e. swap the Softmax layer at the end

  • 3. If you have medium sized

dataset, “finetune” instead: use the old weights as initialization, train the full network or only some of the higher layers retrain bigger portion of the network, or even all of it.

Lecture 9 - 31

slide-75
SLIDE 75

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Transfer Learning with CNNs

  • 1. Train on

Imagenet

  • 2. If small dataset: fix

all weights (treat CNN as fixed feature extractor), retrain only the classifier i.e. swap the Softmax layer at the end

  • 3. If you have medium sized

dataset, “finetune” instead: use the old weights as initialization, train the full network or only some of the higher layers retrain bigger portion of the network, or even all of it. tip: use only ~1/10th of the original learning rate in finetuning to player, and ~1/100th on intermediate layers

Lecture 9 - 32

slide-76
SLIDE 76

Where are we

  • Going beyond Classification:
  • Localization
  • Detection
  • Segmentation
  • Depth Estimation
  • Video Classification
  • Image Ranking and retrieval
  • Image Captioning
  • Visual Question Answering
slide-77
SLIDE 77

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015 Lecture 11 - 15

Beyond Image Classification

slide-78
SLIDE 78

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Localization

Model must output:

  • class (integer)
  • x1,y1,x2,y2 bounding

box coordinates

Lecture 11 - 16

Fei-Fei Li & Andrej Karpathy

slide-79
SLIDE 79

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Very ry Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., 2014 OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, Sermanet et al., 2014

Idea: train a Localization net Take out Softmax loss, swap in L2 (regression) loss, fine-tune the classification network.

swap the Softmax layer at the end with L2 loss

Lecture 11 - 17

Fei-Fei Li & Andrej Karpathy

slide-80
SLIDE 80

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Detection

Model must output: A set of detections Each detection has:

  • confidence
  • class (integer)
  • x1,y1,x2,y2

bounding box coordinates

Lecture 11 - 24

Fei-Fei Li & Andrej Karpathy

slide-81
SLIDE 81

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Rich feature hierarchies for accurate object detection and semantic segmentation [Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik]

Idea: Turn a Detection Problem into an Image Classification problem (but over image regions). Content of every labeled bounding box for is a positive example for a class. Every other bounding box in the image is a special negative class.

Lecture 11 - 25

Fei-Fei Li & Andrej Karpathy

slide-82
SLIDE 82

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Rich feature hierarchies for accurate object detection and semantic segmentation [Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik]

Idea: Turn a Detection Problem into an Image Classification problem (but over image regions).

Lecture 11 - 26

slide-83
SLIDE 83

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Segmentation

Fully Convolutional Networks for Semantic Segmentation Long, Shelhamer, Darrell

Lecture 11 - 29

slide-84
SLIDE 84

Time for a short game

slide-85
SLIDE 85

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

[Eigen et al.], 2014

Lecture 11 - 30

Monocular Depth Estimation

slide-86
SLIDE 86

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Video Classification

Two-Stream Convolutional Networks for Action Recognition in Videos [Simonyan et al.], 2014 Large-scale Video Classification with Convolutional Neural Networks [Karpathy et al.], 2014 Long-term Recurrent Convolutional Networks for Visual Recognition and Description [Donahue et al.], 2014

Lecture 11 - 31

slide-87
SLIDE 87

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Image Captioning

Fei-Fei Li & Andrej Karpathy

Lecture 11 - 32

slide-88
SLIDE 88

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Neural Networks practitioner

Lecture 11 - 33

slide-89
SLIDE 89

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Convolutional Neural Network

Lecture 11 - 34

Recurrent Neural Network

Fei-Fei Li & Andrej Karpathy

slide-90
SLIDE 90

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [Kiros, Salakhutdinov, Zemel, 2014]

Lecture 11 - 74

Image Ranking and Retrieval

slide-91
SLIDE 91

Visual Question Answering

  • S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh
slide-92
SLIDE 92

Pointers to some references

  • Libraries/ Framework:
  • Caffe
  • Torch
  • Tensor Flow
  • Theano
  • Detailed Lectures:
  • CS231n – Fei Fei Li, Karpathy, and Justin
  • ECE 6504 – Dhruv Batra
  • Books:
  • – Bengio
slide-93
SLIDE 93

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

You are now ready.

Lecture 12 - 20

slide-94
SLIDE 94

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

You are now ready.

Lecture 12 - 21

slide-95
SLIDE 95

Fei-Fei Li &Andrej Karpathy Lecture 8 - 2 Feb 2015

You are now ready.

Lecture 12 - 22

slide-96
SLIDE 96

Thank for coming

Questions?? Comments?? Thoughts?? Ideas??