Learning for Computer Vision Ramprasaath Lecture Outline Computer - PowerPoint PPT Presentation

A Shallow Introduction to Deep Learning for Computer Vision Ramprasaath

Lecture Outline • Computer Vision • Before (Image/Alex)Net era – (Summer 1956-2012) • After (Image/Alex)Net era – (2012 – present) • Neural Networks (Brief Introduction) • Need for CNNs • Visualizing, Understanding and Analyzing ConvNets • Transfer Learning • Going beyond Classification: • Localization • Detection • Segmentation • Depth Estimation • Video Classification • Image Ranking and retrieval • Image Captioning • Visual Question Answering

Where are we • Computer Vision • Before (Image/Alex)Net era – (Summer 1956-2012)

1956 Dartmouth AI Project “ We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. ” http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

1956 Dartmouth AI Project Five of the attendees of the 1956 Dartmouth Summer Research Project on AI reunited in 2006 : Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge, and Ray Solomonoff. Missing were: Arthur Samuel, Herbert Simon, Allen Newell, Nathaniel Rochester and Claude Shannon.

The beginning of Computer Vision • During the summer of 1966, Dartmouth Professor Late Dr. Marvin Minsky, asked a student to attach a camera to a Computer and asked him to write an algorithm that would allow the computer to describe what it sees.

Example: Color (Hue) Histogram hue bins +1 Fei-Fei Li &Andrej Karpathy Lecture 4 - Lecture 4 - 12 7 Jan 2015 Fei-Fei Li & Andrej Karpathy

Example: HOG features 8x8 pixel region, quantize the edge orientation into 9 bins (images from vlfeat.org) Fei-Fei Li &Andrej Karpathy Lecture 4 - Lecture 4 - 13 7 Jan 2015 Fei-Fei Li & Andrej Karpathy

Example: Bag of Words 1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature gives a matrix of size [number_of_features x 144] Fei-Fei Li &Andrej Karpathy Lecture 4 - Lecture 4 - 15 7 Jan 2015 Fei-Fei Li & Andrej Karpathy

Example: Bag of Words histogram of visual words visual word vectors 1000-d vector learn k-means centroids “vocabulary of visual words 144 1000-d vector e.g. 1000 centroids 1000-d vector Fei-Fei Li &Andrej Karpathy Lecture 4 - Lecture 4 - 16 7 Jan 2015 Fei-Fei Li & Andrej Karpathy

Traditional Object recognition K-means SIFT Pooling Classifier Sparse Coding HoG supervised unsupervised fixed

Most recognition systems are build on the same Architecture CNNs: end-to-end models Fei-Fei Li &Andrej Karpathy Lecture 4 - 7 Jan 2015 (slide from Yann LeCun)

Lecture Outline • Computer Vision • After (Image/Alex)Net era – (2012 – present)

Year 2012 Year 2014 Year 2015 Year 2010 NEC-UIUC SuperVision GoogLeNet VGG MSRA Dense grid descriptor: HOG,LBP Coding: local coordinate, super- ‐vector Pooling,SPM LinearSVM Convolution Pooling Softmax Other [Simonyan arxiv 2014] [He arxiv 2014] [Lin CVPR2011] [Krizhevsky NIPS 2012] [Szegedy arxiv 2014] Deep Residual Learning for Image Recognition Fei-Fei Li & AndrejKarpathy Lecture 1 - 5- J ‐ a n 1 ‐ - 5 16 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Fei-Fei Li & Andrej Karpathy [He arxiv 2015]

Lecture Outline • Brief Introduction to Neural Networks

Neural Networks: Architectures Neurons “Fully - connected” layers Fei-Fei Li &Andrej Karpathy Lecture 5 - Lecture 5 - 62 21 Jan 2015 Fei-Fei Li & Andrej Karpathy

Neural Networks: Architectures “3 - layer Neural Net”, or “2 - layer Neural Net”, or “2 -hidden-layer Neural Net” “1 -hidden-layer Neural Net” “Fully - connected” layers Fei-Fei Li &Andrej Karpathy Lecture 5 - Lecture 5 - 63 21 Jan 2015 Fei-Fei Li & Andrej Karpathy

Where are we • Need for CNNs

Fully Connected Layer Example: 200x200 image 40K hidden units ~2B parameters !!! - Spatial correlation is local - Waste of resources + we have not enough training samples anyway.. 21 Slide Credit: Marc'Aurelio Ranzato

Locally Connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). 22 Slide Credit: Marc'Aurelio Ranzato

Locally Connected Layer STATIONARITY? Statistics is similar at different locations Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). 23 Slide Credit: Marc'Aurelio Ranzato

Convolutional Layer Locality? Nearby pixels are correlated Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels 24 Slide Credit: Marc'Aurelio Ranzato

Convolutional Layer Learn multiple filters. E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters (C) Dhruv Batra 25 Slide Credit: Marc'Aurelio Ranzato

Pooling Layer Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye? (C) Dhruv Batra 26 Slide Credit: Marc'Aurelio Ranzato

Pooling Layer By “ pooling ” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features. (C) Dhruv Batra 27 Slide Credit: Marc'Aurelio Ranzato

Hyperparameters to play with: - network architecture - learning rate, its decay schedule, update type - regularization (L2/L1/Maxnorm/Dropout) - loss to use (e.g. SVM/Softmax) - initialization neural networks practitioner music = loss function Fei-Fei Li &Andrej Karpathy Lecture 6 - Lecture 6 - 36 21 Jan 2015 Fei-Fei Li & Andrej Karpathy

Y LeCun MA Ranzato Filter Non- feature Filter Non- feature Norm Classifier Norm Bank Linear Pooling Bank Linear Pooling Normalization: eg. Contrast Normalization Filter Bank: Matrix Multiplication Non-Linearity: eg. ReLU Pooling: aggregation over space or feature type

[From recent Yann LeCun slides] Fast-forward to today Lecture 7 - 56

SHALLOW DEEP Y LeCun MA Ranzato Neural Networks Boosting Neural Net RNN AE D-AE Perceptron Conv. Net DBN SVM RBM DBM Sparse BayesNP GMM Coding  Probabilistic Models DecisionTree Unsupervised Supervised Supervised

(convnetjs example of training of CIFAR-10) demo Fei-Fei Li &Andrej Karpathy Lecture 7 - Lecture 7 - 79 21 Jan 2015 Fei-Fei Li & Andrej Karpathy

Convolutional Layer Just like normal Hidden Layer BUT: - Connect neurons to the input in a local receptive field - All neurons in a single depth slice share weights Lecture 8 - Fei-Fei Li &Andrej Karpathy Lecture 8 - 9 2 Feb 2015 Fei-Fei Li & Andrej Karpathy

The weights of this neuron visualized Fei-Fei Li &Andrej Karpathy Lecture 8 - Lecture 8 - 10 2 Feb 2015 Fei-Fei Li & Andrej Karpathy

convolving the first filter in the input gives the first slice of depth in output volume Fei-Fei Li &Andrej Karpathy Lecture 8 - Lecture 8 - 11 2 Feb 2015 Fei-Fei Li & Andrej Karpathy

Visualizing Learned Filters (C) Dhruv Batra 38 Figure Credit: [Zeiler & Fergus ECCV14]

Q: What is the learned CNN representation? ... “CNN code” POOL2: [14x14x512] memory: 14*14*512=100K params: 0 CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512) *512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512) *512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K params: (3*3*512) A CNN transforms the *512 = 2,359,296 image to 4096 numbers POOL2: [7x7x512] memory: 7*7*512=25K params: 0 POOL2: [7x7x512] memory: 7*7*512=25K params: 0 that are then linearly FC: [1x1x4096] memory: 4096 params: 7*7*512*4096 = 102,760,448 classified. FC: [1x1x4096] memory: 4096 params: 4096*4096 = 16,777,216 FC: [1x1x1000] memory: 1000 params: 4096*1000 = 4,096,000 TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd) TOTAL params: 138M parameters Fei-Fei Li &Andrej Karpathy Lecture 8 - Lecture 8 - 16 2 Feb 2015 Fei-Fei Li & Andrej Karpathy

Visualizing the CNN code representation (“CNN code” = 4096 -D vector before classifier) query image nearest neighbors in the “code” space (But we’d like a more global way to visualize the distances) Fei-Fei Li &Andrej Karpathy Lecture 8 - Lecture 8 - 17 2 Feb 2015 Fei-Fei Li & Andrej Karpathy

Learning for Computer Vision Ramprasaath Lecture Outline Computer - PowerPoint PPT Presentation

A Shallow Introduction to Deep Learning for Computer Vision Ramprasaath Lecture Outline Computer Vision Before (Image/Alex)Net era (Summer 1956-2012) After (Image/Alex)Net era (2012 present) Neural Networks (Brief

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Computer Vision Neurobio 230 Bill Lotter Exciting time: Neuroscience computer vision

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Seminar Important Developments in Computer Vision and Machine Learning Kickoff Meeting

Dr. Hoang Huu Hanh, OST - Hue University hanh-at-hueuni.edu.vn Clarification: Cl ifi ti

CSSE463: Image Recognition Day 2 Roll call Announcements: Reinstall Matlab if you are

Communication System Kai Zhang, Chenshu Wu^, Chaofan Yang, Yi Zhao, Kehong Huang, Chunyi Peng

Greater Value Portfolio 2019 Overview of Grant Program & Application Process Recorded

General Philosophy General Philosophy Dr Peter Millican, Hertford College Dr Peter Millican,

Philosophy of Science and Computational Liguistics Debate session Sara Stymne Uppsala

Causes and E fg ects Andreas Zeller 1 bug.c double bug(double z[], int n) { int i, j; i = 0;

Arguments and Problems for Non-Cognitivism Felix Pinkert 103 Ethics: Metaethics, University of

Learning for Computer Vision Ramprasaath Lecture Outline Computer - PowerPoint PPT Presentation

A Shallow Introduction to Deep Learning for Computer Vision Ramprasaath Lecture Outline Computer Vision Before (Image/Alex)Net era (Summer 1956-2012) After (Image/Alex)Net era (2012 present) Neural Networks (Brief

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Computer Vision Introduction Historical context Connections to other disciplines Vision and

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Computer Vision Neurobio 230 Bill Lotter Exciting time: Neuroscience computer vision

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

Seminar Important Developments in Computer Vision and Machine Learning Kickoff Meeting

Dr. Hoang Huu Hanh, OST - Hue University hanh-at-hueuni.edu.vn Clarification: Cl ifi ti

CSSE463: Image Recognition Day 2 Roll call Announcements: Reinstall Matlab if you are

Communication System Kai Zhang*, Chenshu Wu^, Chaofan Yang*, Yi Zhao*, Kehong Huang*, Chunyi Peng

Greater Value Portfolio 2019 Overview of Grant Program &amp; Application Process Recorded

General Philosophy General Philosophy Dr Peter Millican, Hertford College Dr Peter Millican,

Philosophy of Science and Computational Liguistics Debate session Sara Stymne Uppsala

Causes and E fg ects Andreas Zeller 1 bug.c double bug(double z[], int n) { int i, j; i = 0;

Arguments and Problems for Non-Cognitivism Felix Pinkert 103 Ethics: Metaethics, University of

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Communication System Kai Zhang, Chenshu Wu^, Chaofan Yang, Yi Zhao, Kehong Huang, Chunyi Peng

Greater Value Portfolio 2019 Overview of Grant Program & Application Process Recorded