Learning and transferring mid-level image representions using - PowerPoint PPT Presentation

Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 mardi 5 août 14

Image classification (easy) Is there a car ? Source : Pascal VOC dataset 2 mardi 5 août 14

Image classification (harder) Is there a boat ? Source : Pascal VOC dataset 3 mardi 5 août 14

Image classification (harder) Is there a boat ? Source : Pascal VOC dataset 4 mardi 5 août 14

Image classification (v.hard) Is there a person ? Source : Pascal VOC dataset 5 mardi 5 août 14

Image classification (v.hard) Source : Pascal VOC dataset 6 mardi 5 août 14

Pascal VOC vs. ImageNet classification Pascal VOC : ImageNet : complex scenes object-centric 20 object classes 1000 object classes 10k images 1.2M images 7 mardi 5 août 14

Image classification • Traditional methods: HOG, SIFT, FV, SVMs, DPM, k-Means, GMM... [Csurka et al.'04], [Lowe'04], [Sivic & Zisserman'03], [Perronin et al.'10], [Lazebnik et al.'06], [Zhang et al. ’07], [Boureau et al.'10], [Singh et al.'12], [Juneja et al.'13], [Chatfield et al. ’11], [van Gemert et al. ’08], [Wang et al. ’10], [Zhou et al. ’10], [Dong et al. ’13], [Feifei et al. ’05], [Shotton et al. ’05], [Moosmann et al.’05], [Grauman & Darrell ’05] [Harzallah et al. ’09], [...] • Convolutional neural networks ImageNet challenge [Krizhevsky et al. 2012] 8 mardi 5 août 14

Brief history of CNNs • Rosenblatt, 1957 : The perceptron : a perceiving and recognizing automaton. • Hubel & Wiesel 1959 : Receptive fields of single neurons in the cat’s striate cortex • Fukushima 1980 : Neocognition • Rumelhart et al. 1986 : Learning representations by back-propagating errors • LeCun et al. 1989 : Backpropagation applied to handwritten zip code recognition. • LeCun et al. 1998 : Efficient Backprop • LeCun et al. 1998 : Gradient-based learning applied to document recognition • Hinton & Salakhutdinov, 2006 : Reducing the Dimensionality of Data with Neural Networks • Krizhevsky et al. 2012 : ImageNet classification with deep convolutional neural networks. • Zeiler & Fergus, 2013 : Visualizing and understanding neural networks • Sermanet et al. 2013 : Overfeat , • Donahue et al. 2013 : Decaf • Girshick et al. 2014 : Rich feature hierarchies for accurate object detection and semantic segmentation • Razavian et al. 2014 : CNN features off-the-shelf, an astounding baseline for recognition 9 • Chatfield et al. 2014 : Return of the devil in the details mardi 5 août 14

Neural Networks layers X 0 X 1 X 2 Cost w 1 w 2 Input weights (parameters) Differentiable operations : weights trained by gradient descent. 10 mardi 5 août 14

8-layer NN [Krizhevsky et al.] 60 million parameters : - ImageNet (1.2M images) : OK - Pascal VOC (10k images) : ? 11 mardi 5 août 14

Pascal VOC : di fg erent task Typical car examples from ImageNet Car examples from Pascal VOC 12 mardi 5 août 14

Pascal VOC : di fg erent task Typical car examples from ImageNet Car examples from Pascal VOC 13 mardi 5 août 14

Solution : multi-scale patch tiling • Goal : obtain a dataset that looks like ImageNet . Small-scale tiling Large-scale tiling Typical Pascal VOC car example ... ... in disguise Typical car examples from ImageNet 14 mardi 5 août 14

Solution : multi-scale patch tiling • Around 500 tiles per image. • Multiple scales and positions. • Label depending on overlap. background car car 15 mardi 5 août 14

First attempt • Train CNN on Pascal VOC patches : • Result : 70.9% mAP. • We observe overfitting . • State of the art : 82.2% mAP (NUS-PSL). • How to benefit from the power of neural networks ? We propose transfer learning . 16 mardi 5 août 14

Transfer learning ImageNet Source task Source task labels African elephant Wall clock L8 Layers L1-L7 Green snake ImageNet network Yorkshire terrier mardi 5 août 14

Transfer learning ImageNet Source task Source task labels African elephant Wall clock L8 Layers L1-L7 Green snake Yorkshire terrier Pascal VOC Chair Background La Lb Layers L1-L7 Person TV/monitor Sliding patches Target task labels Target task 18 mardi 5 août 14

Transfer learning ImageNet Source task Source task labels African elephant Wall clock L8 Layers L1-L7 Green snake Yorkshire terrier Pascal VOC Chair Background La Lb Layers L1-L7 Person TV/monitor Sliding patches Target task labels Target task 19 mardi 5 août 14

Transfer learning ImageNet Source task Source task labels African elephant Wall clock L8 Layers L1-L7 Green snake Yorkshire terrier Transfer parameters Pascal VOC Chair Background La Lb Layers L1-L7 Person TV/monitor Sliding patches Target task labels Target task 20 mardi 5 août 14

Second attempt (with pre-training) • After pre-training on the ILSVRC-2012 dataset, we obtain 78.7% mean AP (no pre-train : 70.9%). • Significantly better but can we improve more ? +18 % +14 % • Observe large boosts for dog and bird classes. • Well-represented groups in ILSVRC-2012. 21 mardi 5 août 14

Pre-training data • Inspect 22k classes of the ImageNet tree: • «furniture» subtree contains chairs, dining tables, sofas • «hoofed mammal» subtree contains sheep, horses, cows • ... • Add 512 classes to the pre-training, • Result improves from 78.8% to 82.8% mAP. • All scores increase, targeted classes improve more. 22 mardi 5 août 14

Computing scores at test time • We extract 500 multi-scale patches. • Image score = sum of all patch scores . • Pixel score = sum of overlapping patches scores (heat maps) CNN person classifier 23 mardi 5 août 14

Qualitative results Dining table Chair Potted plant Sofa 24 Person TV monitor Source : Pascal VOC’12 test set 24 mardi 5 août 14

Visualizations (aeroplane) First false positive Source : Pascal VOC’12 test set 28 mardi 5 août 14

Visualizations (bicycle) First false positive Source : Pascal VOC’12 test set 29 mardi 5 août 14

Visualizations (bicycle) First false positive Source : Pascal VOC’12 test set 30 mardi 5 août 14

Visualizations (sheep) First false positive Source : Pascal VOC’12 test set 31 mardi 5 août 14

Visualizations (sheep) First false positive Source : Pascal VOC’12 test set 32 mardi 5 août 14

Quantitative results Pascal VOC’12 object classification : State of the art : 33 mardi 5 août 14

Quantitative results Pascal VOC’12 object classification : State of the art : No pre-training baseline : 34 mardi 5 août 14

Quantitative results Pascal VOC’12 object classification : State of the art : No pre-training baseline : 1000 ILSVRC classes : 35 mardi 5 août 14

Quantitative results Pascal VOC’12 object classification : State of the art : No pre-training baseline : 1000 ILSVRC classes : 1512 classes (our best) : 36 mardi 5 août 14

Quantitative results Pascal VOC’12 object classification : State of the art : No pre-training baseline : 1000 ILSVRC classes : Random 1000 classes : 1512 classes (our best) : 37 mardi 5 août 14

Di fg erent task : action classification (still images) playing instrument playing instrument jumping running 0 Source : Pascal VOC’12 Action classification test set State-of-the-art 70.2% mAP result 38 mardi 5 août 14

Di fg erent task : action classification (still images) playing instrument playing instrument jumping running 0 Source : Pascal VOC’12 Action classification test set State-of-the-art 70.2% mAP result 39 mardi 5 août 14

Qualitative results (reading) 40 mardi 5 août 14

Qualitative results (playing instrument) 41 mardi 5 août 14

Qualitative results (phoning) 42 mardi 5 août 14

Take-home messages • Transfer learning with CNNs avoids overfitting • See also : [Girshick et al.’14], [Sermanet et al.’13 ], [Donahue et al. ’13], [Zeiler & Fergus ’13], [Razavian et al. ’14], [Chatfield et al. ’14] • We study the e fg ect of pre-training data : • More pre-training data => better • Related pre-training data => even better • Transfer to action classification. • http://www.di.ens.fr/willow/research/cnn/ • Implementation (Torch7 modules) available soon • Includes e ffj cient and flexible GPU training code 43 mardi 5 août 14

This work training bounding boxes «dog» heatmap • Bounding box annotation is expensive. Can we avoid it? • YES WE CAN ! 44 mardi 5 août 14

Learning and transferring mid-level image representions using - PowerPoint PPT Presentation

Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Lon Bottou, Ivan Laptev, Josef Sivic 1 mardi 5 aot 14 Image classification (easy) Is there a car ? Source :

Tax Issues in Transferring LLC and Tax Issues in Transferring LLC and Partnership Interests

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Mid-Region Council of Governments Mid-Region Metropolitan Planning Organization Mid-Region

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

AOS Linux Tutorial Remote Access and Transferring Files Michael Havas Dept. of Atmospheric and

THE UNINTENDED CONSEQUENCES THE UNINTENDED CONSEQUENCES OF TRANSFERRING REAL ESTATE OF

A guide to transferring to a Havering secondary school in September 2017 For children born

19 th November 2014 The Legacy Series The Family Business: Preserving & transferring

MODERATOR: John F. DeLillo, C.P.A. VARIOUS WAYS WE PASS THE TORCH Sell/Merge Transferring

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2

Mid-Snake TMDL By Cassie Sundquist and Chris Jeszke Mid Snake TMDL EPA approved the Mid Snake

Mid-Year Budget Report February 17, 2015 Purpose of Mid-Year The Annual Mid-Year Budget

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9,

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Debugging Neural Networks for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2020/ In

Learning and transferring mid-level image representions using - PowerPoint PPT Presentation

Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Lon Bottou, Ivan Laptev, Josef Sivic 1 mardi 5 aot 14 Image classification (easy) Is there a car ? Source :

Tax Issues in Transferring LLC and Tax Issues in Transferring LLC and Partnership Interests

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Mid-Region Council of Governments Mid-Region Metropolitan Planning Organization Mid-Region

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

AOS Linux Tutorial Remote Access and Transferring Files Michael Havas Dept. of Atmospheric and

THE UNINTENDED CONSEQUENCES THE UNINTENDED CONSEQUENCES OF TRANSFERRING REAL ESTATE OF

A guide to transferring to a Havering secondary school in September 2017 For children born

19 th November 2014 The Legacy Series The Family Business: Preserving &amp; transferring

MODERATOR: John F. DeLillo, C.P.A. VARIOUS WAYS WE PASS THE TORCH Sell/Merge Transferring

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

PowerWizard Level 1.0 &amp; Level 2.0 Control Systems Training Systems Comparison Level 2

Mid-Snake TMDL By Cassie Sundquist and Chris Jeszke Mid Snake TMDL EPA approved the Mid Snake

Mid-Year Budget Report February 17, 2015 Purpose of Mid-Year The Annual Mid-Year Budget

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9,

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Neural Grammar Induction Yoon Kim Harvard University (with Chris Dyer, Alexander Rush) 1/69

Learning Queuing Networks by Recurrent Neural Networks Giulio Garbi , Emilio Incerto and Mirco

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Debugging Neural Networks for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2020/ In

19 th November 2014 The Legacy Series The Family Business: Preserving & transferring

PowerWizard Level 1.0 & Level 2.0 Control Systems Training Systems Comparison Level 2