recognition and segmentation Mircea Cimpoi, University of Oxford - PowerPoint PPT Presentation

Deep filter banks for texture recognition and segmentation Mircea Cimpoi, University of Oxford Subhransu Maji, UMASS Amherst Andrea Vedaldi, University of Oxford

Texture understanding 2 Indicator of materials properties, e.g. brick vs wooden Complementary to shape Correlated with identity but not the same Kickstarted orderless image representations (e. g. Bag of words) [Bajcsy et al. 73, Julesz 81, Ojala et al. 96, 02, Dana et al. 99, Leung and Malik 99, Varma and Zisserman 03, 05, Caputo et al. 05, Lazebnik et al. 05, 06, Timofte and Van Gool 12 Sharma et al. 12, Sifre and Mallat 13, Sharan et. al 09, 13]

Is there a relation between texture representations and deep convolutional neural networks?

Texture representations 5 Filters + histogramming image x [Leung and Malik 99, 01, Schmid 01, Varma and Zisserman 02, 05]

Texture representations 6 Filters + histogramming F 1 y image x [Leung and Malik 99, 01, Schmid 01, Varma and Zisserman 02, 05]

Texture representations 7 Filters + histogramming F 1 F 2 y image x local descriptors VQ + histogram bank of filters [Leung and Malik 99, 01, Schmid 01, Varma and Zisserman 02, 05]

Texture representations 8 Filters + histogramming F 1 F 2 … y image x local descriptors bank of filters [Leung and Malik 99, 01, Schmid 01, Varma and Zisserman 02, 05]

Texture representations 9 Filters + histogramming F 1 ɸ( x ) Histogram F 2 … y image x local descriptors VQ + histogram bank of filters [Leung and Malik 99, 01, Schmid 01, Varma and Zisserman 02, 05]

Texture representations 10 Filters may be non-linear Local ɸ( x ) Histogram descriptor … y (SIFT, LBP, LTP, HOG, SURF, BRIEF, ORB , …) non-linear x local descriptors VQ + histogram filters [Geusebroek et al 03, Lowe 99, Ojala et al. 02, Dalal and Triggs 05, Bay et al. 06, Tan and Triggs 10]

Texture representations 11 Replace histograms with an order-less pooling encoder Orderless Local ɸ( x ) pooling descriptor … y (Bag-of-words, Fisher Vector, (SIFT, LBP, LTP, HOG, SURF, BRIEF, ORB , …) VLAD, sparse coding, …) non-linear x local descriptors encoder filters [Sivic and Zisserman 03, Csurka et al. 04, Perronnin and Dance 07, Perronnin et al. 10, Jegou et al. 10]

12 Texture representations vs CNNs non-linear feature image encoder representation filters field Handcrafted Orderless ɸ( x ) features pooling

13 Texture representations vs CNNs non-linear feature image encoder representation filters field Handcrafted Orderless ɸ( x ) features pooling ɸ( x ) c 5 c 1 c 2 c 3 c 4 f 6 f 7 f 8 [Krizhevsky et al. 12]

14 Texture representations vs CNNs non-linear feature image encoder representation filters field Handcrafted Orderless ɸ( x ) x features pooling ɸ( x ) c 1 c 2 c 3 c 4 c 5 f 6 f 7 f 8 x “convolutional” layers “fully - connected” (FC) layers

16 Mix and match non-linear feature image encoder representation filters field Handcrafted Orderless local descriptors pooling ɸ( x ) CNN CNN FC pooling local descriptors

17 Mix and match Standard texture representation non-linear feature image encoder representation filters field Handcrafted Orderless local descriptors pooling ɸ( x ) x CNN CNN FC pooling local descriptors [Sivic and Zisserman 03, Csurka et al. 04, Perronnin and Dance 07, Perronnin et al. 10, Jegou et al. 10]

18 Mix and match Standard application of CNN non-linear feature image encoder representation filters field Handcrafted Orderless local descriptors pooling ɸ( x ) CNN CNN FC pooling local descriptors FC-CNN [Chatfield et al. 14, Girshick et al. 2014, Gong et al. 14, Razavin et al. 14]

19 Mix and match Order-less pooling of CNN local descriptors non-linear feature image encoder representation filters field Handcrafted Orderless local descriptors pooling ɸ( x ) CNN CNN FC pooling local descriptors

20 Mix and match CNN descriptors pooled by Fisher Vector non-linear feature image encoder representation filters field Handcrafted Fisher local descriptors Vector ɸ( x ) CNN CNN FC pooling local descriptors FV-CNN

21 Mix and match non-linear feature image encoder representation filters field Handcrafted Orderless local descriptors pooling ɸ( x ) CNN CNN FC pooling local descriptors See [Perronnin and Larlus 15] Poster 2B-44

Tested modules 22 Baseline CNN models SIFT FV Typical ▶ AlexNet [Krizhevsky et al.12] ɸ( x ) VGG-M [Chatfield et al.14] Deep ▶ VGG-VD [Simonyan Zisserman 14] FC CNN

Tested modules 23 Baseline CNN models SIFT FV Typical ▶ AlexNet [Krizhevsky et al.12] ɸ( x ) VGG-M [Chatfield et al.14] Deep ▶ VGG-VD [Simonyan Zisserman 14] FC CNN Local image descriptors Handcrafted: SIFT [ Lowe 99 ] ▶ Learned: Convolutional layers of CNNs ▶

Tested modules 24 Baseline CNN models SIFT FV Typical ▶ AlexNet [Krizhevsky et al.12] ɸ( x ) VGG-M [Chatfield et al.14] Deep ▶ VGG-VD [Simonyan Zisserman 14] FC CNN Local image descriptors Handcrafted: SIFT [ Lowe 99 ] ▶ Learned: Convolutional layers of CNNs ▶ Pooling encoders Classical ▶ Bag of Visual Words [Sivic and Zisserman 03, Csurka et al. 04] Fisher Vector [Perronnin and Dance 07, Perronnin et al. 10] CNN ▶ FC layers [Chatfield et al. 14, Girshick et al. 2014, Gong et al. 14, Razavin et al. 14]

25 Findings: what pooling CNNs is good for How does FV-CNN perform compared to other descriptors? How does FV-CNN handle region recognition? What is the benefit of FV-CNN in domain-transfer?

26 Datasets and benchmarks Material recognition (FMD) Texture attribute recognition (DTD) [Liu et al.10, Sharan et al. 13] [Cimpoi et al. 14 ] Fine-grained recognition (CUB) Scene recognition (MIT Indoors) [Wah et al. 11] [Quattoni and Torralba 09] Object recognition (VOC07) Things and stuff (MSRC) [Everingham et al. 07] [Criminisi 04, Shotton et al. 06]

Which feature and encoder? 28 BoVW-SIFT Fisher vector-SIFT BoVW-CNN Fisher vector-CNN 87 82 77 73.5 72 67.9 67 62 59.7 57 52 50.5 BOVW BOVW FV FV 47 CNN SIFT SIFT CNN 42 Material (FMD) Finding 1) BoVW < FV Finding 2) SIFT < CNN

CNN vs Fisher Vector pooling 30 CNN pooling FV pooling CNN pooling (deep) FV pooling (deep) 87 82 79.8 77.4 77 73.5 72 70.3 FC-CNN (VGG-VD) FC-CNN (VGG-VD) FC-CNN (VGG-VD) FV-CNN (VGG-VD) FV-CNN (VGG-VD) FV-CNN (VGG-VD) 67 (VGG-VD) (VGG-VD) FC-CNN (VGG-M) FC-CNN (VGG-M) FC-CNN (VGG-M) FV-CNN (VGG-M) FV-CNN (VGG-M) FV-CNN (VGG-M) (VGG-M) (VGG-M) FC-CNN FC-CNN FV-CNN FV-CNN 62 57 Material(FMD) Finding 3) Finding 4) FV- pooling ≥ CNN -pooling Deep ≥ shallow

CNN vs Fisher Vector pooling 31 CNN pooling FV pooling CNN (VGG-VD) FV (VGG-VD) 87 82 81 77 74.2 72 67.6 FC-CNN (VGG-VD) FC-CNN (VGG-VD) FC-CNN (VGG-VD) FV-CNN (VGG-VD) FV-CNN (VGG-VD) FV-CNN (VGG-VD) 67 FC-CNN (VGG-M) FC-CNN (VGG-M) FC-CNN (VGG-M) FV-CNN (VGG-M) FV-CNN (VGG-M) FV-CNN (VGG-M) 62.5 62 57 Scene (MIT Indoor) Finding 3) Finding 4) FV- pooling ≥ CNN -pooling Deep ≥ shallow

Breadth of applicability 34 Fully connected (VGG-VD) Fisher vector (VGG-VD) SoA 88.7 ALOT 97.8 95.9 texture (materials) 77.7 FMD 79.8 57.7 62.9 textures DTD 72.3 58.6 (attributes) 81.7 VOC07 85.9 objects 85.2 67.6 MIT 81 scenes 70.8 62.8 CUB+R 73 fine-grained 76.4 45 55 65 75 85 95 Finding 5) FV + CNN applies to many diverse domains [Cimpoi et al. 14, Sulc and Matas 14, Sharan et al. 13, Wei and Levoy 14, Zhou et al. 14, Zhang et al. 14 Burghouts and Geusebroek 09, Sharan et al. 09, Everingham et al. 08, Quattoni and Torralba 09, Wah et al. 11]

35 Findings: what pooling CNNs is good for How does FV-CNN perform compared to other descriptors? How does FV-CNN handle region recognition? What is the benefit of FV-CNN in domain-transfer?

Texture recognition in the “wild” and “clutter” (OS) 36 metal food metal wood glass paper A new texture benchmark ▶ Based OpenSurfaces dataset [Bell et al. 13, 15] ▶ Textures in the wild (uncontrolled conditions) ▶ Textures in clutter (do not fill the image) First extensive evaluation of texture material/attribute recognition of this kind

40 Regions: the crop & describe approach E.g. R-CNN ɸ( x ;R 1 ) R 1 representation ɸ( x ;R 2 ) R 2 representation ɸ( x ;R 3 ) R 3 representation … Pros : straightforward & universal construction [Chatfield et al. 14, Jia 13, Girshick et al. 2014, Gong et al. 14, Razavin et al. 14]

41 Crop & describe limitations ɸ( x ;R) R representation Expensive May distort images Can only do rectangles representation representation representation representation representation

42 Regions: the pooling encoder approach Share the local descriptors ɸ( x ;R 1 ) R 1 pooling non-linear filters ɸ( x ;R 2 ) R 2 pooling ɸ( x ;R 3 ) R 3 pooling … Cons : restricted to a convolutional representation Pros : fast, flexible, multiscale, and often more accurate [He et al. 2014, Cimpoi et al. 2015]

FV vs FC pooling for regions 43 97.6 CNN pooling FV pooling 95 84 85 76.8 76.4 74.2 73.5 75 70.3 65.5 65.2 65 62.5 56.5 54.3 55 52.5 45 41.3 35 FMD VOC07 MIT Indoor OS+R OSA+R CUB+R MSRC+R Finding 6) FV pooling ≫ CNN pooling for small, variable regions (and faster too!)

recognition and segmentation Mircea Cimpoi, University of Oxford - PowerPoint PPT Presentation

Deep filter banks for texture recognition and segmentation Mircea Cimpoi, University of Oxford Subhransu Maji, UMASS Amherst Andrea Vedaldi, University of Oxford Texture understanding 2 Indicator of materials properties, e.g. brick vs wooden

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Interactive Foreground Segmentation in Images and Videos Suyog Jain 1 Foreground Segmentation

Segmentation Analysis: The Segmentation Analysis: The Principal Stages Principal Stages

Using BGP Flow-Spec for distributed micro-segmentation Davide Pucci / 12019364 Attilla de Groot /

Segmentation 2014-11-14 Robin Strand Centre for Image Analysis Dept. of IT Uppsala University

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

Guidelines for CSS MS Capstone Presentation Overview The purpose of the capstone presentation is

AC 2012-4193: HIGH-QUALITY VISUAL EVIDENCE ON PRESENTA- TION SLIDES MAY OFFSET THE NEGATIVE

Poster Presentation Guidelines & Information OVERVIEW A poster presentation provides viewers

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops & Webinars

Field - Based Education & Advocacy: HDSAs Lay & Professional Resources Presentation

NOTE Additional information on certain slides will be found in the NOTES section and will

Challenges with Producing Reliable Evidence for Rare Disease Breakout Advisory Panel on Rare

recognition and segmentation Mircea Cimpoi, University of Oxford - PowerPoint PPT Presentation

Deep filter banks for texture recognition and segmentation Mircea Cimpoi, University of Oxford Subhransu Maji, UMASS Amherst Andrea Vedaldi, University of Oxford Texture understanding 2 Indicator of materials properties, e.g. brick vs wooden

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Interactive Foreground Segmentation in Images and Videos Suyog Jain 1 Foreground Segmentation

Segmentation Analysis: The Segmentation Analysis: The Principal Stages Principal Stages

Using BGP Flow-Spec for distributed micro-segmentation Davide Pucci / 12019364 Attilla de Groot /

Segmentation 2014-11-14 Robin Strand Centre for Image Analysis Dept. of IT Uppsala University

Users Satisfaction Survey &amp; Users &amp; producers dialogue Haitham Zeidan Dissemination and

Guidelines for CSS MS Capstone Presentation Overview The purpose of the capstone presentation is

AC 2012-4193: HIGH-QUALITY VISUAL EVIDENCE ON PRESENTA- TION SLIDES MAY OFFSET THE NEGATIVE

Poster Presentation Guidelines &amp; Information OVERVIEW A poster presentation provides viewers

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops &amp; Webinars

Field - Based Education &amp; Advocacy: HDSAs Lay &amp; Professional Resources Presentation

NOTE Additional information on certain slides will be found in the NOTES section and will

Challenges with Producing Reliable Evidence for Rare Disease Breakout Advisory Panel on Rare

Users Satisfaction Survey & Users & producers dialogue Haitham Zeidan Dissemination and

Poster Presentation Guidelines & Information OVERVIEW A poster presentation provides viewers

Fiscal Year 2016 Arts Grants Workshops and Webinars Format for the Grant Workshops & Webinars

Field - Based Education & Advocacy: HDSAs Lay & Professional Resources Presentation