Understanding image representations by measuring their equivariance - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science Understanding image representations by measuring their equivariance and equivalence Karel Lenc, Andrea Vedaldi

Representations for image understanding 2 image feature semantic representation classifier space space space 𝜔 𝜚 𝒚 bike 𝜚(𝒚) 𝜚 𝜔 𝒛 𝜚(𝒛) bike 𝜚 𝜔 𝜚(𝒜) 𝒜 dog Ultimate goal of a representation: simplify a task such as image classification Many representations Local image descriptors ▶ SIFT [Lowe 04], HOG [Dalal et al. 05], SURF [Bay et al. 06], LBP [Ojala et al. 02], … Feature encoders ▶ BoVW [Sivic et al. 02], Fisher Vector [Perronnin et al. 07], VLAD [Jegou et al. 10], sparse coding, … Deep convolutional neural networks ▶ [Fukushima 1974-1982, LeCun et al. 89, Krizhevsky et al. 12, …]

Design of representations 3 Many designs are empirical, the main theoretical design principle is invariance image feature representation space space 𝜚 𝒚 𝑕 𝜚 𝑕𝒚 Invariant 𝜚 𝒚 = 𝜚(𝑕𝒚)

Design of representations 4 However, many representations such as HOG are not invariant , even to simple transformations image feature representation space space HOG 𝒚 ≠ 𝑕 HOG 𝑕𝒚 Not invariant 𝜚 𝒚 ≠ 𝜚(𝑕𝒚)

Design of representations 5 But they often transform in a simple and predictable manner image feature representation space space HOG 𝒚 𝑕 HOG 𝑕𝒚 Equivariant ∀𝒚: 𝜚 𝒚 = 𝑁 𝑕 𝜚(𝑕𝒚)

Design of representations 6 But what happens with more complex transformations like affine ones? image feature representation space space HOG 𝒚 ? 𝑕 HOG 𝑕𝒚

Design of representations 7 What happens with more complex representations like CNNs? Invariance of CNN rep. studied in. [Goodfellow et al. 09] or [Zeiler, Fergus 13] image feature representation space space CNN 𝒚 ? 𝑕 CNN 𝑕𝒚 Contribution : transformations in CNNs

Representation properties 8 𝜚 Equivariance ? How does a representation reflect 𝜚 image transformations?

When are two representations the same? 9 Learning representations means that there is an endless number of them Variants obtained by learning on different datasets, or different local optima representations CNN-A 𝒚 CNN-B Equivalence 𝜚 𝐶 𝒚 = 𝐹 𝜚 𝐵 (𝒚)

Representation properties 10 𝜚 Equivariance ? How does a representation reflect 𝜚 image transformations? 𝜚 𝐶 Equivalence ? Do different representations have different meanings? 𝜚 𝐵

11 Finding equivariance empirically Regularized linear regression 𝑕 ≃ 𝜚 𝑁 𝑕 𝑁 𝑕 𝜚 𝒚 𝜚 𝑕𝒚 𝜚 𝜚(𝐲) 𝐵 𝑕 𝜚 𝒚 + 𝑐 𝑕 (learned empirically)

12 Finding equivariance empirically Convolutional structure 𝑕 ≃ 𝜚 𝑁 𝑕 𝜚 𝑕𝒚 𝜚 𝐵 𝑕 𝜚 𝒚 + 𝑐 𝑕 (learned empirically) convolution by 1 ⨉ 1 filter bank permutation ∗ 𝐵 𝑕 𝜚 𝑕𝒚

13 Finding equivariance empirically HOG features Rotation 45º ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚

14 Finding equivariance empirically HOG features – inverse with MIT HOGgles [Vondrick et al. 13] Rotation 45º ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚

15 Finding equivariance empirically HOG features – inverse with MIT HOGgles [Vondrick 13] 1.25x Upscale ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚 ≃ 𝜚 𝑁 𝑕 𝜚

16 Equivariance of representations Findings Transformations scaling, rotation, flipping, translation ▶ Equivariant representations HOG ▶

Finding equivariance empirically 17 CNN case Label 𝑧 𝒚 1 2 3 4 5 FC dog 𝜚 𝜔 convolutional layers fully-connected layers We run the same analysis on a typical CNN architecture AlexNet [Krizevsky et al. 12] ▶ 5 convolutional layers + fully-connected layers ▶ Trained on ImageNet ILSVRC ▶

18 Learning mappings empirically CNN case Label 𝑧 𝒚 1 2 3 4 5 FC dog Label 𝑧 𝒚 𝜚(𝒚) 1 2 3 4 5 2 3 4 5 FC Classif. Loss ℓ 𝜚 𝜔

19 Learning mappings empirically CNN case Label 𝑧 𝒚 1 2 3 4 5 FC dog Label 𝑧 𝑁 𝑕 −1 𝜚(𝑕𝒚) 𝒚 4 5 FC Classif. Loss ℓ 𝑕 𝐷𝑝𝑜𝑤3 learned 𝑁 𝑕 −1 empirically 1 2 3 𝑕𝒚 𝜚(𝑕𝒚)

Results – Vertical Flip 20 𝑕 Original Classif., no TF Original Classif. + TF Before Training After Training 60 Original Classif., no TF 50 𝒚 1 2 3 4 5 FC 40 Original Classif. + TF Top-5 Error [%] 𝑕𝒚 1 2 3 4 5 FC 30 Before Training 20 𝑕𝒚 1 2 3 4 5 FC 10 After Training 𝑕𝒚 ∗ 1 2 3 4 5 FC 0 𝐷𝑝𝑜𝑤4 𝑁 𝑕 −1 𝐷𝑝𝑜𝑤2 𝐷𝑝𝑜𝑤1 𝐷𝑝𝑜𝑤3 𝐷𝑝𝑜𝑤5 𝑁 𝑕 −1 𝑁 𝑕 −1 𝑁 𝑕 −1 𝑁 𝑕 −1 𝐷𝑝𝑜𝑤3 𝑁 𝑕 −1 1 2345 12 345 123 45 1234 5 12345

21 Equivariance of representations Findings Transformations scaling, rotation, flipping, translation ▶ Equivariant representations HOG ▶ Early convolutional layers in CNNs ▶ Equivariant to a lesser degree Deeper convolutional layers in CNNs ▶

Representation properties 22 𝜚 Equivariance ? How does a representation reflect 𝜚 image transformations? 𝜚 𝐶 Equivalence ? Do different representations have different meanings? 𝜚 𝐵

23 Equivalence CNN transplantation crash course AlexNet [Krizhevsky et al. 12], same training data, different parametrization CNN-A CNN-B 1 2 3 4 5 FC 1 2 3 4 5 FC 𝜚 𝜚′ Are 𝜚 and 𝜚′ equivalent features?

24 Equivalence CNN transplantation crash course Same training data, different parametrization CNN-A CNN-B 1 2 3 4 5 FC 1 2 3 4 5 FC stitching layer (linear convolution) Classif. ∗ 𝐹 5 FC 1 2 3 4 Loss ℓ Label 𝑧 Train with SGD

25 Franken-network Stitch CNN-A  CNN-B Training data is the same, but parametrization is entirely different 100 Baseline 90 1 2 3 4 5 FC 80 70 CNN-B Top-5 Error [%] 60 Before Training 50 1 2 3 4 5 FC 40 CNN-A CNN-B 30 20 After Training 10 𝐹 1 2 3 4 5 FC 0 𝐷𝑝𝑜𝑤1 𝐷𝑝𝑜𝑤2 𝐷𝑝𝑜𝑤3 𝐷𝑝𝑜𝑤4 𝐷𝑝𝑜𝑤5 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ CNN-A CNN-B 1 2345 12 345 123 45 1234 5 12345

26 Equivalence of similar architecture Compare training on the same or different data ILSVRC12 dataset Places dataset CNN-PLACES CNN-IMNET 1 2 3 4 5 FC 1 2 3 4 5 FC

Franken-network 27 Stitch CNN-PLACES  CNN-IMNET Now even the training sets differ 100 Baseline 90 80 1 2 3 4 5 FC 70 CNN-IMNET Top-5 Error [%] 60 Before Training 50 1 2 3 4 5 FC 40 CNN-IMNET CNN-PLCS 30 After Training 20 𝐹 1 2 3 4 5 FC 10 CNN-PLCS CNN-IMNET 0 𝐷𝑝𝑜𝑤1 𝐷𝑝𝑜𝑤2 𝐷𝑝𝑜𝑤3 𝐷𝑝𝑜𝑤4 𝐷𝑝𝑜𝑤5 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 𝐹 𝜚→𝜚′ 1 2345 12 345 123 45 1234 5 12345

Example application 28 Structured-output pose detection Equivariant maps – Transform features instead of images 𝑕 ∗ = argmax 𝑕 ∈ 𝐻 𝒙, 𝜚 𝑕 −1 𝒚 = argmax 𝑕 ∈ 𝐻 𝒙, 𝑁 𝑕 −1 𝜚 𝒚 Allows significant speedup at test time

29 Conclusions Representing geometry Beyond invariance: equivariance ▶ Transforming the image results in a simple and predictable transformation ▶ of HOG and early CNN layers Application to accelerated structured output regression ▶ Representation equivalence CNN trained from different random seeds are very different, ▶ but only on the surface Early CNN layers are interchangeable even between tasks ▶ General idea study mathematical properties of representations empirically ▶

Understanding image representations by measuring their equivariance - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science Understanding image representations by measuring their equivariance and equivalence Karel Lenc, Andrea Vedaldi Representations for image understanding 2 image feature semantic

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

61A Lecture 16 Announcements String Representations String Representations 4 String

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 35 Finding Waldo Lets

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 37 Finding Waldo Lets

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 35 Finding Waldo Lets

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

Is Digital Technology Image 1 Restructuring the brain? Globally connected Image 2 Image 3 How

FIRST ACADEMIC BUILDING February 12, 2014 MORPHOSIS ARCHITECTS IMAGE OF ART IMAGE OF ART IMAGE

Image filtering and image features September 26, 2019 Outline: Image filtering and image

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

Process and Energy Systems Engineering Research Overview Michael Baldea August 30, 2011 Q H T i

COMPANY PRE SE NT AT I ON Ma rc h 2018 DI SCL AI ME R F ORWARD- L OOKING ST AT E

Field Scale Demonstration / Validation Studies of Alternatives For Methyl Bromide in Plastic

NTA by SES (NTASES, N project) Sang Hyop Lee University of Hawii at Manoa East West Center

Foundations of Modelling and Simulation Hans Vangheluwe Modelling, Simulation and Design Lab

How to Find the Poor: Field Experiments on Targeting Abhijit Banerjee, MIT Why is targeting

Data Assimilation and Detection in Multi-Sensor & Multi-Scale Environments N. Sri

Advanced statistical methods for combining census and survey data

Understanding image representations by measuring their equivariance - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science Understanding image representations by measuring their equivariance and equivalence Karel Lenc, Andrea Vedaldi Representations for image understanding 2 image feature semantic

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

61A Lecture 16 Announcements String Representations String Representations 4 String

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 35 Finding Waldo Lets

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 37 Finding Waldo Lets

Image Pyramids Sanja Fidler CSC420: Intro to Image Understanding 1 / 35 Finding Waldo Lets

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

Is Digital Technology Image 1 Restructuring the brain? Globally connected Image 2 Image 3 How

FIRST ACADEMIC BUILDING February 12, 2014 MORPHOSIS ARCHITECTS IMAGE OF ART IMAGE OF ART IMAGE

Image filtering and image features September 26, 2019 Outline: Image filtering and image

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

Process and Energy Systems Engineering Research Overview Michael Baldea August 30, 2011 Q H T i

COMPANY PRE SE NT AT I ON Ma rc h 2018 DI SCL AI ME R F ORWARD- L OOKING ST AT E

Field Scale Demonstration / Validation Studies of Alternatives For Methyl Bromide in Plastic

NTA by SES (NTASES, N project) Sang Hyop Lee University of Hawii at Manoa East West Center

Foundations of Modelling and Simulation Hans Vangheluwe Modelling, Simulation and Design Lab

How to Find the Poor: Field Experiments on Targeting Abhijit Banerjee, MIT Why is targeting

Data Assimilation and Detection in Multi-Sensor &amp; Multi-Scale Environments N. Sri

Advanced statistical methods for combining census and survey data

Data Assimilation and Detection in Multi-Sensor & Multi-Scale Environments N. Sri