Image Space Embeddings and Generalized Convolutional Neural - PowerPoint PPT Presentation

Image Space Embeddings and Generalized Convolutional Neural Networks Nate Strawn September 20th, 2019 Georgetown University

Table of Contents 1. Introduction 2. Smooth Image Space Embeddings 3. Example: Dictionary Learning 4. Convolutional Neural Networks 5. Proofs and Conclusion 2

Introduction

Inspiration “When I multiply numbers together, I see two shapes. The image starts to change and evolve, and a third shape emerges. That’s the answer. It’s mental imagery. It’s like maths without having to think.” – Daniel Tammet [6] 4

Idea Idea: Embed data into spaces of “smooth” functions over graphs, thereby extending graphical processing techniques to arbitrary datasets. X = { x i } N i =1 ⊂ R d R d ∋ x Φ X → R G �− 5

Implications � � { 0 , 1 , . . . , r − 1 } , { ( k − 1 , k ) } k = r − 1 • With G = I r = , Φ X k =1 maps into functions over an interval • With G = I r × I r , Φ X maps into r by r images • Wavelet/Curvelet/Shearlet dictionaries for images induce dictionaries for arbitrary datasets • Convolutional Neural Networks can be applied to arbitrary datasets in a principled manner 6

Example: Kernel Image Space Embeddings of Tumor Data Benign Tumors Malignant Tumors 7

Smooth Image Space Embeddings

Image Space Embeddings We will call any isometry Φ : R d → C ∞ ([0 , 1] 2 ) or Φ : R d → R r ⊗ R r an image space embedding. • C ∞ ([0 , 1] 2 ) is identified with the space of smooth images with incomplete norm � 1 � 1 f ( x , y ) 2 dxdy � f � 2 L 2 ([0 , 1] 2 ) = 0 0 • R r ⊗ R r is identified with the space of r by r matrices, or r by r digital images with norm � F � 2 2 = trace( F T F ) . 9

Smoothness of Image Space Embeddings We will let D denote: • the gradient operator on C 1 ([0 , 1] 2 ), or • the graph derivative D : R V → R E for a graph G = ( V , E ) defined by ( D f ) ( i , j ) = f i − f j where f : R V → R and it is assumed that if ( i , j ) ∈ E then ( j , i ) �∈ E , and � R r ⊗ R r − 1 � � R r − 1 ⊗ R r � • the discrete differential D : R r ⊗ R r → ⊕ coincides with the graph derivative on a regular r by r grid 10

Smoothness of Image Space Embeddings Given a dataset X = { x i } N i =1 ⊂ R d , we measure the smoothness of an image space embedding of X by the mean quadratic variation: � N MQV ( X ) = 1 �D (Φ( x i )) � 2 . N i =1 11

Optimally Smooth Image Space Embeddings We seek the projection which minimizes the mean quadratic variation over the dataset N � 1 �D (Φ( x i )) � 2 min 2 N Φ i =1 subject to Φ being a linear isometry. 12

Optimally Smooth Discrete Image Space Embeddings Theorem (S.) Suppose r 2 ≥ d, let { v j } d j =1 ⊂ R d be the principal components of X (ordered by descending singular values), and let { ξ j } r 2 j =1 (ordered by as- cending eigenvalues) denote an orthonormal basis of eigenvectors of the graph Laplacian L = D T D . Then d � ξ j v T Φ = j i =1 solves the optimal mean quadratic variation embedding program. 13

Observations • The optimal isometry pairs highly variable components in R d with low-frequency components in L 2 ( G ). • x �→ F by computing the PCA scores of x , arranging them in an r by r matrix, and applying the inverse discrete cosine transform. • If the data x i are drawn i.i.d. from a Gaussian, then Φ maps this Gaussian to a Gaussian process with minimal expected quadratic variation. • The connection with PCA indicates that we can use Kernel PCA to produce nonlinear embeddings into image spaces as well 14

Optimally Smooth Continuous Image Space Embeddings Theorem (S.) j =1 ⊂ R d be the principal components of X (ordered by descending Let { v j } d singular values), and let { k j } d j =1 denote the first d positive integer vectors ordered by non-decreasing norm. Then d � � � v T exp(2 π i ( k T Φ( x ) = j x j · )) j =1 solves the optimal mean quadratic variation embedding program � N �D Φ( x i ) � 2 min L 2 C ([0 , 1] 2 ) Φ i =1 subject to Φ being a complex isometry. 15

Connection with Regularized PCA Theorem (S.) In the discrete case, the solution to the minimum quadratic variation program also provides the optimal Φ for the program 1 2 + λ 2 + γ 2 � X − C Φ � 2 2 � C D ∗ � 2 2 � C � 2 min 2 C , Φ subject to Φ being an isometry. 16

Example: Dictionary Learning

The Sparse Dictionary Learning Problem Problem: Given a data matrix X ∈ R N ⊗ R d , with d large, find a linear dictionary Φ ∈ M k , d and coefficients C ∈ M N , k such that C Φ ≈ X , and C is sparse/compressible. 18

Regularized Factorization The “relaxed” approach attempts to solve the non-convex program: 1 2 � X − Φ T C � 2 min 2 + λ � C � 1 . C , Φ 19

Usual Suspects 1 2 � X − C Φ � 2 min 2 + λ � C � 1 C , Φ • Impose � φ i � 2 2 = 1 for each row of   − φ 1 −   − φ 2 −     Φ = .   .  .  − φ k − � � 1 to deal with the fact that C Φ = ( qC ) q Φ . • Program has analytic solution when C is fixed, and is convex optimization with Φ fixed. 20

Algorithms • Optimization algorithm for supervised and online learning of dictionaries: Mairal et al. [9, 8] • Good initialization procedures can lead to provable results: Agarwal et al. [1] 21

Identifiability • Exactly sparse and approximation (even for large factors!) is NP-hard: Tillmann [16] • Probability model-based learning: Remi and Schnass [11], Spielman et al. [14] • Dictionary is incoherent and coefficients are sufficiently sparse, then original dictionary is a local minimum: Geng and Wright [5], Schnass [12] • Full spark matrix is also identifiable given sufficient measurements: Garfinkle and Hillar [4] 22

Caveats • Many possible local solutions • Interpretability? • Large systems require a large amount of computation! 23

Tight Frame Dictionaries Recall that { ψ a } a ∈A ∈ L 2 ( R 2 ) is a frame if there are constants 0 < A ≤ B such that � A � x � 2 ≤ |� f , ψ a �| 2 ≤ B � x � 2 for all f ∈ H , a ∈A where �· , ·� and � · � are the inner product and induced norm on L 2 ( R 2 ), respectively. If A = B , we say that the frame is tight. 24

Examples of Tight Frames • Tensor product wavelet systems • Curvelets • Shearlets Fact: If { ψ a } a ∈A ∈ L 2 ( R 2 ) is a tight frame, and Φ : R d → L 2 ( R 2 ) is an isometry, then { Φ ∗ ψ a } a ∈A is a tight frame for R d . 25

Example: Wisconsin Breast Cancer Dataset • 569 examples in R 30 describing characteristics of cells obtained from biopsy [15] • each example is either benign or malignant • preprocess by removing medians and rescaling by interquartile range in each variable • image space embedding uses r = 32 (images are 32 by 32) 26

Minimal Mean Quadratic Variation Behavior PCA Scores vs. eigenvalues of graph Laplacian vs. product 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 0 . 07 0 . 06 0 . 05 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 0 5 10 15 20 25 30 0 . 30 0 . 25 0 . 20 0 . 15 0 . 10 0 . 05 0 . 00 0 5 10 15 20 25 30 Normalized MMQV ≈ 38 27

Raw Embeddings of Benign and Malignant Examples Image Space Embeddings of Benign Tumor Data Image Space Embeddings of Malignant Tumor Data 28

LASSO in the Haar Wavelet Induced Dictionary Using the 2D Haar wavelet transform W , we solve 1 2 � X − C W Φ � 2 min 2 + λ � C � 1 C where Φ is the image space embedding matrix. Using BCW dataset, average MSE is 3 . 4 × 10 − 3 when λ = 1. 29

Haar Wavelet Coefficients after LASSO 30

Inverse DWT of Haar Coefficients 31

Compression in PCA Basis and Induced Dictionary Consider best k -term approximations of the first 50 members of the BCW dataset using different dictionaries Compression in the dictionary induced by the Haar wavelet system uses orthogonal matching pursuit: 1 . 0 1 . 0 0 0 0 0 . 4 0 . 9 0 . 9 0 . 3 0 . 8 0 . 8 10 10 10 0 . 2 0 . 7 0 . 7 0 . 1 0 . 6 0 . 6 20 20 20 Example index Example index Example index 0 . 5 0 . 5 0 . 0 0 . 4 0 . 4 30 30 30 − 0 . 1 0 . 3 0 . 3 − 0 . 2 0 . 2 0 . 2 40 40 40 − 0 . 3 0 . 1 0 . 1 − 0 . 4 0 . 0 0 . 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 Support size Support size Support size First and second image: Relative SSE for k -term approximations using the PCA basis, Haar-induced dictionary Third image: First image minus the second image 32

Comparision with Dictionary Learning 0 0 . 4 0 . 3 10 0 . 2 0 . 1 20 Example index 0 . 0 − 0 . 1 30 − 0 . 2 40 − 0 . 3 − 0 . 4 0 5 10 15 20 25 Support size Dictionary learning clearly does better! 33

Convolutional Neural Networks

Convolutional Neural Networks for Arbitrary Datasets People already do this in insane ways! 35

Convolutional Neural Networks for Arbitrary Datasets • Exploit image structure to better deal with image collections [7] • Cutting edge results for image classification tasks 36

Image Space Embeddings and Generalized Convolutional Neural - PowerPoint PPT Presentation

Image Space Embeddings and Generalized Convolutional Neural Networks Nate Strawn September 20th, 2019 Georgetown University Table of Contents 1. Introduction 2. Smooth Image Space Embeddings 3. Example: Dictionary Learning 4. Convolutional

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

An Exploration of Embeddings for Generalized Phrases Wenpeng Yin & Hinrich Schutze ...

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

SAR Phenomenology Dr. Armin Doerry Detailed contact information at www.doerry.us This

Functional Imagery David Kavanagh Training Jackie Andrade Functional Imagery What would you

neat-EO.pink : Computer Vision framework for GeoSpatial Imagery @o_courtin @FOSDEM 2020 Error

What is WordNet? Establishes Organizes over ontological and 150,000 words lexical Original

TRUEBRANCH METRIC LEARNING-BASED VERIFICATION OF FOREST CONSERVATION PROJECTS Simona Santamaria*

How Deep Learning could help to improve OSM Data Quality ? @o_courtin @sotm 2018 Purpose

The suitability of historical optical satellite imagery for investigations of the cryosphere Tino

Human-Machine Collaboration for Fast Land Cover Mapping Caleb Robinson , Anthony Ortiz, Kolya

Image Space Embeddings and Generalized Convolutional Neural - PowerPoint PPT Presentation

Image Space Embeddings and Generalized Convolutional Neural Networks Nate Strawn September 20th, 2019 Georgetown University Table of Contents 1. Introduction 2. Smooth Image Space Embeddings 3. Example: Dictionary Learning 4. Convolutional

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

An Exploration of Embeddings for Generalized Phrases Wenpeng Yin &amp; Hinrich Schutze ...

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

SAR Phenomenology Dr. Armin Doerry Detailed contact information at www.doerry.us This

Functional Imagery David Kavanagh Training Jackie Andrade Functional Imagery What would you

neat-EO.pink : Computer Vision framework for GeoSpatial Imagery @o_courtin @FOSDEM 2020 Error

What is WordNet? Establishes Organizes over ontological and 150,000 words lexical Original

TRUEBRANCH METRIC LEARNING-BASED VERIFICATION OF FOREST CONSERVATION PROJECTS Simona Santamaria*

How Deep Learning could help to improve OSM Data Quality ? @o_courtin @sotm 2018 Purpose

The suitability of historical optical satellite imagery for investigations of the cryosphere Tino

Human-Machine Collaboration for Fast Land Cover Mapping Caleb Robinson , Anthony Ortiz, Kolya

An Exploration of Embeddings for Generalized Phrases Wenpeng Yin & Hinrich Schutze ...