Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu - PowerPoint PPT Presentation

Deep Visual Learning on Hypersphere Weiyang Liu*, Zhen Liu* College of Computing Georgia Institute of Technology 1

Outline • Why Learning on Hypersphere • Loss Design - Large-Margin Learning on Hypersphere • Convolution Operator - Deep Hyperspherical Learning and Decoupled Networks • Weight Regularization - Minimum Hyperspherical Energy for Regularizing Neural Networks 2 • Conclusion

Why Learning on Hypersphere • An empirical observation • Setting the output feature dimension as 2 in CNN • Directly visualizing the features without using T-SNE Deep features are naturally distributed over a sphere! 4

Why Learning on Hypersphere • Euclidean distance is not suitable for high-dimensional data More specifically, In high-dimensional space, vectors tend to be orthogonal to each other, then this reduces to 5

Why Learning on Hypersphere • Learning features on Hypersphere can well regularize the feature space. In deep metric learning, features have to be normalized before entering the loss function. 6 Schroff et al. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015

Large-Margin Learning on Hypersphere • Standard CNN usually uses the softmax loss as the learning objective. How to incorporate margin on hypersphere? 8

Large-Margin Learning on Hypersphere • The intuition (from binary classification) If x belongs to class 1, original Softmax requires: We want to make the classification more rigorous in order to produce a decision margin: 9

Large-Margin Learning on Hypersphere Original Softmax Loss Large-Margin Imposing large Softmax Loss margin Normalizing classifier weights 10 Angular Softmax Loss

Learned Feature Visualization • 2D Feature Visualization on MNIST • 3D Feature Visualization on CASIA Face Dataset 11 m=1 m=2 m=3 m=4

Experimental Results • Face Verification LFW and YTF dataset 12 SphereFace uses the angular large-margin softmax loss, achieving the state-of-the-art performance with only 0.5M training data.

Experimental Results • Million-scale Face Recognition Challenge MegaFace Challenge 13 SphereFace ranked No.1 from 2016.12 to 2017.4, and the current No. 1 entry is also developed based on SphereFace.

Demo 14

SphereNet • Traditional Convolution • HyperSpherical Convolution (SphereConv) SphereConv normalizes each local patch of a feature map and each weight vector . 16

SphereNet - Intuition from Fourier Transform • Semantic information is mostly preserved with corrupted magnitude but not corrupted phase (angular information) 17

Decoupled Convolution Observation: The final feature is naturally decoupled, where the magnitude represents the intra-class variation. 18

Decoupled Convolution General Framework - Decoupled Convolution Magnitude Angle (intra-class variation) (semantic difference) • Decoupling angle and magnitude of feature vectors • Allowing different designs of convolution operators for different tasks 19

Example Choices of Magnitude • SphereConv • BallConv • TanhConv • LinearConv 20

Example Choices of Angle • Linear • Cosine • Squared Cosine 21

Generalization With SphereConv, the top-1 accuracy of CNNs on ImageNet can be improved by ~1%. Plain-CNN-9 Plain-CNN-12 ResNet-27 Baseline 58.31 61.42 65.54 SphereNet 59.23 62.27 66.49 Top-1 Accuracy (center crop) of baseline and SphereNet on ImageNet. * Different from the original NeurIPS paper: 1) In ResNet, we use fully connected layer instead of average pooling to obtain the final feature. We found it to be crucial for SphereNet. 22 2) We add L2 decay, which slows down the optimization but results in better performance.

Adversarial Robustness and Optimization • • * Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Mądry. 23

Optimization Without BatchNorm • Without BatchNorm, decoupled convolutions outperform the baseline. • The bounded TanhConv can be optimized while unbounded ones fail. Accuracies of different convolution operators on Plain-CNN-9 without BatchNorm. N/C indicates ‘not converged’. 24

Adversarial Robustness Bounded convolution operators have better robustness against both fast gradient sign method (FGSM) attack and the multi-step version of FGSM. Naturally Training Adversarial Training 25

Adversarial Robustness It requires larger norm to attack decoupled convolution with bounded magnitude. L2 and L_inf norms needed to attack models on samples in the test set. 26

Minimum Hyperspherical Energy Intuition: Better generalization More diversity of neurons Less redundancy Paper [1] shows that, in one-hidden-layer network, maximizing diversity can eliminate spurious local minima. If two weight vectors in one layer are close to each other, there is probably more redundancy. [1] Bo Xie, Yingyu Liang, and Le Song. Diverse neural network learns true target 28 functions. arXiv preprint arXiv:1611.03131, 2016.

Minimum Hyperspherical Energy Proposed regularization: add repulsion forces between any pair of weight vectors (in one layer) It connects to Thomson problem - to find a minimal configuration of electrons of an atom. 29

Minimum Hyperspherical Energy Loss function: This optimization problem is generally non-trivial. With s = 2, the problem is actually NP-hard. 30

Minimum Hyperspherical Energy Although orthonormal loss seems similar, it does not yield ideal configuration of weights even in 3D case. 31

Minimum Hyperspherical Energy MHE Loss is compatible with weight decay: - MHE regularizes the angles of weights - Weight decay regularizes the magnitude of weights 32

Minimum Hyperspherical Energy Co-linearity Issue: In this toy example, optimizing the original MHE results in colinear weight vectors Half-space MHE: Optimizing on pairwise angles between lines (instead of vectors). 33

MHE - Ablation Study MHE on 9 layer Plain CNN on CIFAR-10/100 dataset. 34

MHE - Ablation Study • MHE consistently improve the performance of networks. • In cases that the network is hard to optimize due to redundancy of neurons (small width/large depth), MHE helps more. MHE with different depths of network on CIFAR-100. 35

MHE - Ablation Study • MHE consistently improve the performance of networks. • In cases that the network is hard to optimize due to redundancy of neurons (small width/large depth), MHE helps more. MHE with different widths of network on CIFAR-100. 36

MHE Application - Image Recognition MHE can improve performance of networks on ImageNet. Top-1 error (center crop) of models on ImageNet. 37

MHE Application - Face Recognition We add MHE loss to the angular softmax loss in SphereFace. We call the resulted model SphreFace+ . Synergy: • Angular softmax loss - intra-class compactness • MHE loss - inter-class separability. 38

MHE Application - Face Recognition Comparison between SphereFace and SphereFace+. Comparison to State-of-the-art results. 39

MHE Application - Class Imbalanced Recognition Applying MHE to the final classifier enforces the prior that all categories have the same importance and thus improves performance. Results on class imbalanced recognition on CIFAR-10. * Single - Reduce the number of samples in only one category by 90%. 40 Multiple - Reduce the number of samples in multiple categories with different weights. Details are shown in the paper.

MHE Application - Class Imbalanced Recognition The category with less data tends to be ignored Visualization for the final CNN feature. 41

MHE Application - GAN With MHE added to the discriminator, the inception score of spectral GAN can be improved from 7.42 to 7.68. 42

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu - PowerPoint PPT Presentation

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia Institute of Technology 1 Outline Why Learning on Hypersphere Loss Design - Large-Margin Learning on Hypersphere Convolution Operator - Deep

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

National Digital Library of India Birjhora Mahavidyalaya Date : 29-08-2019 Introduction

ADAMIS: 2007-2012 and beyond R.Stompor for ADAMIS (APC) ADAMIS in

Multi-threshold vs. Topological Clustering: A deblending comparison Valerio Roscani PhD student

Number-Theoretic Algorithms What are the factors of 326,818,261,539,809,441,763,169?

An Introduction to Morse Theory Gianmarco Molino UConn Sigma Seminar 27 July, 2017 Gianmarco

Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng.

tiny-k Early Intervention Services of Douglas County-consideration becoming part of USD 497

Health Through Collaboration About Me Chris Tilden, Ph.D., MHA Director of Community Health,

Deep Visual Learning on Hypersphere Weiyang Liu*, Zhen Liu* - PowerPoint PPT Presentation

Deep Visual Learning on Hypersphere Weiyang Liu*, Zhen Liu* College of Computing Georgia Institute of Technology 1 Outline Why Learning on Hypersphere Loss Design - Large-Margin Learning on Hypersphere Convolution Operator - Deep

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

National Digital Library of India Birjhora Mahavidyalaya Date : 29-08-2019 Introduction

ADAMIS: 2007-2012 and beyond R.Stompor for ADAMIS (APC) ADAMIS in

Multi-threshold vs. Topological Clustering: A deblending comparison Valerio Roscani PhD student

Number-Theoretic Algorithms What are the factors of 326,818,261,539,809,441,763,169?

An Introduction to Morse Theory Gianmarco Molino UConn Sigma Seminar 27 July, 2017 Gianmarco

Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng.

tiny-k Early Intervention Services of Douglas County-consideration becoming part of USD 497

Health Through Collaboration About Me Chris Tilden, Ph.D., MHA Director of Community Health,

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu - PowerPoint PPT Presentation

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia Institute of Technology 1 Outline Why Learning on Hypersphere Loss Design - Large-Margin Learning on Hypersphere Convolution Operator - Deep