Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui - - PowerPoint PPT Presentation
Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui - - PowerPoint PPT Presentation
Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019 Visual Recognition System Database Supervised Learning Convolutional Neural bird Network (CNN) Visual Recognition System Larger Database
Visual Recognition System
Convolutional Neural Network (CNN)
“bird”
Supervised Learning
Database
Convolutional Neural Network (CNN)
“Northern cardinal”
Supervised Learning
Larger Database (more images, more classes)
Visual Recognition System
Convolutional Neural Network (CNN)
“Northern cardinal”
Supervised Learning
Even Larger Database
Visual Recognition System
Convolutional Neural Network (CNN)
“Northern cardinal”
Supervised Learning
Even Larger Database
Problem occurs...
- Long-tailed
○ Majority of categories are rare
- Hard to get labels
○ Labeling effort grows dramatically per image. ○ Human expertise.
In reality...
Convolutional Neural Network (CNN)
“bird”
Supervised Learning
Medium-sized Database
Convolutional Neural Network (CNN)
“Northern cardinal”
Supervised Learning Transfer Learning
Large-Scale Dataset
But luckily, we have transfer learning
Medium-sized Database
A diverse array of data sources
Search Engine Social Network Communities
Can we build a generic, one size fits all pre-trained model for transfer learning?
Large-scale pre-training
- D. Mahajan et al. Exploring the Limits of Weakly Supervised Pretraining.
ECCV 2018.
- C. Sun et al. Revisiting Unreasonable Effectiveness of Data in Deep
Learning Era. ICCV 2017.
Generic vs. Specialized Model
- ImageNet pre-training vs. iNaturalist pre-training
- iNaturalist 2017 contains 859k images from 5000+ natural categories.
- Fine-tuned on 7 medium-sized datasets.
CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet
82.84 84.19 96.26 91.31 85.49 88.65 82.01
iNat
89.26 78.46 97.64 88.31 82.61 88.80 87.91
Generic vs. Specialized Model
- ImageNet pre-training vs. iNaturalist pre-training
- iNaturalist 2017 contains 859k images from 5000+ natural categories.
- Fine-tuned on 7 medium-sized datasets.
- Combining ImageNet + iNat. More data doesn’t always help.
CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet
82.84 84.19 96.26 91.31 85.49 88.65 82.01
iNat
89.26 78.46 97.64 88.31 82.61 88.80 87.91
ImageNet + iNat
85.84 82.36 97.07 91.38 85.21 88.45 83.98
Model Capacity is not a problem
- Combined training achieve similar performance on each dataset.
- Model is able to learn well on both datasets, but cannot transfer well.
○ Trade-off between quantity and quality in transfer learning. ○ Pre-training a more specialized model could help.
Domain similarity via Earth Mover’s Distance
- Red: source domain. Green: target domain.
Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.
Source domain selection
- Greedy selection strategy: sort and include most similar source classes.
○ Simple and no guarantee on the optimality, but works well in practice.
Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.
Improved Transfer Learning
- Comparable to the best of ImageNet, iNat with only a subset of 585 classes.
CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet
82.84 84.19 96.26 91.31 85.49 88.65 82.01
iNat
89.26 78.46 97.64 88.31 82.61 88.80 87.91
ImageNet + iNat
85.84 82.36 97.07 91.38 85.21 88.45 83.98
Ours (585-class)
88.76 85.23 97.37 90.58 86.13 88.37 87.89
Transfer Learning via Fine-tuning
- Transfer learning performance can be estimated by domain similarity.
Discussion
- In the AutoML setting:
○ We need a model that performs well on a small dataset. Usually it’s domain specific. ○ We have access to large datasets and pre-trained models. ○ The problem cannot be solved by pre-training on a single large source domain.
- Architectural search is one solution.
- Another solution could be from the perspective of source domain selection:
○ A model zoo with models trained on different datasets. ○ Select a source domain / pre-trained model based on domain similarity.
Dealing with long-tailed data distribution
The World is Long-Tailed
- A large number of classes are rare in nature.
- Cannot easily scale the data collection for those classes in the long tail.
Cui et al. Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019.
Overview
Effective Number of Samples: n: number of samples. Class-Balanced Loss:
The more data, the better, but...
- As the number of samples increases, the marginal benefit a model can extract
from the data diminishes.
image courtesy: https://me.me/i/ate-too-much-regrets-nothing-5869266
Data Sampling as Random Covering
- In order to measure data overlap, we associate each sample with a small
region of unit volume 1 instead of a point. Assume the volume of all possible data is N.
Theoretical Results
Class-Balanced Loss
- Class-Balanced Softmax Cross-Entropy Loss:
- Class-Balanced Sigmoid Cross-Entropy Loss:
- Class-Balanced Focal Loss:
Class-Balanced Loss
- Class-Balanced Softmax Cross-Entropy Loss:
- Class-Balanced Sigmoid Cross-Entropy Loss:
- Class-Balanced Focal Loss:
Datasets
Classification Error Rate of ResNet-32 on CIFAR
- Original Losses and best class-balanced loss.
- SM: Softmax; SGM: Sigmoid.
Analysis
Classification Error Rate on ImageNet and iNat
Classification Error Rate on ImageNet and iNat
ResNet-50 Training Curves on ImageNet and iNat
ResNet-50 Training Curves on iNat and ImageNet
Discussion
- The concept of effective number of samples for long-tailed data distribution.
- A theoretical framework to quantify effective number of samples.
○ Model each example as a small region instead of a point.
- Class-balanced loss.
- Improved performance on 3 commonly used loss functions.
- Non-parametric. We do not assume the distribution of data.
- Code available at: https://github.com/richardaecn/class-balanced-loss
- Two follow-up work:
○ K Cao et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. NeurIPS 2019. ○ B Kang et al. Decoupling Representation and Classifier for Long-Tailed Recognition. https://arxiv.org/abs/1910.09217.