Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui - - PowerPoint PPT Presentation

learning from fine grained and long tailed visual data
SMART_READER_LITE
LIVE PREVIEW

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui - - PowerPoint PPT Presentation

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019 Visual Recognition System Database Supervised Learning Convolutional Neural bird Network (CNN) Visual Recognition System Larger Database


slide-1
SLIDE 1

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui

Google Research Dec 11 2019

slide-2
SLIDE 2

Visual Recognition System

Convolutional Neural Network (CNN)

“bird”

Supervised Learning

Database

slide-3
SLIDE 3

Convolutional Neural Network (CNN)

“Northern cardinal”

Supervised Learning

Larger Database (more images, more classes)

Visual Recognition System

slide-4
SLIDE 4

Convolutional Neural Network (CNN)

“Northern cardinal”

Supervised Learning

Even Larger Database

Visual Recognition System

slide-5
SLIDE 5

Convolutional Neural Network (CNN)

“Northern cardinal”

Supervised Learning

Even Larger Database

Problem occurs...

  • Long-tailed

○ Majority of categories are rare

  • Hard to get labels

○ Labeling effort grows dramatically per image. ○ Human expertise.

slide-6
SLIDE 6

In reality...

Convolutional Neural Network (CNN)

“bird”

Supervised Learning

Medium-sized Database

slide-7
SLIDE 7

Convolutional Neural Network (CNN)

“Northern cardinal”

Supervised Learning Transfer Learning

Large-Scale Dataset

But luckily, we have transfer learning

Medium-sized Database

slide-8
SLIDE 8

A diverse array of data sources

Search Engine Social Network Communities

slide-9
SLIDE 9

Can we build a generic, one size fits all pre-trained model for transfer learning?

slide-10
SLIDE 10

Large-scale pre-training

  • D. Mahajan et al. Exploring the Limits of Weakly Supervised Pretraining.

ECCV 2018.

  • C. Sun et al. Revisiting Unreasonable Effectiveness of Data in Deep

Learning Era. ICCV 2017.

slide-11
SLIDE 11

Generic vs. Specialized Model

  • ImageNet pre-training vs. iNaturalist pre-training
  • iNaturalist 2017 contains 859k images from 5000+ natural categories.
  • Fine-tuned on 7 medium-sized datasets.

CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet

82.84 84.19 96.26 91.31 85.49 88.65 82.01

iNat

89.26 78.46 97.64 88.31 82.61 88.80 87.91

slide-12
SLIDE 12

Generic vs. Specialized Model

  • ImageNet pre-training vs. iNaturalist pre-training
  • iNaturalist 2017 contains 859k images from 5000+ natural categories.
  • Fine-tuned on 7 medium-sized datasets.
  • Combining ImageNet + iNat. More data doesn’t always help.

CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet

82.84 84.19 96.26 91.31 85.49 88.65 82.01

iNat

89.26 78.46 97.64 88.31 82.61 88.80 87.91

ImageNet + iNat

85.84 82.36 97.07 91.38 85.21 88.45 83.98

slide-13
SLIDE 13

Model Capacity is not a problem

  • Combined training achieve similar performance on each dataset.
  • Model is able to learn well on both datasets, but cannot transfer well.

○ Trade-off between quantity and quality in transfer learning. ○ Pre-training a more specialized model could help.

slide-14
SLIDE 14

Domain similarity via Earth Mover’s Distance

  • Red: source domain. Green: target domain.

Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.

slide-15
SLIDE 15

Source domain selection

  • Greedy selection strategy: sort and include most similar source classes.

○ Simple and no guarantee on the optimality, but works well in practice.

Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.

slide-16
SLIDE 16

Improved Transfer Learning

  • Comparable to the best of ImageNet, iNat with only a subset of 585 classes.

CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet

82.84 84.19 96.26 91.31 85.49 88.65 82.01

iNat

89.26 78.46 97.64 88.31 82.61 88.80 87.91

ImageNet + iNat

85.84 82.36 97.07 91.38 85.21 88.45 83.98

Ours (585-class)

88.76 85.23 97.37 90.58 86.13 88.37 87.89

slide-17
SLIDE 17

Transfer Learning via Fine-tuning

  • Transfer learning performance can be estimated by domain similarity.
slide-18
SLIDE 18
slide-19
SLIDE 19

Discussion

  • In the AutoML setting:

○ We need a model that performs well on a small dataset. Usually it’s domain specific. ○ We have access to large datasets and pre-trained models. ○ The problem cannot be solved by pre-training on a single large source domain.

  • Architectural search is one solution.
  • Another solution could be from the perspective of source domain selection:

○ A model zoo with models trained on different datasets. ○ Select a source domain / pre-trained model based on domain similarity.

slide-20
SLIDE 20

Dealing with long-tailed data distribution

slide-21
SLIDE 21

The World is Long-Tailed

  • A large number of classes are rare in nature.
  • Cannot easily scale the data collection for those classes in the long tail.

Cui et al. Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019.

slide-22
SLIDE 22

Overview

Effective Number of Samples: n: number of samples. Class-Balanced Loss:

slide-23
SLIDE 23

The more data, the better, but...

  • As the number of samples increases, the marginal benefit a model can extract

from the data diminishes.

image courtesy: https://me.me/i/ate-too-much-regrets-nothing-5869266

slide-24
SLIDE 24

Data Sampling as Random Covering

  • In order to measure data overlap, we associate each sample with a small

region of unit volume 1 instead of a point. Assume the volume of all possible data is N.

slide-25
SLIDE 25

Theoretical Results

slide-26
SLIDE 26

Class-Balanced Loss

  • Class-Balanced Softmax Cross-Entropy Loss:
  • Class-Balanced Sigmoid Cross-Entropy Loss:
  • Class-Balanced Focal Loss:
slide-27
SLIDE 27

Class-Balanced Loss

  • Class-Balanced Softmax Cross-Entropy Loss:
  • Class-Balanced Sigmoid Cross-Entropy Loss:
  • Class-Balanced Focal Loss:
slide-28
SLIDE 28

Datasets

slide-29
SLIDE 29

Classification Error Rate of ResNet-32 on CIFAR

  • Original Losses and best class-balanced loss.
  • SM: Softmax; SGM: Sigmoid.
slide-30
SLIDE 30

Analysis

slide-31
SLIDE 31

Classification Error Rate on ImageNet and iNat

slide-32
SLIDE 32

Classification Error Rate on ImageNet and iNat

slide-33
SLIDE 33

ResNet-50 Training Curves on ImageNet and iNat

slide-34
SLIDE 34

ResNet-50 Training Curves on iNat and ImageNet

slide-35
SLIDE 35

Discussion

  • The concept of effective number of samples for long-tailed data distribution.
  • A theoretical framework to quantify effective number of samples.

○ Model each example as a small region instead of a point.

  • Class-balanced loss.
  • Improved performance on 3 commonly used loss functions.
  • Non-parametric. We do not assume the distribution of data.
  • Code available at: https://github.com/richardaecn/class-balanced-loss
  • Two follow-up work:

○ K Cao et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. NeurIPS 2019. ○ B Kang et al. Decoupling Representation and Classifier for Long-Tailed Recognition. https://arxiv.org/abs/1910.09217.

slide-36
SLIDE 36

Thanks!