Co-Representation Network for Generalized Zero-Shot Learning Fei - - PowerPoint PPT Presentation

co representation network for
SMART_READER_LITE
LIVE PREVIEW

Co-Representation Network for Generalized Zero-Shot Learning Fei - - PowerPoint PPT Presentation

Co-Representation Network for Generalized Zero-Shot Learning Fei Zhang, Guangming Shi XIDIAN UNIVERSITY ICML 2019 In Intr troduct oduction ion Classic Deep CNN Data requirements decrease Predict Transfer Learning Few-Shot


slide-1
SLIDE 1

Co-Representation Network for Generalized Zero-Shot Learning

Fei Zhang, Guangming Shi XIDIAN UNIVERSITY

ICML 2019

slide-2
SLIDE 2

In Intr troduct

  • duction

ion

➢ Classic Deep CNN ➢ Transfer Learning

  • Few-Shot Learning
  • One-Shot Learning
  • Zero-Shot Learning

(ZSL)

Predict Predict Predict

Generalized ZSL (GZSL) Conventional ZSL (CZSL)

Source space (Seen Classes) Target space (Unseen Classes) Legs Fur

· · ·

Semantic space (Attributes, word2vecs)

Data requirements decrease

slide-3
SLIDE 3

Bia ias s Pro roblem blem

Existing Embedding Models for GZSL

  • Visual Space

to Semantic Space

  • Visual & Semantic Space

to a Latent Space

  • Semantic Space

to Visual Space Bias Problem Unseen samples are easily classified into similar seen classes. e.g. Zebra → Horse

44.1 45.6 60.1 55.1 59.9 54.2 65.6 54 53 68.3 0.4 7 7.3 16.8 13.4 11.3 8.9 1.8 1.8 20 40 60 80 DAP CONSE SSE LATEM ALE DEVISE SJE SYNC SAE GFZSL

Average per-class top-1 accuracy in % on unseen classes of various models following CZSL settings and GZSL settings

CZSL GZSL

Yongqin, Xian , et al. “Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly.” IEEE TPAMI 2017

slide-4
SLIDE 4

Ou Our Mo r Model del

Expert module fK Expert module f2 Expert module f1 Relation module

C

1

O

2

O

K

O

Relation module g Similarity output CNN Horse Zebra Panda Tiger Cooperation module f Feature anchor Concatennate Image feature Semantic input Predict Back progagation Image input

➢ Co-Representation Network (CRnet) 1. A cooperation module for visual feature representation (our main contribution). 2. A pre-trained CNN (Resnet-101) for feature extraction. 3. A relation module for similarity output, i.e. the classification. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.)

slide-5
SLIDE 5

➢ Initialization Algorithm Perform K-means Clustering on the semantic space. Semantic vectors: Clustering center: Expert module k: ➢ Cooperation Module Sum the outputs of expert modules.

Alg lgorithm rithm

Single layer perceptron Single layer perceptron Single layer perceptron

Expert module fK Expert module f2 Expert module f1

1

O

2

O

K

O

Expert module fK Expert module f2 Expert module f1 Feature anchor

Visual Embedding Space

slide-6
SLIDE 6

➢ Relation Module Concatenate feature anchor (output

  • f cooperation module) and image feature

v as the input. Tow-layer perceptron with Sigmoid. Ground-truth:

  • When the model converges, cooperation

module divides the semantic space into several parts.

  • Semantic vectors located in different

parts are projected by several different expert modules.

Alg lgorithm rithm

Relation module

C

Feature anchor Concatenate Image feature Predict

➢ Training Objective function: End-to-end manner.

Semantic Space Semantic Space

slide-7
SLIDE 7

Be Benchmar nchmark k Resul sults ts

slide-8
SLIDE 8

Ana naly lysis sis

➢ Bias Problem Unseen anchors distribute too close to seen anchors in the embedding space used for classification.

Serious bias problem Slight bias problem

Visual Embedding Space Visual Embedding Space

➢ Local Relative Distance (LRD) We propose the LRD as a metric for bias problem. , Larger LRD means a more uniform embedding space, i.e. slighter bias problem. 1-d semantic space to 1-d visual embedding space:

fG: General fitting curve; fCR: Fitting curve of CRnet S: semantic space; V: visual embedding space.

  • High local linearity results in larger LRD.
  • Cooperation module actually learns a

piecewise linear function of K+1 pieces with high local linearity

slide-9
SLIDE 9

Contra ntrast st Exp xperi erimen ments ts

➢ Relation Network (RN) A two-layer perceptron instead of cooperation module is used. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.)

vs vs

Expert module fK Expert module f2 Expert module f1 Relation module

C

1

O

2

O

K

O Relation module g Similarity output CNN Horse Zebra Panda Tiger Cooperation module f Feature anchor Concatenate Image feature Semantic input Predict Image input

CRnet

Relation module

C

Relation module g Similarity output CNN Horse Zebra Panda Tiger Feature anchor Concatenate Image feature

Semantic Vectors Input

Predict Image input

Two-layer Perceptron

RN

slide-10
SLIDE 10

Contra ntrast st Exp xperi erimen ments ts

➢ Results Compared with RN, CRnet achieves:

  • More Sparse and Discriminative Features
  • More Uniform Embedding Space (Larger LRD)

20 40 60 80 100 1 2 3 4 5 6 7 8 9 10 Avg

57.5 2.9 48.6 5.8 21.1 0.1 14.8 2.3 14.5 4.5 12.3 0.8 11.6 4.3 1.30.2 0.5 0 0 0 18.2 2.1 61.5 12.6 81.5 59.5 82.1 46.3 89.7 96.2 64.2 59.3 44.9 17.6 32.3 13.8 46.7 39.3 60.9 22.3 97.9 78.9 66.2 44.6

Rate (%) Unseen Class Index Bias Rate of RN Error Rate of RN Bias Rate of CRnet Error Rate of CRnet

  • Slighter Bias Problem
  • Figure. Bar chart of per-class Bias Rate and per-class Error Rate of RN and CRnet on AwA2.

Bias Rate: The rate in % of misclassification into the closest seen class; Error Rate: Per-class classification Error Rate in %.

slide-11
SLIDE 11

Su Summarize mmarize

➢ Co-representation network

  • Decomposition method for projecting semantic space to visual

embedding space.

  • Cooperation module for representation and learnable relation module

for classification. ✓ Training in an end-to-end manner. ✓ Slighter bias problem leads to a good performance on GZSL. Other advantages: ✓ Simple structure with high expandability. ✓ No need for semantic information of unseen classes during training (compared with generative models)

Email: f.zhang@stu.xidian.edu.cn