Co-Representation Network for Generalized Zero-Shot Learning
Fei Zhang, Guangming Shi XIDIAN UNIVERSITY
ICML 2019
Co-Representation Network for Generalized Zero-Shot Learning Fei - - PowerPoint PPT Presentation
Co-Representation Network for Generalized Zero-Shot Learning Fei Zhang, Guangming Shi XIDIAN UNIVERSITY ICML 2019 In Intr troduct oduction ion Classic Deep CNN Data requirements decrease Predict Transfer Learning Few-Shot
Fei Zhang, Guangming Shi XIDIAN UNIVERSITY
ICML 2019
(ZSL)
Predict Predict Predict
Generalized ZSL (GZSL) Conventional ZSL (CZSL)
Source space (Seen Classes) Target space (Unseen Classes) Legs Fur
· · ·
Semantic space (Attributes, word2vecs)
Data requirements decrease
Existing Embedding Models for GZSL
to Semantic Space
to a Latent Space
to Visual Space Bias Problem Unseen samples are easily classified into similar seen classes. e.g. Zebra → Horse
44.1 45.6 60.1 55.1 59.9 54.2 65.6 54 53 68.3 0.4 7 7.3 16.8 13.4 11.3 8.9 1.8 1.8 20 40 60 80 DAP CONSE SSE LATEM ALE DEVISE SJE SYNC SAE GFZSL
Average per-class top-1 accuracy in % on unseen classes of various models following CZSL settings and GZSL settings
CZSL GZSL
Yongqin, Xian , et al. “Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly.” IEEE TPAMI 2017
Expert module fK Expert module f2 Expert module f1 Relation module
C
1
O
2
O
K
O
Relation module g Similarity output CNN Horse Zebra Panda Tiger Cooperation module f Feature anchor Concatennate Image feature Semantic input Predict Back progagation Image input
➢ Co-Representation Network (CRnet) 1. A cooperation module for visual feature representation (our main contribution). 2. A pre-trained CNN (Resnet-101) for feature extraction. 3. A relation module for similarity output, i.e. the classification. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.)
➢ Initialization Algorithm Perform K-means Clustering on the semantic space. Semantic vectors: Clustering center: Expert module k: ➢ Cooperation Module Sum the outputs of expert modules.
Single layer perceptron Single layer perceptron Single layer perceptron
Expert module fK Expert module f2 Expert module f1
1
O
2
O
K
O
Expert module fK Expert module f2 Expert module f1 Feature anchor
Visual Embedding Space
➢ Relation Module Concatenate feature anchor (output
v as the input. Tow-layer perceptron with Sigmoid. Ground-truth:
module divides the semantic space into several parts.
parts are projected by several different expert modules.
Relation module
C
Feature anchor Concatenate Image feature Predict
➢ Training Objective function: End-to-end manner.
Semantic Space Semantic Space
➢ Bias Problem Unseen anchors distribute too close to seen anchors in the embedding space used for classification.
Serious bias problem Slight bias problem
Visual Embedding Space Visual Embedding Space
➢ Local Relative Distance (LRD) We propose the LRD as a metric for bias problem. , Larger LRD means a more uniform embedding space, i.e. slighter bias problem. 1-d semantic space to 1-d visual embedding space:
fG: General fitting curve; fCR: Fitting curve of CRnet S: semantic space; V: visual embedding space.
piecewise linear function of K+1 pieces with high local linearity
➢ Relation Network (RN) A two-layer perceptron instead of cooperation module is used. (Sung, Flood , et al. "Learning to Compare: Relation Network for Few-Shot Learning." CVPR 2018.)
vs vs
Expert module fK Expert module f2 Expert module f1 Relation module
C
1
O
2
O
KO Relation module g Similarity output CNN Horse Zebra Panda Tiger Cooperation module f Feature anchor Concatenate Image feature Semantic input Predict Image input
CRnet
Relation module
C
Relation module g Similarity output CNN Horse Zebra Panda Tiger Feature anchor Concatenate Image feature
Semantic Vectors Input
Predict Image input
Two-layer Perceptron
RN
➢ Results Compared with RN, CRnet achieves:
20 40 60 80 100 1 2 3 4 5 6 7 8 9 10 Avg
57.5 2.9 48.6 5.8 21.1 0.1 14.8 2.3 14.5 4.5 12.3 0.8 11.6 4.3 1.30.2 0.5 0 0 0 18.2 2.1 61.5 12.6 81.5 59.5 82.1 46.3 89.7 96.2 64.2 59.3 44.9 17.6 32.3 13.8 46.7 39.3 60.9 22.3 97.9 78.9 66.2 44.6
Rate (%) Unseen Class Index Bias Rate of RN Error Rate of RN Bias Rate of CRnet Error Rate of CRnet
Bias Rate: The rate in % of misclassification into the closest seen class; Error Rate: Per-class classification Error Rate in %.
➢ Co-representation network
embedding space.
for classification. ✓ Training in an end-to-end manner. ✓ Slighter bias problem leads to a good performance on GZSL. Other advantages: ✓ Simple structure with high expandability. ✓ No need for semantic information of unseen classes during training (compared with generative models)
Email: f.zhang@stu.xidian.edu.cn