 
              One-shot Learning in Semantic Embedding and Data Augmentation 付彦伟 复旦大学大数据学院 yanweifu@fudan.edu.cn http://yanweifu.github.io
One-shot Learning: “ learning object categories from just a few images, by incorporating “generic” knowledge which may be obtained from previously learnt models of unrelated categories ” . Fei-Fei et al. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories. ICCV 2003 Fei-Fei, et al. One-Shot Learning of Object Categories. IEEE TPAMI 2006 One-shot Learning Object categorization
Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014; Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear One-shot Learning by Semantic Embedding
Attribute Learning Pipeline mule lion horse Zebra strips tails Lampert, C. H. Learning to detect unseen object classes by between-class attribute transfer. CVPR 2009
Semantic Attributes in Zero/One-shot Learning Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014;
Learning Multi-modal Latent Attributes Fu, Y.; Hospedales , T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales , T. ; Xiang, T. ; Gong, S. “Learning Multi - modal Latent Attributes” IEEE TPAMI 2014;
Experimental Settings Dataset & Settings: • USAA dataset (4 source cls, 4 target cls, multiple round class splits); • Animal with Attributes (AwA) dataset (40 source cls; 10 target cls); Comparisons • Direct: KNN/SVM of features to classes; • DAP: Direct Attribute Prediction [Lampert et al. CVPR 2009]; • SVM-UD: an SVM generalization of DAP; • SCA: Topic models in [Wang et al CVPR 2009]; • ST: Synthetic Transfer in [Yu et al ECCV 2010];
Unstructured Social Activity Dataset (USAA) Music Non-music Wedding Wedding Wedding Parade Birthday party Graduation performance performance ceremony dance reception
One-shot Learning Results For more results, please check our papers.
Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear Vocabulary-informed Learning
Supervised Learning Semantic labels Visual feature space airplane car unicycle tricycle
One-shot Learning Semantic labels Visual feature space airplane car unicycle tricycle
Zero/One-shot Learning by Semantic Embedding (Problem Definition) Semantic labels Visual feature space Zero/one-shot Learning: We have zero/one instances visually labeled instances of what these look like. bicycle truck
Learning Semantic labels Visual feature space airplane unicycle bicycle bicycle tricycle car truck truck
Inference airplane unicycle bicycle bicycle tricycle car truck truck Key Question: How do we define semantic space?
Semantic Label Vector Spaces Spaces Type Advantages Disadvantages Manual annotation Semantic Good interpretability of each dimension: Supervised Attributes Limited vocabulary Good vector representation for millions of Semantic Word Limited interpretability of Vectors Unsupervised vocabulary each dimension (e.g. word2vec)
Vocabulary-Informed Recognition Image unicycle tricycle Fu et al. Semi-supervised Vocabulary-informed learning, CVPR 2016 (Oral)
Estimating Density of Classes in the Space The knowledge of margin distribution of instances, rather than a single margin across all instances, is crucial for improving the generalization performance of a classifier. Instance margin : the distance between one instance and the separating hyperplane. The distribution for the minimal values of the margin distance is characterized by a Weibull distribution The probability of (𝑦) included in the boundary estimated by (𝑦 𝑗 ) Margin Distribution of Prototypes: Margin distribution of prototypes in the semantic space Coverage Distribution of Prototypes. Extreme Value Theorem Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear
Experimental Dataset and Tasks Dataset: AwA dataset: • ImageNet 2012/2010 dataset. • We can address following tasks by learning semantic embedding, • SUPERVISED recognition • ZERO-SHOT recognition • GENERAL-ZERO-SHOT recognition • ONE-SHOT recognition • OPEN-SET recognition
Experimental Settings of Few-shot Learning • Learning Classifiers from Few Source Training Instances • Source classes: One-shot Recognition • Target classes: Zero-shot Recognition • Key insights: leveraging the knowledge from semantic space (vocabulary-informed) • Few-shot Target Training instances • Few-shot setting, consistent with general definition
Results on Few-shot Learning Few-shots on source dataset
Results on Few-shot Learning
One-shot learning aims to learn information about object categories from one, or only a few , training images. Meta-Learning Data-Augmentation Meta Augmentation Learning One-shot Learning by Data Augmentation
Multi-level Semantic Feature Augmentation for One-shot Learning Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. IEEE Transaction on Image Processing (TIP) 2019
Motivation • A straight forward way to tackle one-shot learning is data augmentation • We want to utilize semantic space • Related concepts in the semantic space help to learn Help? Image Feature Space Semantic Feature Space Killer whale Sea lion Mountain goat Whale Hartebeest Orca Antelopes Pronghorn Muskrat Beaver Badger Woodchuck
Method Image Feature Space Semantic Feature Space Killer whale Sea lion Mountain goat Whale Hartebeest Orca Antelopes Pronghorn Muskrat 𝑔(𝑦) Beaver Badger Woodchuck (𝑦)
Single-level • But we want to utilize different level visual concepts.
Multi-level • Use High-level feature and low-level feature help to encode • Decode semantic feature to different level feature diversify the augmented features
Visualization
Image Deformation Meta-Networks for One-Shot Learning Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert
The Basic Idea of Jigsaw Augmentation Method Image Block Augmentation for One-Shot Learning. Zitian Chen, Yanwei Fu, Kaiyu Chen, Yu-Gang Jiang. AAAI 2019
Visual contents from other images may be helpful to synthesize new images 33
Stitched Ghosted Partially occluded Montaged Human can learn novel visual concepts even when images undergo various deformations 34
Deformed Images Visual contents from other images might be helpful
Approach 36
Motivation 1.Visual contents from other images may be helpful to synthesize new images. 2.Human can learn novel visual concepts even when images undergo various deformations. Approach We design a deformation sub-network that learns to deform images by fusing a pair of images — a probe image that keeps the visual content and a gallery image that diversifies the de- formations. 37
Probe Image ANET Probe Image Concat find visually similar BNET Gallery Image Gallery Image Embedding Sub-Network Deformation Sub-Network
Top-1 accuracies(%) on miniImagenet Top-1 accuracies(%) on miniImagenet 75 75 70 70 Baseline Baseline 65 65 60 60 Ours Ours 55 55 50 50 1-Shot 1-Shot 5-Shot 5-Shot 39
Gaussian Ours real probe image deformed image real image 40 40
NeurIPS 2019
Falcon Hawk source: https://birdeden.com/distinguishing-between-hawks-falcons
Fine-grained Visual Recognition • Much harder than normal classification. • Difficult to collect data. • Can’t use crowdsourcing. • Need expert annotator. • Demand one-shot learning.
Can we generate more data? • How about state-of-the-art GANs? • Challenge: GAN training itself need a lot of data.
Our Idea: Fine-tune GANs trained on ImageNet. One Million General Images BigGAN Z Transfer generative knowledge from one million general images to a domain specific image. ? A Specific Image Z
Fine-tune BigGAN with a single image Generated Original
Technical Point: Fine-tune Batch Norm Only Original Fine-Tune All Fine-Tune BatchNorm
Our idea: Meta-Augmentation Learning Learning to reinforce with the original image Fused: 𝑥𝐽 + (1 − 𝑥)𝐻(𝐽) Generated: 𝐻(𝐽) Original: 𝐽 Image Fusion Net F Fusing Weight 𝑥 Use meta-learning to learn the best mixing strategy to help one-shot classifiers.
Examples
Our method has consistent improvement.
Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent Yuqian Fu, Chengrong Wang, Yanwei Fu, Yu-Xiong Wang, Cong Bai, Xiangyang Xue, Yu-Gang Jiang ACM Multimedia 2019
Recommend
More recommend