yanweifu@fudan.edu.cn
http://yanweifu.github.io
One-shot Learning in Semantic Embedding and Data Augmentation
付彦伟 复旦大学大数据学院
Embedding and Data Augmentation yanweifu@fudan.edu.cn - - PowerPoint PPT Presentation
One-shot Learning in Semantic Embedding and Data Augmentation yanweifu@fudan.edu.cn http://yanweifu.github.io One-shot Learning: learning object categories from just a few images, by incorporating
http://yanweifu.github.io
付彦伟 复旦大学大数据学院
Object categorization
Fei-Fei et al. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories. ICCV 2003 Fei-Fei, et al. One-Shot Learning of Object Categories. IEEE TPAMI 2006
“learning object categories from just a few images, by incorporating “generic” knowledge which may be obtained from previously learnt models of unrelated categories”.
Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014; Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear
strips tails Zebra horse mule lion
Lampert, C. H. Learning to detect unseen object classes by between-class attribute transfer. CVPR 2009
Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014;
Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014;
Dataset & Settings:
Comparisons
Birthday party Graduation Music performance Non-music performance Parade Wedding ceremony Wedding dance Wedding reception
For more results, please check our papers.
Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear
airplane car unicycle tricycle
Semantic labels Visual feature space
airplane unicycle tricycle car
Semantic labels Visual feature space
truck bicycle
Zero/one-shot Learning: We have zero/one instances visually labeled instances of what these look like.
Semantic labels Visual feature space
truck bicycle airplane unicycle tricycle car truck bicycle
Semantic labels Visual feature space
truck bicycle airplane unicycle tricycle car truck bicycle Key Question: How do we define semantic space?
Spaces Type Advantages Disadvantages Semantic Attributes Supervised Good interpretability of each dimension: Manual annotation Limited vocabulary Semantic Word Vectors (e.g. word2vec) Unsupervised Good vector representation for millions of vocabulary Limited interpretability of each dimension
unicycle tricycle
Image
Fu et al. Semi-supervised Vocabulary-informed learning, CVPR 2016 (Oral)
Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear
Margin distribution of prototypes in the semantic space
The knowledge of margin distribution of instances, rather than a single margin across all instances, is crucial for improving the generalization performance of a classifier.
Instance margin: the distance between one instance and the separating
characterized by a Weibull distribution
The probability of (𝑦) included in the boundary estimated by (𝑦𝑗) Margin Distribution of Prototypes: Coverage Distribution of Prototypes. Extreme Value Theorem
Dataset:
We can address following tasks by learning semantic embedding,
(vocabulary-informed)
Few-shots on source dataset
One-shot learning aims to learn information about object categories from one, or
Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. IEEE Transaction on Image Processing (TIP) 2019
Image Feature Space Semantic Feature Space
Antelopes Killer whale Beaver Mountain goat Whale Orca Sea lion Muskrat Woodchuck Badger Hartebeest Pronghorn
Help?
Image Feature Space Semantic Feature Space
Antelopes Killer whale Beaver Mountain goat Whale Orca Sea lion Muskrat Woodchuck Badger Hartebeest Pronghorn
𝑔(𝑦) (𝑦)
Image Deformation Meta-Networks for One-Shot Learning
Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert
Image Block Augmentation for One-Shot Learning. Zitian Chen, Yanwei Fu, Kaiyu Chen, Yu-Gang Jiang. AAAI 2019
33
Visual contents from other images may be helpful to synthesize new images
34
Ghosted Stitched Montaged Partially occluded
Human can learn novel visual concepts even when images undergo various deformations
Deformed Images Visual contents from other images might be helpful
36
37
Motivation 1.Visual contents from other images may be helpful to synthesize new images. 2.Human can learn novel visual concepts even when images undergo various deformations. Approach We design a deformation sub-network that learns to deform images by fusing a pair of images — a probe image that keeps the visual content and a gallery image that diversifies the de- formations.
Probe Image Gallery Image find visually similar
ANET BNET
Concat Probe Image Gallery Image
Embedding Sub-Network Deformation Sub-Network
39
50 55 60 65 70 75 1-Shot 5-Shot
Top-1 accuracies(%) on miniImagenet
Baseline Ours 50 55 60 65 70 75 1-Shot 5-Shot
Top-1 accuracies(%) on miniImagenet
Baseline Ours
40
Gaussian Ours
real probe image deformed image real image
40
source: https://birdeden.com/distinguishing-between-hawks-falcons
Z Z
One Million General Images
A Specific Image Transfer generative knowledge from one million general images to a domain specific image.
Original Generated
Image Fusion Net F Fusing Weight 𝑥 Original: 𝐽 Generated: 𝐻(𝐽) Fused: 𝑥𝐽 + (1 − 𝑥)𝐻(𝐽)
Learning to reinforce with the original image
Yuqian Fu, Chengrong Wang, Yanwei Fu, Yu-Xiong Wang, Cong Bai, Xiangyang Xue, Yu-Gang Jiang
ACM Multimedia 2019
One-Shot Learning Setting Revisited
“shooting basketball” “running”
Source Domain Target Domain
P1D-09
Learning from Actions of a Virtual Embodied Agent
Virtual Environment Virtual Embodied Agent Virtual Action Videos
https://www.unrealengine.com/marketplace/en-US/store
P1D-09
break dancing throwing waving hand
Real Target Data Virtual Source Data
http://www.sdspeople.fudan.edu.cn/fuyanwei/dataset/UnrealAction/
UnrealAction Dataset
P1D-09
Classical One-shot Recognition
Embodied One-Shot Recognition Domain Adaptation Transfer Recognition
Embodied One-Shot Video Recognition
P1D-09
Action Label : c Action Label : c Probe Video Gallery Video Segment Augmented Video
Video Segment Augmentation Method
P1D-09
Video Segment Augmentation Method
CNN model
Probe segments in 𝑊
𝑞𝑠𝑝𝑐𝑓
segment-level feature extractor Gallery segments in 𝐻𝑞𝑝𝑝𝑚
𝐻1 𝐻𝑙 𝐻𝑜 𝐻2 … 𝑄
1
𝑄
𝑛
𝑄2 …
semantic correlation scores matrix
… … slide window
𝑧𝑙,1 𝑧𝑙,2 𝑧𝑙,3 𝑧𝑙,𝑛 … [ 𝜇1, 𝜇2, 𝜇1 ] 𝑧𝑙,2
′
… 𝑧𝑙,1
′
𝑧𝑙,3
′
𝑧𝑙,𝑛
′
slide
*
𝑔θ ( 𝐻𝑙) …
∎ ∎ … ∎ ∎ ∎ ∎ ∎ ∎ … ∎ ∎ ∎ … ∎ …
𝐺θ ( 𝐻𝑞𝑝𝑝𝑚) 𝐺θ ( 𝑊
𝑞𝑠𝑝𝑐𝑓)
d ( , )
CNN model
P1D-09
sample
Framework
Training Testing
stage1 Video Segment Augmentation n-way-k-shot 1 query video
fine-tuning stage2 fine-tuning Video Segment Augmentation
ProtoNet feature extractor
P1D-09