Embedding and Data Augmentation yanweifu@fudan.edu.cn - - PowerPoint PPT Presentation

embedding and data augmentation
SMART_READER_LITE
LIVE PREVIEW

Embedding and Data Augmentation yanweifu@fudan.edu.cn - - PowerPoint PPT Presentation

One-shot Learning in Semantic Embedding and Data Augmentation yanweifu@fudan.edu.cn http://yanweifu.github.io One-shot Learning: learning object categories from just a few images, by incorporating


slide-1
SLIDE 1

yanweifu@fudan.edu.cn

http://yanweifu.github.io

One-shot Learning in Semantic Embedding and Data Augmentation

付彦伟 复旦大学大数据学院

slide-2
SLIDE 2

One-shot Learning

Object categorization

Fei-Fei et al. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories. ICCV 2003 Fei-Fei, et al. One-Shot Learning of Object Categories. IEEE TPAMI 2006

One-shot Learning:

“learning object categories from just a few images, by incorporating “generic” knowledge which may be obtained from previously learnt models of unrelated categories”.

slide-3
SLIDE 3

One-shot Learning by Semantic Embedding

Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014; Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear

slide-4
SLIDE 4

Attribute Learning Pipeline

strips tails Zebra horse mule lion

Lampert, C. H. Learning to detect unseen object classes by between-class attribute transfer. CVPR 2009

slide-5
SLIDE 5

Semantic Attributes in Zero/One-shot Learning

Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014;

slide-6
SLIDE 6

Learning Multi-modal Latent Attributes

Fu, Y.; Hospedales, T.; Xiang, T; Gong, S. “Attribute Learning for Understanding Unstructured Social Activity”, ECCV 2012; Fu, Y. ; Hospedales, T. ; Xiang, T. ; Gong, S. “Learning Multi-modal Latent Attributes” IEEE TPAMI 2014;

slide-7
SLIDE 7

Experimental Settings

Dataset & Settings:

  • USAA dataset (4 source cls, 4 target cls, multiple round class splits);
  • Animal with Attributes (AwA) dataset (40 source cls; 10 target cls);

Comparisons

  • Direct: KNN/SVM of features to classes;
  • DAP: Direct Attribute Prediction [Lampert et al. CVPR 2009];
  • SVM-UD: an SVM generalization of DAP;
  • SCA: Topic models in [Wang et al CVPR 2009];
  • ST: Synthetic Transfer in [Yu et al ECCV 2010];
slide-8
SLIDE 8

Unstructured Social Activity Dataset (USAA)

Birthday party Graduation Music performance Non-music performance Parade Wedding ceremony Wedding dance Wedding reception

slide-9
SLIDE 9

One-shot Learning Results

For more results, please check our papers.

slide-10
SLIDE 10

Vocabulary-informed Learning

Fu et al. Semi-supervised Vocabulary-informed Learning. (CVPR 2016, oral) Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear

slide-11
SLIDE 11

Supervised Learning

airplane car unicycle tricycle

Semantic labels Visual feature space

slide-12
SLIDE 12

airplane unicycle tricycle car

One-shot Learning

Semantic labels Visual feature space

slide-13
SLIDE 13

Zero/One-shot Learning by Semantic Embedding (Problem Definition)

truck bicycle

Zero/one-shot Learning: We have zero/one instances visually labeled instances of what these look like.

Semantic labels Visual feature space

slide-14
SLIDE 14

truck bicycle airplane unicycle tricycle car truck bicycle

Semantic labels Visual feature space

Learning

slide-15
SLIDE 15

Inference

truck bicycle airplane unicycle tricycle car truck bicycle Key Question: How do we define semantic space?

slide-16
SLIDE 16

Semantic Label Vector Spaces

Spaces Type Advantages Disadvantages Semantic Attributes Supervised Good interpretability of each dimension: Manual annotation Limited vocabulary Semantic Word Vectors (e.g. word2vec) Unsupervised Good vector representation for millions of vocabulary Limited interpretability of each dimension

slide-17
SLIDE 17
slide-18
SLIDE 18

Vocabulary-Informed Recognition

unicycle tricycle

Image

Fu et al. Semi-supervised Vocabulary-informed learning, CVPR 2016 (Oral)

slide-19
SLIDE 19

Estimating Density of Classes in the Space

Fu et al. Vocabulary-informed Zero-shot and Open-set Learning. IEEE TPAMI to appear

Margin distribution of prototypes in the semantic space

The knowledge of margin distribution of instances, rather than a single margin across all instances, is crucial for improving the generalization performance of a classifier.

Instance margin: the distance between one instance and the separating

  • hyperplane. The distribution for the minimal values of the margin distance is

characterized by a Weibull distribution

The probability of 𝑕(𝑦) included in the boundary estimated by 𝑕(𝑦𝑗) Margin Distribution of Prototypes: Coverage Distribution of Prototypes. Extreme Value Theorem

slide-20
SLIDE 20

Experimental Dataset and Tasks

Dataset:

  • AwA dataset:
  • ImageNet 2012/2010 dataset.

We can address following tasks by learning semantic embedding,

  • SUPERVISED recognition
  • ZERO-SHOT recognition
  • GENERAL-ZERO-SHOT recognition
  • ONE-SHOT recognition
  • OPEN-SET recognition
slide-21
SLIDE 21

Experimental Settings of Few-shot Learning

  • Learning Classifiers from Few Source Training Instances
  • Source classes: One-shot Recognition
  • Target classes: Zero-shot Recognition
  • Key insights: leveraging the knowledge from semantic space

(vocabulary-informed)

  • Few-shot Target Training instances
  • Few-shot setting, consistent with general definition
slide-22
SLIDE 22

Results on Few-shot Learning

Few-shots on source dataset

slide-23
SLIDE 23

Results on Few-shot Learning

slide-24
SLIDE 24

One-shot learning aims to learn information about object categories from one, or

  • nly a few, training images.

Data-Augmentation Meta-Learning Meta Augmentation Learning

One-shot Learning by Data Augmentation

slide-25
SLIDE 25

Multi-level Semantic Feature Augmentation for One-shot Learning

Zitian Chen, Yanwei Fu, Yinda Zhang, Yu-Gang Jiang, Xiangyang Xue, and Leonid Sigal. IEEE Transaction on Image Processing (TIP) 2019

slide-26
SLIDE 26

Motivation

  • A straight forward way to tackle one-shot learning is data augmentation
  • We want to utilize semantic space
  • Related concepts in the semantic space help to learn

Image Feature Space Semantic Feature Space

Antelopes Killer whale Beaver Mountain goat Whale Orca Sea lion Muskrat Woodchuck Badger Hartebeest Pronghorn

Help?

slide-27
SLIDE 27

Method

Image Feature Space Semantic Feature Space

Antelopes Killer whale Beaver Mountain goat Whale Orca Sea lion Muskrat Woodchuck Badger Hartebeest Pronghorn

𝑔(𝑦) 𝑕(𝑦)

slide-28
SLIDE 28

Single-level

  • But we want to utilize different level visual concepts.
slide-29
SLIDE 29

Multi-level

  • Use High-level feature and low-level feature help to encode
  • Decode semantic feature to different level feature diversify the augmented features
slide-30
SLIDE 30

Visualization

slide-31
SLIDE 31

Image Deformation Meta-Networks for One-Shot Learning

Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert

slide-32
SLIDE 32

The Basic Idea of Jigsaw Augmentation Method

Image Block Augmentation for One-Shot Learning. Zitian Chen, Yanwei Fu, Kaiyu Chen, Yu-Gang Jiang. AAAI 2019

slide-33
SLIDE 33

33

Visual contents from other images may be helpful to synthesize new images

slide-34
SLIDE 34

34

Ghosted Stitched Montaged Partially occluded

Human can learn novel visual concepts even when images undergo various deformations

slide-35
SLIDE 35

Deformed Images Visual contents from other images might be helpful

slide-36
SLIDE 36

36

Approach

slide-37
SLIDE 37

37

Motivation 1.Visual contents from other images may be helpful to synthesize new images. 2.Human can learn novel visual concepts even when images undergo various deformations. Approach We design a deformation sub-network that learns to deform images by fusing a pair of images — a probe image that keeps the visual content and a gallery image that diversifies the de- formations.

slide-38
SLIDE 38

Probe Image Gallery Image find visually similar

ANET BNET

Concat Probe Image Gallery Image

Embedding Sub-Network Deformation Sub-Network

slide-39
SLIDE 39

39

50 55 60 65 70 75 1-Shot 5-Shot

Top-1 accuracies(%) on miniImagenet

Baseline Ours 50 55 60 65 70 75 1-Shot 5-Shot

Top-1 accuracies(%) on miniImagenet

Baseline Ours

slide-40
SLIDE 40

40

Gaussian Ours

real probe image deformed image real image

40

slide-41
SLIDE 41

NeurIPS 2019

slide-42
SLIDE 42

Hawk

source: https://birdeden.com/distinguishing-between-hawks-falcons

Falcon

slide-43
SLIDE 43

Fine-grained Visual Recognition

  • Much harder than normal classification.
  • Difficult to collect data.
  • Can’t use crowdsourcing.
  • Need expert annotator.
  • Demand one-shot learning.
slide-44
SLIDE 44

Can we generate more data?

  • How about state-of-the-art GANs?
  • Challenge: GAN training itself need a lot of data.
slide-45
SLIDE 45

Our Idea: Fine-tune GANs trained on ImageNet.

BigGAN

Z Z

One Million General Images

?

A Specific Image Transfer generative knowledge from one million general images to a domain specific image.

slide-46
SLIDE 46

Fine-tune BigGAN with a single image

Original Generated

slide-47
SLIDE 47

Technical Point: Fine-tune Batch Norm Only

Original Fine-Tune All Fine-Tune BatchNorm

slide-48
SLIDE 48

Our idea: Meta-Augmentation Learning

Image Fusion Net F Fusing Weight 𝑥 Original: 𝐽 Generated: 𝐻(𝐽) Fused: 𝑥𝐽 + (1 − 𝑥)𝐻(𝐽)

Use meta-learning to learn the best mixing strategy to help one-shot classifiers.

Learning to reinforce with the original image

slide-49
SLIDE 49

Examples

slide-50
SLIDE 50

Our method has consistent improvement.

slide-51
SLIDE 51

Embodied One-Shot Video Recognition: Learning from Actions of a Virtual Embodied Agent

Yuqian Fu, Chengrong Wang, Yanwei Fu, Yu-Xiong Wang, Cong Bai, Xiangyang Xue, Yu-Gang Jiang

ACM Multimedia 2019

slide-52
SLIDE 52

One-Shot Learning Setting Revisited

“shooting basketball” “running”

  • Quite similar video clips may appear in both source and target classes.

Source Domain Target Domain

P1D-09

slide-53
SLIDE 53

Embodied One-Shot Video Recognition

slide-54
SLIDE 54

Learning from Actions of a Virtual Embodied Agent

Virtual Environment Virtual Embodied Agent Virtual Action Videos

https://www.unrealengine.com/marketplace/en-US/store

  • Learning from actions of virtual embodied agents to address the limitations.

P1D-09

slide-55
SLIDE 55

break dancing throwing waving hand

Real Target Data Virtual Source Data

http://www.sdspeople.fudan.edu.cn/fuyanwei/dataset/UnrealAction/

UnrealAction Dataset

  • 14 action classes.
  • each class has 100 virtual videos and 10 real videos.

P1D-09

slide-56
SLIDE 56

Classical One-shot Recognition

a b c d

Embodied One-Shot Recognition Domain Adaptation Transfer Recognition

A B A B A B E F

Embodied One-Shot Video Recognition

P1D-09

slide-57
SLIDE 57

Action Label : c Action Label : c Probe Video Gallery Video Segment Augmented Video

Video Segment Augmentation Method

  • Subliminal advertising experiments.
  • Augmenting videos by replacing short segments.

P1D-09

slide-58
SLIDE 58

Method

slide-59
SLIDE 59

Video Segment Augmentation Method

CNN model

Probe segments in 𝑊

𝑞𝑠𝑝𝑐𝑓

segment-level feature extractor Gallery segments in 𝐻𝑞𝑝𝑝𝑚

𝐻1 𝐻𝑙 𝐻𝑜 𝐻2 … 𝑄

1

𝑄

𝑛

𝑄2 …

semantic correlation scores matrix

… … slide window

𝑧𝑙,1 𝑧𝑙,2 𝑧𝑙,3 𝑧𝑙,𝑛 … [ 𝜇1, 𝜇2, 𝜇1 ] 𝑧𝑙,2

… 𝑧𝑙,1

𝑧𝑙,3

𝑧𝑙,𝑛

slide

*

𝑔θ ( 𝐻𝑙) …

∎ ∎ … ∎ ∎ ∎ ∎ ∎ ∎ … ∎ ∎ ∎ … ∎ …

𝐺θ ( 𝐻𝑞𝑝𝑝𝑚) 𝐺θ ( 𝑊

𝑞𝑠𝑝𝑐𝑓)

d ( , )

CNN model

P1D-09

slide-60
SLIDE 60

sample

Framework

Training Testing

𝐸𝑜𝑝𝑤𝑓𝑚

stage1 Video Segment Augmentation n-way-k-shot 1 query video

𝐸𝑐𝑏𝑡𝑓

fine-tuning stage2 fine-tuning Video Segment Augmentation

𝐻𝑞𝑝𝑝𝑚

ProtoNet feature extractor

P1D-09

slide-61
SLIDE 61

Thanks very much!

yanweifu@fudan.edu.cn