Concepts with Few-shot Supervision Xuming He ShanghaiTech - - PowerPoint PPT Presentation

concepts with few shot supervision
SMART_READER_LITE
LIVE PREVIEW

Concepts with Few-shot Supervision Xuming He ShanghaiTech - - PowerPoint PPT Presentation

Learning Structured Visual Concepts with Few-shot Supervision Xuming He ShanghaiTech University hexm@shanghaitech.edu.cn 1 12/5/2019 Outline Introduction Learning from very limited annotated data Background in few-shot


slide-1
SLIDE 1

12/5/2019 1

Learning Structured Visual Concepts with Few-shot Supervision

Xuming He 何旭明

ShanghaiTech University

hexm@shanghaitech.edu.cn

slide-2
SLIDE 2

Outline

 Introduction

 Learning from very limited annotated data

 Background in few-shot learning

 Few-shot classification  Meta-learning framework

 Towards few-shot representation learning in vision tasks

 Spatio-temporal patterns in videos [CVPR 2018]  Visual object & task representation [AAAI 2019]

 Summary and future directions

2 12/5/2019

slide-3
SLIDE 3

 Data-driven visual scene understanding

 Deep Neural Networks require large amount of annotated data

Introduction

3 12/5/2019

Semantic segmentation Instance segmentation&detection Depth estimation Image-level description

slide-4
SLIDE 4

Real-world scenarios

 Data annotation is costly

 Many specific domain and cross modality tasks

 Visual concept learning in wild

4 12/5/2019 Medical image understanding (image credit: 廖飞. 胰腺影像学. 2015.) Biological image analysis (Zhang and He, 2019) Vision & Language (MSCOCO) (Liu et al CVPR 2019)

slide-5
SLIDE 5

Challenges

 Limitation in naïve transfer learning

 Insufficient instance variations of novel classes  Fine-tuning usually fails given a few examples per class

 Human (child) performance is much better

 How do we achieve such data efficiency?  What representations are used?  What are the underlying learning algorithms?

5 12/5/2019

Image Credit: Ravi & Larochelle et al 2017

slide-6
SLIDE 6

Main intuitions in few-shot learning

 Prior knowledge in different vision tasks

 Similarity between visual categories

 Feature representations, etc.

 Similarity between visual recognition tasks

 Learning a classifier, etc.

 Focusing on generic aspects of similar tasks

 Generic visual representations

 Not category-specific

 Transferrable learning strategies

 Very data-efficient 6 12/5/2019 Task 1 Task 2

slide-7
SLIDE 7

Outline

 Introduction

 Learning from very limited annotated data

 Background in few-shot learning

 Few-shot classification  Meta-learning framework

 Towards few-shot representation learning in vision tasks

 Spatio-temporal patterns in videos [CVPR 2018]  Visual object & task representation [AAAI 2019]

 Summary and future directions

7 12/5/2019

slide-8
SLIDE 8

Few-shot learning problem

 Learning from (very) limited annotated data  Typical setting:

 Classification using a few training examples per visual category  Formally, given a small dataset

 N categories  K shot: each class has K examples, or

 The goal is to learn a model F parametrized by to minimize

8 12/5/2019

Image Credit: Weng, Lil-log, 2018

slide-9
SLIDE 9

Few-shot learning problem

 For a single isolated task, this is difficult

 But if we have access to many similar few-shot learning tasks,

we can exploit such prior knowledge.

 Main idea is to consider task-level learning

 Learn a representation shared by all those tasks  Learn an efficient classifier learning algorithm that can be applied

to all the tasks

9 12/5/2019

Image Credit: Weng, Lil-log, 2018

slide-10
SLIDE 10

Meta-learning framework

 Problem formulation

 Each few-shot classification problem as a task  Each task (or an episode) consists of  Task-train (support) set  Task-test set (query)  For each task, we adopt an learning algorithm

 to learn its own classifier via  to perform well on the task-test set

10 12/5/2019

slide-11
SLIDE 11

Meta-learning formulation

 Key assumptions:

 The learning algorithm is shared across tasks  We can sample many tasks to learn a good

 A meta-learning strategy

 Input: meta-training set  Output: algorithm parameter  Objective: good performance on meta-test set  Minimizing the empirical loss on the meta-training set

 Each meta-train task 11 12/5/2019

slide-12
SLIDE 12

Meta-learning formulation

 Analogy to standard supervised learning

12 12/5/2019

Image Credit: Ravi & Larochelle et al 2017

slide-13
SLIDE 13

Overview of existing methods

 Depending on the meta-learners used in few-shot tasks

13 12/5/2019

Slide Credit: Vinyals, NIPS 2017

slide-14
SLIDE 14

Metric-based methods

 Basic idea: Learn a generic distance metric

14 12/5/2019

 Typical methods

 Siamese network (Koch, Zemel &

Salakhutdinov, 2015)

 Matching network (Vinyals et al,

2016)

 Relation network (Sung et al.

2018)

 Prototypical network (Snell,

Swersky & Zemel, 2017)

slide-15
SLIDE 15

Optimization-based methods

 Basic idea: Adjust the optimization in model learning so

that the model can effectively learn from a few examples

15 12/5/2019

 Typical methods

 LSTM meta-learner (Ravi &

Larochelle, 2017)

 MAML (Finn, et al. 2017)  Reptile (Nichol, Achiam &

Schulman, 2018)

slide-16
SLIDE 16

Model-based methods

 Basic idea: Building a neural network with specific

architecture for fast learning

16 12/5/2019

 Typical methods

 Memory-augmented network

(Santoro et al., 2016)

 Meta networks (Munkhdalai & Yu,

2017)

 SNAIL (Mishra et al., 2018)

slide-17
SLIDE 17

Main limitations

 A global representation of inputs

 Sensitive to nuisance parameters: background clutter,

  • cclusions, etc.

 Mixed representation and predictor learning

 Complex architecture, difficult to interpret  Sometimes slow convergence

 Focusing on classification tasks

 Non-trivial to apply to other vision tasks: localization,

segmentation, etc.

17 12/5/2019

slide-18
SLIDE 18

Our proposed solutions

 Structure-aware data representation

 Spatial/temporal representations for semantic objects/actions

 Decoupling representation and classifier learning

 Improving representation learning

 Generalizing to other visual tasks

 Instance localization and detection with few-shot learning

18 12/5/2019

slide-19
SLIDE 19

Outline

 Introduction

 Learning from very limited annotated data

 Background in few-shot learning

 Few-shot classification  Meta-learning framework

 Towards few-shot representation learning in vision tasks

 Spatio-temporal patterns in videos [CVPR 2018]  Visual object & task representation [AAAI 2019]

 Summary and future directions

19 12/5/2019

slide-20
SLIDE 20

Temporal action localization

 Our goal: Jointly classify action instances and localize

them in an untrimmed video

 Important for detailed video understanding  Broad range of applications in video surveillance/analytics

20 12/5/2019

slide-21
SLIDE 21

Our problem setting

 We conceptualize an example-based action localization

strategy

 Few-shot learning of action classes and  Being sensitive to action boundaries

21 12/5/2019

Few-shot Action Localization Network

slide-22
SLIDE 22

Main ideas

 Meta-learning problem formulation

 Learning how to transfer the labels of a few action examples to a

test video

 Encode action instance into a structured representation  Learn to match (partial) action instances  Exploit the matching correlation scores 22 12/5/2019

slide-23
SLIDE 23

Overview of our method

23 12/5/2019

slide-24
SLIDE 24

Video encoder network

 Embed an action video into a segment-based

representation

 Maintain its temporal structure  Allows partial matching between two actions

24 12/5/2019

slide-25
SLIDE 25

Similarity Network

 Generate a matching score between labeled examples

(support set) and a test window

25 12/5/2019

slide-26
SLIDE 26

Similarity Network

 Full context embedding (FCE)

 Capture context of the entire support set and enrich the action

representations

26 12/5/2019

slide-27
SLIDE 27

Similarity Network

 Similarity scores

 Cosine distance between two action instances  Nearest neighbor for classification, but what about localization?

27 12/5/2019

slide-28
SLIDE 28

Labeling network

 Cache correlation scores for sliding windows  Exploit patterns in the score matrix to predict the

locations

28 12/5/2019

slide-29
SLIDE 29

Matching examples

29 12/5/2019

Matching score trajectories

slide-30
SLIDE 30

Meta-learning strategy

 Meta-training phase

 Meta-training set  Task-train (support set)  Task-test (query)  Loss function

 Our loss function

 Localization loss: foreground vs background (cross entropy)  Classification loss: action class (log loss)  Ranking loss: replacing localization loss to encourage partial

alignment

30 12/5/2019

slide-31
SLIDE 31

Experimental evaluation

 Few-shot performance summary

 ~80 classes for meta-training and ~20 for meta-test

31 12/5/2019

Thumos14 ActivityNet Fully supervised Few-shot

slide-32
SLIDE 32

Ablative Study

 Effect of the similarity net  Effect of temporal structure

32 12/5/2019

slide-33
SLIDE 33

Outline

 Introduction

 Learning from very limited annotated data

 Background in few-shot learning

 Few-shot classification  Meta-learning framework

 Towards few-shot representation learning in vision tasks

 Spatio-temporal patterns in videos [CVPR 2018]  Visual object & task representation [AAAI 2019]

 Summary and future directions

33 12/5/2019

slide-34
SLIDE 34

Task: Few-shot image classification

 Our goal: An efficient modular meta-learner for visual

concepts

 A better image representation  An easy-to-interpret encoding method for support set

34 12/5/2019

Image Credit: Ravi & Larochelle et al 2017

slide-35
SLIDE 35

Main idea

 Exploiting attention mechanism in representation

learning

 Spatial attention to localize the foreground object  Task attention to encode the task context for label prediction

35 12/5/2019

slide-36
SLIDE 36

Main idea

 Exploiting attention mechanism in representation

learning

 Recurrent attention to refine the representation

36 12/5/2019

slide-37
SLIDE 37

Dual-attention structure

 Spatial attention

 Extracting relevant features on Conv-feature maps  Using test image feature as query

37 12/5/2019

Pooling

slide-38
SLIDE 38

Dual-attention structure

 Task attention

 Encoding the support set by selecting relevant training examples

38 12/5/2019

Support-set representation

slide-39
SLIDE 39

Dual-attention structure

 Recurrent attention

 Refining task-test (query image) features with support set

39 12/5/2019

slide-40
SLIDE 40

Network architecture

40 12/5/2019

slide-41
SLIDE 41

Example results

41 12/5/2019

slide-42
SLIDE 42

Example results

42 12/5/2019

slide-43
SLIDE 43

A hybrid loss function

43 12/5/2019

 Standard meta-learning loss + global classification loss

slide-44
SLIDE 44

Validation of hybrid loss

 We train our models from scratch (no pre-training)

44 12/5/2019

slide-45
SLIDE 45

Quantitative results

 MiniImageNet:

 80 classes for meta-training and 20 for meta-test  Roughly 100K tasks for training and 1K for test

45 12/5/2019

slide-46
SLIDE 46

Quantitative results

 MiniImageNet:

 80 classes for meta-training and 20 for meta-test  Roughly 100K tasks for training and 1K for test

46 12/5/2019

slide-47
SLIDE 47

Failure cases

 Large variations in scale/viewpoint

47 12/5/2019

slide-48
SLIDE 48

Research questions I

 Task similarity

 A new benchmark: Meta-CIFAR100

48 12/5/2019

slide-49
SLIDE 49

Research questions I

 Preliminary results on Meta-CIFAR100

 Task similarity plays a key role in few-shot performance

49 12/5/2019

slide-50
SLIDE 50

Research questions II

 From few-shot to low-shot learning

 Novel classifier: incremental few-shot learning  How do we exploit unlabeled data?

50 12/5/2019

slide-51
SLIDE 51

Summary and future directions

 Few-shot visual concept learning

 Structured representation is important  Modularized, interpretable network design  Extension to multiple vision tasks

 Future directions

 Studying impact of different task distributions  Connecting few-shot learning to continual learning  Exploring few-shot learning in real-world applications

51 12/5/2019

slide-52
SLIDE 52

Acknowledgement

 PhD students

 Hongtao Yang @ANU  Songyang Zhang and Shipeng Yan @ShanghtaiTech

52 12/5/2019

Thank You & Question!