Concepts with Few-shot Supervision Xuming He ShanghaiTech - PowerPoint PPT Presentation

Learning Structured Visual Concepts with Few-shot Supervision Xuming He 何旭明 ShanghaiTech University hexm@shanghaitech.edu.cn 1 12/5/2019

Outline  Introduction  Learning from very limited annotated data  Background in few-shot learning  Few-shot classification  Meta-learning framework  Towards few-shot representation learning in vision tasks  Spatio-temporal patterns in videos [CVPR 2018]  Visual object & task representation [AAAI 2019]  Summary and future directions 2 12/5/2019

Introduction  Data-driven visual scene understanding  Deep Neural Networks require large amount of annotated data Instance segmentation&detection Semantic segmentation Depth estimation Image-level description 3 12/5/2019

Real-world scenarios  Data annotation is costly  Many specific domain and cross modality tasks Medical image understanding Biological image analysis (image credit: 廖飞 . 胰腺影像学 . 2015.) (Zhang and He, 2019) Vision & Language (MSCOCO)  Visual concept learning in wild (Liu et al CVPR 2019) 4 12/5/2019

Challenges  Limitation in naïve transfer learning  Insufficient instance variations of novel classes  Fine-tuning usually fails given a few examples per class Image Credit: Ravi & Larochelle et al 2017  Human (child) performance is much better  How do we achieve such data efficiency?  What representations are used?  What are the underlying learning algorithms? 5 12/5/2019

Main intuitions in few-shot learning  Prior knowledge in different vision tasks  Similarity between visual categories  Feature representations, etc.  Similarity between visual recognition tasks  Learning a classifier, etc. Task 1 Task 2  Focusing on generic aspects of similar tasks  Generic visual representations  Not category-specific  Transferrable learning strategies  Very data-efficient 6 12/5/2019

Few-shot learning problem  Learning from (very) limited annotated data  Typical setting:  Classification using a few training examples per visual category  Formally, given a small dataset  N categories  K shot: each class has K examples, or  The goal is to learn a model F parametrized by to minimize Image Credit: Weng, Lil-log, 2018 8 12/5/2019

Few-shot learning problem  For a single isolated task, this is difficult  But if we have access to many similar few-shot learning tasks, we can exploit such prior knowledge.  Main idea is to consider task-level learning  Learn a representation shared by all those tasks  Learn an efficient classifier learning algorithm that can be applied to all the tasks Image Credit: Weng, Lil-log, 2018 9 12/5/2019

Meta-learning framework  Problem formulation  Each few-shot classification problem as a task  Each task (or an episode ) consists of  Task-train ( support ) set  Task-test set (query)  For each task, we adopt an learning algorithm  to learn its own classifier via  to perform well on the task-test set 10 12/5/2019

Meta-learning formulation  Key assumptions:  The learning algorithm is shared across tasks  We can sample many tasks to learn a good  A meta-learning strategy  Input: meta-training set  Output: algorithm parameter  Objective: good performance on meta-test set  Minimizing the empirical loss on the meta-training set  Each meta-train task 11 12/5/2019

Meta-learning formulation  Analogy to standard supervised learning 12 12/5/2019 Image Credit: Ravi & Larochelle et al 2017

Overview of existing methods  Depending on the meta-learners used in few-shot tasks Slide Credit: Vinyals, NIPS 2017 13 12/5/2019

Metric-based methods  Basic idea: Learn a generic distance metric  Typical methods  Siamese network (Koch, Zemel & Salakhutdinov, 2015)  Matching network (Vinyals et al, 2016)  Relation network (Sung et al. 2018)  Prototypical network (Snell, Swersky & Zemel, 2017) 14 12/5/2019

Optimization-based methods  Basic idea: Adjust the optimization in model learning so that the model can effectively learn from a few examples  Typical methods  LSTM meta-learner (Ravi & Larochelle, 2017)  MAML (Finn, et al. 2017)  Reptile (Nichol, Achiam & Schulman, 2018) 15 12/5/2019

Model-based methods  Basic idea: Building a neural network with specific architecture for fast learning  Typical methods  Memory-augmented network (Santoro et al., 2016)  Meta networks (Munkhdalai & Yu, 2017)  SNAIL (Mishra et al., 2018) 16 12/5/2019

Main limitations  A global representation of inputs  Sensitive to nuisance parameters: background clutter, occlusions, etc.  Mixed representation and predictor learning  Complex architecture, difficult to interpret  Sometimes slow convergence  Focusing on classification tasks  Non-trivial to apply to other vision tasks: localization, segmentation, etc. 17 12/5/2019

Our proposed solutions  Structure-aware data representation  Spatial/temporal representations for semantic objects/actions  Decoupling representation and classifier learning  Improving representation learning  Generalizing to other visual tasks  Instance localization and detection with few-shot learning 18 12/5/2019

Temporal action localization  Our goal: Jointly classify action instances and localize them in an untrimmed video  Important for detailed video understanding  Broad range of applications in video surveillance/analytics 20 12/5/2019

Our problem setting  We conceptualize an example-based action localization strategy  Few-shot learning of action classes and  Being sensitive to action boundaries Few-shot Action Localization Network 21 12/5/2019

Main ideas  Meta-learning problem formulation  Learning how to transfer the labels of a few action examples to a test video  Encode action instance into a structured representation  Learn to match (partial) action instances  Exploit the matching correlation scores 22 12/5/2019

Overview of our method 23 12/5/2019

Video encoder network  Embed an action video into a segment-based representation  Maintain its temporal structure  Allows partial matching between two actions 24 12/5/2019

Similarity Network  Generate a matching score between labeled examples (support set) and a test window 25 12/5/2019

Similarity Network  Full context embedding (FCE)  Capture context of the entire support set and enrich the action representations 26 12/5/2019

Similarity Network  Similarity scores  Cosine distance between two action instances  Nearest neighbor for classification, but what about localization? 27 12/5/2019

Labeling network  Cache correlation scores for sliding windows  Exploit patterns in the score matrix to predict the locations 28 12/5/2019

Matching examples Matching score trajectories 29 12/5/2019

Meta-learning strategy  Meta-training phase  Meta-training set  Task-train (support set)  Task-test (query)  Loss function  Our loss function  Localization loss: foreground vs background (cross entropy)  Classification loss: action class (log loss)  Ranking loss: replacing localization loss to encourage partial alignment 30 12/5/2019

Experimental evaluation  Few-shot performance summary  ~80 classes for meta-training and ~20 for meta-test Fully supervised Few-shot Thumos14 ActivityNet 31 12/5/2019

Ablative Study  Effect of the similarity net  Effect of temporal structure 32 12/5/2019

Task: Few-shot image classification  Our goal: An efficient modular meta-learner for visual concepts  A better image representation  An easy-to-interpret encoding method for support set Image Credit: Ravi & Larochelle et al 2017 34 12/5/2019

Main idea  Exploiting attention mechanism in representation learning  Spatial attention to localize the foreground object  Task attention to encode the task context for label prediction 35 12/5/2019

Main idea  Exploiting attention mechanism in representation learning  Recurrent attention to refine the representation 36 12/5/2019

Concepts with Few-shot Supervision Xuming He ShanghaiTech - PowerPoint PPT Presentation

Learning Structured Visual Concepts with Few-shot Supervision Xuming He ShanghaiTech University hexm@shanghaitech.edu.cn 1 12/5/2019 Outline Introduction Learning from very limited annotated data Background in few-shot

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Infinite Mixture Prototypes for Few-Shot Learning Adaptively inferring model capacity for simple

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric

A Baseline for Few-Shot Image Classification Guneet S. Dhillon 1 , Pratik Chaudhari 2 , Avinash

Group and Commercial Insurer Supervision Presenter: Gerald Gakundi Assistant Director of

Thermo Shot Thermo Shot F30 Series F30 Series NEC Avio Avio Infrared Technologies Co., Ltd.

Horizontal Movable Shot Blasting Machine PW2-40DA Horizontal movable shot blasting machine Dust

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Lecture 8 Sample Sample Chapter 8 and 10 Statistic Shot Noise Limit Homodyne Demodula-

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

Compositional history of P inis syntactic theory: how linguistics can help Artemij

Technology Development Lance Cooley TD / SRF Development Dept. / SRF Materials Group Leader f

Lesson 9 Latina Christiana MG 4 & UG 5/6 Second Declension -UM Nouns Welcome Ms.

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami

Upscaling dissolution dissolution Upscaling mechanisms in porous media mechanisms

Turkish morphology in WebLicht ar ltekin University of Tbingen Seminar fr

ARCHER Performance and Debugging Tools Slides contributed by Cray and EPCC The

How to program UNIX processes (Chapters 7-9) Processes Follow the flow of Ch 8. Process

Concepts with Few-shot Supervision Xuming He ShanghaiTech - PowerPoint PPT Presentation

Learning Structured Visual Concepts with Few-shot Supervision Xuming He ShanghaiTech University hexm@shanghaitech.edu.cn 1 12/5/2019 Outline Introduction Learning from very limited annotated data Background in few-shot

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Infinite Mixture Prototypes for Few-Shot Learning Adaptively inferring model capacity for simple

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric

A Baseline for Few-Shot Image Classification Guneet S. Dhillon 1 , Pratik Chaudhari 2 , Avinash

Group and Commercial Insurer Supervision Presenter: Gerald Gakundi Assistant Director of

Thermo Shot Thermo Shot F30 Series F30 Series NEC Avio Avio Infrared Technologies Co., Ltd.

Horizontal Movable Shot Blasting Machine PW2-40DA Horizontal movable shot blasting machine Dust

Siamese Network &amp; Matching Network for one-shot learning Reference Papers Siamese Neural

Lecture 8 Sample Sample Chapter 8 and 10 Statistic Shot Noise Limit Homodyne Demodula-

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

Compositional history of P inis syntactic theory: how linguistics can help Artemij

Technology Development Lance Cooley TD / SRF Development Dept. / SRF Materials Group Leader f

Lesson 9 Latina Christiana MG 4 &amp; UG 5/6 Second Declension -UM Nouns Welcome Ms.

Asynchronous Hyperparameter Tuning and Ablation Studies with Apache Spark Sina Sheikholeslami

Upscaling dissolution dissolution Upscaling mechanisms in porous media mechanisms

Turkish morphology in WebLicht ar ltekin University of Tbingen Seminar fr

ARCHER Performance and Debugging Tools Slides contributed by Cray and EPCC The

How to program UNIX processes (Chapters 7-9) Processes Follow the flow of Ch 8. Process

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Lesson 9 Latina Christiana MG 4 & UG 5/6 Second Declension -UM Nouns Welcome Ms.