DeViSE: A Deep Visual-Semantic Embedding Model Presenters: Ji Gao, - PowerPoint PPT Presentation

Oct 10, 2022 •160 likes •328 views

DeViSE: A Deep Visual-Semantic Embedding Model Presenters: Ji Gao, Fandi Lin Motivation Visual recognition systems experience problems with large amount of categories. Insufficient labeled training data Blurred distinction between

DeViSE: A Deep Visual-Semantic Embedding Model Presenters: Ji Gao, Fandi Lin
Motivation Visual recognition systems experience problems with large amount of categories. ● Insufficient labeled training data ● Blurred distinction between classes How do we improve predictions of unknown categories?
Background N-way discrete classifiers ● Labels treated as unrelated ● Semantic information not captured Result: These systems cannot make zero-shot predictions without additional information, i.e. text data.
Related Work WSABIE: Linear map from image features to embedding space. Only used training labels. Socher et al: Linear map from image features to embedding space. Outlier detection. Only 8 known and 2 unknown classes. Other work that has shown zero-shot classification relies on curated information.
Proposed Method Combine a traditional Visual model with a language model.
Proposed Method 1. Train a language model for semantic information 2. At the same time, train a CNN for images 3. Initialize the combined model using pre-trained parameters 4. Train the combined model
Skip-gram language model ● Efficient estimation of word representations in vector space, ICLR 2013 ● Skip-gram: a generalization of n -grams which skips the words between ● Skip-gram model: Learn a NN from a word to predict nearby words.
Skip-gram language model Learn the relationship between labels. ● Data: 5.7 million documents (5.4 billion words) extracted from wikipedia.org
CNN model ● AlexNet ● Winner of ILSVRC 2012 ● 5 conv layers
Combined model Use a linear embedding layer to map the features extracted before Softmax(4096d) to match the size of the language model(500 or 1000d). Loss function:
Experiment Task: ● Image classification ● Zero-shot image classification
Experiment - With same label set (not zero-shot) Baselines: ● Alexnet ● Random Embedding: Alexnet + a random vectors (instead of the language model)
Experiment: Zero-shot Dataset: ● 2-hop: two clusters of labels ● 3-hop: three clusters of labels ● ImageNet2011: Use labels in ImageNet2011 that doesn’t appear in ImageNet2012
Experiment: Zero-shot Comparing to pure CNN:
Experiment: Zero-shot Compare to previous zero-shot result
Conclusion DeViSE achieves state-of-the-art performance in classification task, and also able to do zero-shot learning. Suitable for large amount of data, and can handle labels with not enough number of data. Show the power of combining image and semantic data.

Recommend

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Given a graph, find an embedding s.t. greedy routing works Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding Given a graph G, find an embedding of the vertices in R d , s.t. for each pair of

361 views • 23 slides

DeViSE: A Deep Visual-Semantic Embedding Model Frome et al., Google Research Presented by:

DeViSE: A Deep Visual-Semantic Embedding Model Frome et al., Google Research Presented by: Tushar Nagarajan The year is 2012... Krizhevsky et al. 2012 The year is 2012... Koala? Cat? Giraffe? Yes Of course! Dont be silly. Whats

1.14k views • 42 slides

DEEP SEMANTIC-VISUAL EMBEDDING WITH LOCALIZATION Thursday 4th October, 2018 Martin Engilberge,

DEEP SEMANTIC-VISUAL EMBEDDING WITH LOCALIZATION Thursday 4th October, 2018 Martin Engilberge, Louis Chevallier, Patrick Prez, Matthieu Cord Deep semantic-visual embedding with localization 2 Tasks Visual Grounding of phrases: Localize

553 views • 30 slides

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza, OD, FCOVD , , , , A Model of Visual Imagery A Model of Visual Imagery A Model of Visual Imagery A Model of Visual Imagery What shape are

533 views • 39 slides

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns each vertex a coordinate and each edge a (not necessarily straight) line connecting the corresponding coordinates. 0 1 3 4 1 1 0 2 5 2

541 views • 18 slides

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns each vertex a coordinate and each edge a (not necessarily straight) line connecting the corresponding coordinates. 0 1 3 4 1 1 0 2 5 2 5

754 views • 18 slides

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3 285 millions visually impaired people Retina Visual cortex 3 285 millions visually impaired people Retina Visual cortex

744 views • 63 slides

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Embedding 3-manifolds via surgery on surfaces Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin klarson@math.utexas.edu June 12th, 2015 Embedding 3-manifolds via surgery on surfaces Embedding 3-manifolds

968 views • 54 slides

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic role labeling? The state-of-the-art before neural networks Neural models for semantic roles 1 Overview What is semantic role labeling?

930 views • 66 slides

Semantic Analysis and Semantic Roles Ling 571 Deep Processing Techniques for NLP February 10,

Semantic Analysis and Semantic Roles Ling 571 Deep Processing Techniques for NLP February 10, 2016 Roadmap Semantic Analysis Semantic attachments Extended example Quantifier scope Earley Parsing and Semantics

588 views • 15 slides

Semantic Roles & Semantic Role Labeling Ling571 Deep Processing Techniques for NLP February

Semantic Roles & Semantic Role Labeling Ling571 Deep Processing Techniques for NLP February 17, 2016 Roadmap Semantic role labeling (SRL): Motivation: Between deep semantics and slot-filling Thematic roles

490 views • 35 slides

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby Segaran Why is Semantic Data Interesting? Why is Semantic Data Interesting? Walmart demo Why is Semantic Data Interesting? Walmart demo

1.53k views • 149 slides

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Family History Research Family History Research : on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a Semantic Prototype for Danish Genealogical Research Genealogical Research By By Charla Woodbury

744 views • 27 slides

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Semantic Processing Semantic Representation FOPC Inference Issues Description Logics Semantic Processing Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545 Dept. of Linguistics, Indiana

585 views • 44 slides

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic Similarity; how similar are a pair of lexical items? Semantic Similarity Semantic Similarity Semantic Similarity Sentence level

1.42k views • 127 slides

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

SemEval 2014 Task-3 Cross-Level Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly focused on similar types of lexical items Semantic Similarity What if we have different types of inputs? CLSS:

971 views • 47 slides

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data Alina Roitberg,

European Conference on Computer Vision 2018 Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen Computer Vision for Human-Computer Interaction

370 views • 14 slides

Non-asymptotic entanglement distillation arXiv:1706.06221 Kun Fang Joint work with Xin Wang,

Non-asymptotic entanglement distillation arXiv:1706.06221 Kun Fang Joint work with Xin Wang, Marco Tomamichel, Runyao Duan Centre for Quantum Software and Information U niversity of T echnology S ydney Entanglement distillation [Bennett,

538 views • 52 slides

How (Not) to Shoot in Your Foot with SDN Local Fast Failover A Load-Connectivity Tradeoff

How (Not) to Shoot in Your Foot with SDN Local Fast Failover A Load-Connectivity Tradeoff Michael Borokhovich, Stefan Schmid Communication Systems Engineering, Ben-Gurion University, Israel Internet Network Architectures, TU Berlin &

554 views • 54 slides

One-Shot Verifiable Encryption from Lattices Vadim Lyubashevsky and Gregory Neven IBM Research

One-Shot Verifiable Encryption from Lattices Vadim Lyubashevsky and Gregory Neven IBM Research -- Zurich Zero-Knowledge Proofs Zero-Knowledge Proofs Relation f(s)=t, and want to prove knowledge of s Zero-Knowledge Proofs Relation f(s)=t, and

1.04k views • 72 slides

The Mug-Shot Search Problem Ellie Baker and Margo Seltzer Harvard University, Cambridge, MA., USA

The Mug-Shot Search Problem Ellie Baker and Margo Seltzer Harvard University, Cambridge, MA., USA ellie@eecs.harvard.edu Face Recognition plus Composite Creation Eigenfaces (Turk & Pentland 1991) Uses PCA to compress images to a

298 views • 14 slides

HEROS Kickoff for Partner Roll-Out FHA Partner Training MAY 30, 2018 Introduction HEROS NOW

HEROS Kickoff for Partner Roll-Out FHA Partner Training MAY 30, 2018 Introduction HEROS NOW OPEN TO MF FHA AND RAD PARTNERS! Agenda Purpose of HUD Environmental Review Partners role in environmental review process HEROS

1.12k views • 94 slides

Quarterly Results Presentation Q2 2019 Cautionary notes CAUTIONARY NOTE REGARDING

Quarterly Results Presentation Q2 2019 Cautionary notes CAUTIONARY NOTE REGARDING FORWARD-LOOKING INFORMATION This document may contain forward-looking information. Forward-looking information includes statements that are predictive in nature,

596 views • 28 slides

The Two Properties of Term : Signification Supposition Slide 3-2 1 Signification is way in

Three Acts of the Mind Mental Act: Verbal Expression: Simple Apprehension Term Term Judgment Proposition Deductive Inference Syllogism Slide 3-1 The Two Properties of Term : Signification Supposition Slide

285 views • 6 slides