better action retrieval in images Inkyu An Content 1. Background - PowerPoint PPT Presentation

Learning semantic relationships for better action retrieval in images Inkyu An

Content 1. Background 2. Motivation 3. Related Work 4. Approach 5. Result 2

Background | Semantic ? What comes to mind when you see below picture ? There are many parked vehicles on either side of the road. 3

Background | Semantic labeling http://rodrigob.github.io/are_we_there_yet/build/semantic_l abeling_datasets_results.html#4d5352432d3231 4

Background | Semantic labeling More complex - A wide variety of classes 5

Background | Semantic labeling More complex - A wide variety of classes Collie Retriever Great Labrador Pomeranian Dane Retriever Vizsla Samoyed Bull Terrier Poodle Yorkshire Terrier 6

Background | More and more complex She is stretching her right leg over listening a music 7

Motivation | Action retrieval in images Query image Image Search ??? Person interacting with panda 8

Motivation | Action retrieval in images Query image Result of Prior work Image Search False Positive Person interacting with panda 9

Motivation | Action retrieval in images Query image Result images Person holding Person interacting Person feeding Person feeding animals with panda panda calf Implied-by Mutual-exclusive Type-of 10

Motivation | Action retrieval in images Three kinds of relations 1. Implied-by 2. Type-of 3. Mutual-exclusive HEX-graph Large-scale object classification using label relation graphs [ECCV 2014] 11

Motivation | Action retrieval in images “Person interacting with panda” is represented by a weight vector 𝓧 𝑩 Skip-grams Distributed Representations of Words and Phrases and their Compositionality [NIPS 2013] 12

Motivation | Action retrieval in images They needed to get a score of relationship of sentences pair. Neural Tensor Network Reasoning With Neural Tensor Networks for Knowledge Base Completion [NIPS 2013] 13

Related Work | 1. HEX-graph - Three kinds of relations 2. Skip-grams - Weight vectors of actions(Sentence) 3. Neural Tensor Network - Scores of relationship of pairs of actions 14

Related Work | HEX-graph _ Motivation Classifier Siberian Husky Poodle Bulldog Bengal cat Russian Blue Dog Cat 15

Related Work | HEX-graph _ Motivation Classifier Siberian Husky Puppy Dog Cat Exclusion Subsumption HEX-graph 16

Related Work | HEX-graph _ Problem Definition <HEX-graph> exclusion 𝑂𝑝𝑒𝑓𝑡 𝑊 ∶ Dog Cat Dog Cat Puppy Husky subsumption 𝐼𝑗𝑓𝑠𝑏𝑠𝑑ℎ𝑧 𝑓𝑒𝑕𝑓 𝐹 ℎ ∶ subsumption Husky Puppy 𝐹𝑦𝑑𝑚𝑣𝑡𝑗𝑝𝑜 𝑓𝑒𝑕𝑓 𝐹 𝑓 ∶ exclusion Relations : Dog Puppy : subsumption Dog Cat : exclusion Husky Puppy : overlap 17

Related Work | skip-grams - The training objective is to learn word vector representations that are good at predicting the nearby words The average log probability  Training Input sentence Nearby words 21

Related Work | Neural Tensor Networks (NTN) - The model returns a high score if they are in that relationship and a low on otherwise 23

Approach | Problem setup A set of actions 𝒝 Action : Person riding bike - Person riding bike - Person riding horse - Person preparing food Related - Chef cooking pasta images - Person walking with a horse Two SVO structure : 1. <subject, verb, object> 2. <subject, verb, prepositional object> 24

Approach | Problem setup _ three kinds of relations Person preparing food 1. Implied-by : Chef cooking pasta Person doing football 2. Type-of : Man playing soccer 3. Mutually exclusive : Person riding horse Man riding camel 25

Approach | Full model 2 𝐷 = 𝐷 𝑏𝑑 + 𝛽 𝑠 𝐷 𝑠𝑓𝑑 + 𝛽 𝑜 𝐷 𝑜𝑚𝑞 + 𝛽 𝑑 𝐷 𝑑𝑝𝑜𝑡 + 𝜇 𝑋 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 26

Approach | Full model 2 𝐷 = 𝐷 𝑏𝑑 + 𝛽 𝑠 𝐷 𝑠𝑓𝑑 + 𝛽 𝑜 𝐷 𝑜𝑚𝑞 + 𝛽 𝑑 𝐷 𝑑𝑝𝑜𝑡 + 𝜇 𝑋 2 Full model : Basic action The weights in Language prior retrieval model the model [only Action] [Image + Action] Consistency Visual objective objective [Image + Action] [only Action] 𝑋 = 𝑋 𝑗𝑛 , 𝑥 𝐵 , 𝑋 𝑏𝑠𝑓 𝑚 2 𝑠𝑓𝑕𝑣𝑚𝑏𝑠𝑗𝑨𝑓𝑒 𝑥𝑗𝑢ℎ 𝑏 𝑠𝑓𝑕𝑣𝑚𝑏𝑠𝑗𝑨𝑏𝑢𝑗𝑝𝑜 𝑑𝑝𝑓𝑔𝑔𝑗𝑑𝑗𝑓𝑜𝑢 𝜇 𝑠𝑓𝑚 𝐵∈𝒝 27

Approach | Basic action retrieval model Person riding 𝐵𝑑𝑢𝑗𝑝𝑜 Skip-grams 𝒙 𝑩 Skip-grams bike 𝐵 𝑋 𝑐 𝑗𝑛 𝑗𝑛 𝐽 𝐵 + 𝒈 𝑩 + 𝒙 𝑩 𝒈 𝑩 + CNN CNN 𝒈 − 𝒙 𝑩 𝒈 − I − 𝑐 𝑗𝑛 𝑋 𝑗𝑛 Action prediction loss 𝑔 𝐽 = 𝑋 𝑗𝑛 𝐷𝑂𝑂 𝐽 + 𝑐 𝑗𝑛 𝑈 (𝑔 𝐷 𝑏𝑑 = max 0,1 + 𝑥 𝐵 𝐽− − 𝑔 𝐽+ ) 𝐽 + ∈𝒰 𝒰 𝐵 𝐵 : a set of positive images of A 𝐵 𝐽 − ∈𝒰 𝒰 𝐵 : a set of negative images of A 𝐵 29

Approach | Relationship prediction Goal : Denote the relationship by a vector 𝑠 𝐵𝐶 𝑗 , 𝑠 𝑢 , 𝑠 𝑛 ∈ 0,1 3 = 𝑠 𝐵𝐶 𝐵𝐶 𝐵𝐶 Implied by, type-of and mutually exclusive 𝒔 𝑩𝑪 Person riding 𝐵𝑑𝑢𝑗𝑝𝑜 𝑥 𝐵 , 𝑥 𝐶 Skip-grams Skip-grams bike 𝐵 Neural Tensor Person riding Neural Tensor 𝐵𝑑𝑢𝑗𝑝𝑜 1:3 Softmax 𝑋 Network 𝑠𝑓𝑚 camel Network 𝐶 1:3 ⨂𝑥 𝐶 + 𝑐 𝑠𝑓𝑚 𝑠 𝐵𝐶 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝛾 𝑥 𝐵 ⨂𝑋 𝑠𝑓𝑚 31

Approach | Language prior for relationship - NLP prior Person preparing food 1. Implied-by : Chef cooking pasta Wrong 2. Type-of : Man eating fish Person feeding a fish 3. Mutually exclusive : Person riding horse Man riding camel 32

Approach | Language prior for relationship The loss function of language-based relationship 𝑫 𝒐𝒎𝒒 : 𝐷 𝑜𝑚𝑞 = 𝑠 𝐵𝐶 − 𝑠 𝐵𝐶 𝐵 𝐶∈ℛ 𝐵 𝒔 𝑩𝑪 : NLP prior 𝒔 𝑩𝑪 : Relationship prediction - NLP priors are not always accurate - They treated NLP priors as a noisy prior 33

Approach | Action retrieval with relationship - Visual objective A is implied-by B : Rank the positive images of B higher than the negatives of A 𝑗 = 𝑈 𝑔 𝐽 − − 𝑔 𝐽 𝑐 → 𝐷 𝐵𝐶 max 0,1 + 𝑥 𝐵 𝒰 𝐶 : a set of positive images of B 𝐽 𝑐 ∈𝒰 𝐶 𝐽 − ∈𝒰 𝒰 𝐵 : a set of negative images of A 𝐵 A is Type-of B : Rank the positive images of A higher than negatives of B 𝑗 = 𝑈 𝑔 𝐽 − − 𝑔 𝐽 𝑏 → 𝐷 𝐵𝐶 max 0,1 + 𝑥 𝐶 𝐽 𝑏 ∈𝒰 𝒰 𝐵 : a set of positive images of A 𝐵 𝐽 − ∈𝒰 𝒰 𝐶 : a set of negative images of B 𝐶 A is Mutually : Rank the positive images of A higher than the positives exclusive of B of B 𝑗 = 𝑈 𝑔 𝐽 𝑐 − 𝑔 𝐽 𝑏 → 𝐷 𝐵𝐶 max 0,1 + 𝑥 𝐵 𝐽 𝑏 ∈𝒰 𝒰 𝐵 : a set of positive images of A 𝐵 𝐽 𝑐 ∈𝒰 𝒰 𝐶 : a set of positive images of B 𝐶 35

Approach | Action retrieval with relationship - Visual objective 𝑗 ⋅ 𝐷 𝑢 ⋅ 𝐷 𝑢 + 𝑠 𝑛 ⋅ 𝐷 𝐵𝐶 𝑗 𝑛 𝑃𝑐𝑘𝑓𝑑𝑢𝑗𝑤𝑓: 𝐷 𝑠𝑓𝑑 = 𝑠 + 𝑠 𝐵𝐶 𝐵𝐶 𝐵𝐶 𝐵𝐶 𝐵𝐶 𝐵∈𝒝 𝐶∈ℛ 𝐵 Relationship prediction 𝑗 , 𝑠 𝑢 , 𝑠 𝑛 } 𝑠 𝐵𝐶 = {𝑠 𝐵𝐶 𝐵𝐶 𝐵𝐶 𝑗 , 𝐷 𝐵𝐶 𝑢 , 𝐷 𝑛 ) of each relations, when  Summarize costs( 𝐷 𝐵𝐶 𝐵𝐶 𝑗 , 𝑠 𝑢 , 𝑠 𝑛 } ) is ‘1’. relationship prediction( {𝑠 𝐵𝐶 𝐵𝐶 𝐵𝐶 36

better action retrieval in images Inkyu An Content 1. Background - PowerPoint PPT Presentation

Learning semantic relationships for better action retrieval in images Inkyu An Content 1. Background 2. Motivation 3. Related Work 4. Approach 5. Result 2 Background | Semantic ? What comes to mind when you see below picture ? There are

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Information Retrieval Introducing Information Retrieval and Web Search

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

The Bourne Rivulet Initiative Gail Taylor Our membership a partnership Hants and IOW Local

Tectonic and eustatic controls on the distribution of sandstone and hot mudstone: the Goodrich -

Tenmile Lakes: Water Quality 1 Topics of Discussion Tenmile Lakes Basin Partnership

Healthy Fitch Bay: From Diagnoses to Solutions Ariane Orjikh, M.E.I. Franois Blanger,

AGGRESSIVE DOG REPORTING Presented by: DOGS NSW Show Committee DOGS NSW Regulations Part

TH THE IN INFL FLUENTIA IAL VE VET NE NEW GRAD ADUAT ATE Dewi W Hughes: Business

The North Asia CAPE presentation to VUW 2 July 2018 Overview CAPE | North Asia 1. Purpose and

Non-Traditional Rutgers Students: Providing Resources for Self-Care and Success By Madeline

better action retrieval in images Inkyu An Content 1. Background - PowerPoint PPT Presentation

Learning semantic relationships for better action retrieval in images Inkyu An Content 1. Background 2. Motivation 3. Related Work 4. Approach 5. Result 2 Background | Semantic ? What comes to mind when you see below picture ? There are

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Information Retrieval Introducing Information Retrieval and Web Search

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

The Bourne Rivulet Initiative Gail Taylor Our membership a partnership Hants and IOW Local

Tectonic and eustatic controls on the distribution of sandstone and hot mudstone: the Goodrich -

Tenmile Lakes: Water Quality 1 Topics of Discussion Tenmile Lakes Basin Partnership

Healthy Fitch Bay: From Diagnoses to Solutions Ariane Orjikh, M.E.I. Franois Blanger,

AGGRESSIVE DOG REPORTING Presented by: DOGS NSW Show Committee DOGS NSW Regulations Part

TH THE IN INFL FLUENTIA IAL VE VET NE NEW GRAD ADUAT ATE Dewi W Hughes: Business

The North Asia CAPE presentation to VUW 2 July 2018 Overview CAPE | North Asia 1. Purpose and

Non-Traditional Rutgers Students: Providing Resources for Self-Care and Success By Madeline

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models