Look, Imagine and Match: Improving Textual-Visual Cross-Modal - - PowerPoint PPT Presentation

▶

Jun 11, 2023 359 likes •523 views

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models Jiuxiang Gu Jianfei Cai Shafiq Joty Li Niu Gang Wang Goal Text-to-Image Retrieval Image-to-Text Retrieval A young man doing a

SLIDE 1

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Jiuxiang Gu Jianfei Cai Shafiq Joty Li Niu Gang Wang

SLIDE 2

Goal

A young man doing a skateboard trick while others watch A man doing a skate trick during a competition event with a audience Guys on a course made for skate boarding A group of people doing skateboarding tricks on a car A boy riding on his skateboard at a skate park while other guys watch … Bright room with a couch and various different dressers … Image-to-Text Retrieval Text-to-Image Retrieval

SLIDE 3

Classical Pipeline

Bright room with a couch and various different dressers

… …

Similarity Image Encoder Text Encoder

𝑤" 𝑢" 𝑗 𝑑

Image Feature Text Feature

SLIDE 4

Motivation: Look è Imagine è Match

𝑗 𝑤 𝑢 𝑑 𝑑̂ 𝑗 𝑤 𝑢 𝑑 𝚥̂

Local Similarity Global Similarity Global Similarity Local Similarity Imagine Imagine Image-to-Text Retrieval Text-to-Image Retrieval

SLIDE 5

Look è Imagine

SLIDE 6

Match

SLIDE 7

Look è Imagine

SLIDE 8

Match

SLIDE 9

Proposed Approach

SLIDE 10

Cross-Modal Retrieval with Generative Learning

SLIDE 11

Cross-Modal Retrieval with Generative Learning

SLIDE 12

Results

SLIDE 13

Results (Classical Pipeline)

SLIDE 14

Results (Ours)

SLIDE 15

At the Poster:

Additional details
Quantitative results
Discussion