A Fast and Accurate One-Stage Approach to Visual Grounding - - PowerPoint PPT Presentation

▶

Oct 09, 2023 193 likes •342 views

A Fast and Accurate One-Stage Approach to Visual Grounding Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo Presenter: Tianlang Chen Visual grounding Grounding a language query onto a region of the image

SLIDE 1

A Fast and Accurate One-Stage Approach to Visual Grounding

Presenter: Tianlang Chen Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo

SLIDE 2

Visual grounding

Grounding a language query onto a region of the image

Query: bottom right grass

Grounding a language query onto a region of the image

–

Phrase localization

–

Referring expression comprehension

SLIDE 3

Two-stage framework

Existing framework

Query: center building

✔

SLIDE 4

Performance is capped by the region candidates
Slow in speed

Existing framework

SLIDE 5

One-stage visual grounding

One-stage approach
Generally applicable for sub-tasks in grounding

SLIDE 6

Why one-stage visual grounding

No region candidates -> 7~20% higher in accuracy
One-stage -> 10x faster

SLIDE 7

Architecture overview

Encoder
Fusion module
Grounding module

SLIDE 8

Architecture

Encoder
Fusion module
Grounding module
Visual encoder: DarkNet53+FPN
Language encoder: Bert, LSTM, FV
Spatial encoder: location related queries

SLIDE 9

Architecture

Encoder
Fusion module
Grounding module
Image-level fusion
Image-level fusion

– Multiple resolutions – Three parts of input features

SLIDE 10

Architecture

Encoder
Fusion module
Grounding module
Output format: box + confidence

SLIDE 11

Datasets

Phrase localization: Flickr 30K Entities
Referring expression comprehension: ReferItGame

the black backpack on the bottom right

Flickr 30K Entities ReferItGame

SLIDE 12

Comparison to other methods

SLIDE 13

Union of multiple objects
Stuff as opposed to things
Challenging regions

Qualitative results

Pred. gt Ours Two- stage

Reasons of improvement

SLIDE 14

Code & models: https://github.com/zyang-ur/onestage_grounding Poster: #26 Contact: zyang39@cs.rochester.edu