A Fast and Accurate One-Stage Approach to Visual Grounding - - PowerPoint PPT Presentation

a fast and accurate one stage approach to visual grounding
SMART_READER_LITE
LIVE PREVIEW

A Fast and Accurate One-Stage Approach to Visual Grounding - - PowerPoint PPT Presentation

A Fast and Accurate One-Stage Approach to Visual Grounding Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo Presenter: Tianlang Chen Visual grounding Grounding a language query onto a region of the image


slide-1
SLIDE 1

A Fast and Accurate One-Stage Approach to Visual Grounding

Presenter: Tianlang Chen Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo

slide-2
SLIDE 2

Visual grounding

  • Grounding a language query onto a region of the image

Query: bottom right grass

  • Grounding a language query onto a region of the image

Phrase localization

Referring expression comprehension

slide-3
SLIDE 3
  • Two-stage framework

Existing framework

Query: center building

slide-4
SLIDE 4
  • Performance is capped by the region candidates
  • Slow in speed

Existing framework

slide-5
SLIDE 5

One-stage visual grounding

  • One-stage approach
  • Generally applicable for sub-tasks in grounding
slide-6
SLIDE 6

Why one-stage visual grounding

  • No region candidates -> 7~20% higher in accuracy
  • One-stage -> 10x faster
slide-7
SLIDE 7

Architecture overview

  • Encoder
  • Fusion module
  • Grounding module
slide-8
SLIDE 8

Architecture

  • Encoder
  • Fusion module
  • Grounding module
  • Visual encoder: DarkNet53+FPN
  • Language encoder: Bert, LSTM, FV
  • Spatial encoder: location related queries
slide-9
SLIDE 9

Architecture

  • Encoder
  • Fusion module
  • Grounding module
  • Image-level fusion
  • Image-level fusion

– Multiple resolutions – Three parts of input features

slide-10
SLIDE 10

Architecture

  • Encoder
  • Fusion module
  • Grounding module
  • Output format: box + confidence
slide-11
SLIDE 11

Datasets

  • Phrase localization: Flickr 30K Entities
  • Referring expression comprehension: ReferItGame

the black backpack on the bottom right

Flickr 30K Entities ReferItGame

slide-12
SLIDE 12

Comparison to other methods

slide-13
SLIDE 13
  • Union of multiple objects
  • Stuff as opposed to things
  • Challenging regions

Qualitative results

Pred. gt Ours Two- stage

  • Reasons of improvement
slide-14
SLIDE 14

Code & models: https://github.com/zyang-ur/onestage_grounding Poster: #26 Contact: zyang39@cs.rochester.edu

A Fast and Accurate One-Stage Approach to Visual Grounding