Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval - - PowerPoint PPT Presentation

fi fine ne gr grained ained vid video eo te text re
SMART_READER_LITE
LIVE PREVIEW

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval - - PowerPoint PPT Presentation

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar arch chic ical al Gr Graph Re Reasoning Shizhe Chen 1 , Yida Zhao 1 , Qin Jin 1 , Qi Wu 2 1 Renmin University of China , 2 University of Adelaide


slide-1
SLIDE 1

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar arch chic ical al Gr Graph Re Reasoning

Shizhe Chen1, Yida Zhao1, Qin Jin1, Qi Wu2

1Renmin University of China, 2University of Adelaide

1

slide-2
SLIDE 2

Vi Video-Te Text Cr Cros

  • ss-mod

modal Re Retrieval

2

  • Dominant approach: learning joint embedding space
  • Global visual-semantic matching
  • L One vector is hard to encode fine-grained details
  • Local visual-semantic matching
  • L Relationships between local vectors are not well captured via sequential modeling
slide-3
SLIDE 3

Hier Hierar archic hical al Gr Grap aph Re Reasoning Mod Model (H (HGR)

  • Hierarchical Textual Encoding
  • Decompose sentence into semantic

role graph

  • Capture relationships via graph

reasoning

3

  • Multi-level Video-Text Matching
  • Event
  • Actions
  • Entities
  • Hierarchical Video Encoding
  • Guided by different levels of text to learn

diverse video representations

Global Local

slide-4
SLIDE 4
  • In-domain Cross-modal Retrieval
  • Better performance across three datasets
  • Cross-domain Generalization
  • Generalize better across datasets
  • Fine-grained Binary Selection
  • Differentiate fine-grained difference

between positive and negative sentences

Expe Experiments

4

slide-5
SLIDE 5

Con Conclusion

  • n
  • Decompose videos and texts into hierarchical semantic levels
  • Utilize graph reasoning to generate hierarchical embeddings
  • Evaluate on in-domain, cross-domain and fine-grained binary

selection to demonstrate model’s effectiveness

5

Codes and datasets will be released at: https://github.com/cshizhe/hgr_v2t