what does bert with vision look at
play

What Does BERT with Vision Look At? Liunian Harold Li Mark Yatskar - PowerPoint PPT Presentation

1 What Does BERT with Vision Look At? Liunian Harold Li Mark Yatskar Da Yin Cho-Jui Hsieh Kai-Wei Chang UCLA AI2 PKU UCLA UCLA A long version, VisualBERT: A Simple and Performant Baseline for Vision and Language is on Arxiv (Aug


  1. 1 What Does BERT with Vision Look At? Liunian Harold Li Mark Yatskar Da Yin Cho-Jui Hsieh Kai-Wei Chang UCLA AI2 PKU UCLA UCLA A long version, “VisualBERT: A Simple and Performant Baseline for Vision and Language” is on Arxiv (Aug 2019).

  2. 2 BERT with Vision: Pre-trained Vision-and-language (V&L) Models Several people walking on a sidewalk in the rain with umbrellas. a) Yes, it is snowing. Several people [MASK] on a [MASK] b) Yes, [person8] and [person10] are outside. in the [MASK] with [MASK]. c) No, it looks to be fall. d) Yes, it is raining heavily. Pre-train on image captions and transfer to visual question answering

  3. 3 BERT with Vision: Pre-trained Vision-and-language (V&L) Models Mask and predict on image captions Transformer over image regions and texts Significant improvement over baselines ViLBERT, B2T2, LXMERT, VisualBERT, Unicoder-VL, VL-BERT, UNITER, … Performance of VisualBERT compared to strong baselines

  4. 4 What does BERT with Vision learn during pre-training? Entity grounding Map entities to regions

  5. 5 Probing attention maps of VisualBERT: Entity Grounding 50.77 Certain heads can perform entity grounding Accuracy peaks in higher layers

  6. 6 What does BERT with Vision learn during pre-training? Syntactic grounding Map w 1 to regions of w 2 , if w 1 w 2

  7. 7 Probing attention maps of VisualBERT: Syntactic Grounding For each dependency relationship, there exists at least one accurate syntax grounding head

  8. 8 Probing attention maps of VisualBERT: Syntactic Grounding pobj nsubj Syntactic grounding accuracy peaks in higher layers

  9. 9 Probing attention maps of VisualBERT: Qualitative Example Layer 3 Layer 4 Layer 5 Layer 6 Layer 10 Layer 11 Woman Sweater Husband Accurate entity and syntax grounding Refined understanding over the layers

  10. 10 Discussion Previous work Pre-trained language models learn the classical NLP pipeline (Peters et al., 2018; Liu et al., 2019; Tenney et al., 2019) Qualitatively, V&L models learn some entity grounding (Yang et al., 2016; Anderson et al., 2018; Kim et al., 2018) Grounding can be learned using dedicated methods (Xiao et al., 2017; Datta et al., 2019) Our paper BERT with Vision learns grounding through pre-training We quantitively verify both entity and syntactic grounding https://github.com/uclanlp/visualbert

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend