scene graph parsing as dependency parsing
play

Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , - PowerPoint PPT Presentation

Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , Chenxi Liu, Xiaohui Zeng, Alan Yuille Conference: North American Chapter of the Association for Computational Linguistics, 2018 1 Outline Introduction Method


  1. Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , Chenxi Liu, Xiaohui Zeng, Alan Yuille Conference: North American Chapter of the Association for Computational Linguistics, 2018 1

  2. Outline Introduction ● Method ● Experiments ● Conclusion ● 2

  3. Introduction Introduction ● Method ● Experiments ● Conclusion ● 3

  4. Introduction Many multimodal tasks fit into this picture ● A young boy wearing Intermediate black shirt is in front Representation of a goal 4

  5. Image Generation from Text A young boy wearing Intermediate black shirt is in front Representation of a goal 5

  6. Image Captioning A young boy wearing Intermediate black shirt is in front Representation of a goal 6

  7. Image Retrieval A young boy wearing Intermediate black shirt is in front Representation of a goal 7

  8. Neural Network Embedding Neural network embeddings often used as the intermediate representation ● Pro: easy training; similarity with cosine distance ● Con: no explicit structure; no easy interpretability ● A young boy wearing 1.2, -1.3, 4.6, …, -3.7 black shirt is in front 2.3, -2.2, -2.6,…, 5.3 of a goal 3.8, -7.4,-5.9 …, -3.2 8

  9. Scene Graph More recently, people start exploring a more explainable representation ● Has 3 types of nodes: object, attribute, relation ● A young boy wearing black shirt is in front of a goal 9 Ref: Johnson et al., Image Retrieval Using Scene Graph, CVPR 2015

  10. Our Goal Parsing from sentence to scene graph (i.e., scene graph parsing) ● A young boy wearing black shirt is in front of a goal 10

  11. Previous Work: Separated Two-stage Standard Heuristic rules; Dependency Simple classifier Parsing a young boy wearing Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016 black shirt is in front of a man 11

  12. Our Work: End-to-end One-stage a young boy wearing Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016 black shirt is Customized in front of a Dependency man Equivalent Parsing 12

  13. Method Introduction ● Method ● Experiments ● Conclusion ● 13

  14. Scene Graph Node-centric View 14

  15. Pushing Labels from Node to Arc Node-centric View Edge-centric View Object node to attribute node Object node to relation node Relation node to object node Equivalent Different colors are different arc labels ● Under the edge-centric view, scene graphs begin to look like dependency parses ● 15

  16. Review of Dependency Parsing 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! NSUBJ LEFT NMOD RIGHT CASE SHIFT DET ... ... 16

  17. How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? ? 17

  18. How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? ? 18

  19. Visual Genome In Visual Genome, every image is annotated with 30 regions on average ● Every region is annotated with a (region) description and a (region) scene graph ● kid sit on ground A young boy wearing black A kid is sitting on the ground shirt is in front of a goal 19 Ref: Krishna et al., Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations, IJCV 2017

  20. Alignment Strategy To mimic a dependency parsing training corpus, we need alignment between ● nodes in the scene graph and words in the sentence We propose a two-round alignment strategy: ● Within each round, object, attribute, relation nodes are aligned in this order ○ First round is more “conservative” (word-by-word match) ○ Second round is more “aggressive” (synonyms match) ○ 20

  21. Alignments made in Round 1 a young boy wearing black shirt is in front of a goal ROOT 21 21

  22. Alignments made in Round 2 a young boy wearing black shirt is in front of a goal ROOT 22 22

  23. Alignment Result a young boy wearing black shirt is in front of a goal ROOT 23

  24. How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? 24

  25. Regular Labels 1. ATTR 2. SUBJ 3. OBJT Object to Attribute Object to Relation Relation to Object SUBJ OBJT SUBJ ATTR OBJT ATTR a young boy wearing black shirt is in front of a goal ROOT 25

  26. Auxiliary Labels 1. ATTR 2. SUBJ 3. OBJT 4. CONT 5. BEGN Object to Attribute Object to Relation Relation to Object Phrase ROOT to Obj without Head BEGN SUBJ OBJT SUBJ ATTR OBJT CONT CONT ATTR a young boy wearing black shirt is in front of a goal ROOT 26

  27. How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? BEGN SUBJ OBJT CONT ATTR 27

  28. Transition-Based Arc-Hybrid System Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011 28

  29. Transition-Based Arc-Hybrid System Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011 29

  30. Augmented Arc-Hybrid We augment Arc-Hybrid with one more action that is REDUCE ● This is because we don’t require every word to have a head (e.g. “is”) ● 30

  31. How we do Scene Graph Parsing? 3. Define Actions in a System 2. Define a Label Space! 1. Get a Corpus! (e.g. Arc-Hybrid)! BEGN LEFT SUBJ RIGHT OBJT SHIFT CONT REDUCE ATTR 31

  32. Detailed Architecture 1.Initialization Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 2. Predict the next action to take 32 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016

  33. Detailed Architecture Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT a 1 young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 2 fully connected layers BiLSTM a young boy wearing black shirt is in front of a goal ROOT 33 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016

  34. Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT a young boy wearing black shirt is in front of a goal ROOT 34

  35. Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE a young boy wearing black shirt is in front of a goal ROOT 35

  36. Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT a young boy wearing black shirt is in front of a goal ROOT 36

  37. Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) ATTR a young boy wearing black shirt is in front of a goal ROOT 37

  38. Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 38

  39. Step Stack Buffer Action 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 39

  40. Step Stack Buffer Action 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend