Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , - - PowerPoint PPT Presentation

scene graph parsing as dependency parsing
SMART_READER_LITE
LIVE PREVIEW

Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , - - PowerPoint PPT Presentation

Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , Chenxi Liu, Xiaohui Zeng, Alan Yuille Conference: North American Chapter of the Association for Computational Linguistics, 2018 1 Outline Introduction Method


slide-1
SLIDE 1

Scene Graph Parsing as Dependency Parsing

Author: Yu-Siang Wang, Chenxi Liu, Xiaohui Zeng, Alan Yuille Conference: North American Chapter of the Association for Computational Linguistics, 2018

1

slide-2
SLIDE 2

Outline

  • Introduction
  • Method
  • Experiments
  • Conclusion

2

slide-3
SLIDE 3

Introduction

  • Introduction
  • Method
  • Experiments
  • Conclusion

3

slide-4
SLIDE 4

Introduction

A young boy wearing black shirt is in front

  • f a goal

Intermediate Representation

4

  • Many multimodal tasks fit into this picture
slide-5
SLIDE 5

Image Generation from Text

Intermediate Representation

A young boy wearing black shirt is in front

  • f a goal

5

slide-6
SLIDE 6

Image Captioning

Intermediate Representation

A young boy wearing black shirt is in front

  • f a goal

6

slide-7
SLIDE 7

Image Retrieval

Intermediate Representation

A young boy wearing black shirt is in front

  • f a goal

7

slide-8
SLIDE 8
  • Neural network embeddings often used as the intermediate representation
  • Pro: easy training; similarity with cosine distance
  • Con: no explicit structure; no easy interpretability

Neural Network Embedding

1.2, -1.3, 4.6, …, -3.7 2.3, -2.2, -2.6,…, 5.3 3.8, -7.4,-5.9 …, -3.2

A young boy wearing black shirt is in front

  • f a goal

8

slide-9
SLIDE 9

Scene Graph

Ref: Johnson et al., Image Retrieval Using Scene Graph, CVPR 2015

A young boy wearing black shirt is in front of a goal

9

  • More recently, people start exploring a more explainable representation
  • Has 3 types of nodes: object, attribute, relation
slide-10
SLIDE 10

Our Goal

A young boy wearing black shirt is in front of a goal

10

  • Parsing from sentence to scene graph (i.e., scene graph parsing)
slide-11
SLIDE 11

Previous Work: Separated Two-stage

Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016

a young boy wearing black shirt is in front of a man

Heuristic rules; Simple classifier Standard Dependency Parsing

11

slide-12
SLIDE 12

Our Work: End-to-end One-stage

Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016

a young boy wearing black shirt is in front of a man

Customized Dependency Parsing

12

Equivalent

slide-13
SLIDE 13

Method

  • Introduction
  • Method
  • Experiments
  • Conclusion

13

slide-14
SLIDE 14

Scene Graph

14

Node-centric View

slide-15
SLIDE 15

Pushing Labels from Node to Arc

Equivalent

15

Edge-centric View Node-centric View

  • Different colors are different arc labels
  • Under the edge-centric view, scene graphs begin to look like dependency parses

Object node to attribute node Object node to relation node Relation node to object node

slide-16
SLIDE 16

Review of Dependency Parsing

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Pick a System (e.g.

Arc-Hybrid) and its Actions!

NSUBJ NMOD CASE DET ... LEFT RIGHT SHIFT ...

16

slide-17
SLIDE 17

How we do Scene Graph Parsing?

?

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Pick a System (e.g.

Arc-Hybrid) and its Actions!

? ?

17

slide-18
SLIDE 18

How we do Scene Graph Parsing?

?

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Pick a System (e.g.

Arc-Hybrid) and its Actions!

? ?

18

slide-19
SLIDE 19

Visual Genome

  • In Visual Genome, every image is annotated with 30 regions on average
  • Every region is annotated with a (region) description and a (region) scene graph

19

A young boy wearing black shirt is in front of a goal A kid is sitting on the ground

kid sit on ground Ref: Krishna et al., Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations, IJCV 2017

slide-20
SLIDE 20

Alignment Strategy

  • To mimic a dependency parsing training corpus, we need alignment between

nodes in the scene graph and words in the sentence

  • We propose a two-round alignment strategy:

○ Within each round, object, attribute, relation nodes are aligned in this order ○ First round is more “conservative” (word-by-word match) ○ Second round is more “aggressive” (synonyms match)

20

slide-21
SLIDE 21

Alignments made in Round 1

21

a young boy is front a goal wearing black shirt in

  • f

ROOT

21

slide-22
SLIDE 22

Alignments made in Round 2

22 22

a young boy is front a goal wearing black shirt in

  • f

ROOT

slide-23
SLIDE 23

a young boy is front a goal wearing black shirt in

  • f

ROOT

23

Alignment Result

slide-24
SLIDE 24

How we do Scene Graph Parsing?

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Pick a System (e.g.

Arc-Hybrid) and its Actions!

? ?

24

slide-25
SLIDE 25
  • 1. ATTR 2. SUBJ 3. OBJT

Object to Attribute Object to Relation Relation to Object

Regular Labels

OBJT

a young boy is front a goal wearing black shirt in

  • f

ATTR SUBJ

ATTR

SUBJ OBJT

ROOT

25

slide-26
SLIDE 26

Auxiliary Labels

  • 1. ATTR 2. SUBJ 3. OBJT 4. CONT 5. BEGN

Object to Attribute Object to Relation Relation to Object Phrase ROOT to Obj without Head

OBJT

a young boy is front a goal wearing black shirt in

  • f

ATTR SUBJ

ATTR

SUBJ

CONT CONT

OBJT

ROOT

BEGN

26

slide-27
SLIDE 27

How we do Scene Graph Parsing?

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Pick a System (e.g.

Arc-Hybrid) and its Actions!

?

27

BEGN SUBJ OBJT CONT ATTR

slide-28
SLIDE 28

Transition-Based Arc-Hybrid System

28 Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011

slide-29
SLIDE 29

Transition-Based Arc-Hybrid System

29 Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011

slide-30
SLIDE 30

Augmented Arc-Hybrid

30

  • We augment Arc-Hybrid with one more action that is REDUCE
  • This is because we don’t require every word to have a head (e.g. “is”)
slide-31
SLIDE 31

How we do Scene Graph Parsing?

  • 1. Get a Corpus!
  • 2. Define a Label Space!
  • 3. Define Actions in a System

(e.g. Arc-Hybrid)!

31

BEGN SUBJ OBJT CONT ATTR LEFT RIGHT SHIFT REDUCE

slide-32
SLIDE 32

Detailed Architecture

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT

1.Initialization

  • 2. Predict the next action to take

32 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016

slide-33
SLIDE 33

Detailed Architecture

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT

a young boy is front a goal wearing black shirt in

  • f

ROOT

BiLSTM 2 fully connected layers

33 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016

slide-34
SLIDE 34

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

34

slide-35
SLIDE 35

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE

35

slide-36
SLIDE 36

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT

36

slide-37
SLIDE 37

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) ATTR

37

slide-38
SLIDE 38

Stack Buffer a young boy wearing black shirt is in front of a goal ROOT Action SHIFT Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR

38

slide-39
SLIDE 39

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

young boy wearing black shirt is in front of a goal ROOT a 1 REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR

39

slide-40
SLIDE 40

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

2 young boy wearing black shirt is in front of a goal ROOT SHIFT boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR

40

slide-41
SLIDE 41

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

boy wearing black shirt is in front of a goal ROOT 3 young LEFT(ATTR) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT 7 boy wearing black shirt is in front of a goal ROOT LEFT(ATTR) ATTR ATTR

41

slide-42
SLIDE 42

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT 7 boy wearing black shirt is in front of a goal ROOT LEFT(ATTR) 8 boy wearing shirt is in front of a goal ROOT SHIFT ATTR ATTR

42

slide-43
SLIDE 43

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT 7 boy wearing black shirt is in front of a goal ROOT LEFT(ATTR) 8 boy wearing shirt is in front of a goal ROOT SHIFT ATTR ATTR 9 boy wearing shirt is in front of a goal ROOT RIGHT(OBJT) OBJT

43

slide-44
SLIDE 44

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

6 boy wearing black shirt is in front of a goal ROOT SHIFT 7 boy wearing black shirt is in front of a goal ROOT LEFT(ATTR) 8 boy wearing shirt is in front of a goal ROOT SHIFT ATTR ATTR 9 boy wearing shirt is in front of a goal ROOT RIGHT(OBJT) OBJT 10 boy wearing is in front of a goal ROOT RIGHT(SUBJ) SUBJ

44

slide-45
SLIDE 45

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

7 boy wearing black shirt is in front of a goal ROOT LEFT(ATTR) 8 boy wearing shirt is in front of a goal ROOT SHIFT ATTR ATTR 9 boy wearing shirt is in front of a goal ROOT RIGHT(OBJT) OBJT 10 boy wearing is in front of a goal ROOT RIGHT(SUBJ) SUBJ 11 boy is in front of a goal ROOT SHIFT

45

slide-46
SLIDE 46

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

8 boy wearing shirt is in front of a goal ROOT SHIFT ATTR ATTR 9 boy wearing shirt is in front of a goal ROOT RIGHT(OBJT) OBJT 10 boy wearing is in front of a goal ROOT RIGHT(SUBJ) SUBJ 11 boy in front of a goal ROOT SHIFT 12 boy is in front of a goal ROOT REDUCE

46

slide-47
SLIDE 47

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR 9 boy wearing shirt is in front of a goal ROOT RIGHT(OBJT) OBJT 10 boy wearing is in front of a goal ROOT RIGHT(SUBJ) SUBJ 11 boy in front of a goal ROOT SHIFT 12 boy is in front of a goal ROOT REDUCE 13 boy in front of a goal ROOT SHIFT

47

slide-48
SLIDE 48

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT 10 boy wearing is in front of a goal ROOT RIGHT(SUBJ) SUBJ 11 boy in front of a goal ROOT SHIFT 12 boy is in front of a goal ROOT REDUCE 13 boy in front of a goal ROOT SHIFT

CONT

14 boy in front of a goal ROOT LEFT(CONT)

48

slide-49
SLIDE 49

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ 11 boy in front of a goal ROOT SHIFT 12 boy is in front of a goal ROOT REDUCE 13 boy in front of a goal ROOT SHIFT

CONT

14 boy in front of a goal ROOT LEFT(CONT) 15 boy front of a goal ROOT SHIFT

49

slide-50
SLIDE 50

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ 12 boy is in front of a goal ROOT REDUCE 13 boy in front of a goal ROOT SHIFT

CONT

14 boy in front of a goal ROOT LEFT(CONT) 15 boy front of a goal ROOT SHIFT 16 boy front

  • f a goal ROOT

LEFT(CONT)

CONT

50

slide-51
SLIDE 51

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ 13 boy in front of a goal ROOT SHIFT

CONT

14 boy in front of a goal ROOT LEFT(CONT) 15 boy front of a goal ROOT SHIFT 16 boy front

  • f a goal ROOT

LEFT(CONT)

CONT

17 boy

  • f a goal ROOT

SHIFT

51

slide-52
SLIDE 52

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT

14 boy in front of a goal ROOT LEFT(CONT) 15 boy front of a goal ROOT SHIFT 16 boy front

  • f a goal ROOT

LEFT(CONT)

CONT

17 boy

  • f a goal ROOT

SHIFT 18 boy of a goal ROOT SHIFT

52

slide-53
SLIDE 53

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT

15 boy front of a goal ROOT SHIFT 16 boy front

  • f a goal ROOT

LEFT(CONT)

CONT

17 boy

  • f a goal ROOT

SHIFT 18 boy of a goal ROOT SHIFT 19 boy of a goal ROOT REDUCE

53

slide-54
SLIDE 54

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT

16 boy front

  • f a goal ROOT

LEFT(CONT)

CONT

17 boy

  • f a goal ROOT

SHIFT 18 boy of a goal ROOT SHIFT 19 boy of a goal ROOT REDUCE 20 boy of goal ROOT SHIFT

54

slide-55
SLIDE 55

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT CONT

17 boy

  • f a goal ROOT

SHIFT 18 boy of a goal ROOT SHIFT 19 boy of a goal ROOT REDUCE 20 boy of goal ROOT SHIFT 21 boy of goal ROOT RIGHT(OBJT) OBJT

55

slide-56
SLIDE 56

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT CONT

18 boy of a goal ROOT SHIFT 19 boy of a goal ROOT REDUCE 20 boy of goal ROOT SHIFT 21 boy of goal ROOT RIGHT(OBJT) OBJT 22 boy of ROOT RIGHT(SUBJ) SUBJ

56

slide-57
SLIDE 57

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT CONT

19 boy of a goal ROOT REDUCE 20 boy of goal ROOT SHIFT 21 boy of goal ROOT RIGHT(OBJT) OBJT 22 boy of ROOT RIGHT(SUBJ) SUBJ 23 boy ROOT LEFT(BEGN) BEGN

57

slide-58
SLIDE 58

Stack Buffer Action Step

a young boy is front a goal wearing black shirt in

  • f

ROOT

ATTR ATTR OBJT SUBJ

CONT CONT

20 boy of goal ROOT SHIFT 21 boy of goal ROOT RIGHT(OBJT) OBJT 22 boy of ROOT RIGHT(SUBJ) SUBJ 23 boy ROOT LEFT(BEGN) BEGN 24 ROOT

58

slide-59
SLIDE 59

Experiments

  • Introduction
  • Method
  • Experiments
  • Conclusion

59

slide-60
SLIDE 60
  • Dataset statistics (intersection of MS COCO and Visual Genome):
  • Evaluated by F-score between parsed scene graph and ground truth scene graph

Experiment 1: Scene Graph Parsing

Training Validation # of Images 34027 17471 # of Sentences/Scene Graphs 1070145 547795

60

slide-61
SLIDE 61

Scene Graph Parsing Results

Parser F-Score Stanford (Schuster et al,. 2015) [Separated Two-stage] 0.3549 SPICE (Anderson et al., 2016) [Separated Two-stage] 0.4469 Ours [End-to-end One-stage] 0.4967

61

slide-62
SLIDE 62

Scene Graph Parsing Oracle

Parser F-Score Stanford (Schuster et al,. 2015) [Separated Two-stage] 0.3549 SPICE (Anderson et al., 2016) [Separated Two-stage] 0.4469 Ours [End-to-end One-stage] 0.4967 Oracle 0.6985

62

slide-63
SLIDE 63

Scene Graph Parsing Ablation Studies

Parser F-Score Ours (CONT pointing left) 0.4967 Ours (CONT pointing right) 0.4952 Ours (1 round aggressive alignment) 0.4877 Ours (1 round conservative alignment) 0.4538

63

slide-64
SLIDE 64

Experiment 2: Image Retrieval

Ref: Schuster et al., Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval, EMNLP 2015 Workshop 64

  • Task: Rank images based on relevance to the input sentence/query.
  • Dataset: Same as (Schuster et al., 2015); a smaller version of Visual Genome.
  • Experiment:

○ Directly apply the parser learned in the previous experiment to parse the query into scene graph ○ Compute the F-score between the parsed scene graph and ground truth scene graph obtained from image ○ Rank the images based on this F-score similarity

  • Evaluation metric: Recall@5; Recall@10; Median rank.
slide-65
SLIDE 65

Image Retrieval Dataset Statistics

Development Set Test Set Intersection of YFCC100m and MS COCO # of Images 454 456 # of Regions 4953 5180

65

slide-66
SLIDE 66

Image Retrieval Results

66

slide-67
SLIDE 67

Conclusion

  • Introduction
  • Method
  • Experiments
  • Conclusion

67

slide-68
SLIDE 68

Conclusion

  • Scene graph is a structured, explainable intermediate representation

connecting image and text

  • By taking the edge-centric view of scene graphs, we adapt techniques from

dependency parsing to train a scene graph parser end-to-end

  • We outperform previous works by a large margin, and efficacy is evaluated in

terms of both F-score similarity and image retrieval experiments

  • Code is released at https://github.com/Yusics/bist-parser/tree/sgparser

68

slide-69
SLIDE 69

Thank you!

69