Text to 3D Scene Generation with Rich Lexical Grounding Angel Chang - - PowerPoint PPT Presentation

text to 3d scene generation with rich lexical grounding
SMART_READER_LITE
LIVE PREVIEW

Text to 3D Scene Generation with Rich Lexical Grounding Angel Chang - - PowerPoint PPT Presentation

Text to 3D Scene Generation with Rich Lexical Grounding Angel Chang Will Monroe Manolis Savva Christopher Potts Christoper D. Manning Stanford University There is a desk and there is a notepad on the desk. There is a pen next to


slide-1
SLIDE 1

Text to 3D Scene Generation with Rich Lexical Grounding

ACL-IJCNLP July 27, 2015 Beijing, China

“There is a desk and there is a notepad on the desk. There is a pen next to the notepad.”

Angel Chang Will Monroe Manolis Savva Christopher Potts Christoper D. Manning Stanford University

slide-2
SLIDE 2

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and Conclusion
slide-3
SLIDE 3

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-4
SLIDE 4

The art of 3D scene design

slide-5
SLIDE 5

The art of 3D scene design

Call of Duty: Advanced Warfare [Activision / Sledgehammer Games]

slide-6
SLIDE 6

Call of Duty: Advanced Warfare [Activision / Sledgehammer Games] Toy Story 3 [Disney / Pixar]

The art of 3D scene design

slide-7
SLIDE 7

Call of Duty: Advanced Warfare [Activision / Sledgehammer Games] Toy Story 3 [Disney / Pixar] “Modern: Plywood, Plastic & Polished Metal” [Homedit Interior Design & Architecture]

The art of 3D scene design

slide-8
SLIDE 8

Generating 3D scenes from text

slide-9
SLIDE 9

Generating 3D scenes from text

TOYS’ POV -- An idyllic day care classroom, filled with the happy bustle

  • f four- and five-year-olds, playing with toys -- dinosaurs, a baby

doll, a pink Teddy bear, a Ken doll. ... A Tonka Truck races forward, then backs up in a quick 180 arc, revealing a large pink Teddy bear, LOTSO, in its bed. Lotso taps a Tinker Toy cane and the truck bed rises, “dumping” him out. Like Bob Hope stepping off the links in Palm Springs, Lotso exudes an easy, cheerful charisma.

(Screenplay by Michael Arndt)

slide-10
SLIDE 10

Selected prior work

SHRDLU (Winograd, 1972) WordsEye (Coyne and Sproat, 2001)

slide-11
SLIDE 11

Scene generation pipeline

There is a room with a wooden desk and a black

  • lamp. There is a chair to

the right of the desk. (Chang et al., 2014)

slide-12
SLIDE 12

Scene generation pipeline

There is a room with a wooden desk and a black

  • lamp. There is a chair to

the right of the desk.

parsing

(Chang et al., 2014)

slide-13
SLIDE 13

Scene generation pipeline

There is a room with a wooden desk and a black

  • lamp. There is a chair to

the right of the desk.

parsing

  • bject

selection

(Chang et al., 2014)

slide-14
SLIDE 14

Scene generation pipeline

There is a room with a wooden desk and a black

  • lamp. There is a chair to

the right of the desk.

parsing layout

  • bject

selection

(Chang et al., 2014)

slide-15
SLIDE 15

Handling lexical variety

sofa couch loveseat dresser chest of drawers cabinet

slide-16
SLIDE 16

Identifying object mentions

Wood table and four wood chairs in the center of the room

slide-17
SLIDE 17

Wood table and four wood chairs in the center of the room

Can we fix this by learning from data?

Identifying object mentions

slide-18
SLIDE 18

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-19
SLIDE 19

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-20
SLIDE 20

Dataset

There is a bed and there is a chair next to the bed.

slide-21
SLIDE 21

Dataset

There is a bed and there is a chair next to the bed.

slide-22
SLIDE 22

Structure of a 3D scene

slide-23
SLIDE 23

{ 'modelID': '7bdc0aac', 'position': [118.545639, 97.979499, 3.098599], 'scale': 0.087807, 'rotation': -1.088704 }

Structure of a 3D scene

slide-24
SLIDE 24

{ 'modelID': '7bdc0aac', 'position': [118.545639, 97.979499, 3.098599], 'scale': 0.087807, 'rotation': -1.088704 }

Field Value name ellington armchair id 7bdc0aac tags armchair, chair, ellington, haughton, sam, seating, woodmark category Chair wnlemmas armchair unit 0.028974 up [0, 0, 1] front [0, -1, 0]

Structure of a 3D scene

slide-25
SLIDE 25

{ 'modelID': '7bdc0aac', 'position': [118.545639, 97.979499, 3.098599], 'scale': 0.087807, 'rotation': -1.088704 }

Field Value name ellington armchair id 7bdc0aac tags armchair, chair, ellington, haughton, sam, seating, woodmark category Chair wnlemmas armchair unit 0.028974 up [0, 0, 1] front [0, -1, 0] WordNet human-tagged keywords & categories size & orientation suggestions

Structure of a 3D scene

slide-26
SLIDE 26

Dataset

There is a bed and there is a chair next to the bed.

slide-27
SLIDE 27

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.
slide-28
SLIDE 28

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.
slide-29
SLIDE 29

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.
slide-30
SLIDE 30

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.

1128 scenes 4284 scene descriptions 60 seed sentences

slide-31
SLIDE 31

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-32
SLIDE 32

Discrimination task

brown room with a refrigerator in the back corner

A B C D E

slide-33
SLIDE 33

D brown room with a refrigerator in the back corner

Discrimination task

slide-34
SLIDE 34

Learning lexical items

  • One-vs.-all logistic regression
  • Features: 1{(language, object)}

– language: bag-of-words / bag-of-bigrams – object: model id / category

brown brown room room room with with ... room01 room02 7bdc0aac cat:Room cat:Refrigerator ...

slide-35
SLIDE 35

Discrimination results

Random set Model ids only 71.5% Model ids + categories 83.3%

  • Accuracy (% correct scenes identified)
slide-36
SLIDE 36

Lexical grounding examples

text category chair Chair couch Couch sofa Couch fruit Bowl bookshelf Bookcase

slide-37
SLIDE 37

Lexical grounding examples

slide-38
SLIDE 38

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-39
SLIDE 39

Generate!

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

?

slide-40
SLIDE 40

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

desk room chair wooden desk There is a black a a wooden black lamp

slide-41
SLIDE 41

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

desk room chair wooden desk There is a black a a wooden black lamp

slide-42
SLIDE 42

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

desk room chair wooden desk There is a black a a wooden black lamp

slide-43
SLIDE 43

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

group by object sum weights

2.1 1.5 2.3 2.0 1.7 1.8 1.9

slide-44
SLIDE 44

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

choose top k (k = 4)

K = 4, average number of objects in human-constructed scenes

2.1 1.5 2.3 2.0 1.7 1.8 1.9

slide-45
SLIDE 45

Baseline

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

choose top k (k = 4)

No relationship enforced between objects! Combine with rule-based parser?

2.1 1.5 2.3 2.0 1.7 1.8 1.9

slide-46
SLIDE 46

Rule-based parsing

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.

(Chang et al., 2014)

slide-47
SLIDE 47

Rule-based parsing

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.
  • Identify object categories using noun phrases

(Chang et al., 2014)

slide-48
SLIDE 48

Rule-based parsing

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.
  • Identify object categories using noun phrases
  • Identify attributes and keywords using

modifiers and dependency patterns

(Chang et al., 2014)

slide-49
SLIDE 49

Rule-based parsing

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.
  • Identify object categories using noun phrases
  • Identify attributes and keywords using modifiers

and dependency patterns

  • Identify spatial relations using dependency

patterns

(Chang et al., 2014)

slide-50
SLIDE 50

Rule-based parsing

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.
  • Identify object categories using noun phrases
  • Identify attributes and keywords using modifiers

and dependency patterns

  • Identify spatial relations using dependency

patterns

  • Look up objects from DB using categories and

keywords

(Chang et al., 2014)

slide-51
SLIDE 51

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp

slide-52
SLIDE 52

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp

c=argmax

c

ϕi∈ϕ(p)

θ(i ,c) Lamp Table Vase

slide-53
SLIDE 53

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp

Lamp 2.304 Table 0.622 Vase -0.310 c=argmax

c

ϕi∈ϕ(p)

θ(i ,c)

slide-54
SLIDE 54

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp

c=argmax

c

ϕi∈ϕ(p)

θ(i ,c)

m=argmax

m∈c (λd ∑ ϕi∈ϕ(d)

θ(i,m)+λx ∑

ϕi∈ϕ(x)

θ(i ,m))

Lamp 2.304 Table 0.622 Vase -0.310

slide-55
SLIDE 55

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp

Lamp 2.304 Table 0.622 Vase -0.310 c=argmax

c

ϕi∈ϕ(p)

θ(i ,c)

m=argmax

m∈c (λd ∑ ϕi∈ϕ(d)

θ(i ,m)+λ x ∑

ϕi∈ϕ(x)

θ(i, m))

slide-56
SLIDE 56

Parsing + learned lexical grounding

there is a room with a wooden desk and a black lamp 0.302 0.460 -0.021

Lamp 2.304 Table 0.622 Vase -0.310 c=argmax

c

ϕi∈ϕ(p)

θ(i ,c)

m=argmax

m∈c (λd ∑ ϕi∈ϕ(d)

θ(i ,m)+λ x ∑

ϕi∈ϕ(x)

θ(i, m))

slide-57
SLIDE 57

Parsing + learned lexical grounding

There is a room with a wooden desk and a black lamp. There is a chair to the right

  • f the desk.
slide-58
SLIDE 58

Scene generation pipeline

There is a room with a wooden desk and a black

  • lamp. There is a chair to

the right of the desk.

parsing layout

  • bject

selection

(Chang et al., 2014)

slide-59
SLIDE 59

Generated scene examples

A round table is in the center of the room with four chairs around the table. There is a double window facing west. A door is on the east side of the room.

slide-60
SLIDE 60

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-61
SLIDE 61

Evaluation

  • Turkers rated fidelity of generated scenes
  • n a scale of 1 (poor) to 7 (good)
slide-62
SLIDE 62

Evaluation

  • Turkers rated fidelity of generated scenes
  • n a scale of 1 (poor) to 7 (good)
  • Compare scenes generated with four

methods against human-built scenes

slide-63
SLIDE 63

Evaluation

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

h u m a n

  • b

u i l t

slide-64
SLIDE 64

Evaluation

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-65
SLIDE 65

Evaluation

  • Turkers rated fidelity of generated scenes
  • n a scale of 1 (poor) to 7 (good)
  • Compare scenes generated with 4 methods

(random, lexical baseline, rule-based-parser, combined) against human-built scenes

slide-66
SLIDE 66

Evaluation

  • Turkers rated fidelity of generated scenes
  • n a scale of 1 (poor) to 7 (good)
  • Compare scenes generated with 4 methods

(random, lexical baseline, rule-based-parser, combined) against human-built scenes

  • Two sets of scene descriptions

Seed: seed sentences Mturk: descriptions provided by turkers

slide-67
SLIDE 67

Dataset

There is a bed and there is a chair next to the bed.

Seed

slide-68
SLIDE 68

Dataset

There is a bed and there is a chair next to the bed.

Seed

Simple, no modifiers

slide-69
SLIDE 69

Dataset

There is a bed and there is a chair next to the bed.

Seed

slide-70
SLIDE 70

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.

Seed Mturk

slide-71
SLIDE 71

Dataset

There is a bed and there is a chair next to the bed.

Floor to ceiling windows on back wall. Green bed with two pillows and black blanket. Lights recessed into right side wall. Light wood flooring. A chair is in the upper right hand corner There is a bed on the side of the room. There is a chair in the corner, next to the windows. I see a bed and a chair. The room has three windows on one wall. There is a red bed in the back of the room. Along side the bed is a side chair that is red and white. This room has a bed with red bedding against the wall. Next to the bed is a chair. there is a antique looking bed with red covers and pillows in a room. next to it is a recliner chair with red padding. also there are windows. there is a bed with five pillows on it, and next to it is a chair There is a bed in the room with two pillows and a small chair near to the right side of it. There is a large grey bed in the bottom right corner

  • f the room. Above the bed is a small black chair.

Seed Mturk

More complex, varied language

slide-72
SLIDE 72

Evaluation Results

Method Simple Random 2.03 Lexical baseline 3.51 Rule-based parser 5.44 Combined 5.23 Human-built 6.06

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-73
SLIDE 73

Evaluation Results

Method Seed Random 2.03 Lexical baseline 3.51 Rule-based parser 5.44 Combined 5.23 Human-built 6.06

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-74
SLIDE 74

Evaluation Results

Method Seed Random 2.03 Lexical baseline 3.51 Rule-based parser 5.44 Combined 5.23 Human-built 6.06

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-75
SLIDE 75

Evaluation Results

Method Seed Random 2.03 Lexical baseline 3.51 Rule-based parser 5.44 Combined 5.23 Human-built 6.06

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-76
SLIDE 76

Evaluation Results

Method Seed Mturk Random 2.03 1.68 Lexical baseline 3.51 2.61 Rule-based parser 5.44 3.15 Combined 5.23 3.73 Human-built 6.06 5.87

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-77
SLIDE 77

Evaluation Results

Method Seed Mturk Random 2.03 1.68 Lexical baseline 3.51 2.61 Rule-based parser 5.44 3.15 Combined 5.23 3.73 Human-built 6.06 5.87

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-78
SLIDE 78

Evaluation Results

Method Seed Mturk Random 2.03 1.68 Lexical baseline 3.51 2.61 Rule-based parser 5.44 3.15 Combined 5.23 3.73 Human-built 6.06 5.87

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-79
SLIDE 79

Evaluation Results

Method Seed Mturk Random 2.03 1.68 Lexical baseline 3.51 2.61 Rule-based parser 5.44 3.15 Combined 5.23 3.73 Human-built 6.06 5.87

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-80
SLIDE 80

Outline

  • Introduction and prior work
  • Dataset
  • Lexical learning
  • Generation with lexical grounding
  • Evaluation
  • Challenges and conclusion
slide-81
SLIDE 81

Evaluation Results

Method Seed Mturk Random 2.03 1.68 Lexical baseline 3.51 2.61 Rule-based parser 5.44 3.15 Combined 5.23 3.73 Human-built 6.06 5.87

Turkers rated fidelity of generated scenes

  • n a scale of 1 (poor) to 7 (good)

168 participants, average 4.2 ratings per scene-description pair

slide-82
SLIDE 82

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-83
SLIDE 83

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-84
SLIDE 84

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-85
SLIDE 85

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-86
SLIDE 86

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

slide-87
SLIDE 87

Generated scene examples

In between the doors and the window, there is a black couch with red cushions, two white pillows, and one black pillow. In front of the couch, there is a wooden coffee table with a glass top and two newspapers. Next to the table, facing the couch, is a wooden folding chair.

?

slide-88
SLIDE 88

Remaining Challenges

  • Grounding of spatial relations
  • Coreference

There in the middle is a table. On the table is a cup. facing the couch

slide-89
SLIDE 89

Summary

  • Learning of lexical grounding to handle

linguistic variation in scene description

slide-90
SLIDE 90

Summary

  • Learning of lexical grounding to handle

linguistic variation in scene description

  • Combined rule-based parser and learned

lexical groundings for scene generation

slide-91
SLIDE 91

Summary

  • Learning of lexical grounding to handle

linguistic variation in scene description

  • Combined rule-based parser and learned

lexical groundings for scene generation

  • Evaluation demonstrating improved text to

scene generation

slide-92
SLIDE 92

Thank you!

Dataset is publicly available http://nlp.stanford.edu/data/text2scene.shtml