Visually Grounded Neural Syntax Acquisition * * Haoyue Shi - PowerPoint PPT Presentation

Visually Grounded Neural Syntax Acquisition * * Haoyue Shi Jiayuan Mao Kevin Gimpel Karen Livescu July 29 th , 2019 @ACL

When we were c hildren… A cat is on the lawn.

When we were c hildren… A cat is on the lawn. A cat sleeps outside.

When we were c hildren… A cat is on the lawn. A cat, as a whole, means something concrete. A cat sleeps outside.

When we were c hildren… A cat is on the lawn. A cat is staring at you. A cat plays with a ball. A cat, as a whole, means something concrete. A cat sleeps outside. A cat is on the ground. There is a cat sleeping on the ground.

When we were c hildren… A cat is on the lawn. A cat is staring at you. A cat plays with a ball. A cat, as a whole, means something concrete. A cat sleeps outside. A cat is on the ground. There is a cat sleeping on the ground. A cat, as a whole, functions as a single unit in sentences.

When we were c hildren… A cat is on the lawn. A cat is staring at you. A cat plays with a ball. A cat was chasing a mouse. A dog was chasing a cat . A cat was chased by a dog. … A cat sleeps outside. A cat is on the ground. There is a cat sleeping on the ground. A cat, as a whole, functions as a single unit in sentences.

Problem Definition • Given a large set of parallel image-text data (e.g., MSCOCO), can we generate linguistically plausible structure for the text? Figure credit: Ding et al. (2018)

Problem Definition • Given a large set of parallel image-text data (e.g., MSCOCO), can we generate linguistically plausible structure for the text? A cat is on the lawn

Visually Grounded Neural Syntax Learner • Concrete spans are more likely to be constituents. Caption : “A cat is on the lawn” Image Joint Embedding Space Parser Constituency Parse Tree Text 𝒅 3 Encoder 𝒅 1 𝒅 1 : a cat 𝒅 2 𝒅 2 : the lawn Image Encoder: 𝒅 3 : on the lawn ResNet 101 (He et al., 2015) … Estimated Concreteness as Scores

Visually Grounded Neural Syntax Learner • Concrete spans are more likely to be constituents. Caption : “A cat is on the lawn” Parser Constituency Parse Tree 𝒅 1 : a cat 𝒅 2 : the lawn 𝒅 3 : on the lawn …

Greedy Bottom-Up Parser a cat is on the lawn

Greedy Bottom-Up Parser Compute score 𝐰 𝑏 𝐺𝐺𝑂 = 4.5 4.5 𝐰 𝑑𝑏𝑢 a cat is on the lawn

Greedy Bottom-Up Parser Compute score 𝐰 𝑑𝑏𝑢 𝐺𝐺𝑂 = 0.5 4.5 0.5 𝐰 𝑗𝑡 a cat is on the lawn

Greedy Bottom-Up Parser Compute score 𝐰 𝑗𝑡 𝐺𝐺𝑂 = 1 4.5 0.5 1 𝐰 𝑝𝑜 a cat is on the lawn

Greedy Bottom-Up Parser Compute score 𝐰 𝑝𝑜 𝐺𝐺𝑂 = 1 4.5 0.5 1 1 𝐰 𝑢ℎ𝑓 a cat is on the lawn

Greedy Bottom-Up Parser Compute score 𝐰 𝑢ℎ𝑓 𝐺𝐺𝑂 = 3 4.5 0.5 1 1 3 𝐰 𝑚𝑏𝑥𝑜 a cat is on the lawn

Greedy Bottom-Up Parser Normalized to a probability distribution 0.45 0.05 0.1 0.1 0.3 a cat is on the lawn

Greedy Bottom-Up Parser 0.45 0.05 0.1 0.1 0.3 Sample a pair to combine (training) Greedily combine (inference) a cat is on the lawn

Greedy Bottom-Up Parser Textual representation: Normalized sum of children (a cat) 𝐰 𝑏 + 𝐰 𝑑𝑏𝑢 𝐰 𝑏 𝑑𝑏𝑢 = 𝐰 𝑏 + 𝐰 𝑑𝑏𝑢 0.45 0.05 0.1 0.1 0.3 2 a cat is on the lawn

Greedy Bottom-Up Parser Textual representation: Normalized sum of children (a cat) is on the lawn 𝐰 𝑏 + 𝐰 𝑑𝑏𝑢 𝐰 𝑏 𝑑𝑏𝑢 = 𝐰 𝑏 + 𝐰 𝑑𝑏𝑢 0.45 0.05 0.1 0.1 0.3 2 a cat is on the lawn

Greedy Bottom-Up Parser Compute probability 0.25 0.15 0.15 0.45 (a cat) is on the lawn 0.45 0.05 0.1 0.1 0.3 a cat is on the lawn

Greedy Bottom-Up Parser Combine (a cat) is on (the lawn) 0.25 0.15 0.15 0.45 (a cat) is on the lawn 0.45 0.05 0.1 0.1 0.3 a cat is on the lawn

Greedy Bottom-Up Parser Finished! ((a cat) (is (on (the lawn)))) … (a cat) is on (the lawn) 0.25 0.15 0.15 0.45 (a cat) is on the lawn 0.45 0.05 0.1 0.1 0.3 a cat is on the lawn

Visually Grounded Neural Syntax Learner • Concrete spans are more likely to be constituents. Caption : “A cat is on the lawn” Parser Constituency Parse Tree 𝒅 1 : a cat 𝒅 2 : the lawn 𝒅 3 : on the lawn …

Visually Grounded Neural Syntax Learner • Concrete spans are more likely to be constituents. Caption : “A cat is on the lawn” Image Parser Constituency Parse Tree 𝒅 1 : a cat 𝒅 2 : the lawn 𝒅 3 : on the lawn …

Visually Grounded Neural Syntax Learner • Concrete spans are more likely to be constituents. Caption : “A cat is on the lawn” Image Joint Embedding Space Parser Constituency Parse Tree Text 𝒅 3 Encoder 𝒅 1 𝒅 1 : a cat 𝒅 2 𝒅 2 : the lawn Image Encoder: 𝒅 3 : on the lawn ResNet 101 (He et al., 2015) …

The Joint Embedding Space Hinge-based triplet loss between images and captions for visual semantic embeddings (VSE; Kiros et al., 2015): 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

The Joint Embedding Space Hinge-based triplet loss between images and captions for visual semantic embeddings (VSE; Kiros et al., 2015): √ A cat is on the lawn. 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

The Joint Embedding Space Hinge-based triplet loss between images and captions for visual semantic embeddings (VSE; Kiros et al., 2015): √ A cat is on the lawn. 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) × ⋅ + = max ⋅, 0 A cat is on the lawn. 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

The Joint Embedding Space Hinge-based triplet loss between images and captions for visual semantic embeddings (VSE; Kiros et al., 2015): × √ A cat is on the lawn. A dog is on the lawn. 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) × ⋅ + = max ⋅, 0 A cat is on the lawn. 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

Concreteness Estimation in the Joint Embedding Space Hinge-based triplet loss between images and captions constituents for visual semantic embeddings: 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

Concreteness Estimation in the Joint Embedding Space a cat √ Hinge-based triplet loss between images and captions constituents for visual semantic embeddings: 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

Concreteness Estimation in the Joint Embedding Space √ a cat Hinge-based triplet loss between images and captions constituents for visual semantic embeddings: on the ? 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

Concreteness Estimation in the Joint Embedding Space √ a cat Hinge-based triplet loss between images and captions constituents for visual semantic embeddings: ? on the 𝑡𝑗𝑛 𝑗 ′ , 𝑑 − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + + 𝑡𝑗𝑛 𝑗, 𝑑′ − 𝑡𝑗𝑛 𝑗, 𝑑 + 𝜀 + ℒ 𝑗, 𝑑 = ෍ 𝑗 ′ ,𝑑 ′ ≠(𝑗,𝑑) Abstractness: local hinge loss between constituents and images. 𝑏𝑐𝑡𝑢𝑠𝑏𝑑𝑢 𝑑; 𝑗 = ℒ(𝑗, 𝑑) ⋅ + = max ⋅, 0 𝑡𝑗𝑛 ⋅,⋅ = cos(⋅,⋅)

Visually Grounded Neural Syntax Acquisition * * Haoyue Shi - PowerPoint PPT Presentation

Visually Grounded Neural Syntax Acquisition * * Haoyue Shi Jiayuan Mao Kevin Gimpel Karen Livescu July 29 th , 2019 @ACL When we were c hildren A cat is on the lawn. When we were c hildren A cat is on the lawn. A cat sleeps outside.

Visually Grounded Meaning Representation Qi Huang Ryan Rock Outline 1. Motivation 2.

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Representations of language in a model of visually grounded speech signal Grzegorz Chrupaa

Visually grounded learning of keyword prediction from untranscribed speech Interspeech, August

Visually Grounded, Task-oriented Dialogue Elia Bruni Outline Language grounding Visual dialogue

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Every Little Step Counts An Effective Model for Culturally Grounded Pediatric Diabetes Prevention

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems Aishwarya

What are the research questions for this course? How is the knowledge (in our minds) grounded

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Secrets of the Goddess of the Dawn Aurora Myth & Legend, with Melissa F. Kaelin "The

Add picture Swarm here Bret Fisher DevOps Consultant Docker Captain, Dell {code} Catalyst

Filling the Gap Writing Nuclear History of Japan from Different Angles Masakatsu Ota

How to Win New Accounts! Presented by: Neil Grant MSc. C.Eng. MIMechE Managing Director Strategic

Paws4medford.org www.paws4medford.org (339) 674-0085 Paws4medford.org www.paws4medford.org (339)

Public Health & Healthcare Preparedness Public Health & Healthcare Preparedness (PHHP)

Engagement events November 2013 1 Update The challenge Objectives Engagement &

The efficacy of solution-focused brief therapy for distress among parents of pediatric congenital

Visually Grounded Neural Syntax Acquisition * * Haoyue Shi - PowerPoint PPT Presentation

Visually Grounded Neural Syntax Acquisition * * Haoyue Shi Jiayuan Mao Kevin Gimpel Karen Livescu July 29 th , 2019 @ACL When we were c hildren A cat is on the lawn. When we were c hildren A cat is on the lawn. A cat sleeps outside.

Visually Grounded Meaning Representation Qi Huang Ryan Rock Outline 1. Motivation 2.

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Lear Learning M ning Multi ulti-Moda Modal l Grounded Lingu Grounded Linguistic istic

Outline Introduction Definition History Features When should Grounded Theory be used? Types

TAKE TAKE GROUNDED GROUNDED DECISIONS DECISIONS Farm Modelling Statistic based, gamification

Representations of language in a model of visually grounded speech signal Grzegorz Chrupaa

Visually grounded learning of keyword prediction from untranscribed speech Interspeech, August

Visually Grounded, Task-oriented Dialogue Elia Bruni Outline Language grounding Visual dialogue

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Blind/Visually Impaired Silvia Ludena Veronica Sarabia Katie Stoddard Huong Vo Blind/Visually

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Every Little Step Counts An Effective Model for Culturally Grounded Pediatric Diabetes Prevention

Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems Aishwarya

What are the research questions for this course? How is the knowledge (in our minds) grounded

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Secrets of the Goddess of the Dawn Aurora Myth &amp; Legend, with Melissa F. Kaelin &quot;The

Add picture Swarm here Bret Fisher DevOps Consultant Docker Captain, Dell {code} Catalyst

Filling the Gap Writing Nuclear History of Japan from Different Angles Masakatsu Ota

How to Win New Accounts! Presented by: Neil Grant MSc. C.Eng. MIMechE Managing Director Strategic

Paws4medford.org www.paws4medford.org (339) 674-0085 Paws4medford.org www.paws4medford.org (339)

Public Health &amp; Healthcare Preparedness Public Health &amp; Healthcare Preparedness (PHHP)

Engagement events November 2013 1 Update The challenge Objectives Engagement &amp;

The efficacy of solution-focused brief therapy for distress among parents of pediatric congenital

Secrets of the Goddess of the Dawn Aurora Myth & Legend, with Melissa F. Kaelin "The

Public Health & Healthcare Preparedness Public Health & Healthcare Preparedness (PHHP)

Engagement events November 2013 1 Update The challenge Objectives Engagement &