Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum Max Planck - - PowerPoint PPT Presentation
Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum Max Planck - - PowerPoint PPT Presentation
Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum Max Planck Institute for Informatics, Saarbrcken, Germany User Query Concrete Abstract Q: bicycle in street Q: environment friendly traffic Sreyasi Nag Chowdhury, AKBC 2016 17/06/2016 1
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract Q: bicycle in street Q: environment friendly traffic
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!”
Q: bicycle in street Q: environment friendly traffic
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!”
Text-only Q: bicycle in street Q: environment friendly traffic
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual Q: bicycle in street Q: environment friendly traffic
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
Text-only Text + visual
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual
“Riding for a cause.” Visual objects: person, bicycle CSK: (riding bicycle, be, environment friendly)
Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
Text-only Text + visual
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual
“Riding for a cause.” Visual objects: person, bicycle
Text/visual Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
Text-only Text + visual
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual
“Riding for a cause.” Visual objects: person, bicycle CSK: (riding bicycle, be, environment friendly)
Text/visual Text + visual + CSK Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
Text-only Text + visual
17/06/2016 1 Sreyasi Nag Chowdhury, AKBC 2016
User Query Concrete Abstract
“Wow! Double-decker buses still run!” Visual objects: bicycle, bus, car
Text-only Text + visual
“Riding for a cause.” Visual objects: person, bicycle CSK: (riding bicycle, be, environment friendly)
Text/visual Text + visual + CSK Q: bicycle in street Q: environment friendly traffic
“Biking by the river” Visual objects: train, piano
Text-only Text + visual
Our contribution
- CSK: Where do we get it from?
- CSK: How do we use it?
- CSK: How to combine noisy signals?
- CSK: Does it help?
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016 2
- Existing CSK knowledge bases: WordNet, ConceptNet, WebChild, Knowlywood
CSK: WHERE DO WE GET IT FROM?
3 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Existing CSK knowledge bases: WordNet, ConceptNet, WebChild, Knowlywood
- Our corpus: Wiki articles from domain ‘tourism’
CSK: WHERE DO WE GET IT FROM?
3 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Existing CSK knowledge bases: WordNet, ConceptNet, WebChild, Knowlywood
- Our corpus: Wiki articles from domain ‘tourism’
- Pruned by Jaccard Similarity
CSK: WHERE DO WE GET IT FROM?
3 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Existing CSK knowledge bases: WordNet, ConceptNet, WebChild, Knowlywood
- Our corpus: Wiki articles from domain ‘tourism’
- Pruned by Jaccard Similarity
- ~22,000 CSK triples
“tourism” “be travel for” “recreation, leisure, family, business purposes” “people” “fall in” “love”
- “the bloody hell” “be” “you”
CSK: WHERE DO WE GET IT FROM?
3
Domain-specific ReVerb triples
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
- Query string: travel with backpack
- CSK to expand query
- t1: (tourists, use, travel maps)
- t2: (tourists, carry, backpack)
- t3: (backpack, is a type of, bag)
4 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Query string: travel with backpack
- CSK to expand query
- t1: (tourists, use, travel maps)
- t2: (tourists, carry, backpack)
- t3: (backpack, is a type of, bag)
- Document x with features
- Textual: “A tourist reading a map by the road”
- Visual: person, bag, bottle, bus
Text-only systems Text + visual + CSK systems
4 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
- Query string: travel with backpack
- CSK to expand query
- t1: (tourists, use, travel maps)
- t2: (tourists, carry, backpack)
- t3: (backpack, is a type of, bag)
- Document x with features
- Textual: “A tourist reading a map by the road”
- Visual: person, bag, bottle, bus
Text-only systems Text + visual + CSK systems
4
CSK bridge vocabulary gap between query and document CSK establish relations between concepts CSK diminish noise from modalities – ensemble effect
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
5
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
Document x
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
5
Document x Textual features xx
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
5
Document x Textual features xx
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
Visual features xv : backpack, person
5
Document x Textual features xx
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
Visual features xv : backpack, bag, container person, casual agent, organism
5
Document x Textual features xx
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
Visual features xv : backpack, bag, container person, casual agent, organism Query: “group excursion” Query expansion: (an excursion, be trip by, a group of people) (organized excursions, book through, a tour company) CSK features
A tour group is standing on the grass with ruins in the background. Group of people standing in front of a stone structure.
5 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW DO WE USE IT?
Visual features xv : backpack, bag, container person, casual agent, organism Query: “group excursion” Query expansion: (an excursion, be trip by, a group of people) (organized excursions, book through, a tour company)
CSK: HOW TO COMBINE NOISY SIGNALS?
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: HOW TO COMBINE NOISY SIGNALS?
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK triple
CSK: HOW TO COMBINE NOISY SIGNALS?
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
Probabilities based on word-wise overlaps
CSK: HOW TO COMBINE NOISY SIGNALS?
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
Background corpus – Co-occurring Flickr tags
CSK: HOW TO COMBINE NOISY SIGNALS?
6
- Mixture LM:
- Commonsense-aware LM:
- Smoothed LM: , where
- Basic LM: , where
17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
Textual and visual features
CSK: HOW TO COMBINE NOISY SIGNALS?
7 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Image Dataset
- Flickr30k
- MS COCO captioned dataset
- Pascal Sentence Dataset
- SBU captioned dataset
7 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Image Dataset
- Flickr30k
- MS COCO captioned dataset
- Pascal Sentence Dataset
- SBU captioned dataset
Boat trip to see the mythical pink dolphins... this is John checking in with the office for that day.
7 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Image Dataset
- Flickr30k
- MS COCO captioned dataset
- Pascal Sentence Dataset
- SBU captioned dataset
A group of tourists is crossing a bridge that connects a walking path to a trail of nature. Many people cross a very tall footbridge with a tree-covered hill in the background. This shows a group of people walking over an arched red bridge. People cross a large bridge to get over the body of water. People walking over a white and red bridge over a pond. Boat trip to see the mythical pink dolphins... this is John checking in with the office for that day.
7 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Image Dataset
- Flickr30k
- MS COCO captioned dataset
- Pascal Sentence Dataset
- SBU captioned dataset
A group of tourists is crossing a bridge that connects a walking path to a trail of nature. Many people cross a very tall footbridge with a tree-covered hill in the background. This shows a group of people walking over an arched red bridge. People cross a large bridge to get over the body of water. People walking over a white and red bridge over a pond. Boat trip to see the mythical pink dolphins... this is John checking in with the office for that day.
social media post blog post
7 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Image Dataset
- Flickr30k
- MS COCO captioned dataset
- Pascal Sentence Dataset
- SBU captioned dataset
- ~ 50,000 images with captions
A group of tourists is crossing a bridge that connects a walking path to a trail of nature. Many people cross a very tall footbridge with a tree-covered hill in the background. This shows a group of people walking over an arched red bridge. People cross a large bridge to get over the body of water. People walking over a white and red bridge over a pond. Boat trip to see the mythical pink dolphins... this is John checking in with the office for that day.
social media post blog post
- Baselines: Text-only and Text + Visual search approaches
- Evaluation metric: Average Precision @ 10
8 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
- Baselines: Text-only and Text + Visual search approaches
- Evaluation metric: Average Precision @ 10
8 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
47% 64% 85% Text-only Text + Visual Text + Visual + CSK (Know2Look)
- Baselines: Text-only and Text + Visual search approaches
- Evaluation metric: Average Precision @ 10
- Examples queries:
- Concrete – ball park, bridge road, table home, bicycle road
- Abstract – diesel transport, housing town
- Mixed – old clock, backpack travel, boat tour
8 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
47% 64% 85% Text-only Text + Visual Text + Visual + CSK (Know2Look)
- Baselines: Text-only and Text + Visual search approaches
- Evaluation metric: Average Precision @ 10
- Examples queries:
- Concrete – ball park, bridge road, table home, bicycle road
- Abstract – diesel transport, housing town
- Mixed – old clock, backpack travel, boat tour
8 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
CSK: DOES IT HELP?
47% 64% 85% Text-only Text + Visual Text + Visual + CSK (Know2Look) Co-occurring Flickr tags
9 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
Text-only Text + Visual Text + Visual + CSK (Know2Look)
Query: “group excursion”
xxj: “A small excursion boat anchored
- n the beach at the resort in Mexico. "
xvj: lunar excursion module, conveyance xxj: “A group of people riding camels.“ yk: (an excursion, be trip by, a group of people)
CSK: DOES IT HELP?
9 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
Text-only Text + Visual Text + Visual + CSK (Know2Look)
Query: “group excursion”
xxj: “A small excursion boat anchored
- n the beach at the resort in Mexico. "
xvj: lunar excursion module, conveyance xxj: “A group of people riding camels.“ yk: (an excursion, be trip by, a group of people)
CSK: DOES IT HELP?
- Noisy OpenIE triples capture commonsense knowledge
- Noisy textual cues + noisy visual object detection + noisy commonsense knowledge
ensemble effect better results for multimodal document retrieval
- CSK act as bridge between text and vision
10 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Noisy OpenIE triples capture commonsense knowledge
- Noisy textual cues + noisy visual object detection + noisy commonsense knowledge
ensemble effect better results for multimodal document retrieval
- CSK act as bridge between text and vision
- Do word co-occurrences or word embeddings provide similar results?
- Does structured commonsense knowledge improve retrieval?
10 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016
- Noisy OpenIE triples capture commonsense knowledge
- Noisy textual cues + noisy visual object detection + noisy commonsense knowledge
ensemble effect better results for multimodal document retrieval
- CSK act as bridge between text and vision
- Do word co-occurrences or word embeddings provide similar results?
- Does structured commonsense knowledge improve retrieval?
10 17/06/2016 Sreyasi Nag Chowdhury, AKBC 2016