Computer vision technologies for visual knowledge enrichment - - PowerPoint PPT Presentation
Computer vision technologies for visual knowledge enrichment - - PowerPoint PPT Presentation
Computer vision technologies for visual knowledge enrichment Miriam Redi, Research Scientist Images are powerful tools for communication and knowledge sharing Wikimedia spaces are still missing many images! Around 1/10 articles in 95% of
Images are powerful tools for communication and knowledge sharing
Wikimedia spaces are still missing many images!
95% of items in Around 1/10 articles in
One source of Visual Knowledge
Computer Vision
- Computer vision can help contributors
○ Organize images ○ Describe images ○ Search and Find images
How do we bridge the visual knowledge gap?
How do we fill the gap responsibly?
_Knowledge equity: As a social movement, we will focus_ _our efforts on the knowledge and communities that have_ _been left out by structures of power and privilege. We will_ _welcome people from every background to build strong and _ _diverse communities. We will break down the social, political,_ _and technical barriers preventing people from accessing and_ _contributing to free knowledge._
RESPONSIBLE* Computer Vision
unbiased (against stereotypes) representative (multicultural) empowering (vs exclusive)
*
9
10
Unbiased Machine Vision Bias: learning from “human” data
VERY Large
12
Biased Machine Vision
Object detection
Interpretable Vision Algorithms Automatically Checking for Stereotypes in Datasets
Miriam Redi, Nikhil Rasiwasia, Alejandro Jaimes, Gaurav Aggarwal The Beauty of Capturing Faces, Rating the Quality of Digital Portraits Face and Gesture Recognition, FG 2015, Ljubljana, Slovenia 13
Unbiased Machine Vision
Computational Aesthetics Detecting beautiful images Computational Portrait Aesthetics Detecting beautiful images
- f faces
(NOT beautiful faces)
14
Unbiased Machine Vision
Computational Portrait Aesthetics
Data Hundreds of thousands of photographs annotated in terms
- f quality by photographers
Interpretable Visual Feature extraction
- Inspired by Portrait Photography
- Expanded with demographics
features
A EYE SHARPNESS
15
Unbiased Machine Vision
Computational Portrait Aesthetics
Correlation between quality scores and individual features -- The dataset is NOT Biased! :)
16
Unbiased Machine Vision
Computational Portrait Aesthetics
How do we define bias?
How can we identify and operationalize such bias dimensions?
17
Unbiased Machine Vision
Representation: learning from unevenly distributed data
19
Underrepresentative Machine Vision
Culture biases
Representative Machine Vision
Some solutions
Representative Machine Vision Multicultural Machine Vision Tools Reflecting visual definitions and preferences of people around the world
21
Representative Machine Vision
We have sentiment detectors for images… Reflecting sentiment perception of small groups of (western) people
Can we make image sentiment classifiers multicultural?
Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia DATA, CODE, DEMO: mvso.columbia.edu 22
Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia
12 LANGUAGES 7.6M IMAGES
Multilingual Visual Sentiment Ontology
EMOTION KEYWORDS
[Plutchik 1980] FLICKR CRAWLING
LANGUAGE-SPECIFIC ADJECTIVE-NOUN PAIR (ANP) DISCOVERY
SENTIMENT ANNOTATIONS
Through CROWDSOURCING
12 LANGUAGES
healthy breakfast, health coffee, ...
16K ANPs
23
Representative Machine Vision
Multilingual Visual Sentiment Ontology
- Cultural insights based
- n semantically related
concepts
- Each cluster reveals
○ Wording variation ○ Sentiment variation ○ Visual content variation
24
Representative Machine Vision
25
IT ES ZH EN DE FR
LANGUAGE-SPECIFIC SENTIMENT PREDICTORS CROSS-LINGUAL SENTIMENT PREDICTION
Multilingual Visual Sentiment Ontology
Representative Machine Vision
Cross-Lingual Sentiment Prediction
Representative Machine Vision
Multilingual Visual Sentiment Ontology Comparing different models to understand similarities and differences between language communities.
26
Cross-Lingual Sentiment Prediction
What is really “representative”?
What is the tradeoff between complexity and representativeness?
27
Representative Machine Vision
28
Empowerment: the algorithm in the human loop
Photo by Franck V. on Unsplash
Empowering Machine Vision Finding Representative Images for Human Knowledge In a collaborative environment.
29
Visual Thinking
Wikidata
Wikidata is an international and thus multilingual project. While English is the default interface language, the project is intended to be used by, and useful for, users of every language with MediaWiki internationalization support.
Empowering Machine Vision
ALL PEOPLE SPECIES
Empowering Machine Vision
Missing images in Wikidata
Item without P18 - ‘Has image’ Manual Selection and Evaluation
Empowering Machine Vision
Typical Scenario: users willing to add images to Wikidata might need
to manually search for the right image using different tools from millions
- f free-licensed images
Manual search through millions of free licensed images
Item without P18 - ‘Has image’ Discovering related images from different
- pen sources
Ranking images according to Relevance and Quality, automatically inferred from community curation Manual Selection and Evaluation
Empowering Machine Vision
Solving the problem of visual enrichment in a collaborative fashion, leveraging the wisdom of the community
Automatically discovering RELEVANT Free-Licensed images from linked pages, Commons search, Flickr search
Empowering Machine Vision
35
So much user-generated visual information.. How to prioritize it?
Empowering Machine Vision
QUALITY: is the image of high photographic quality?
Photo: Jee & Rani Nature Photography on Commons Photo: Vinayaraj on Commons
Not all relevant images are actually ‘good’ images
High Quality Lower Quality
High Quality: 160K Quality Commons Lower Quality: 160K Random Commons
Modeling the wisdom of the community
To surface high-quality pictures, we train a model using data curated by the Wikimedia community
Drawing: Aphex34 on Commons
Empowering Machine Vision
FRAMEWORK: Convolutional Neural Network Google Inception-v3[1]
[1] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
High Low
Empowering Machine Vision
Example results
https://tools.wmflabs.org/wikidata-game/distributed/#game=49&opt=%7B%22type%22%3A%22flower%22%7D
Empowering Machine Vision
Community tools for disseminating image recommendations
Is this really empowering?
How do we measure disruptions introduced by algorithms?
41
Representative Machine Vision
Summary
- Unbiased: Interpretable models to detect data stereotypes
- Representative: Culture-specific models that reflect different
visual worlds
- Empowering: Helping communities with visual knowledge
enrichment
Thank you!
Challenges
- Bias: How to identify bias dimensions?
○ How do we evaluate bias with the same communities?
- Representation: How to make really representative, universal models? What does
that mean? ○ Representation vs quality; representative image search results
- Empowerment: How to use machine learning technologies in harmony with
democratic, collaborative processes?
- More! Are we missing something?
○ Privacy of people in pictures ○ Data re-use, what are people doing with images?