[PPT] - Computer vision technologies for visual knowledge enrichment PowerPoint Presentation

SLIDE 1

Computer vision technologies for visual knowledge enrichment

Miriam Redi, Research Scientist

SLIDE 2

Images are powerful tools for communication and knowledge sharing

SLIDE 3

Wikimedia spaces are still missing many images!

95% of items in Around 1/10 articles in

SLIDE 4

One source of Visual Knowledge

SLIDE 5

SLIDE 6

Computer Vision

SLIDE 7

Computer vision can help contributors

○ Organize images ○ Describe images ○ Search and Find images

How do we bridge the visual knowledge gap?

SLIDE 8

How do we fill the gap responsibly?

_Knowledge equity: As a social movement, we will focus_ _our efforts on the knowledge and communities that have_ _been left out by structures of power and privilege. We will_ _welcome people from every background to build strong and _ _diverse communities. We will break down the social, political,_ _and technical barriers preventing people from accessing and_ _contributing to free knowledge._

SLIDE 9

RESPONSIBLE* Computer Vision

unbiased (against stereotypes) representative (multicultural) empowering (vs exclusive)

*

9

SLIDE 10

10

Unbiased Machine Vision Bias: learning from “human” data

SLIDE 11

VERY Large

SLIDE 12

12

Biased Machine Vision

Object detection

SLIDE 13

Interpretable Vision Algorithms Automatically Checking for Stereotypes in Datasets

Miriam Redi, Nikhil Rasiwasia, Alejandro Jaimes, Gaurav Aggarwal The Beauty of Capturing Faces, Rating the Quality of Digital Portraits Face and Gesture Recognition, FG 2015, Ljubljana, Slovenia 13

Unbiased Machine Vision

SLIDE 14

Computational Aesthetics Detecting beautiful images Computational Portrait Aesthetics Detecting beautiful images

f faces

(NOT beautiful faces)

14

Unbiased Machine Vision

Computational Portrait Aesthetics

SLIDE 15

Data Hundreds of thousands of photographs annotated in terms

f quality by photographers

Interpretable Visual Feature extraction

Inspired by Portrait Photography
Expanded with demographics

features

A EYE SHARPNESS

15

Unbiased Machine Vision

Computational Portrait Aesthetics

SLIDE 16

Correlation between quality scores and individual features -- The dataset is NOT Biased! :)

16

Unbiased Machine Vision

Computational Portrait Aesthetics

SLIDE 17

How do we define bias?

How can we identify and operationalize such bias dimensions?

17

Unbiased Machine Vision

SLIDE 18

Representation: learning from unevenly distributed data

SLIDE 19

19

Underrepresentative Machine Vision

Culture biases

SLIDE 20

Representative Machine Vision

Some solutions

SLIDE 21

Representative Machine Vision Multicultural Machine Vision Tools Reflecting visual definitions and preferences of people around the world

21

SLIDE 22

Representative Machine Vision

We have sentiment detectors for images… Reflecting sentiment perception of small groups of (western) people

Can we make image sentiment classifiers multicultural?

Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia DATA, CODE, DEMO: mvso.columbia.edu 22

SLIDE 23

Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, Shih-Fu Chang Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ACM Multimedia 2015, Brisbane, Australia

12 LANGUAGES 7.6M IMAGES

Multilingual Visual Sentiment Ontology

EMOTION KEYWORDS

[Plutchik 1980] FLICKR CRAWLING

LANGUAGE-SPECIFIC ADJECTIVE-NOUN PAIR (ANP) DISCOVERY

SENTIMENT ANNOTATIONS

Through CROWDSOURCING

12 LANGUAGES

healthy breakfast, health coffee, ...

16K ANPs

23

Representative Machine Vision

SLIDE 24

Multilingual Visual Sentiment Ontology

Cultural insights based
n semantically related

concepts

Each cluster reveals

○ Wording variation ○ Sentiment variation ○ Visual content variation

24

Representative Machine Vision

SLIDE 25

25

IT ES ZH EN DE FR

LANGUAGE-SPECIFIC SENTIMENT PREDICTORS CROSS-LINGUAL SENTIMENT PREDICTION

Multilingual Visual Sentiment Ontology

Representative Machine Vision

Cross-Lingual Sentiment Prediction

SLIDE 26

Representative Machine Vision

Multilingual Visual Sentiment Ontology Comparing different models to understand similarities and differences between language communities.

26

Cross-Lingual Sentiment Prediction

SLIDE 27

What is really “representative”?

What is the tradeoff between complexity and representativeness?

27

Representative Machine Vision

SLIDE 28

28

Empowerment: the algorithm in the human loop

Photo by Franck V. on Unsplash

SLIDE 29

Empowering Machine Vision Finding Representative Images for Human Knowledge In a collaborative environment.

29

SLIDE 30

Visual Thinking

Wikidata

Wikidata is an international and thus multilingual project. While English is the default interface language, the project is intended to be used by, and useful for, users of every language with MediaWiki internationalization support.

Empowering Machine Vision

SLIDE 31

ALL PEOPLE SPECIES

Empowering Machine Vision

Missing images in Wikidata

SLIDE 32

Item without P18 - ‘Has image’ Manual Selection and Evaluation

Empowering Machine Vision

Typical Scenario: users willing to add images to Wikidata might need

to manually search for the right image using different tools from millions

f free-licensed images

Manual search through millions of free licensed images

SLIDE 33

Item without P18 - ‘Has image’ Discovering related images from different

pen sources

Ranking images according to Relevance and Quality, automatically inferred from community curation Manual Selection and Evaluation

Empowering Machine Vision

Solving the problem of visual enrichment in a collaborative fashion, leveraging the wisdom of the community

SLIDE 34

Automatically discovering RELEVANT Free-Licensed images from linked pages, Commons search, Flickr search

Empowering Machine Vision

SLIDE 35

35

So much user-generated visual information.. How to prioritize it?

SLIDE 36

Empowering Machine Vision

QUALITY: is the image of high photographic quality?

Photo: Jee & Rani Nature Photography on Commons Photo: Vinayaraj on Commons

Not all relevant images are actually ‘good’ images

High Quality Lower Quality

SLIDE 37

High Quality: 160K Quality Commons Lower Quality: 160K Random Commons

Modeling the wisdom of the community

To surface high-quality pictures, we train a model using data curated by the Wikimedia community

SLIDE 38

Drawing: Aphex34 on Commons

Empowering Machine Vision

FRAMEWORK: Convolutional Neural Network Google Inception-v3[1]

[1] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

High Low

SLIDE 39

Empowering Machine Vision

Example results

SLIDE 40

https://tools.wmflabs.org/wikidata-game/distributed/#game=49&opt=%7B%22type%22%3A%22flower%22%7D

Empowering Machine Vision

Community tools for disseminating image recommendations

SLIDE 41

Is this really empowering?

How do we measure disruptions introduced by algorithms?

41

Representative Machine Vision

SLIDE 42

Summary

Unbiased: Interpretable models to detect data stereotypes
Representative: Culture-specific models that reflect different

visual worlds

Empowering: Helping communities with visual knowledge

enrichment

SLIDE 43

Thank you!

SLIDE 44

Challenges

Bias: How to identify bias dimensions?

○ How do we evaluate bias with the same communities?

Representation: How to make really representative, universal models? What does

that mean? ○ Representation vs quality; representative image search results

Empowerment: How to use machine learning technologies in harmony with

democratic, collaborative processes?

More! Are we missing something?

○ Privacy of people in pictures ○ Data re-use, what are people doing with images?