Wisdom of the Crowd you rely heavily on? (e.g., imdb ratings, CS - - PowerPoint PPT Presentation

wisdom of the crowd
SMART_READER_LITE
LIVE PREVIEW

Wisdom of the Crowd you rely heavily on? (e.g., imdb ratings, CS - - PowerPoint PPT Presentation

Reply in Zoom chat: Which peer signals do Wisdom of the Crowd you rely heavily on? (e.g., imdb ratings, CS 278 | Stanford University | Michael Bernstein Carta, online product reviews) Where we are, and where were going Week 1-2: Basic


slide-1
SLIDE 1

Wisdom of the Crowd

CS 278 | Stanford University | Michael Bernstein Reply in Zoom chat: Which peer signals do you rely heavily on? (e.g., imdb ratings, Carta, online product reviews)

slide-2
SLIDE 2

Where we are, and where we’re going

2

Week 1-2: Basic ingredients — motivation, norms, and strategies for managing growth Week 3: Groups — strong/weak ties, collaboration Week 4: Massive collaborations

The Wisdom of the Crowd Crowdsourcing and Peer Production

slide-3
SLIDE 3

http://hci.st/wise

Grab your phone, fill it out!

slide-4
SLIDE 4

4

How much do you weigh? My cerebral cortex is insufficiently developed for language

slide-5
SLIDE 5

5

Whoa, the mean guess is within 1% of the true value

slide-6
SLIDE 6

6

Innovation competitions in industry Innovation competitions for science

slide-7
SLIDE 7

7

Prediction markets AI data annotation at scale

slide-8
SLIDE 8

Today

What is the wisdom of the crowd? What is crowdsourcing? Why do they work? When do they work?

8

slide-9
SLIDE 9

Wisdom of the crowd

slide-10
SLIDE 10

Crowds are surprisingly accurate at estimation tasks

Who will win the election? How many jelly beans are in the jar? What will the weather be? Is this website a scam? Individually, we all have errors and biases. However, in aggregate, we exhibit surprising amounts of collective intelligence.

10

slide-11
SLIDE 11

11

“Guess the number of minutes it takes to fly from Stanford, CA to Seattle, WA.” 130 110 150 90 70 170 190 If our errors are distributed at random around the true value, we can recover it by asking enough people and aggregating.

slide-12
SLIDE 12

What problems can be solved this way?

Jeff Howe [2009] theorized that that it required:

Diversity of opinion Decentralization Aggregation function

So — any question that has a binary (yes/no), categorical (e.g., win/ lose/tie), or interval (e.g., score spread on a football game) outcome

12

slide-13
SLIDE 13

What problems cannot be solved this way?

Flip the bits!

People all think the same thing People can communicate No way to combine the opinions

For example, writing a short story (is much harder!)

13

slide-14
SLIDE 14

General algorithm

  • 1. Ask a large number of people to answer the question

Answers must be independent of each other — no talking! People must have a reasonable level of expertise regarding the phenomenon in question.

  • 2. Average their responses

14

slide-15
SLIDE 15

Why does this work?

[Simoiu et al. 2020] Independent guesses minimize the effects of social influence

Showing consensus cues such as the most popular guess lowers accuracy If initial guesses are inaccurate and public, then the crowd never recovers

Crowds are more consistent guessers then experts

In an experiment, crowds are only at the 67th percentile on average per question…but at the 90th percentile averaged across questions! Think of this as the Tortoise and the Hare, except the Tortoise (crowd) is even faster — at the 67th percentile instead of the worst percentile

15

slide-16
SLIDE 16

16

Mechanism: ask many independent contributors to take a whack at the problem, and reward the top contributor

slide-17
SLIDE 17

Mechanism: ask paid data annotators to label the same image and look for agreement in labels Mechanism: use a market to aggregate opinions (much more on the implications of paid crowd work in the Future of Work lecture)

slide-18
SLIDE 18

Let’s check our http://hci.st/wise results

slide-19
SLIDE 19

Aggregation approaches

slide-20
SLIDE 20

Early crowdsourcing

[Grier 2007] 1760 British Nautical Almanac Nevil Maskelyne Two distributed workers work independently, and a third verifier adjudicates their responses

20

slide-21
SLIDE 21

Work distributed via mail

21

Work distributed via mail

slide-22
SLIDE 22

Charles Babbage

=

22

Two people doing the same task in the same way will make the same errors.

slide-23
SLIDE 23

23

I did it in 1906. And I have cool sideburns. You reinvented the same idea, but it was stickier this time because statistics had matured. Unfortunately, you also held some pretty problematic opinions about eugenics.

slide-24
SLIDE 24

Mathematical Tables Project

WPA project, begun 1938 Calculated tables of mathematical functions Employed 450 human computers The origin of the term computer

24

slide-25
SLIDE 25

20th Century Fox

slide-26
SLIDE 26

Enter computer science

Computation allows us to execute these kinds of goals at even larger scale and with even more complexity. We can design systems that gather evidence, combine estimates, and guide behavior.

26

slide-27
SLIDE 27

Forms of crowdsourcing

slide-28
SLIDE 28

Definition

Crowdsourcing term coined by Jeff Howe [2006] in Wired “Taking [...] a function once performed by employees and outsourcing it to an undefined (and generally large) network

  • f people in the form of an open call.”

28

slide-29
SLIDE 29

Volunteer crowdsourcing

Tap into intrinsic motivation to recruit volunteers

29

Kasparov vs. the world NASA Clickworkers Collaborative math proofs Search for a missing person Wikipedia Ushahidi crisis mapping

slide-30
SLIDE 30

Automated sharing

Opt in to sharing and aggregation

30

Waze traffic sharing (also includes manual) Purple Air air quality sensors

slide-31
SLIDE 31

What if the task were embedded in another goal?

Just like I get exercise on my commute to Stanford When I could still commute to Stanford *quiet sob*

31

slide-32
SLIDE 32

Games with a purpose

[von Ahn and Dabbish ’08]

Make the data labeling goal enjoyable. You are paired up with another person on the internet, but can’t talk to them. You see the same image. Try to guess the same word to describe it.

32

slide-33
SLIDE 33

Games with a purpose

[von Ahn and Dabbish ’08]

Let’s try it. Volunteers? Taboo words:

Burger Food Fries

33

slide-34
SLIDE 34

Games with a purpose

[von Ahn and Dabbish ’08]

Let’s try it. Volunteers? Taboo words:

Stanford Graduation Wacky walk Appendix

34

slide-35
SLIDE 35

reCAPTCHA

“Oh, I see you’d like to make an account

  • here. Sure would be a shame if you

couldn’t get into my website. Maybe you should help me train my AI system and I’ll see if I can do something about letting you in.”

35

slide-36
SLIDE 36

Handling collusion and manipulation

slide-37
SLIDE 37

37

Not the name that the British were expecting to see Stephen Colbert fans raid NASA’s vote to name the new ISS wing

slide-38
SLIDE 38

A small number

  • f malicious

individuals can tear apart a collective effort.

38

slide-39
SLIDE 39

39

[Example via Mako Hill]

slide-40
SLIDE 40

40

[Example via Mako Hill]

slide-41
SLIDE 41

41

[Example via Mako Hill]

slide-42
SLIDE 42

42

[Example via Mako Hill]

slide-43
SLIDE 43

43

[Example via Mako Hill]

slide-44
SLIDE 44

Can we survive vandalism?

Michael’s take: it’s a calculation of the cost of vandalism vs. the cost

  • f cleaning it up.

How much effort does it take to vandalize Wikipedia? How much effort does it take an admin to revert it?

If effort to vandalize >>> effort to revert, then the system can survive. How do you design your crowdsourcing system to create this balance?

44

slide-45
SLIDE 45

Who do we trust?

We need to answer two questions simultaneously: (1) What is the correct answer to each question? and (2) Which participants’ answers are most likely to be correct? Think of it another way: if people are disagreeing, is there someone who is generally right? An algorithm called Get Another Label solves this problem by answering the two questions simultaneously

45

[Sheng, Provost, Ipeirotis, ’08]

slide-46
SLIDE 46

Get Another Label

Inspired by Expectation Maximization (EM) algorithm from AI. Use the workers’ guesses to estimate the most likely answer for each question. Use those answers to estimate worker

  • quality. Use those estimates of quality to

re-weight the guesses and re-compute

  • answers. Loop.

46

[Sheng, Provost, Ipeirotis, ’08]

Given current contributor estimates, estimate the probability of each answer Given current answer probabilities, estimate contributor accuracy Loop until convergence

slide-47
SLIDE 47

Bayesian Truth Serum

Inspiration: people with accurate meta-knowledge (knowledge of how much other people know) are often more accurate So, when asking for the estimate, also ask for each person’s predicted empirical distribution of answers Then, pick the answer that is more popular than people predict

47

[Prelec, Seung, and McCoy ’04]

slide-48
SLIDE 48

Bayesian Truth Serum

“When will HBO have its next hit show?” 1 year / 5 years / 10 years “What percentage of people do you think will answer each

  • ption?”

1 year / 5 years / 10 years An answer that 10% of people give but is predicted to be only 5% receives a high score

48

[Prelec, Seung, and McCoy ’04]

slide-49
SLIDE 49

Bayesian Truth Serum

[Prelec, Seung, and McCoy Nature ’04]

Calculate the population endorsement frequencies ¯ xk for each option k and the geometric average

  • f the predicted frequencies ¯

yk

log ¯ xk ¯ yk

Evaluate each answer according to its information score: And reward people with accurate prediction frequency reports

slide-50
SLIDE 50

Judging quality explicitly

Gold standard judgments [Le et al. ’10]

Include questions with known answers Performance on these “gold standard” questions is used to filter submissions

Gated instruction [Liu et al. 2016]

Create a training phase where you know all the answers already, and give feedback on every right or wrong answer during training At the end of training, only let people go on if they have a high enough accuracy

50

slide-51
SLIDE 51

Person- vs. process-centric

[Mitra, Hutto and Gilbert, CHI ’15]

Person-centric methods: find and filter for high performers

Essentially, build up a private reputation measurement e.g., gold standard questions, qualification tests

Process-centric methods: take all comers and use algorithms

e.g., financial incentives, Get Another Label, Bayesian Truth Serum

Result: person-based strategies are most effective

51

slide-52
SLIDE 52

Michael’s take

There are two primary causes of quality challenges:

Strategic dishonesty, where the contributor is explicitly seeking to get away with something Mental model misalignment, where the requester has not clearly communicated their goal

My experience is that strategic dishonesty is rare and can be caught, whereas mental model misalignment is ubiquitous

(But most of the field’s focus is on strategic dishonesty)

52

slide-53
SLIDE 53

Summary

Crowdsourcing: an open call to a large group of people who self- select to participate Crowds can be surprisingly intelligent, if opinions are levied with some expertise and without communication, then aggregated intelligently. Design differently for intrinsically and extrinsically motivated crowds Quality issues are best handled up front by identifying the strong contributors and gating them through

53

slide-54
SLIDE 54

Assignment 3: Let’s Crowdsource A Midterm

Goal: gain experience with crowdsourcing workflows, and their double-edged

  • nature. We will be constructing our own midterm!

Part I (suggested by Friday): brainstorm midterm questions Part II (due next Monday): remix others’ questions Part III (due next Wednesday): vote Part IV: (due two weeks from today): reflections

Top ~10% of questions by vote will form a public question bank of possible questions for the midterm. You get full credit if a question you contributed is on the midterm. Staff will add some questions not in the question bank as well.

54

slide-55
SLIDE 55

Creative Commons images thanks to Kamau Akabueze, Eric Parker, Chris Goldberg, Dick Vos, Wikimedia, MaxPixel.net, Mescon, and Andrew Taylor. Slide content shareable under a Creative Commons Attribution- NonCommercial 4.0 International License.

55

Social Computing

CS 278 | Stanford University | Michael Bernstein