[PPT] - Computational social processes Lirong Xia Fall, 2016 Example: PowerPoint Presentation

SLIDE 1

Fall, 2016

Lirong Xia

Computational social processes

SLIDE 2

2

a b a b c Turker 1 Turker 2 Turker n

…

> >

Example: Crowdsourcing

. . . . . . . . . . . . . . . . . . . . . . . . . . . . > a b > b c >

SLIDE 3

The Condorcet Jury theorem.

Given

– two alternatives {O,M}. – 0.5<p<1,

Suppose

– each agent’s preferences is generated i.i.d., such that – w/p p, the same as the ground truth – w/p 1-p, different from the ground truth

Then, as n→∞, the majority of agents’ preferences

converges in probability to the ground truth

3

The Condorcet Jury theorem

[Condorcet 1785]

Pr( | ) = Pr( | ) = p>0.5

SLIDE 4

Parametric ranking models

– Distance-based models

Mallows
Condorcet

– Random utility models

Plackett-Luce
Decision making

– MLE – Bayesian

4

Today’s schedule

SLIDE 5

A statistical model has three parts

– A parameter space: Θ – A sample space: S = Rankings(A)n

A = the set of alternatives, n=#voters
assuming votes are i.i.d.

– A set of probability distributions over S: {Prθ (s) for each s∈Rankings(A) and θ∈Θ}

5

Parametric ranking models

SLIDE 6

Condorcet’s model for two alternatives
Parameter space Θ={ , }
Sample space S = { , }n
Probability distributions, i.i.d.

6

Example

Pr( | ) = Pr( | ) = p>0.5

SLIDE 7

Fixed dispersion 𝜒 <1
Parameter space

– all full rankings over candidates

Sample space

– i.i.d. generated full rankings

Probabilities:

PrW(V) ∝ 𝜒 Kendall(V,W)

7

Mallows’ model [Mallows-1957]

SLIDE 8

Probabilities: 𝑎 = 1 + 2𝜒 + 2𝜒( + 𝜒)

8

Example: Mallows for

Kyle Stan Eric

> > Truth > > 1 𝑎 ⁄ > > > > > > > > > > 𝜒 𝑎 ⁄ 𝜒 𝑎 ⁄ 𝜒( 𝑎 ⁄ 𝜒( 𝑎 ⁄ 𝜒) 𝑎 ⁄

SLIDE 9

Fixed dispersion 𝜒 <1
Parameter space

– all binary relations over candidates

Sample space

– i.i.d. generated binary relations

Probabilities:

PrW(V) ∝ 𝜒 Kendall(V,W)

9

Condorcet’s model

[Condorcet-1785, Young-1988, ES UAI-14, APX NIPS-14]

SLIDE 10

Continuous parameters: Θ=(θ1,…, θm)

– m: number of alternatives – Each alternative is modeled by a utility distribution μi – θi: a vector that parameterizes μi

An agent’s latent utility Ui for alternative ci is generated

independently according to μi(Ui)

Agents rank alternatives according to their perceived utilities

– Pr(c2≻c1≻c3|θ1, θ2, θ3) = PrUi ∼ μi(U2>U1>U3)

10

Random utility model (RUM)

[Thurstone 27]

U1 U2 U3

θ3 θ2 θ1

SLIDE 11

Pr(Data |θ1, θ2, θ3) = ∏V∈Data Pr(V |θ1, θ2, θ3)

11

Generating a preference-profile

Parameters P1= c2≻c1≻c3 Pn= c1≻c2≻c3

…

Agent 1 Agent n θ3 θ2 θ1

SLIDE 12

μi’s are Gumbel distributions

– A.k.a. the Plackett-Luce (P-L) model [BM 60, Yellott 77]

Alternative parameterization λ1,…,λm
Pros:

– Computationally tractable

Analytical solution to the likelihood function

– The only RUM that was known to be tractable

Widely applied in Economics [McFadden 74], learning to rank [Liu 11],

and analyzing elections [GM 06,07,08,09]

Cons: may not be the best model

Pr(c1  c2  cm | λ1λm) = λ1 λ1 ++ λm × λ2 λ2 ++ λm ×× λm−1 λm−1 + λm

12

Plackett-Luce model

c1 is the top choice in { c1,…,cm } c2 is the top choice in { c2,…,cm } cm-1 is preferred to cm

SLIDE 13

13

Example

> > > > > > > > 1 10× 5 9 5 10× 1 5 4 10× 5 6 5 10× 4 5 > > 1 10× 4 9 > > 4 10× 1 6 1 Truth 4 5

SLIDE 14

μi’s are normal distributions

– Thurstone’s Case V [Thurstone 27]

Pros:

– Intuitive – Flexible

Cons: believed to be computationally intractable

– No analytical solution for the likelihood function Pr(P | Θ) is known

14

RUM with normal distributions

Pr(c1   cm |Θ) =  µm(Um)µm−1(Um−1)µ1(U1)dU1

U2 ∞

∫



Um ∞

∫

dUm−1 dUm

−∞ ∞

∫

Um: from -∞ to ∞ Um-1: from Um to ∞ … U1: from U2 to ∞

SLIDE 15

15

Model selection

Value(Normal)

Value(PL)

LL

Pred. LL

AIC BIC 44.8(15.8) 87.4(30.5)

79.6(31.6)
50.5(31.6)
Compare RUMs with Normal distributions and PL for

– log-likelihood: log Pr(D|Θ) – predictive log-likelihood: E log Pr(Dtest|Θ) – Akaike information criterion (AIC): 2k-2log Pr(D|Θ) – Bayesian information criterion (BIC): klog n-2log Pr(D|Θ)

Tested on an election dataset

– 9 alternatives, randomly chosen 50 voters Red: statistically significant with 95% confidence Project: model fitness for election data

SLIDE 16

16

Decision making

SLIDE 17

Maximum likelihood estimators (MLE)

For any profile P=(V1,…,Vn),

– The likelihood of θ is L(θ,P)=Prθ(P)=∏V∈P Prθ(V) – The MLE mechanism MLE(P)=argmaxθL(θ,P) – Decision space = Parameter space

“Ground truth” θ V1 V2 Vn …

17

Model: Mr

SLIDE 18

Given a profile P=(V1,…,Vn), and a prior

distribution 𝜌 over Θ

Step 1: calculate the posterior probability over

Θ using Bayes’ rule

– Pr(θ|P) ∝ 𝜌(θ) Prθ(P)

Step 2: make a decision based on the

posterior distribution

– Maximum a posteriori (MAP) estimation – MAP(P)=argmaxθPr(θ|P) – Technically equivalent to MLE when 𝜌 is uniform

18

Bayesian approach

SLIDE 19

19

Example

Θ={ , }
S = { , }n
Probability distributions:
Data P = {10@ + 8@ }
MLE

– L(O)=PrO(O)6 PrO(M)4 = 0.610 0.48 – L(M)=PrM(O)6 PrM(M)4 = 0.410 0.68 – L(O)>L(M), O wins

MAP: prior O:0.2, M:0.8

– Pr(O|P) ∝0.2 L(O) = 0.2 × 0.610 0.48 – Pr(M|P) ∝0.8 L(M) = 0.8 × 0.410 0.68 – Pr(M|P)> Pr(O|P), M wins

Pr( | ) = Pr( | ) = 0.6

SLIDE 20

MLE-based approach

– there is an unknown but fixed ground truth

– p = 10/14=0.714

– Pr(2heads|p=0.714) =(0.714)2=0.51>0.5 – Yes!

20

Decision making under uncertainty

Bayesian

– the ground truth is captured by a belief distribution – Compute Pr(p|Data) assuming uniform prior – Compute Pr(2heads|Data)=0.485<0 .5 – No!

Credit: Panos Ipeirotis & Roy Radner

You have a biased coin: head w/p p

– You observe 10 heads, 4 tails – Do you think the next two tosses will be two heads in a row?

SLIDE 21

Given

– statistical model: Θ, S, Prθ (s) – decision space: D – loss function: L(θ, d)∈ℝ

Make a good decision based on data

– decision function f : data⟶D – Bayesian expected lost:

ELB(data, d) = Eθ|dataL(θ,d)

– Frequentist expected lost:

ELF(θ, f ) = Edata|θL(θ,f(data))

– Evaluated w.r.t. the objective ground truth

21

Statistical decision theory

SLIDE 22

22

Top 250 movies

Ø“Complex voter weighting system”

Claimed to be accurate

Øa “true Bayesian estimate”

Claimed to be fair

SLIDE 23

Q: “This is unfair ! ”

– “That film / show has received awards, great reviews, commendations and deserves a much higher vote!”

IMDB: “…only votes cast by IMDb users are
counted. We do not delete or alter individual

votes”

23

Different Voice

IMDb Votes/Ratings Top Frequently Asked Questions http://www.imdb.com/help/show_leaf?votestopfaq

SLIDE 24

Theorem: Strict Condorcet

No Bayesian estimator satisfies strict Condorcet criterion

Theorem: Neutrality

Neutral Bayesian estimators = Bayesian estimators of “neutral” models

24

Lirong Xia

Computational social processes

…

Example: Crowdsourcing

The Condorcet Jury theorem.

converges in probability to the ground truth

The Condorcet Jury theorem

[Condorcet 1785]

Pr( | ) = Pr( | ) = p>0.5

– Distance-based models

– Random utility models

– MLE – Bayesian

Today’s schedule

– A parameter space: Θ – A sample space: S = Rankings(A)n

– A set of probability distributions over S: {Prθ (s) for each s∈Rankings(A) and θ∈Θ}

Parametric ranking models

Example

Pr( | ) = Pr( | ) = p>0.5

– all full rankings over candidates

– i.i.d. generated full rankings

PrW(V) ∝ 𝜒 Kendall(V,W)

Mallows’ model [Mallows-1957]

Example: Mallows for

– all binary relations over candidates

– i.i.d. generated binary relations

PrW(V) ∝ 𝜒 Kendall(V,W)

Condorcet’s model

[Condorcet-1785, Young-1988, ES UAI-14, APX NIPS-14]

– Pr(c2≻c1≻c3|θ1, θ2, θ3) = PrUi ∼ μi(U2>U1>U3)

Random utility model (RUM)

[Thurstone 27]

U1 U2 U3

Generating a preference-profile

…

Plackett-Luce model

Example

RUM with normal distributions

∫

∫

∫

Model selection

Decision making

Maximum likelihood estimators (MLE)

– The likelihood of θ is L(θ,P)=Prθ(P)=∏V∈P Prθ(V) – The MLE mechanism MLE(P)=argmaxθL(θ,P) – Decision space = Parameter space

distribution 𝜌 over Θ

Θ using Bayes’ rule

– Pr(θ|P) ∝ 𝜌(θ) Prθ(P)

posterior distribution

– Maximum a posteriori (MAP) estimation – MAP(P)=argmaxθPr(θ|P) – Technically equivalent to MLE when 𝜌 is uniform

Bayesian approach

Example

Pr( | ) = Pr( | ) = 0.6

– there is an unknown but fixed ground truth

– p = 10/14=0.714

– Pr(2heads|p=0.714) =(0.714)2=0.51>0.5 – Yes!

Decision making under uncertainty

– statistical model: Θ, S, Prθ (s) – decision space: D – loss function: L(θ, d)∈ℝ

– decision function f : data⟶D – Bayesian expected lost:

– Frequentist expected lost:

– Evaluated w.r.t. the objective ground truth

Statistical decision theory

Top 250 movies

Ø“Complex voter weighting system”

votes”

Different Voice

No Bayesian estimator satisfies strict Condorcet criterion

Neutral Bayesian estimators = Bayesian estimators of “neutral” models

Fairness of Bayesian estimators