Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor - - PowerPoint PPT Presentation

can who edits what predict edit survival
SMART_READER_LITE
LIVE PREVIEW

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor - - PowerPoint PPT Presentation

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor Kristof , Lucas Maystre, Matthias Grossglauser I nformation and N etwork Dy namics Lab (indy.ep fl .ch) August 23, 2018 KDD18 London Peer-production systems Emergence of


slide-1
SLIDE 1

Can Who-Edits-What Predict Edit Survival?

Batuhan Yardım, Victor Kristof, Lucas Maystre, Matthias Grossglauser

Information and Network Dynamics Lab (indy.epfl.ch) — August 23, 2018 — KDD18 – London

slide-2
SLIDE 2

Peer-production systems

2

Emergence of self-organizing, crowd-sourced projects online. Distributed vs. centralized production.

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Problem

6

Projects are victims of their own success: problems arise with increasing scale.

« Blah blih bluh!@!? » « Alan Turing was an English computer scientist… » Alan Turing

??? ??? ???

Predict quality of contributions. Help project maintainers in their work. Help users match their interests.

slide-7
SLIDE 7

Typical approaches

7

User reputation systems Not accurate Simple Complex Accurate Simple Accurate

42

INTERANK Highly specialized predictors General General Specialized

58 23 #words timestamp user IP

slide-8
SLIDE 8

Model: INTERANK Experiment: Wikipedia Experiment: Linux

slide-9
SLIDE 9

Model: INTERANK Experiment: Wikipedia Experiment: Linux

slide-10
SLIDE 10

INTERANK: basic variant

10

Skill of user u Difficulty of item i Bias Informally:

  • Skill quantifies ability of user to make a contribution.
  • Difficulty quantifies how « resistant » to contributions a particular item is.

Model the probability pui that an edit made by user u on item i is successful… …as a game between user u and item i (inspired by Bradley-Terry models).

pui = 1 1 + exp[ − (su − di + b)], su, di, b ∈ R

If su increases, pui increases. If di increases, pui decreases.

slide-11
SLIDE 11

INTERANK: full variant

11

Informally:

  • describes the set of skills displayed by user u.
  • describes the set of skills needed to edit item i.

Too simplistic: if user u is more skilled than user v, then pui > pvi for all items i. Need to capture the interactions between users and items.

pui = 1 1 + exp[ − (su − di + x⊺

uyi + b)],

xu, yi ∈ RD

Embedding of user u Embedding of item i Dimension of latent space

xu yi

If and are close, pui increases.

xu yi

slide-12
SLIDE 12

INTERANK: learning

12

The outcome qk {0, 1} encodes whether an edit by user u on item i survives.

−ℓ(θ; 𝒠) = ∑

(u,i,q)∈𝒠

[−q log pui − (1 − q)log(1 − pui)]

basic: log-likelihood is convex full: bilinear term breaks convexity

In practice:

  • We do not observe any convergence issues.
  • We reliably find good model parameters using Stochastic Gradient Descent.

basic: full:

θ = [s1, . . . , sN, d1, . . . , dM]

θ = [s1, . . . , sN, d1, . . . , dM, {xu1, . . . , xuD}N

u=1, {yi1, . . . , yiD}M i=1]

A dataset of K observations consists of triplets (uk, ik, qk), k =1, …, K.

𝒠

slide-13
SLIDE 13

Model: INTERANK Experiment: Wikipedia Experiment: Linux

slide-14
SLIDE 14

Wikipedia

14

Edition # users # articles # edits French 5.5M 1.9M 65M Turkish 1.4M 0.3M 8.8M

Average: User-only: [Adler & de Alfaro, 2007] GLAD: [Whitehill et al., 2009] ORES: [Halfaker & Taraborelli, 2015]: Uses over 80 content-based and system- based features. Different for Turkish and French. Competing approaches

pu = 1 1 + exp[ − (su + b)]

pui = 1 1 + exp[ − (su/di + b)]

p = 1 1 + exp[ − (su + b)]

# good edits # total edits

Reputation system INTERANK Specialized predictor Naive predictor

slide-15
SLIDE 15

Wikipedia: results

15
  • ORES has the best AUPRC and INTERANK full has the best log-likelihood.
slide-16
SLIDE 16

Wikipedia: difficulty parameter di

16

Rank Title Percentile of di

1 Ségolène Royal 99.840 % 2 Unidentified flying object 99.229 % 3 Jehovah’s Witnesses 99.709 % 4 Jesus 99.953 % 5 Sigmund Freud 97.841 % 6 September 11 attacks 99.681 % 7 Muhammad al-Durrah incident 99.806 % 8 Islamophobia 99.787 % 9 God in Christianity 99.712 % 10 Nuclear power debate 99.304 %

di

Compare:

  • Manual ranking of controversial articles

[Yasseri et al., 2014]

  • Ranking of difficulty parameter di as

learned by INTERANK

slide-17
SLIDE 17

Wikipedia: latent factors

17

TV & teen culture French municipality Tennis-related Other Justine Henin Julie Halard Virginia Wade Marcelo Melo … William Shakespeare

  • M. de Robespierre

Nelson Mandela Charlemagne …

  • Highest

Lowest

Seven Wonders of the World Harry Potter’s magic list Thomas Edison List of programs broadcasted by Star TV Cell Bursaspor 2011-12 season Mustafa Kemal Atatürk Kral Pop TV Top 20 Albert Einstein Death Eater Democracy Heroes (TV series) Isaac Newton List of programs broadcasted by TV8 Mehmed the Conqueror Karadayı Leonardo da Vinci Show TV Louis Pasteur List of episodes of Kurtlar Vadisi Pusu
  • Y = [yi]

H i g h c u l t u r e a r t i c l e s P

  • p

u l a r c u l t u r e a r t i c l e s

slide-18
SLIDE 18

Model: INTERANK Experiment: Wikipedia Experiment: Linux

slide-19
SLIDE 19

Linux

19
  • Dataset from [Jiang et al., 2013].

Developers submit patches to subsystems. A patch is accepted if it makes it into a Linux release. Specialized classifier: random forest using 21 features.

# developers # subsystems # patches % accepted 9 672 394 619 419 34.12 %

slide-20
SLIDE 20

Linux: difficulty parameter

20

Difficulty Subsystem % accepted +2.66 usr 1.9 % +1.33 include 7.8 % +1.04 lib 16.0 % +1.01 drivers/clk 34.3 % +0.87 include/trace 17.7 %

  • 0.80

arch/mn10300 45.4 %

  • 0.94

net/nfc 73.0 %

  • 0.99

drivers/ps3 44.3 %

  • 1.08

net/tipc 43.1 %

  • 1.19

drivers/addi-data 78.3 %

Core components Peripheral components « Higher number of commits leads to lower acceptance rate. » [Jiang et al., 2013]

  • Avg. number of commits in

first quartile = 687

  • Avg. number of commits in

last quartile = 833

slide-21
SLIDE 21

Conclusion

21

INTERANK provides a new point in the solution space. Yields insights into collaborative projects. Easy to implement and computationally inexpensive.

Generality Accuracy INTERANK Reputation systems Specialized predictors

slide-22
SLIDE 22

Can who-edits-what predict edit survival?

YES!

slide-23
SLIDE 23

Thank you!

/lca4/interank