generalists and specialists
play

Generalists and Specialists Using Community Embeddings to Quantify - PowerPoint PPT Presentation

Isaac Waller walleris@cs.toronto.edu Ashton Anderson ashton@cs.toronto.edu University of Toronto The Web Conference 2019 Generalists and Specialists Using Community Embeddings to Quantify Activity Diversity in Online Platforms full-stack


  1. Isaac Waller walleris@cs.toronto.edu Ashton Anderson ashton@cs.toronto.edu University of Toronto The Web Conference 2019 Generalists and Specialists Using Community Embeddings to Quantify Activity Diversity in Online Platforms

  2. full-stack developer vs. React developer family doctor vs. neurosurgeon generalist vs. specialist Generalists and specialists

  3. full-stack developer vs. React developer family doctor vs. neurosurgeon generalist vs. specialist Generalists and specialists

  4. vulture generalist koala specialist Koala photo by DAVID ILIFF. License: CC-BY-SA 3.0. Vulture photo by Charles Sharp. License: CC-BY-SA 4.0 Generalists and specialists

  5. Games MakeupAddiction medicalschool soccer math programming Cartalk chromeos Construction funny television Aquariums Reddit

  6. User 1: User 2: GS C ? Which is the specialist? C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom }

  7. User 1: User 2: Which is the specialist? C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom } GS ( C ) = ?

  8. [1] Mikolov et al. (2013) Distributed Representations of Words and Phrases and their Compositionality Word2vec 1

  9. [3] Martin (2017) community2vec: Vector representations of online communities encode semantic relationships Output: a vector for each community in the input, where communities with high [2] Kumar et al. (2018) Community Interaction and Conflict on the Web user overlap are closer to each other Word2vec for communities 2,3 Input: a ( community , user ) pair for each comment made in a community ( Games , user1 ) ( Fitness , user3 ) ( medicalschool , user2 ) ( China , user4 ) ( Science , user2 ) ( weightlifting , user3 )

  10. [3] Martin (2017) community2vec: Vector representations of online communities encode semantic relationships Output: a vector for each community in the input, where communities with high [2] Kumar et al. (2018) Community Interaction and Conflict on the Web user overlap are closer to each other Word2vec for communities 2,3 Input: a ( community , user ) pair for each comment made in a community ( Games , user1 ) ( Fitness , user3 ) ( medicalschool , user2 ) ( China , user4 ) ( Science , user2 ) ( weightlifting , user3 )

  11. A first embedding

  12. A first embedding

  13. Verb tense Male to female Word analogies

  14. Sports team to sport / city University to city Community analogies

  15. toronto AnaheimDucks brocku PolkStateCollege WinterHaven as csun LosAngeles Coyotes phoenix as LosAngeles as FLC folsom as OxfordBrookes oxford phillies philadelphia as Torontobluejays oaklandraiders oakland indianapolis nus stcatharinesON as uakron akron angelsbaseball baseball as nba LAClippers singapore Colts missoula as UMT 4,392 analogies total → → → → → → → → → → → → → → → →

  16. triathalon 72% perfect, 93% top 5 running swimming cycling Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30%

  17. triathalon 72% perfect, 93% top 5 running swimming cycling Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30%

  18. 72% perfect, 93% top 5 Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30% cycling + swimming + running = triathalon

  19. Our better embedding

  20. User 1: User 2: Back to generalists and specialists C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom } GS ( C ) = ?

  21. generalist specialist GS C C c C w c cos c GS-score

  22. generalist specialist GS-score GS ( C ) = 1 ∑ w c cos ( c , µ ) | C | c ∈ C

  23. User 1: User 2: GS-score GS ( { China , nba , Buddhism , startrek } ) = 0 . 69 24 th percentile GS ( { Fitness , powerlifting , bodybuilding , weightroom } ) = 0 . 89 72 nd percentile GS ( C ) = 1 ∑ w c cos ( c , µ ) | C | c ∈ C

  24. All comments in 2017 All commits, pull requests, forks, watches, and stars in 2017 900M comments, 11.4M distinct users 413M actions, 8.3M distinct users Top 10,000 subreddits by activity Top 40,000 repos by number of stars Sources: pushshift.io , gharchive.org Data

  25. Reddit (left) and GitHub (right) Results 75000 10000 3 5 6 11 Frequency 12 31 50000 32 5000 25000 0 0 0.6 0.8 1.0 0.6 0.8 1.0

  26. but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results 1.0 0.005 P(stay for >= 6 months) 0.8 0.002 0.004 0.6 0.4 0.001 0.003 0.2 0.000 0.0 0.0 0.6 0.2 0.8 1.0 0.4 0.6 0.6 0.8 0.8 1.0 1.0 User's GS-score

  27. but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results 1.0 P(remaining on platform) 0.90 0.8 0.8 0.85 0.6 0.7 0.80 0.4 1st quartile 0.75 2nd quartile 0.6 0.2 3rd quartile 0.70 4th quartile 0.0 0.0 20 40 0.2 60 80 0.4 20 0.6 40 0.8 60 80 1.0 Activity (# of comments)

  28. but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results

  29. On Reddit, specialists tend to be make more exceptional comments Results P(score > parent) 0.16 0.14 20 40 60 80 100 Percentile author GS-score

  30. but generalists are exposed to a more diverse set of users Results Parent-universe GS-score 1.0 0.9 0.8 0.7 0.6 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 User's GS-score

  31. Center-of-mass NN Mean average precision 0.06 Collaborative filtering Popularity 0.04 Random 0.02 0.00 20 40 60 80 100 User GS-score percentile Can GS-score predict new communities a user joins? Results

  32. Can GS-score predict new communities a user joins? Results Center-of-mass NN Mean average precision 0.06 Collaborative filtering Popularity 0.04 Random 0.02 0.00 20 40 60 80 100 User GS-score percentile

  33. Community GS-scores

  34. Community GS-scores 1st 2nd 3rd 4th quartile Community GS-score 1.00 0.80 0.75 0.8 0.50 0.75 0.25 0.7 0.70 0.00 0.0 2015 2016 0.2 2017 0.4 2018 2017-1 0.6 2017-6 0.8 2017-11 1.0 Month

  35. predictable than generalists On Reddit, specialists are more likely to generalist to specialist Specialists are significantly more engaged with the platform longer communities longer, but generalists stay Specialists stay engaged with individual Users on Reddit and GitHub range from make exceptional comments In summary 1.0 P(remaining on platform) 1.0 P(stay for >= 6 months) 0.005 0.90 0.8 0.8 0.8 0.002 0.85 0.004 0.6 0.6 0.7 0.80 0.4 0.4 1st quartile 0.001 0.75 2nd quartile 0.003 0.2 0.6 0.2 3rd quartile 0.70 4th quartile 0.000 0.0 0.0 0.0 0.6 0.2 0.8 1.0 0.4 0.6 0.6 0.8 0.8 1.0 1.0 0.0 20 40 0.2 60 80 0.4 20 0.6 40 0.8 60 80 1.0 User's GS-score Activity (# of comments) P(score > parent) Mean average precision Center-of-mass NN 0.06 Collaborative filtering 0.16 Popularity 0.04 Random 0.02 0.14 0.00 20 40 60 80 100 20 40 60 80 100 Percentile author GS-score User GS-score percentile

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend