A Few Bad Votes Too Many? Towards Robust Ranking in Social Media
Jiang Bian Georgia Tech Yandong Liu Emory University Eugene Agichtein Emory University Hongyuan Zha Georgia Tech
A Few Bad Votes Too Many? Towards Robust Ranking in Social Media - - PowerPoint PPT Presentation
A Few Bad Votes Too Many? Towards Robust Ranking in Social Media Jiang Bian Georgia Tech Yandong Liu Emory University Eugene Agichtein Emory University Hongyuan Zha Georgia Tech Outline Background and Motivation Learning Ranking
Jiang Bian Georgia Tech Yandong Liu Emory University Eugene Agichtein Emory University Hongyuan Zha Georgia Tech
2
3
Online Social Media
Information need share Information User Interactions: Voting/Rating the content
4
u The quality of the content in this QA portals varies drastically [Agichtein et al. 2008] u User votes can provide crucial indicators into the quality and reliability of the content u User votes can help to improve the quality of ranking CQA content [Bian et al. 2008]
5
– Many “thumbs up” or “thumbs down” votes are generated without much thought – In some cases, users intend to game the system by promoting specific answers for fun or profit – We refer those bad or fraudulent votes as vote spam
– Yahoo! Team semi-automatically removes some of more obvious vote spam after the fact – It is not adequate
specifics of media and topic
– A robust method to train a ranking function that remains resilient to evolving vote spam attacks
6
7
User Votes
8
, , query topic response < >
9
10
Choose
threads to attack Choose number of attackers based on N(
✁,
✂2) for each
chose thread Choose one response to promote for each chosen thread thumbs up vote spam
chosen response Thumbs down vote spam
chosen response AND one thumb down vote to each others
11
12
– Promising performance – User vote information provides much contribution to the high accuracy (no vote spam)
– Apply the general vote spam model to generate vote spam into unpolluted QA data – Train the ranking function based on new polluted data – Transfer more weight to other content and community interaction features
Content features Community interaction Features relevance Quality
User Votes Preference
13
question in the Yahoo! Answers archive
ranked questions according to Yahoo! Answer ranking
14
15
16
2 2
% 10%; ( , ) (3,1 ) N N β µ σ = =
(clear testing data)
(clear testing data)
17
18
Number of answer terms
Similarity between query and qst+ans
Number of thumb up vote
Number of stars for the answerer
Number of thumbs down vote
Length ratio between query and answer
Number of resolved questions of the answerer
Similarity between query and question
Feature Name
19
21