TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES - PowerPoint PPT Presentation

TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES Michael G. Noll, Ching-Man Au Yeung, Nicholas Gibbins, Christoph Meinel, Nigel Shadbolt (SIGIR’09) Presenter: Xiang Gao (Vincent)

Introduction • Collaborative tagging – organizing and sharing • Documents relevant to a specified domain • Other users who are experts in a specified domain • Existing systems only provide a list of resources or users • Large volume of data • Spammers • SPEAR: our approach to assess the expertise • Be able to detect the different types of experts • More resistant to spammers

Outline • Background • SPEAR algorithm • Experiments and Evaluation • Conclusions and Discussions

Collaborative Tagging • Allows users to assign tags to resources • User-generated classification scheme: folksonomies • Definition of folksonomy • A folksonomy 𝐺 is a tuple 𝐺 = 𝑉, 𝑈, 𝐸, 𝑆 • 𝑉 : Users, 𝑈 : Tags, 𝐸 : Documents • 𝑆 = 𝑣, 𝑢, 𝑒 |𝑣 gives 𝑢 to 𝑒, 𝑣, 𝑢, 𝑒 ∈ 𝑉 × 𝑈 × 𝐸 • 𝑆 𝑢 = 𝑣, 𝑒 | 𝑣, 𝑢, 𝑒 ∈ 𝑆 • 𝑉 𝑢 , 𝐸 𝑢

Related Work: HITS Algorithm • J. Kleinberg. Authoritative sources in a hyperlinked envorinoment . J. ACM, 1999 • Precursor to PageRank • Algorithm • Start with each node having a hub score and authority score of 1. • Run the Authority Update Rule • Run the Hub Update Rule • Normalize the • Repeat as necessary.

Expertise and document quality • By the number of times he tags on some documents • Used by many existing systems • Quantity does not imply quality – spammers • The ability to select most relevant information • NOT enough alone to identify the experts

Discoverer vs. Follower • An expert is able to give usefulness BEFORE others do • Expert is a discoverer, rather than a follower • The earlier a user has tagged a document, the more likely that he should be an expert • The tagging time is an approximation of how sensitive he is to new information

Algorithm Design: Step 1 • Implement the idea of document quality • Mutual reinforcement • Similar to HITS

Algorithm 1 • Inputs • Number of users 𝑁 • Number of documents 𝑂 • Tagging 𝑆 𝑢 = 𝑣, 𝑢, 𝑒 • Number of iterations 𝑙 • Output • A ranked list of users 𝑀

Algorithm 1 (cont.) 𝐹 ← 1,1, … , 1 ∈ ℚ 𝑁 𝑅 ← 1,1, … , 1 ∈ ℚ 𝑂 ← 𝑏 𝑗,𝑘 = 1 if user 𝑗 tagged document 𝑘, 0 otherwise 𝐵 For 𝑗 = 1 to 𝑙 do Similar to HITS 𝑈 𝐹 ← 𝐹 × 𝐵 𝑅 ← 𝐹 × 𝐵 Normalize 𝐹 Normalize 𝑅 End for 𝑀 ← Sort users by expertise score in E Return 𝑀

Algorithm Design: Step 2 • Implement the idea of discoverers and followers • Include timing information in the tagging • 𝑆 = 𝑣, 𝑢, 𝑒, 𝑑 • Prepare the adjacent matrix in a different way ← 𝑏 𝑗,𝑘 = 1 if user 𝑗 … • 𝐵 • 𝐵 ← 𝑏 𝑗,𝑘 = #followers if user 𝑗 … • #followers = 𝑣| 𝑣 𝑗 , 𝑢, 𝑒 𝑘 , 𝑑 𝑗 ∈ 𝑆 𝑢 𝑑 𝑗 < 𝑑 + 1 Credits

Algorithm 2 • Inputs • Number of users 𝑁 • Number of documents 𝑂 • Tagging 𝑆 𝑢 = 𝑣, 𝑢, 𝑒, 𝑑 • Number of iterations 𝑙 • Output • A ranked list of users 𝑀

Algorithm 2 (cont.) 𝐹 ← 1,1, … , 1 ∈ ℚ 𝑁 𝑅 ← 1,1, … , 1 ∈ ℚ 𝑂 ← Generated adjacent matrix 𝐵 For 𝑗 = 1 to 𝑙 do 𝑈 𝐹 ← 𝐹 × 𝐵 𝑅 ← 𝐹 × 𝐵 Normalize 𝐹 Normalize 𝑅 End for 𝑀 ← Sort users by expertise score in E Return 𝑀

Algorithm Design: Step 3 • The discoverer of a popular Credit scoring function document will receive a high score • Even if he discovered the document by accident • and no other contribution • The function 𝑫 should have Credit such a convexity • 𝐷 ′ 𝑦 > 0, 𝐷 ′′ 𝑦 ≤ 0 • Here we use 𝐷 𝑦 = 𝑦 ← 𝑏 𝑗,𝑘 = #followers if … • 𝐵 ← • 𝐵 𝑏 𝑗,𝑘 = 𝐷( #followers ) if … #Followers linear convexed

Final Algorithm: SPEAR • Inputs • Number of users 𝑁 • Number of documents 𝑂 • Tagging 𝑆 𝑢 = 𝑣, 𝑢, 𝑒, 𝑑 • Number of iterations 𝑙 • Output • A ranked list of users 𝑀

Final Algorithm: SPEAR 𝐹 ← 1,1, … , 1 ∈ ℚ 𝑁 𝑅 ← 1,1, … , 1 ∈ ℚ 𝑂 ← Generated adjacent matrix, with the scoring function 𝐵 For 𝑗 = 1 to 𝑙 do 𝑈 𝐹 ← 𝐹 × 𝐵 𝑅 ← 𝐹 × 𝐵 Normalize 𝐹 Normalize 𝑅 End for 𝑀 ← Sort users by expertise score in E Return 𝑀

Experiments • Challenge: No ground truth • We never know whether someone is ACTUALLY an expert • Use simulated experts and spammers, and inject them into real world data • Compare with FREQ and HITS

Types of simulated experts • Veteran • Bookmarks significantly more documents than average user • Newcomer • Only sometimes among the first to discover • Geek • Significantly more bookmarks than a veteran • Geek > Veteran > Newcomer

Types of simulated spammers • Flooder • Tags a huge number of documents • Usually one of the last users in the timeline • Promoter • Tagging his own documents to promote their popularity • Does not care about other documents • Trojan • To mimic regular users • Sharing some traits with a so-called slow-poisoning attack.

Promoting Experts Detect the differences between the three types of experts

Demoting Spammers • Effectively demotes flooders and promoters, • More resistant to Trojans than HITS and FREQ

Conclusions and Future Work • SPEAR is • better at distinguishing various kinds of experts • More resistant to different kinds of spammers • Future work: • Better credit score functions • Consider expertise in closely related tags • Activity of users

Limitations • Validity of simulated input • Data mining bias – the input is generated according to an known conclusion • No evaluation using real data

THANKS

TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES - PowerPoint PPT Presentation

TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES Michael G. Noll, Ching-Man Au Yeung, Nicholas Gibbins, Christoph Meinel, Nigel Shadbolt (SIGIR09) Presenter: Xiang Gao (Vincent) Introduction Collaborative tagging

Aberdeen & Aberdeenshire: Telling the story Lorna Easton Adam Bates Telling the Story of

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard

Today The multiplicative weights framework. Experts framework. n experts. Every day, each offers

your sTory Telling Effective church newsletters By Rick Frennea editor of The Edgewood

LONGITUDINAL DATA AND FISCAL IMPACT TRANSPARENCY Transparency is telling all of the people all of

Good Mor Good Mor ning, Ki Or ning, Ki Or a a I am going to begin by telling you a little

PRESENTATION + REPRESENTATION Narrative (Telling a Story) Composition Color

Pat Bode Presentation Telling the Next Generation Let us begin with prayer. Dear Lord God,

Maths Measurement Maths | Year 4 | Measurement | Telling the Time 12-Hour and 24-Hour | Lesson 1

Context | Contrast, Repetition | Typography Telling Stories with Data October 30, 2017 Plan

Knowledge, Experts and Experts and Accountability in School Governing Bodies', Educational

1/22/20 MEET OUR FOOD SAFETY EXPERTS MEET OUR FOOD SAFETY EXPERTS DR. CATHERINE STROHBEHN DR.

Fire Fighting with Water Mist The Smarter Way of Fire Fighting 1 Experts in Water Mist Company

ANNUAL RESULTS Y E A R E N D E D 3 1 M A R C H 2 0 2 0 EXPERTS IN REGIONAL PROPERTY Palace

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Abbreviation Expansion in Lexical Annotation of Schema Maciej Gawinecki International Doctorate

Server-side Adoption of Certificate Transparency Carl Nykvist, Linkping University Linus

HunekeWiegand conjecture of rank one with the change of rings Naoki Taniguchi Meiji

Information Retrieval CS276: Information Retrieval and Web Search

(Near)-optimal policies for Probabilistic IPC 2018 domains Brikena C elaj Department of

Specific Video Summarization Vishal Kaushal 1 , Sandeep Subramanian 1 , Suraj Kothawade 1 , Rishabh

TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES - PowerPoint PPT Presentation

TELLING EXPERTS FROM SPAMMERS: EXPERTISE RANKING IN FOLKSONOMIES Michael G. Noll, Ching-Man Au Yeung, Nicholas Gibbins, Christoph Meinel, Nigel Shadbolt (SIGIR09) Presenter: Xiang Gao (Vincent) Introduction Collaborative tagging

Aberdeen &amp; Aberdeenshire: Telling the story Lorna Easton Adam Bates Telling the Story of

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Efficient tracking of a growing number of experts Jaouad Mourtada &amp; Odalric-ambrym Maillard

Today The multiplicative weights framework. Experts framework. n experts. Every day, each offers

your sTory Telling Effective church newsletters By Rick Frennea editor of The Edgewood

LONGITUDINAL DATA AND FISCAL IMPACT TRANSPARENCY Transparency is telling all of the people all of

Good Mor Good Mor ning, Ki Or ning, Ki Or a a I am going to begin by telling you a little

PRESENTATION + REPRESENTATION Narrative (Telling a Story) Composition Color

Pat Bode Presentation Telling the Next Generation Let us begin with prayer. Dear Lord God,

Maths Measurement Maths | Year 4 | Measurement | Telling the Time 12-Hour and 24-Hour | Lesson 1

Context | Contrast, Repetition | Typography Telling Stories with Data October 30, 2017 Plan

Knowledge, Experts and Experts and Accountability in School Governing Bodies', Educational

1/22/20 MEET OUR FOOD SAFETY EXPERTS MEET OUR FOOD SAFETY EXPERTS DR. CATHERINE STROHBEHN DR.

Fire Fighting with Water Mist The Smarter Way of Fire Fighting 1 Experts in Water Mist Company

ANNUAL RESULTS Y E A R E N D E D 3 1 M A R C H 2 0 2 0 EXPERTS IN REGIONAL PROPERTY Palace

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Abbreviation Expansion in Lexical Annotation of Schema Maciej Gawinecki International Doctorate

Server-side Adoption of Certificate Transparency Carl Nykvist, Linkping University Linus

HunekeWiegand conjecture of rank one with the change of rings Naoki Taniguchi Meiji

Information Retrieval CS276: Information Retrieval and Web Search

(Near)-optimal policies for Probabilistic IPC 2018 domains Brikena C elaj Department of

Specific Video Summarization Vishal Kaushal 1 , Sandeep Subramanian 1 , Suraj Kothawade 1 , Rishabh

Aberdeen & Aberdeenshire: Telling the story Lorna Easton Adam Bates Telling the Story of

Efficient tracking of a growing number of experts Jaouad Mourtada & Odalric-ambrym Maillard