Constructing Effective and Efficient Topic-Specific Authority - PDF document

7/11/2014 Constructing Effective and Efficient Topic-Specific Authority Networks For Expert Finding in Social Media Reyyan Yeniterzi & Jamie Callan SoMeRA 2014 Social Media for Expert Search 2  72% of the companies use internal social media to find experts within the organization and improve collaboration  McKinsey Global Institute survey with >4200 companies  56% of the companies use social media for recruiting  SHRM 2011 survey on ‘Social Networking Websites and Staffing’ 1

7/11/2014 Expert Retrieval Background 3  Expert Finding Task  TREC Enterprise Track 2005-2008  W3C and CSIRO Collections  State-of-the-art Approaches  Profile-based Models [Balog, 2006]  Document-based Models [Balog, 2006; Macdonald, 2006]  Graph-based Models [Serdyukov, 2008]  Learning-based Models [Fang, 2010] Expert Retrieval in Social Media 4  Is writing topic-specific content enough for being considered an expert ?  One also needs to have topic-specific influence over other users  authority estimation  user authority networks  reading, commenting or voting 2

7/11/2014 Outline 5  Authority-based approaches  PageRank [Brin and Page, 1998]  Topic-Sensitive PageRank [Haveliwala, 2002]  HITS [Kleinberg, 1999]  Topic-Candidate Graphs  Experiments  Finding topic-specific expert bloggers  Conclusion PageRank (PR) [Brin and Page, 1998] 6  Graph  topic-independent  all users  all user activities over all documents 3

7/11/2014 Topic-Sensitive PageRank (TSPR) [ Haveliwala, 2002] 7  the PageRank graph  TSPR Approach  PageRank approach +  Teleportation is possible only to users that are associated with topic-relevant content Query Hyperlink-Induced Topic Search (HITS) [Kleinberg, 1999] 8  Hub: Sum of authority scores of outgoing edges  Authority: Sum of hub scores of incoming edges Hub Authority  Applied to more topic-specific authority networks  to focus the computational effort on relevant nodes 4

7/11/2014 Constructing HITS Graph 9  Step 1: Retrieve an initial list of expert candidates, which is called the root set Query Constructing HITS Graph 10  Step 2 : Expand root set into base set, which consists of users who are connected to/from users in the root set 5

7/11/2014 Constructing HITS Graph 11  Step 3 : Use all users in base set as nodes and all existing interactions among them as edges Graph Properties: Nodes & Edges 12 PageRank Graph HITS Graph 6

7/11/2014 HITS on web pages 13 ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ HITS on users 14 ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ 7

7/11/2014 HITS on users 15 ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ Topic-Candidate (TC) graphs 16 ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ 8

7/11/2014 Constructing Topic-Candidate Graph 17  Step 1: Retrieve an initial list of expert candidates, which is called the root set Query Constructing Topic-Candidate Graph 18  Step 2 : Expand root set into base set, which consists of users who are connected to/from users in root set due to topic-relevant interactions 9

7/11/2014 Comparison of Graphs 19 PageRank Graph Topic-Candidate Graph HITS Graph Experiments Finding topic-specific expert bloggers  Reading and commenting activity as authority signals  10

7/11/2014 Dataset 21  Intra-organizational blog collection from a large multinational IT firm # Posts 165,414 # Comments 783,356 # Employees >100,000 # Posters 20,354 # Commenters 42,169 # Readers 92,360  Access logs  cover 44 of the 56 months of the collection Evaluation Data 22  40 work related topics  Selected from the access logs of company search engine  Created by the company employees  Candidate Pools  Top 10 candidates retrieved from content-based approaches  Assessments – (The collection is not public)  Performed by author Yeniterzi  4-point scale  not an expert, some expertise, an expert, very expert 11

7/11/2014 Authority Networks 23 Reading Commenting Content-based Experiments 24 NDCG NDCG NDCG @1 @3 @10 Profile [Balog, 2006] .7000 .6689 .6494 Votes [MacDonald, 2006] .3667 .4090 .4140 ReciprocalRank [MacDonald, 2006] . 7083 . 7003 . 7281 CombSUM [MacDonald, 2006] .6417 .6334 .6168 CombMNZ [MacDonald, 2006] .5333 .5295 .5124 IRW [Serdyukov, 2008] .5167 .5189 .5159 12

7/11/2014 Authority-based Re-ranking 25 �� where � � � � � � 1  Parameter optimization  5-fold cross validation PageRank on Three Types of Graph 26 0.8 0.7 0.6 0.5 0.4 0.3 0.2 NDCG@1 NDCG@10 MAP (VE) MRR (VE) Content Baseline PR Graph HITS Graph TC Graph MRR (VE) improvement is statistically significant with p< 0.05 MAP (VE) improvement is statistically significant with p< 0.10 13

7/11/2014 PageRank on Three Types of Graph Ave. # unassessed candidates introduced 27 0.8 0.125 0.125 0.85 0.7 0.6 0.5 0.4 0.3 0.2 NDCG@1 NDCG@10 MAP (VE) MRR (VE) Content Baseline PR Graph HITS Graph TC Graph MRR (VE) improvement is statistically significant with p< 0.05 MAP (VE) improvement is statistically significant with p< 0.10 TSPR on Three Types of Graph 28 0.8 0.7 0.6 0.5 0.4 0.3 0.2 MRR (VE) improvement is statistically significant with p< 0.05 14

7/11/2014 HITS on Three Types of Graph 29 0.8 0.6 0.4 0.2 NDCG@1 NDCG@10 MAP (VE) MRR (VE) Content Baseline PR Graph HITS Graph TC Graph Graph Size and Running Time Analysis 30 Average Average Approximate Graph Approach Graph Running Times # Nodes # Edges (in sec) R C R C R C PR 92K 43K 1,631K 214K PR 1,203 85 HITS 57K 14K 1,480K 138K PR HITS 1,116 49 TC 7K 1K 9K 2K TC 4 1 PR 1,222 93 TSPR HITS 1,248 65 TC 2 0.4 PR 478 73 HITS HITS 344 26 TC 3 0.5 15

7/11/2014 Conclusion 31  Topic-Candidate graphs  Statistically significant improvements @ MRR (p<0.05) with PageRank and TSPR approaches  Effectiveness  4% @ NDCG@1  8% @ MAP(VE)  17% @ MRR(VE)  Efficiency  Reading: 20 min to 2 sec  Commenting: 1 min to 0.4 sec Thank you 16

Constructing Effective and Efficient Topic-Specific Authority - PDF document

7/11/2014 Constructing Effective and Efficient Topic-Specific Authority Networks For Expert Finding in Social Media Reyyan Yeniterzi & Jamie Callan SoMeRA 2014 Social Media for Expert Search 2 72% of the companies use internal social

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

Specific Aims One Page The single most important page in a grant Specific Aims Specific Aims

Identifying and Identifying and Constructing a Constructing a Dredged Material Dredged Material

Constructing Error- -Correction Codes Correction Codes Constructing Error from Scale- -Free

Constructing Inverse Probability Weights for Static Constructing Inverse Probability Weights for

Constructing a spanning tree Toni Kylml toni.kylmala@tkk.fi 1 Constructing a spanning tree

(yes, again! ) Stephan van Staden Outline The Views framework The motivation for

Constructing noncommutative topology David Kruml Masaryk University, Brno Constructing

9/15/17 Outline Topic 1.Introduc8on Topic 2. RCS for six key fuels Topic 3.

Second Year Student Meeting PhD Candidacy Exam On-topic or Off-topic Candidacy Exam? On-Topic:

The Dynamic Earth Unit Topics Topic 1: Earths Interior Topic 2: Continental Drift

Strategic Considerations for Managing a Nanotechnology Patent Portfolio Sarah Korman, Ph.D., J.D.

Googles eigenvector The secret of PageRank Adhemar Bultheel Dept. Computer Science,

PageRank Model of internet: Users click random link on a page. (byGooglefounder

GUI Applications A Standard GUI Application Animates the application, like a movie A Standard

Model Checking Concurrent Systems with Unboundedly Many Processes Using Data Logics Ahmet Kara

Parallel Solution of PageRank Problem eero.vainikko@ut.ee Teooriapevad Ruge, 26th January

Unleash Data Science Danny Bickson Co-Founder GraphLab Project History GraphLab GraphLab

Data-Intensive Distributed Computing CS 431/631 451/651 (Winter 2019) Part 8: Analyzing Graphs,

Virtual Memory Overview / Motivation Simple Approach: Overlays