CQARank:Jointly Model Topics and Expertise in Community Question - - PowerPoint PPT Presentation

cqarank jointly model topics and expertise in community
SMART_READER_LITE
LIVE PREVIEW

CQARank:Jointly Model Topics and Expertise in Community Question - - PowerPoint PPT Presentation

CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen Peking University Singapore Management University Community Question Answering


slide-1
SLIDE 1

CQARank:Jointly Model Topics and Expertise in Community Question Answering

Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen Peking University Singapore Management University

slide-2
SLIDE 2

Community Question Answering

  • Open platforms for sharing expertise
  • Large repositories of valuable knowledge

2

CIKM2013

slide-3
SLIDE 3
  • Poor expertise matching
  • Low-quality answers
  • Under-utilized archived questions
  • Fundamental question: how to model topics and

expertise in CQA sites

3

Existing CQA Mechanism Challenges

CIKM2013

slide-4
SLIDE 4

4

Motivation

Vote Tag User

  • A case study of Stack Overflow

Question Answer

CIKM2013

slide-5
SLIDE 5
  • Propose a principle approach to jointly model topics

and expertise in CQA

– No one is expert in all topical interests – Each new question should be routed to answerers interested in related topics with the right level of expertise

  • Achieve better understanding of both user topical

interest and expertise by leveraging tagging and voting information

– Tags are important user-generated category information

  • f Q&A posts

– Votes indicate a CQA community’s long term review result for a given user’s expertise under a specific topic

5

Motivation

CIKM2013

slide-6
SLIDE 6

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

6

CIKM2013

slide-7
SLIDE 7

Related Work

7

  • Link Analysis

– HITS (Jurczyk and Agichtein, CIKM07) – Expertise Rank and Z-score (Zhang et al., WWW07) – Find global experts without model of user interests

  • Latent Topical Analysis

– UQA Model ( Guo et al. CIKM08) – Fail to capture to what extent these users’ expertise match the questions with similar topical interest

  • Topic Sensitive PageRank

– TwitterRank (Weng et al. WSDM10) – Topic-sensitive probabilistic model for expert finding (Zhou et al. CIKM12)

CIKM2013

slide-8
SLIDE 8

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

8

CIKM2013

slide-9
SLIDE 9
  • Concepts

– Topical Interest – Topical Expertise – Q&A Graph

  • Our Approach

– Topic Expertise Model – CQARank to combine learning results from TEM with link analysis of Q&A graph

9

Method Overview

CIKM2013

slide-10
SLIDE 10

10

Method Overview

CIKM2013

  • CQARank Recommendation Framework
slide-11
SLIDE 11

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

11

CIKM2013

slide-12
SLIDE 12

12

Topic Expertise Model

  • 𝑉: # of users
  • 𝑂𝑣: # of posts
  • 𝑀𝑣,𝑜: # of words
  • 𝑄

𝑣,𝑜: # of tags

  • z: topic label
  • e: expertise label
  • v: a vote
  • w: a word
  • t: a tag

U

w

Lu,n Nu

z t 𝜚𝑙,𝑣

K*U

β 𝜄𝑣 α φ𝑙

K

γ 𝜔𝑙 η

U

e v

E P

u,n

𝜈𝑓 Σe 𝜈0 𝑙0 𝛽0 𝛾0

K User topical expertise distribution Topic specific word and tag distribution Expertise specific vote distribution User specific topic distribution

CIKM2013

slide-13
SLIDE 13

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

13

CIKM2013

slide-14
SLIDE 14

14

CQARank

  • CQARank combines textual content learning result of

TEM with link analysis to enforce user topical expertise learning

  • Construct Q&A Graph 𝐻 = (𝑊, 𝐹)

– 𝑊 is a set of nodes representing users – 𝐹 is a set of directed edges from the asker to the answerer

  • 𝑓 = 𝑣𝑗, 𝑣𝑘 𝑣𝑗 ∈ 𝑊, 𝑣𝑘 ∈ 𝑊
  • Weight 𝑋

𝑗𝑘 is the number of all answers answered by 𝑣𝑘

for questions of 𝑣𝑗

CIKM2013

slide-15
SLIDE 15

CQARank

  • For each topic 𝑨 , the transition probability from

asker 𝑣𝑗 to answerer 𝑣𝑘 is defined as:

  • 𝑄

𝑨 𝑗 → 𝑘 = 𝑋𝑗𝑘 ∙𝑡𝑗𝑛𝑨(𝑗→𝑘) Σ𝑙=1

𝑊

𝑋𝑗𝑙∙𝑡𝑗𝑛𝑨(𝑗→𝑙) 𝑗𝑔

𝑥𝑗,𝑛

𝑛

𝑋 ≠ 0

  • 𝑄

𝑨 𝑗 → 𝑘 = 0 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

  • 𝑡𝑗𝑛𝑨(𝑗 → 𝑘) is the similarity between 𝑣𝑗 and 𝑣𝑘

under topic 𝑨, which is defined as

  • 𝑡𝑗𝑛𝑨 𝑗 → 𝑘 = 1 − 𝜄𝑗𝑨

′ − 𝜄 𝑘𝑨 ′

  • The row-normalized transition matrix M is defined as
  • 𝐍𝑗𝑘 = 𝑄

𝑨 𝑗 → 𝑘

15

CIKM2013

slide-16
SLIDE 16

CQARank

  • Given topic 𝑨 , the CQARank saliency score of 𝑣𝑗 is

computed based on the following formula:

  • 𝐒𝑨 𝑣𝑗 = 𝜇 𝑘:𝑣𝑘→𝑣𝑗𝐒𝑨 𝑣𝑘 ∙ 𝐍𝑗𝑘 + 1 − 𝜇 ∙ 𝜄𝑣𝑗𝑨 ∙ 𝐅(𝑨, 𝑣𝑗)
  • 𝐅(𝑨, 𝑣𝑗) is the estimated expertise score of 𝑣𝑗 under topic 𝑨,

which is defined as the expectation of user topical expertise distribution learnt by TEM. 𝐅 𝑨, 𝑣𝑗 = 𝜚𝑨,𝑣𝑗,𝑓

𝑓

∙ 𝜈𝑓

  • 𝜇 ∈ 0,1 is a parameter to control the probability of

teleportation operation.

16

CIKM2013

slide-17
SLIDE 17

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

17

CIKM2013

slide-18
SLIDE 18
  • Stack Overflow Data Set

– All Q&A posts in three months (May 1𝑡𝑢 to August 1𝑡𝑢, 2009) – Training data: 8,904 questions and 96,629 answers posted by 663 users.(10,689 unique tags and 135 unique votes) – Testing data: 1,173 questions and 9,883 answers

  • Data Preprocessing

– Tokenize text and discard all code snippets – Remove stop words and HTML tags in text

  • Parameters Setting

– 𝐿 = 15, 𝐹 = 10, 𝛽 =

50 𝐿 , 𝛾 = 0.01, 𝛿 = 0.01, 𝜃 = 0.001, 𝜇 = 0.2

– Norma-Gamma parameters – 500 iterations of Gibbs Sampling

18

Experiments

CIKM2013

slide-19
SLIDE 19
  • Topic Analysis - topic tags

– Top tags provide phrase level features to distill richer topic information

19

TEM Results

CIKM2013

slide-20
SLIDE 20
  • Topic Analysis - topic words

– Top words have strong correlation with top tags under the same topic

20

TEM Results

CIKM2013

slide-21
SLIDE 21
  • Expertise Analysis

– TEM learns different user expertise levels by clustering votes using GMM component. – 10 Gaussian distributions with various means for the generation of votes in data. – The higher the mean is, the lower the precision is.

21

TEM Results

CIKM2013

slide-22
SLIDE 22
  • Task

– Given a new question 𝑟 and a set of users 𝐕, Rank users by their interests and expertise to answer question 𝑟. – Recommendation score function 𝑇 𝑣, 𝑟 = 𝑇𝑗𝑛 𝑣, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄𝑣, 𝜄𝑟)) ∙ 𝜄𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨)

𝑨

– 𝜄𝑟,𝑨 is the estimated posterior topic distribution of question 𝑟 𝜄𝑟,𝑨 ∝ 𝑞 𝑨 𝐱𝑟, 𝐮𝑟, 𝑣 = 𝑞 𝑨 𝑣 𝑞 𝐱𝑟 𝑨 𝑞 𝐮𝑟 𝑨 = 𝜄𝑣,𝑨 𝜒 𝑨, 𝑥 𝜔(𝑨, 𝑢)

𝑢:𝐮𝑟 𝑥:𝐱𝑟

22

Recommend Expert Users

CIKM2013

slide-23
SLIDE 23
  • Our method

– CQARank

  • Baselines

Link analysis method – In Degree(ID) – PageRank(PR) Probabilistic generative model – TEM(Part of our method) – UQA( Guo et al. CIKM08) Combine link analysis and topic model – Topic Sensitive PageRank(TSPR)(Zhou et al. CIKM12)

23

Recommend Expert Users

CIKM2013

slide-24
SLIDE 24
  • Evaluation Criteria

– Ground truth: User rank list by average votes for answering 𝑟 – Metrics: 𝑜𝐸𝐷𝐻 , Pearson/Kendall correlation coefficients

  • Results

24

Recommend Expert Users

CIKM2013

slide-25
SLIDE 25
  • Task

– Give a new question 𝑟 and a set of answers 𝐁, Rank all answers in 𝐁. – Recommendation score function 𝑇 𝑏, 𝑟 = 𝑇𝑗𝑛 𝑏, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄𝑏, 𝜄𝑟)) ∙ 𝜄𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨)

𝑨

  • Baselines and evaluation criteria are the same with

expert recommendation task

  • We use each answer’s vote to generate ground truth

rank list

25

Recommend Answers

CIKM2013

slide-26
SLIDE 26
  • Result

26

Recommend Answers

CIKM2013

slide-27
SLIDE 27
  • When a user asks a new question(referred as query

question), the user will often get replies of links to

  • ther similar questions
  • Crawl 1000 questions as query question set whose

similar questions exist in the training data set

  • For each query question with 𝑜 similar questions, we

randomly select another 𝑛 (m = 1000) questions from the training data set to form candidate similar questions

27

Recommend Similar Questions

CIKM2013

slide-28
SLIDE 28
  • All comparing methods rank these 𝑛 + 𝑜 candidate

similar questions according to their similarity with the query question

  • The higher the similar questions are ranked, the

better the performance of the method is.

  • Recommendation score is computed based on JS-

divergence between topic distributions of the query question and candidate similar questions

28

Recommend Similar Questions

CIKM2013

slide-29
SLIDE 29
  • Baseline

– TSPR(LDA), UQA, SimTag

  • Evaluation Criteria

– Precision@K, Average rank of similar questions, Mean reciprocal rank (MRR), Cumulative distribution of ranks (CDR)

29

Recommend Similar Questions

CIKM2013

slide-30
SLIDE 30
  • Performance in expert users recommendation of

CQARank by varying the number of expertise (𝐹) and topics (𝐿)

30

Parameter Sensitivity Analysis

CIKM2013

slide-31
SLIDE 31

Roadmap

  • Motivation
  • Related Work
  • Our Method

– Method Overview – Topic Expertise Model – CQARank

  • Experiments
  • Summery

31

CIKM2013

slide-32
SLIDE 32
  • Conclusions

– A probabilistic generative model to jointly model topics and expertise in CQA services – CQARank algorithm to combine textual content learning with link analysis – Our model is generalized and applicable for various CQA tasks

  • Future Work

– Temporal analysis of topic expertise and interests in CQA – Social influence of experts

32

Summery

CIKM2013

slide-33
SLIDE 33

Thank you

Q&A