An Question Recommendation System for Question Answer Community - - PowerPoint PPT Presentation

an question recommendation system for question answer
SMART_READER_LITE
LIVE PREVIEW

An Question Recommendation System for Question Answer Community - - PowerPoint PPT Presentation

CS 6501 Text Mining: An Question Recommendation System for Question Answer Community (Stackoverflow) Presenter: Haoyu Chen Haoran Hou What is Question Answering Community: Community question answering (cQA) provides a platform for


slide-1
SLIDE 1

CS 6501 Text Mining:

An Question Recommendation System for Question Answer Community (Stackoverflow)

Presenter: Haoyu Chen Haoran Hou

slide-2
SLIDE 2

What is Question Answering Community:

Community question answering (cQA) provides a platform for people with diverse background to share information and knowledge.

slide-3
SLIDE 3

People need help!

slide-4
SLIDE 4

There’s only one style of programming: stackoverflow oriented programming.

What we decided to work on:

slide-5
SLIDE 5

Exhibit A: Result Ranking doesn’t consider about the quality of answers.

slide-6
SLIDE 6

Exhibit B: Result Ranking doesn’t work well in some cases

slide-7
SLIDE 7

What we aim to do:

  • Find similar questions and list them in more reasonable order.
  • Get answers in a faster and more convenient way.
slide-8
SLIDE 8

About stackoverflow

  • No need for sentiment analysis
  • Few duplicated questions
  • Provide tags
  • Ordered Answer: Voting
  • Full data provided

New query

  • >Best existing post with most similar query
  • >Return best answer
slide-9
SLIDE 9

Our thoughts on improvement:

  • query-answer matching: After finding similar existing

queries, compute the similarity between the new query and the best answer

  • Adding tag matching along with query matching
  • Find the reasonable ‘return-best-answer’ strategy
slide-10
SLIDE 10

Question title Question content Best answer

Query: difference replace replaceall java

Only compute new query and existing query

query-answer matching

slide-11
SLIDE 11

Adding tag matching Compute the similarity between existing queries, as well as their tags new query: difference replace replaceall java

existing query: difference between string replace() and replaceall() tags:

e.g.

slide-12
SLIDE 12

Find answer: Favor vote more than acceptance More votes -> acceptance Return even if there’s no (good) answer: comments

slide-13
SLIDE 13

Let’s start from Solr

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene

  • -- The Headline on Solr Official Website
slide-14
SLIDE 14

Key Facts on Stackoverflow data

Link:http://data.stackexchange.com/help

Open -- Under CC BY-SA 3.0(ShareAlike and Attribution) API -- E.g. Search Users, Answer, Questions Updation -- every Monday Size -- 8 million questions (28G)

slide-15
SLIDE 15

Preprocessing Stackoverflow data

Select Useful features -- Tags, QuestionsID, Titles Convert it into Solr input format Result: 28G -> 1.6G

slide-16
SLIDE 16

Search Flow Chart

Indexed data

Search Java ….

slide-17
SLIDE 17

Search Flow Chart

Indexed data

Search Java ….

slide-18
SLIDE 18

Solr similarity algorithm:

document contains more query’s term the higher make scores between queries comparable

1/2

1 Normalize document with boost

slide-19
SLIDE 19

Let’s Demo Our Tools!

slide-20
SLIDE 20

Let’s Demo Our Tools!

Features:

  • Auto change detection
  • Answer overview - (More responsive than StackOverflow version)

Difference:

  • Search not just for title, but also tags.
  • Show answer with the largest votes

Testing Questions:

  • Replace
slide-21
SLIDE 21

Demo 1

slide-22
SLIDE 22

Demo 1

slide-23
SLIDE 23

Future steps

  • Distribute different weight to question title and

tags

  • Dig more information provided by comments
  • Recommend tag using MoreLikeThis feature
slide-24
SLIDE 24