Causal Learning in Question Quality Improvement Yichuan Li (Arizona - - PowerPoint PPT Presentation

causal learning in question quality improvement
SMART_READER_LITE
LIVE PREVIEW

Causal Learning in Question Quality Improvement Yichuan Li (Arizona - - PowerPoint PPT Presentation

Causal Learning in Question Quality Improvement Yichuan Li (Arizona State University) Ruocheng Guo (Arizona State University) Weiying Wang (Arizona State University) Huan Liu (Arizona State University) 1 Overview 1. Motivation 2.


slide-1
SLIDE 1

Causal Learning in Question Quality Improvement

Yichuan Li (Arizona State University) Ruocheng Guo (Arizona State University) Weiying Wang (Arizona State University) Huan Liu (Arizona State University)

1

slide-2
SLIDE 2

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

2

slide-3
SLIDE 3

Motivation: Community based Question Answering

  • 1. Community-based question answering (CQA) forums (Stack Overflow, Quora

and Zhihu) have attracted millions of users all over the world monthly.

  • 2. However, the low quality of questions has been widespread in these website.
  • 3. The submitted questions can be unclear, lack of background information,

and the chaos question format.

3

slide-4
SLIDE 4

Motivation: Community based Question Answering

Example of the low-quality questions The beginners will miss the clarification of the question, the program version they are using, the method they have already tried and the missing of input and output format.

4

slide-5
SLIDE 5

Motivation: Community based Question Answering

The ratio of unanswered questions has been increased in recent years in Stack Overflow. This indicates more and more low-quality questions have been posted in the website.

5

slide-6
SLIDE 6

Motivation: Community based Question Answering

Can we give suggestion to the low-quality questions before users post them online?

6

slide-7
SLIDE 7

Other QA work

  • 1. Find the best answer given the question and a corpus of answers

SemEval 2015 Task 3: Answer Selection in Community Question Answering. “ Given (i) a new question and (ii) a large collection of question-comment threads created by a user community, rank the comments/answers that are most useful for answering the new question

  • 2. Generate the answer based on the question and given text

SQuAD: finding an answer in a paragraph or a document. Given a document and question, find the target answer for the question. http://alt.qcri.org/semeval2015/task3/ https://rajpurkar.github.io/SQuAD-explorer/

7

slide-8
SLIDE 8

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

8

slide-9
SLIDE 9

What is Stack Overflow?

  • Stack Overflow is a question and

answer site for professional and enthusiast programmers. Users can easily post questions and wait for others response.

  • Users can ask question from

program to movie and language.

9

slide-10
SLIDE 10

What is Stack Overflow?

Question Title Question Body

10

slide-11
SLIDE 11

What is Stack Overflow?

Question wait for answer

11

slide-12
SLIDE 12

What is Stack Overflow?

For low quality Questions, others will revise it and leave comment for the revision

12

slide-13
SLIDE 13

What is Stack Overflow?

Comment Actual Revision

13

slide-14
SLIDE 14

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

14

slide-15
SLIDE 15

Related Work

  • 1. Binary classification: Classify the Question into

high quality and low quality then reject posting unqualified question. (No suggestion)

  • 2. Multi-class classification: Classify the questions

into the suggestion label. (No suggestion effect estimation)

  • 3. Directly make intervention on the text. (Impractical)

High Quality Answer Low Quality Answer S1 S2 S3 S4

15

slide-16
SLIDE 16

Related Work

  • 1. To solve aforementioned problems, we build a new dataset contains all the

information we need.

16

slide-17
SLIDE 17

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

17

slide-18
SLIDE 18

Process of Data Crawling

Others’ work cannot afford the question text(X), question revision suggestion(T) and the reward(Y) after taking this suggestion in the same time. And the reward(Y) is very important in evaluating the suggestion effects. The

  • ptimal suggestion should get the largest reward.

Can we build up a dataset that contains Question Text(X), Revision Suggestion(T) and reward(Y)?

18

slide-19
SLIDE 19

Process of Data Crawling

  • 1. “PostHistory” and “Posts” tables in Stack Exchange contain these information

Text: question text(X) Comment: revision suggestion(T) Answer Count: reward(Y)

19

slide-20
SLIDE 20

Process of Data Crawling

Users can use SQL to query the dataset.

20

slide-21
SLIDE 21

Process of Data Crawling

We use the keywords in revision comment as the type of suggestion to retrieve the revised question then remove questions that locate in more than

  • ne type of revision type.

PostHistory Revised Question Cleaned Question Post De-duplicate

21

slide-22
SLIDE 22

Process of Data Crawling

  • Types to revision suggestion:

○ Clarification: the askers provide additional context and clarify what they want to achieve ○ Example: the askers added an output or input format or included the expected results for their problems. ○ Attempt: the possible attempts askers have tried in the process of solving their problems. ○ Solution: the askers add content to or comment on the solution found for the questions. ○ Code: modification of the source code, only considering code additions; ○ Version: inclusion of additional details about the hardware or software used (program version, processor specification, etc); ○ Error Information: warning message and stack trace information of the problem

22

slide-23
SLIDE 23

Process of Data Crawling

The number of revision suggestions in each category

23

slide-24
SLIDE 24

Process of Data Crawling

  • Example instances of

the contributed dataset

24

slide-25
SLIDE 25

Process of Data Crawling

Dataset difference among ours and others

25

slide-26
SLIDE 26

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

26

slide-27
SLIDE 27

Experiment Setup

We want to answer two questions in our dataset :

  • 1. What is the answer count difference after taking specific suggestions?
  • 2. What is the optimal suggestions for the low-quality questions?

27

slide-28
SLIDE 28

Experiment Setup

We cannot get the for the suggestion type directly, here Y is the reward answer count, X is the question text and T is the treatment suggestion type. We want to break the connection between the question text and treatment and choose any treatment. We choose three SOTA causal inference models to solve this problem to know the causal effect after taking specific suggestion.

28

slide-29
SLIDE 29

Experiment Setup

Question 1: What is the answer count difference after taking specific suggestion ?

29

slide-30
SLIDE 30

Experiment Setup

Question 1: What is the answer count difference before and after taking specific suggestion ? We estimate the conditional average treatment effect (CATE) for each revision suggestion separately.

Our target is to learn function which enables us to approximate the CATE

mean squared error Precision in Estimation of Heterogeneous Effect (PEHE)

30

slide-31
SLIDE 31

Experiment Setup

Question 2: . What is the optimal suggestions for the low-quality question?

Choose the treatment which makes the greatest improvement in the outcome

31

slide-32
SLIDE 32

SOTA models: BART

  • 1. Bayesian Additive Regression

Trees(BART):

  • An additive error mean regression model
  • It is a sum-of-trees model
  • Each tree’s complexity is constrained by a

regularization prior

Reward Prediction

.

32

slide-33
SLIDE 33

SOTA models: CEVAE

Causal effect variational autoencoder(CEVAE)

  • This model estimates the unknown

confounders from observation data through Variational Autoencoders

33

slide-34
SLIDE 34

SOTA models: CFRnet

Counterfactual Regression Networks (CFRnet)

  • This model learns a balanced representation of the

control and treatment groups.

34

slide-35
SLIDE 35

Features in treat effect estimation models

35

slide-36
SLIDE 36

Experiment Result

  • 1. Result

Metric BART CEVAE CFRnet 0.041 0.169 0.508 0.661 1.030 1.522

From the experiment result, we found that BART get the best performance result.

36

slide-37
SLIDE 37

Experiment Result

Metric BART CEVAE CFRnet Accuracy 0.086 0.126 0.161

From this accuracy result, CFRnet achieve the highest accuracy.

37

slide-38
SLIDE 38

Overview

  • 1. Motivation
  • 2. Introduction about Stack Overflow
  • 3. Related Work in Question Quality Improvement
  • 4. Introduction about dataset
  • 5. Experiment:

a.

Models

b.

Result

  • 6. Contribution
  • 7. Future Work

38

slide-39
SLIDE 39

Contribution

  • 1. Afford new dataset to solve question quality improvement problem:

The dataset contains three main components: (1) context: text features of questions, (2) treatment: revision suggestions and (3) outcome: indicators of the quality of the revised question.

  • 2. Demonstrate the utility of this dataset on three causal inference model:

This dataset contains rich information in the revision treatment and various kinds of outcomes. Researchers can discover the treatment from the revision text and estimate the causal effect simultaneously.

39

slide-40
SLIDE 40

Future Work

  • 1. Advanced models for feature extraction and classification like Bert.

2. Generate suggestion text by the question.

40

slide-41
SLIDE 41

Reference

1.

Makoto Kato, Ryen W. White, Jaime Teevan, and Susan Dumais. Clarifications and question specificity in synchronous social q&a. ACM, April 2013

2.

Manaal Faruqui and Dipanjan Das. Identifying Well-formed Natural Language Questions. arXiv e-prints, page arXiv:1808.09419, Aug 2018

3.

Jan Trienes and Krisztian Balog. Identifying unclear questions in community question answering websites. Advances in Information Retrieval, page 276–289, 2019.

4.

Jie Yang, Claudia Hauff, Alessandro Bozzon, and Geert-Jan Houben. Asking the right question in collaborative q&a

  • systems. In Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT ’14, pages 179–189, New

York, NY, USA, 2014. ACM.

5.

Jonas Mueller, David N Reshef, George Du, and Tommi Jaakkola. Learning optimal interventions. arXiv preprint arXiv:1606.05027, 2016.

6.

Jonas Mueller, David Gifford, and Tommi Jaakkola. Sequence to better sequence: Continuous revision of combinatorial structures. In Doina Precup and Yee WhyeTeh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2536–2544, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.

41