Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke - - PowerPoint PPT Presentation

sentiment analysis of peer review texts for scholarly
SMART_READER_LITE
LIVE PREVIEW

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke - - PowerPoint PPT Presentation

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan {wangke17,wanxiaojun}@pku.edu.cn July 9, 2018 Institute of Computer Science and Technology, Peking University Beijing , China Outline 1. Introduction 2.


slide-1
SLIDE 1

Sentiment Analysis of Peer Review Texts for Scholarly Papers

Ke Wang & Xiaojun Wan {wangke17,wanxiaojun}@pku.edu.cn July 9, 2018

Institute of Computer Science and Technology, Peking University Beijing , China

slide-2
SLIDE 2

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

1/29

slide-3
SLIDE 3

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

2/29

slide-4
SLIDE 4

Introduction

  • The boom of scholarly papers
  • Motivations
  • Help review submission system to

detect the consistency of review texts and scores.

  • Help the chair to write a

comprehensive meta-review.

  • Help authors to further improve their

paper. Figure 1: An example of peer review text and the analysis results.

3/29

slide-5
SLIDE 5

Introduction

  • Challenges
  • Long length.
  • Mixture of non-opinionated and opinionated texts.
  • Mixture of pros and cons.
  • Contributions
  • We built two evaluation datasets. (ICLR-2017 and ICLR-2018)
  • We propose a multiple instance learning network with a novel

abstract-based memory mechanism (MILAM)

  • Evaluation results demonstrate the efficacy of our proposed model

and show the great helpfulness of using abstract as memory.

4/29

slide-6
SLIDE 6

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

5/29

slide-7
SLIDE 7

Related Work

  • Sentiment Classification

Sentiment analysis has been widely explored in many text domains, but few studies trying to perform it in the domain of peer reviews for scholarly papers.

  • Multiple Instance Learning

MIL can extract instance labels(sentence-level polarities) from bags (reviews in our case), but none of previous work was applied to this challenging task.

  • Memory Network

Memory network utilizes external information for greater capacity and efficiency.

  • Study on Peer Reviews

These tasks are related but different from the sentiment analysis task addressed in this study.

6/29

slide-8
SLIDE 8

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

7/29

slide-9
SLIDE 9

Framework

  • Architecture

1

Input Representation

2

Sentence Classification

3

Review Classification

...

1

I

2

I

n

I

... ... ... ...

1

M

2

M

m

M

... ... ...

MLP MLP MLP 1

V

2

V

n

V

... ... ...

1

V

2

V

n

V

2

h

n

h

1

h

... ... document attention (2)

E

( ) n

E

(2)

R

( ) n

R

(1)

E

Input Representation Layer Sentence Classification Layer n

P

1

P

2

P

review

P

abstract

T

1 a

S

2 a

S

a m

S

review

T

1 r

S

2 r

S

r n

S

matched attention response content sentence embedding convolution

...

max pooling 1

a

2

a

n

a

softmax

Review Classification Layer Abstract-based Memory Mechanism Sum ( ) i

R

(1)

R

( ) 1 i

e

( ) 2 i

e

( ) i m

e

( ) i

E

Figure 2: The architecture of MILAM

8/29

slide-10
SLIDE 10

Framework

1

Input Representation Layer:

I A sentence S of length L (padded where necessary) is represented as: S = w1 ⊕ w2 ⊕ · · · ⊕ wL, S ∈ RL×d, (1) II The convolutional layer: fk = tanh(Wc · Wk−l+1:k + bc), (2) f (q) = [f (q)

1 , f (q) 2 , · · · , f (q) L−l+1],

(3) III A max-pooling layer: uq = max{f (q)}. (4)

Finally, the representations of the review text {Sr

i}n i=1 and the

abstract text {Sa

j }m j=1 are denoted as [Ii]n i=1, [Mj]m i=1

  • respectively. where Ii, Mj ∈ Rz.

9/29

slide-11
SLIDE 11

Framework

2

Sentence Classification Layer:

I Obtain a matched attention vector E(i) = [e(i)

t ]m t=1 which indicates

the weight of memories. II Calculate the response content R(i) ∈ Rz using this matched attention vector. III Use a MLP to obtain the final representation vector of each sentence in the review text. Vi = fmlp(Ii||R(i); θmlp), (5) IV Use the softmax classifier to get sentence-level distribution over sentiment labels. Pi = softmax(Wp · Vi + bp), (6)

Finally, we obtained new high-level representations of sentences in the review text by leveraging relevant abstract information.

10/29

slide-12
SLIDE 12

Framework

3

Review Classification Layer:

I use separate LSTM modules to produce forward and back- ward hidden vectors: − → hi = − − − → LSTM(Vi), ← − hi = ← − − − LSTM(Vi), hi = − → hi ||← − hi (7) II The importance (ai) of each sentence is measured as follows: h

i = tanh(Wa · hi + ba), ai =

exp(h

i )

j exp(h

j )

(8) III Finally, we obtain a document-level distribution over sentiment labels as the weighted sum of sentence-level distributions: P(c)

review =

i

aiP(c)

i

, c ∈ [1, C] (9)

11/29

slide-13
SLIDE 13

Framework

  • Abstract-based Memory Mechanism

1

Get the matched attention vector E(i) of memories: e

t = LSTM(ˆ

ht−1, Mt), (ˆ h0 = Ii, t = 1, ..., m) (10) e(i)

t

= exp(e

t )

j exp(e

j )

(11) E(i) = [e(i)

t ]m t=1

(12)

2

Calculate the response content R(i): R(i) =

m

t=1

e(i)

t Mt

(13)

3

Use R(i) and Ii to compute the new sentence representation vector Vi: Vi = fmlp(Ii||R(i); θmlp), (14)

12/29

slide-14
SLIDE 14

Framework

  • Objective Function
  • Our model only needs the review’s sentiment label while each

sentence’s sentiment label is unobserved.

  • The categorical cross-entropy loss:

L(θ) = ∑

Treview C

c=1

−P(c)

review log(¯

P(c)

review)

(15)

13/29

slide-15
SLIDE 15

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

14/29

slide-16
SLIDE 16

Experiments

  • Evaluation Datasets
  • Statistics for ICLR-2017 and ICLR-2018 datasets.

Data Set #Papers #Reviews #Sentences #Words ICLR-2017 490 1517 24497 9868 ICLR-2018 954 2875 58329 13503

  • The score distributions:

15/29

slide-17
SLIDE 17

Experiments

  • Comparison of review sentiment classification accuracy on

the 2-class task {accept(score ∈ [1, 5]), reject(score ∈ [6, 10])}

16/29

slide-18
SLIDE 18

Experiments

  • Comparison of review sentiment classification accuracy on

the 3-class task {accept(score ∈ [1, 4]), borderline(score ∈ [5, 6]), reject(score ∈ [7, 10])}

17/29

slide-19
SLIDE 19

Experiments

  • Sentence-Level Classification Results.

We randomly selected 20 reviews, a total of 213 sentences, and manually labeled the sentiment polarity of each sentence. Figure 3: Example opinionated sentences with predicted polarity scores extracted from a review text.

18/29

slide-20
SLIDE 20

Experiments

  • Influence of Abstract Text.

Figure 4: Example sentences in a review text and its most relevant sentence in the paper abstract text. The sentence with the largest weight in the matched attention vector E(i) is considered most relevant. The red texts indicate similarities in the review text and the abstract text.

19/29

slide-21
SLIDE 21

Experiments

  • Influence of Abstract Text.
  • A simple method of using abstract texts as a contrast experiment

Remove the sentences that are similar to the paper abstract’s sentences from the review text and use the remaining text for classification.(The threshold is set to 0.7) Figure 5: The comparison of using and not using the paper abstract via a simple method.

20/29

slide-22
SLIDE 22

Experiments

  • Influence of Borderline Reviews.

Figure 6: Experimental results on different datasets with, without and only borderline reviews.

21/29

slide-23
SLIDE 23

Experiments

  • Cross-Year Experiments.

Figure 7: Results of cross-year experiments. Model@ICLR − ∗ means the model is trained on ICLR − ∗ dataset.

22/29

slide-24
SLIDE 24

Experiments

  • Cross-Domain Experiments.

We further collected 87 peer reviews for submissions in the NLP conferences (CoNLL, ACL, EMNLP , etc.), including 57 positive reviews (accept) and 30 negative reviews (reject). Figure 8: Results of cross-domain experiments.∗ means the performance improvement over the first three methods is statistically significant with p-value < 0.05 for sign-test. Model@ICLR − ∗ means the model is trained on ICLR − ∗ dataset.

23/29

slide-25
SLIDE 25

Experiments

  • Final Decision Prediction for Scholarly Papers.
  • Methods to predict the final decision of a paper based on several

review scores.

  • Voting:

Decision = { Accept if #accept > #reject Reject Otherwise (16)

  • Simple Average:

Simply average the scores of all reviews. If the average score is larger than or equal to 0.6, then the paper is predicted as final accept, and

  • therwise final reject.
  • Confidence-based Average:
  • verall_score = 1

|S|

|S|

i=1

Si ∗ 1 (6 − ReviewerConfidencei) (17)

24/29

slide-26
SLIDE 26

Experiments

  • Final Decision Prediction for Scholarly Papers.
  • Results of final decision prediction for scholarly papers.

Figure 9: Results of final decision prediction for scholarly papers.

25/29

slide-27
SLIDE 27

Outline

  • 1. Introduction
  • 2. Related Work
  • 3. Framework
  • 4. Experiments
  • 5. Conclusion and Future Work

26/29

slide-28
SLIDE 28

Conclusion and Future Work

  • Contributions
  • We built two evaluation datasets. (ICLR-2017 and ICLR-2018)
  • We propose a multiple instance learning network with a novel

abstract-based memory mechanism (MILAM)

  • Evaluation results demonstrate the efficacy of our proposed model

and show the great helpfulness of using abstract as memory.

  • Future Work
  • Collect more peer reviews.
  • Try more sophisticated deep learning techniques.
  • Several other sentiment analysis tasks:

Prediction of the fine-granularity scores of reviews, Automatic writing of meta-reviews, Prediction of the best papers...

27/29

slide-29
SLIDE 29

Acknowledgments

  • National Natural Science Foundation of China.
  • Anonymous reviewers for their helpful comments.
  • SIGIR Student Travel Grant.

28/29

slide-30
SLIDE 30

29/29