sentiment analysis of peer review texts for scholarly
play

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke - PowerPoint PPT Presentation

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan {wangke17,wanxiaojun}@pku.edu.cn July 9, 2018 Institute of Computer Science and Technology, Peking University Beijing , China Outline 1. Introduction 2.


  1. Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan {wangke17,wanxiaojun}@pku.edu.cn July 9, 2018 Institute of Computer Science and Technology, Peking University Beijing , China

  2. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 1/29

  3. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 2/29

  4. Introduction • The boom of scholarly papers • Motivations • Help review submission system to detect the consistency of review texts and scores. • Help the chair to write a comprehensive meta-review. • Help authors to further improve their paper. Figure 1: An example of peer review text and the analysis results. 3/29

  5. Introduction • Challenges • Long length. • Mixture of non-opinionated and opinionated texts. • Mixture of pros and cons. • Contributions • We built two evaluation datasets. (ICLR-2017 and ICLR-2018) • We propose a multiple instance learning network with a novel abstract-based memory mechanism (MILAM) • Evaluation results demonstrate the efficacy of our proposed model and show the great helpfulness of using abstract as memory. 4/29

  6. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 5/29

  7. Related Work • Sentiment Classification Sentiment analysis has been widely explored in many text domains, but few studies trying to perform it in the domain of peer reviews for scholarly papers. • Multiple Instance Learning MIL can extract instance labels(sentence-level polarities) from bags (reviews in our case), but none of previous work was applied to this challenging task. • Memory Network Memory network utilizes external information for greater capacity and efficiency. • Study on Peer Reviews These tasks are related but different from the sentiment analysis task addressed in this study. 6/29

  8. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 7/29

  9. Framework • Architecture document attention Review a a a P Classification 1 2 n softmax review 1 Input Layer h h ... h ... 1 2 n ... Representation P P P V V V 1 2 n 1 2 n ... Sentence 2 Abstract-based Memory Mechanism V V V Sentence response ... ( ) i 1 2 n R Classification Classification content Sum (2) ( ) n Layer (1) R R R MLP MLP MLP Review 3 ... ... matched ( ) i ( ) i ( ) i ( ) i (1) Classification E e e e E (2) ( ) n E E attention ... 1 2 m M M M I I I max pooling 2 ... m 1 1 2 ... n Input Representation convolution Layer ... ... sentence embedding ... ... r r r a a a S S S S S S 1 2 m 1 2 n T T abstract review Figure 2: The architecture of MILAM 8/29

  10. Framework Input Representation Layer: 1 I A sentence S of length L (padded where necessary) is represented as: S = w 1 ⊕ w 2 ⊕ · · · ⊕ w L , S ∈ R L × d , (1) II The convolutional layer: f k = tanh ( W c · W k − l +1: k + b c ) , (2) f ( q ) = [ f ( q ) 1 , f ( q ) 2 , · · · , f ( q ) L − l +1 ] , (3) III A max-pooling layer: u q = max { f ( q ) } . (4) Finally, the representations of the review text { S r i } n i =1 and the abstract text { S a j } m j =1 are denoted as [ I i ] n i =1 , [ M j ] m i =1 respectively. where I i , M j ∈ R z . 9/29

  11. Framework Sentence Classification Layer: 2 I Obtain a matched attention vector E ( i ) = [ e ( i ) t ] m t =1 which indicates the weight of memories. II Calculate the response content R ( i ) ∈ R z using this matched attention vector. III Use a MLP to obtain the final representation vector of each sentence in the review text. V i = f mlp ( I i || R ( i ) ; θ mlp ) , (5) IV Use the softmax classifier to get sentence-level distribution over sentiment labels. P i = softmax ( W p · V i + b p ) , (6) Finally, we obtained new high-level representations of sentences in the review text by leveraging relevant abstract information. 10/29

  12. Framework Review Classification Layer: 3 I use separate LSTM modules to produce forward and back- ward hidden vectors: − → h i = − LSTM ( V i ) , ← − − → h i = ← − LSTM ( V i ) , h i = − − − − → h i ||← − h i (7) II The importance ( a i ) of each sentence is measured as follows: ′ exp ( h i ) ′ i = tanh ( W a · h i + b a ) , a i = (8) h ′ ∑ j exp ( h j ) III Finally, we obtain a document-level distribution over sentiment labels as the weighted sum of sentence-level distributions: P ( c ) ∑ a i P ( c ) review = , c ∈ [1 , C ] (9) i i 11/29

  13. Framework • Abstract-based Memory Mechanism Get the matched attention vector E ( i ) of memories: 1 ′ t = LSTM (ˆ h t − 1 , M t ) , (ˆ h 0 = I i , t = 1 , ..., m ) (10) e ′ exp ( e t ) e ( i ) = (11) t ′ ∑ j exp ( e j ) E ( i ) = [ e ( i ) t ] m (12) t =1 Calculate the response content R ( i ) : 2 m R ( i ) = e ( i ) ∑ t M t (13) t =1 Use R ( i ) and I i to compute the new sentence representation 3 vector V i : V i = f mlp ( I i || R ( i ) ; θ mlp ) , (14) 12/29

  14. Framework • Objective Function • Our model only needs the review ’ s sentiment label while each sentence’s sentiment label is unobserved. • The categorical cross-entropy loss: C ∑ ∑ − P ( c ) review log (¯ P ( c ) L ( θ ) = review ) (15) c =1 T review 13/29

  15. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 14/29

  16. Experiments • Evaluation Datasets • Statistics for ICLR-2017 and ICLR-2018 datasets. Data Set #Papers #Reviews #Sentences #Words ICLR-2017 490 1517 24497 9868 ICLR-2018 954 2875 58329 13503 • The score distributions: 15/29

  17. Experiments • Comparison of review sentiment classification accuracy on the 2-class task {accept(score ∈ [1, 5]), reject(score ∈ [6, 10])} 16/29

  18. Experiments • Comparison of review sentiment classification accuracy on the 3-class task {accept(score ∈ [1, 4]), borderline(score ∈ [5, 6]), reject(score ∈ [7, 10])} 17/29

  19. Experiments • Sentence-Level Classification Results. We randomly selected 20 reviews, a total of 213 sentences, and manually labeled the sentiment polarity of each sentence. Figure 3: Example opinionated sentences with predicted polarity scores extracted from a review text. 18/29

  20. Experiments • Influence of Abstract Text. Figure 4: Example sentences in a review text and its most relevant sentence in the paper abstract text. The sentence with the largest weight in the matched attention vector E ( i ) is considered most relevant. The red texts indicate similarities in the review text and the abstract text. 19/29

  21. Experiments • Influence of Abstract Text. • A simple method of using abstract texts as a contrast experiment Remove the sentences that are similar to the paper abstract ’ s sentences from the review text and use the remaining text for classification.(The threshold is set to 0.7) Figure 5: The comparison of using and not using the paper abstract via a simple method. 20/29

  22. Experiments • Influence of Borderline Reviews. Figure 6: Experimental results on different datasets with, without and only borderline reviews. 21/29

  23. Experiments • Cross-Year Experiments. Figure 7: Results of cross-year experiments. Model @ ICLR − ∗ means the model is trained on ICLR − ∗ dataset. 22/29

  24. Experiments • Cross-Domain Experiments. We further collected 87 peer reviews for submissions in the NLP conferences (CoNLL, ACL, EMNLP , etc.), including 57 positive reviews (accept) and 30 negative reviews (reject). Figure 8: Results of cross-domain experiments. ∗ means the performance improvement over the first three methods is statistically significant with p-value < 0.05 for sign-test. Model @ ICLR − ∗ means the model is trained on 23/29 ICLR − ∗ dataset.

  25. Experiments • Final Decision Prediction for Scholarly Papers. • Methods to predict the final decision of a paper based on several review scores. • Voting: { if #accept > #reject Accept Decision = (16) Reject Otherwise • Simple Average: Simply average the scores of all reviews. If the average score is larger than or equal to 0.6, then the paper is predicted as final accept, and otherwise final reject. • Confidence-based Average: | S | overall _ score = 1 1 ∑ S i ∗ (17) | S | (6 − ReviewerConfidence i ) i =1 24/29

  26. Experiments • Final Decision Prediction for Scholarly Papers. • Results of final decision prediction for scholarly papers. Figure 9: Results of final decision prediction for scholarly papers. 25/29

  27. Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 26/29

  28. Conclusion and Future Work • Contributions • We built two evaluation datasets. (ICLR-2017 and ICLR-2018) • We propose a multiple instance learning network with a novel abstract-based memory mechanism (MILAM) • Evaluation results demonstrate the efficacy of our proposed model and show the great helpfulness of using abstract as memory. • Future Work • Collect more peer reviews. • Try more sophisticated deep learning techniques. • Several other sentiment analysis tasks: Prediction of the fine-granularity scores of reviews, Automatic writing of meta-reviews, Prediction of the best papers... 27/29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend