towards discourse annotation and sentiment analysis of
play

Towards discourse annotation and sentiment analysis of the Basque - PowerPoint PPT Presentation

Introduction and Related Works Theoretical framework and methodology Results and discussion Conclusion and Future Work References Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus Workshop on Discourse Relation


  1. Introduction and Related Works Theoretical framework and methodology Results and discussion Conclusion and Future Work References Towards discourse annotation and sentiment analysis of the Basque Opinion Corpus Workshop on Discourse Relation Parsing and Treebanking (NAACL-HLT 2019) Jon Alkorta, Koldo Gojenola & Mikel Iruskieta IXA NLP group. University of the Basque Country (UPV/EHU) Minneapolis, Minessota (USA), 6th June, 2019 1 / 28

  2. Introduction and Related Works Introduction and Related Works Conclusion and Future Work 4 Results and discussion 3 Theoretical framework and methodology 2 1 Theoretical framework and methodology Outline Related Works Introduction References Conclusion and Future Work Results and discussion 2 / 28

  3. Introduction and Related Works Theoretical framework and methodology Results and discussion Conclusion and Future Work References Introduction Related Works Introduction Aims of sentiment analysis: i) Document level sentiment classifjcation. A positive or negative evaluation [Pang et al., 2002, Turney, 2002]. ii) Subjectivity classifjcation at sentence level. A subjective or objective (factual) sentence [Wiebe et al., 1999]. iii) Aspect and entity level. Identifjcation of the target of one positive or negative opinion [Hu and Liu, 2004]. 3 / 28

  4. Introduction and Related Works Linguistic knowledge : analysis difgerent linguistic phenomena relations. Basque to check out the semantic orientation of rhetorical Annotate the rhetorical structure of an opinionated corpus in The aim of this work phenomena. Statistic analysis : extraction of patterns of difgerent linguistic related to sentiment analysis. Examples: Theoretical framework and methodology information for sentiment analysis is indispensable. Apart from basic resources, a corpus with subjective Related Works Introduction References Conclusion and Future Work Results and discussion 4 / 28

  5. Introduction and Related Works subjectivity annotation Kappa. EDUs: 0.69, 0.44 Documents: 0.73, 0.58 [Asher et al., 2009] SDRT +300 texts (movies, letters, reports) Discourse and Categorization: 95% Documents: subjectivity Segmentation: 82% [Mittal et al., 2013] - 662 reviews in Hindi Violating expectation conjunctions. Negation. Discourse + negation, the accuracy: 50.45 to 80.21. and discourse relations EDUs: subjectivity. Theoretical framework and methodology Results Results and discussion Conclusion and Future Work References Introduction Related Works Related works Author Theory Corpus Annotation [Refaee and Rieser, 2014] news reactions) - 8,868 tweets in Arabic Semantic orientation Grammatical features Kappa: 0.84 [Chardon et al., 2013] SDRT 211 texts (movie revies, 5 / 28

  6. Introduction and Related Works Introduction and Related Works Conclusion and Future Work 4 Results and discussion 3 Theoretical framework and methodology 2 1 Theoretical framework and methodology Outline Methodology Theoretical framework References Conclusion and Future Work Results and discussion 6 / 28

  7. Introduction and Related Works Theoretical framework and methodology Results and discussion Conclusion and Future Work References Theoretical framework Methodology Theoretical framework: Rhetorical Structure Theory (RST) 7 / 28

  8. Introduction and Related Works movies, literature books and weather. the corpus. Negation, irrealis blocking and discourse markers also are in Wikipedia and 9.82% in the corpus for study. 8.50% of the words correspond to adjectives in Basque Wikipedia) vs. 8.37% in the Basque Opinion Corpus. The fjrst person: 1.21% in a Basque objective corpus (Basque Usefulness for sentiment analysis: Opinion texts of six difgerent domains: sports, politics, music, Theoretical framework and methodology 240 opinion texts collected from difgerent websites. The Basque Opinion Corpus Methodology Theoretical framework References Conclusion and Future Work Results and discussion 8 / 28

  9. Introduction and Related Works 5 literature texts were annotated in one hour. Weather texts were annotated in 20 minutes while movie and [Das and Taboada, 2018]. Following the annotation guidelines proposed by 2- Annotation procedure and process. 70 39 50 Total 25 20 + 5 5 Literature 15 10 + 5 Theoretical framework and methodology 1- Set the stage for the annotating work. Results and discussion Conclusion and Future Work References Theoretical framework Methodology Methodology steps A1 Weather A2 Total Movie 21 + 9 9 30 9 / 28

  10. Introduction and Related Works Theoretical framework and methodology CONCESSION and EVALUATION relations. Extraction of the sentiment valence of 75 instances of [Taboada et al., 2011]. Use of the Basque version of the SO-CAL tool 4- Semantic orientation extraction. account. In contrast with the qualitative evaluation, the manual F-measure. The qualitative evaluation method [Iruskieta et al., 2015] using Inter-annotator agreement was measured in two ways: 3- Measurement of inter-annotator agreement. Methodology Theoretical framework References Conclusion and Future Work Results and discussion 10 / 28 evaluation did not take the central subconstituent factor into

  11. Introduction and Related Works Theoretical framework and methodology Results and discussion Conclusion and Future Work References Theoretical framework Methodology 5- Results. Percentage of rhetorical relations with the same label annotated by two persons. Accumulated values of sentiment valences in nuclei and satellites in texts of difgerent domains. 11 / 28

  12. Introduction and Related Works 1 Conclusion and Future Work 4 Results and discussion 3 Theoretical framework and methodology 2 Introduction and Related Works Outline Theoretical framework and methodology Discussion: usefulness of the corpus for sentiment analysis Discussion: inter-annotator agreement Results: subjectivity extraction from rhetorical relations Results: inter-annotator agreement References Conclusion and Future Work Results and discussion 12 / 28

  13. Introduction and Related Works Weather 170 of 427 39.81 Total 83 of 220 37.73 Movies 70 of 168 41.67 Literature 17 of 39 43.59 Agreement (RR) Theoretical framework and methodology Agreement (%) Domain Type of rhetorical relation. RST annotation: inter-annotator agreement Discussion: usefulness of the corpus for sentiment analysis Discussion: inter-annotator agreement Results: subjectivity extraction from rhetorical relations Results: inter-annotator agreement References Conclusion and Future Work Results and discussion 13 / 28

  14. Introduction and Related Works Movies 39.75 49.86 33.35 Literature 61.02 68.73 53.13 80.30 13.98 Weather 19.45 26.01 45.58 Total 114.41 (47.21 %) 127.93 (52.79 %) 128.99 (45.00%) 159.23 (55.00%) 39.41 Satellite Theoretical framework and methodology relations Results and discussion Conclusion and Future Work References Results: inter-annotator agreement Results: subjectivity extraction from rhetorical relations Discussion: inter-annotator agreement Discussion: usefulness of the corpus for sentiment analysis Sentiment analysis: sentiment valence of rhetorical We sum all the sentiment valence of words of CONCESSION Nucleus and EVALUATION rhetorical relations. The results of the sum are given based on nuclearity. Sum of sentiment valences CONCESSION EVALUATION Nucleus Satellite 14 / 28

  15. Introduction and Related Works Discussion: inter-annotator agreement EVALUATION. Theoretical framework and methodology CONCESSION. Discussion: usefulness of the corpus for sentiment analysis 15 / 28 Results: subjectivity extraction from rhetorical relations Results: inter-annotator agreement References Conclusion and Future Work Results and discussion [S[Puntu ahulak izan arren,] − 1 . 5 N[fjlm erakargarri eta berezia da Victoria.] + 6 ] + 4 . 5 (ZIN19) [S[Although it has weak points,] − 1 . 5 N[Victoria is an entertaining and special movie.] + 6 ] + 4 . 5 [N[Bada, erraz ikusten den fjlma da “The danish girl”.] + 1 S[Atsegina da, hunkigarria, entretenigarria] + 6 ] + 7 (ZIN15). [N[So, “The danish girl” is a fjlm easy to watch.] + 1 S[It is nice, touching, entertaining.] + 6 ] + 7

  16. Introduction and Related Works [Iruskieta et al., 2015] same efgect. attachment point of two or more EDUs in order to get the Relation. The same type of rhetorical relation to the regarding direction (NS, NS or NN). N-S or nuclearity Specifjcation of the compared relations relation is attached. Attachment point. The node in the RS-tree to which the unit or span. Constituent (C). All the EDUs that compose each discourse the central subconstituent is the same) following Theoretical framework and methodology Automatic evaluation in a more strict scenario (if and only if RST annotation: inter-annotator agreement Discussion: usefulness of the corpus for sentiment analysis Discussion: inter-annotator agreement Results: subjectivity extraction from rhetorical relations Results: inter-annotator agreement References Conclusion and Future Work Results and discussion 16 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend