Aspect-Oriented Opinion Mining from User Reviews in Croatian Goran - - PowerPoint PPT Presentation

aspect oriented opinion mining from user reviews in
SMART_READER_LITE
LIVE PREVIEW

Aspect-Oriented Opinion Mining from User Reviews in Croatian Goran - - PowerPoint PPT Presentation

University of Zagreb, Faculty of Electrical Engineering and Computing Ru der Bokovi c Institute Text Analysis and Knowledge Engineering Lab Aspect-Oriented Opinion Mining from User Reviews in Croatian Goran Glava, Damir Koren ci


slide-1
SLIDE 1

University of Zagreb, Faculty of Electrical Engineering and Computing Ru ¯ der Boškovi´ c Institute Text Analysis and Knowledge Engineering Lab

Aspect-Oriented Opinion Mining from User Reviews in Croatian

Goran Glavaš, Damir Korenˇ ci´ c, Jan Šnajder

August 8th, 2013 Balto-Slavic Natural Language Processing 2013

slide-2
SLIDE 2

Introduction

User review

Really laudable! Food was delivered 15 minutes early. We ordered pizza which was filled with extras, well-baked, and very tasteful.

Rating: 6/6

Aspect-oriented opinion mining Construction of opinion lexicon

product aspects

  • pinion clues

Extraction of opinionated aspects Prediction of overall review opinion

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 2/15

slide-3
SLIDE 3

Introduction

User review

Really laudable! Food was delivered 15 minutes early. We ordered pizza which was filled with extras, well-baked, and very tasteful. Lexicon

aspects: food, deliver, pizza clues: laudable, early, filled, well-baked, tasteful

Opinionated aspects

(deliver, early) (pizza, filled), (pizza, well-baked), (pizza, tasteful)

Review opinion

positive 6/6

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 3/15

slide-4
SLIDE 4

Preprocessing

Spell checking with GNU Aspell Lemmatization [Šnajder et al., 2008] POS tagging [Agi´ c et al., 2008] Dependancy parsing [Agi´ c, 2012]

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 4/15

slide-5
SLIDE 5

Opinion lexicon

Candidates for positive/negative clues are lemmas that appear much more frequently in positive/negative reviews Aspect candidates are lemmas that frequently co-occur with opinion clues Manual filtering of the initial lists of candidates

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 5/15

slide-6
SLIDE 6

Opinionated aspects

Pairing of aspects with the opinion clues that target them Polarity of the (aspect, clue) pair can be inverted

the pizza is never cold cold pizza vs. cold ice-cream

Generate all the (aspect, clue) candidate pairs within a sentence Supervised classification of candidates into paired or not paired classes

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 6/15

slide-7
SLIDE 7

Opinionated aspects

Basic features

distance, sentence length, number of aspects and clues punctuation, other aspects and clues in between, order

Lexical features

lemmas of aspect and clue, bag of lemmas in between conjunction of aspect or clue with another aspect or clue

Part-of-speech features

POS tags, tags in between, before and after the pair agreement of gender and number

Syntactic dependency features

relation labels along the path from the aspect to the clue is the aspect syntactically the closest to the clue ? is the clue syntactically the closest to the aspect ?

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 7/15

slide-8
SLIDE 8

Opinionated aspects

Reviews crawled from pauza.hr Trained on 200 sentences, 1406 aspect-clue pairs Tested on 70 sentences, 308 aspect-clue pairs libSVM [Chang & Lin, 2011] for classification Baseline assigns to each aspect the closest opinion clue within the sentence

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 8/15

slide-9
SLIDE 9

Opinionated aspects

Results

Model Precision Recall F1 Baseline 31.8 71.0 43.9 Basic 77.2 76.1 76.6 Basic+Lex 78.1 82.6 80.3 Basic+Lex+POS 80.9 79.7 80.3 Basic+Lex+POS+Syntax 84.1 80.4 82.2

models with linguistic features outperform Basic model no significant difference between linguistic feature sets

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 9/15

slide-10
SLIDE 10

Overall opinion prediction

Review polarity prediction – binary classification Review rating prediction – regression Features

tf-idf weighted bag-of-word representation of the review number of tokens in the review number of positive and negative emoticons number and the lemmas of positive and negative clues number and lemmas of positively and negatively

  • pinionated aspects

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 10/15

slide-11
SLIDE 11

Overall opinion prediction

3310 reviews, 100K tokens For polarity prediction we consider ratings ≤ 2.5 as negative and ≥ 4 as positive libSVM [Chang & Lin, 2011] for classification and regression Baseline – bag-of-words model

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 11/15

slide-12
SLIDE 12

Opinionated aspects

Results

Review polarity Review rating Model Pos F1 Neg F1 Avg F1 r MAE BoW 94.1 79.1 86.6 0.74 0.94 BoW+E 94.4 80.3 87.4 0.75 0.91 BoW+E+A 95.7 85.2 90.5 0.80 0.82 BoW+E+C 95.7 85.6 90.7 0.81 0.79 BoW+E+A+C 96.0 86.2 91.1 0.83 0.76 E – emoticons; A – opinionated aspects; C – opinion clues

aspect and clue features outperform the BoW baseline no significant difference between aspect and clue features

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 12/15

slide-13
SLIDE 13

Conclusion

We presented a method for aspect-oriented opinion mining from domain-specific user reviews in Croatian Supervised model with linguistic features is effective for assigning opinions to the product aspects Opinion clues and opinionated aspects improve prediction

  • f overall review polarity and rating

Future work:

Evaluation of the method on other domains Aspect-based opinion summarization

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 13/15

slide-14
SLIDE 14

Thanks for your attention!

Text Analysis and Knowledge Engineering Lab

www.takelab.hr

UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 14/15

slide-15
SLIDE 15

References

Agi´ c, Ž. (2012). K-best spanning tree dependency parsing with verb valency lexicon reranking. In Proceedings of 24th international Conference on Computational Linguistics (COLING 2012): Posters (pp. 1–12). Agi´ c, Ž., Tadi´ c, M., & Dovedan, Z. (2008). Improving part-of-speech tagging accuracy for Croatian by morphological analysis. Informatica, 32(4), 445–451. Chang, C.-C. & Lin, C.-J. (2011). Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. Šnajder, J., Baši´ c, B., & Tadi´ c, M. (2008). Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing & Management, 44(5), 1720–1731. UNIZG, FER, TakeLab | BSNLP ACL 2013 | August 8th, 2013 15/15