Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and - - PowerPoint PPT Presentation

word embeddings for arabic sentiment analysis
SMART_READER_LITE
LIVE PREVIEW

Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and - - PowerPoint PPT Presentation

Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and L. Tao Pace University IEEE BigData 2016, Workshop (Dec. 5, 2016) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis DISCLAIMER This


slide-1
SLIDE 1

Word Embeddings for Arabic Sentiment Analysis

  • A. Aziz Altowayan and L. Tao

Pace University IEEE BigData 2016, Workshop (Dec. 5, 2016)

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-2
SLIDE 2

DISCLAIMER

This work directly apply existing word embedding techniques to Arabic text. With two main objectives in mind: How well (automatic) embedding-based features can perform in compare to manually-crafted features in Arabic text. Release the pre-trained embeddings data (and the datasets) for future use and comparison.

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-3
SLIDE 3

Problem

Polarity classification and subjectivity detection in Arabic text Simple Classification Problem

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-4
SLIDE 4

Approach

Workflow

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-5
SLIDE 5

Data

For training word embeddings, we build the following corpus:

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-6
SLIDE 6

Data

For classification, we experiment on 3 different datasets: Arabic translation of MPQA (Banea et al., 2010) Book reviews “LABR” (Aly and Atiya, 2013) Twitter (table below)

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-7
SLIDE 7

Word Vectors

CBOW1

T

  • w=1
  • c∈C

log p(w|c) Where, w = the current word and c = its context words

1word2vec parameters: window size 10, dimension 300. And p(x) is

calculated using Negative Sampling “NCE”

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-8
SLIDE 8

Vectors Evaluation

How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance.

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-9
SLIDE 9

Vectors Evaluation

How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance. Then manually verify results, e.g. sample analogy queries

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-10
SLIDE 10

Features Representation

Embedding vs. manual features

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-11
SLIDE 11

Features Representation (linguistic issues)

Spelling errors Semantic relatedness (lexicon like?)

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-12
SLIDE 12

Results and Comparison

Banea et al. (2010) and Mourad et al. (2013) compared their models on two datasets: MPQA “Subjectivity” and ArabSenti2 “Sentiment” Results compared on MPQA:

2We could not obtain ArabSenti data.

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-13
SLIDE 13

Results and Comparison

Performance can improve with more data Better accuracy with more examples (not fair comparison) Possibly, larger embeddings size will give better accuracy too. Current size: ~159K vocabulary.

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-14
SLIDE 14

Future Work and Conclusion

Future Work Vectors evaluation. Incorporate manual features into the model. Deep models instead of the conventional classifiers. Conclusion Simple3 word embeddings perform better than top hand-crafted methods

3i.e. without any modification on the training samples.

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis

slide-15
SLIDE 15

Q/A

Thank you for attention

  • A. Aziz Altowayan and L. Tao Pace University

Word Embeddings for Arabic Sentiment Analysis