Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why - - PowerPoint PPT Presentation

analysis in hindi
SMART_READER_LITE
LIVE PREVIEW

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why - - PowerPoint PPT Presentation

Semi-Supervised Sentiment Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling the reviews with their sentiment would provide succinct summaries to readers Helpful in business intelligence


slide-1
SLIDE 1

Semi-Supervised Sentiment Analysis in Hindi

Naman Bansal Umair Z Ahmed

slide-2
SLIDE 2

MOTIVATION

 Why Sentiment Analysis?

  • Labeling the reviews with their sentiment would provide succinct

summaries to readers

  • Helpful in business intelligence applications, recommender systems,

message filtering, …  Why semi-supervised? Problems with Supervised polarity

classification systems:

  • Typically domain-specific
  • Expensive process of annotating a large amount of data (especially for

low resource languages)

slide-3
SLIDE 3

PREVIOUS WORKS

 Dasgupta and Ng (2009) firstly mine the unambiguous reviews

using spectral techniques, and then exploit them to classify the ambiguous reviews via a novel combination of active learning, transductive learning, and ensemble learning.

 Joshi et al. (2010) created H-SWN using English SentiWordNet

and English-Hindi WordNet Linking.

 Bakliwal et al. (2012) created Hindi Subjective Lexicon and use

Hindi WordNet to assign similar polarity to synonyms and

  • pposite polarity to antonyms.
slide-4
SLIDE 4

PREVIOUS WORKS ON INDIAN LANGUAGES

 Sentiment analysis for Indian Languages has primarily been

focusing on using:

  • Machine Translation to translate the data in English to Hindi.
  • Bi-Lingual dictionary for English and Indian Languages
  • Hindi WordNet expansion to exploit synonyms and antonym polarity

 Un/Semi-supervised sentiment analysis techniques are

under-investigated in NLP

slide-5
SLIDE 5

DATASET

 IIT Bombay Movie Review Dataset

  • Open source
  • 300 Reviews (150 + 150)

 IIIT Hyderabad Product Review Dataset

  • On Request
  • 700 Reviews (350 + 350)

 Our contribution

  • Building movie review dataset from jagran.com
slide-6
SLIDE 6

DATASET

<movie sentiment=“neg” star=“2” link = http://www.jagran.com/entertainment/reviews- mickey-virus-movie-review-10821431.html> <review> चरॎचित टीवी एंकर मनीष पॉल की इस फिलॎम से बहुत उमॎमीदेः थीं। … </review> <SelectedLines> <line sentiment=“pos”> ममकी वाइरस पूरी तरह से मनीष पॉल की फिलॎम है और फिलॎम मेः उनकी इमेज क े हहसाब से ही दॄशॎय और सॎथथततयां रची गई थीं। मनीष ने अपने फकरकार को बखूबी तनभाया है। </line> <line sentiment=“neg”> फिलॎम देखने क े बाद न मसि ि उमॎमीदेः धराशायी हुई बसॎलॎक अचॎछे खासे ववषय को यूं ही जाया हो जाने का अिसोस भी हो रहा है। उनक े अमभनय मेः इंटेमसंटी तो है लेफकन फकरदार थटीररयोटाइप होते जाए तो अचॎछा अमभनेता भी बोर कर सकता है।</line> </SelectedLines> </movie>

slide-7
SLIDE 7

PRE-PROCESSING DATA

 Remove:

 Punctuations  Numbers  Words of length one  Words that occur only in a single review  Words with high document frequency, many of which are

stopwords or domain specific general-purpose words

slide-8
SLIDE 8

DATA REPRESENTATION

Each review is represented as a vector of unigrams, using binary weight equal to 1 for terms present in a vector. The dataset is represented as a Matrix where R is the number of training samples, T is the number of test samples, D is the number of feature words in the dataset.

slide-9
SLIDE 9

PROPOSED APPROACH

Deep Learning Architechture

One Input Layer h0

N hidden layers h1, h2, …, hN

One Output Layer

The input layer h0 has D units, equal to the number

  • f features of sample data x.

We intend to seek the mapping function XL YL using the L labeled data and R+T

  • L unlabeled data.

Deep Learning

slide-10
SLIDE 10

PROPOSED APPROACH

 The semi-supervised learning method based on ADN architecture can

be divided into two stages:

First, ADN architecture is constructed by greedy layer-wise unsupervised learning using RBMs as building blocks. All the unlabeled data together with L labelled data are utilized to find the parameter space W with N layers.

Second, ADN architecture is trained according to the exponential loss function using gradient descent method. The parameter space W is retrained by an exponential loss function using L labelled data.

slide-11
SLIDE 11

PROPOSED APPROACH

Energy of the state(hk-1,hk) as

The probability that the model assigns to hk-1

is: where Z(θ) denotes the normalizing constant.

slide-12
SLIDE 12

PROPOSED APPROACH

The probability of turning on unit t is a logistic function of the states of hk and wk The probability of turning on unit t is a logistic function of the states of hk-1 and wk The logistic function is:

slide-13
SLIDE 13

PROPOSED APPROACH

Optimization problem is formulized as The loss function is defined as