Mining Domain-Specific Dictionaries Konstantinos Pantelis Ioannis - - PowerPoint PPT Presentation

mining domain specific dictionaries
SMART_READER_LITE
LIVE PREVIEW

Mining Domain-Specific Dictionaries Konstantinos Pantelis Ioannis - - PowerPoint PPT Presentation

Mining Domain-Specific Dictionaries Konstantinos Pantelis Ioannis Katakis Fotios Kokkoras Ntonas Agathangelou Technological University of International Open University Educational Institute Athens Hellenic University of Thessaly of


slide-1
SLIDE 1

Mining Domain-Specific Dictionaries

Pantelis Agathangelou Open University

  • f Cyprus

Fotios Kokkoras Technological Educational Institute

  • f Thessaly

Ioannis Katakis University of Athens

Konstantinos Ntonas International Hellenic University

15th International Conference on Web Information System Engineering (WISE 2014)

slide-2
SLIDE 2

Summary

Pantelis Agathangelou, Mining Domain Specific Dictionaries 2

The Opinion Mining Problem Introduction Proposed Method Experimental Evaluation Interface

?

slide-3
SLIDE 3

The Opinion Mining Problem

Pantelis Agathangelou, Mining Domain Specific Dictionaries

  • Social Networks
  • Blogs
  • Discussion Boards

Collected Opinions

  • 1. Positive 2.Positive 3. Negative 4. Positive

Classify Opinions Domain Lexicon Analyze Opinions

Classification Data Analysis Summarized Results

In Dictionary Based Solutions

3

slide-4
SLIDE 4

DOMAIN SPECIFIC OPINION LEXICON

LEXICON ATTRIBUTES

  • List of terms of known polarity

(Positive or Negative)

  • Strength or Sentiment Tension

SAMPLE

Positive Sentiment Tension Beautiful 10 Astonishing 4 Cool

  • 4

Negative Tension Slow

  • 6

Ugly

  • 3

Low +3

Pantelis Agathangelou, Mining Domain Specific Dictionaries

Differences In Comparison To Generic Lexicons

4

slide-5
SLIDE 5

INTRODUCTION

 What we do in this paper ?

We mine a domain specific dictionary We implement a multiple stage approach We utilize language patterns for the extraction process

 What is the innovation ?

  • The designed algorithm can
  • perate with a small initial

seed list

  • The method is unsupervised
  • It can operate in multiple

languages, provided the appropriate patterns

  • Produces fast and accurate

results.

Pantelis Agathangelou, Mining Domain Specific Dictionaries

5

slide-6
SLIDE 6

Opinion Preprocessing Auxiliary List Preparation (Modules) Seed Import and Filtered Seed Extraction Conjunction Based Extraction Double Propagation & Opinion Word Validation

Proposed Method

Pantelis Agathangelou, Mining Domain Specific Dictionaries 6

slide-7
SLIDE 7

OPINION PREPROCESSING

  • Receives user opinions in raw

form.

  • Implement some form of

preprocessing

  • Sentence splitting – delimitation
  • Additionally – Stemmer Engine

Pantelis Agathangelou, Mining Domain Specific Dictionaries

7

Sentence Splitters

slide-8
SLIDE 8

AUXILIARY LIST PREPARATION - MODULES

Articles Basic Verbs Comparatives Decreasers the be cheaper little a bend finer clearer an chose newer slower

  • ne

throw stronger poorer

Pantelis Agathangelou, Mining Domain Specific Dictionaries 8

Future Words Increasers Negations Pronouns will better none my to hotter no they let harder any everybody if darker anyone her

Sum: 380 word constants

slide-9
SLIDE 9

SEED IMPORT AND FILTERED SEED EXTRACTION

FILTER SEED EXTRACTION PATTERNS EXTRACTION PROCESS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

9

Opinions Seed Extraction Patterns Filter Seed Lexicon Seed + Modules Positive Seed Patterns Negative Seed Patterns

21 positive, 12 negative polarity patterns in total

slide-10
SLIDE 10

CONJUNCTION-BASED EXTRACTION

CONJUNCTION BASED EXTRACTION PATTERNS EXTRACTION PROCESS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

10

Opinions Conjunction Based Extraction Patterns Conjunction Based Lexicon Filter Seed + Modules Positive Conj Based Patterns Negative Conj Based Patterns

6 positive, 4 negative extraction patterns in total

slide-11
SLIDE 11

DOUBLE PROPAGATION EXTRACTION METHOD

DOUBLE PROPAGATION PATTERNS EXTRACTION PROCESS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

11

Opinions Double Propagation Extraction Patterns Double Propagation Lexicon Modules Opinion Target Extraction Opinion Target List

OPINION TARGET EXTRACTION e.g. nice phone, amazing screen Opinion Targets

Filter Seed + Conj Based Lexicon

e.g. wide and tall

6 extraction patterns in total

slide-12
SLIDE 12

DOUBLE PROPAGATION SENTIMENT EXTRACTION

STEP 1

  • Intra –Sentential Sentiment

Consistency INTRA SENTENTIAL EXAMPLE

Pantelis Agathangelou, Mining Domain Specific Dictionaries

12

STEP 2

  • Inter –Sentential Sentiment

Consistency (max depth = 3) (magnificent cool screen)[+1] Opinion word cool is extracted from

  • pinion target screen, inherits sentence

polarity [+1] INTER SENTENTIAL EXAMPLE (very good that mobile)[+1] (awesome screen)[0] (easy browsing fabulous graphics)[+2] awesome extracted from screen, inherits sentence polarity [+2] at depth 1

slide-13
SLIDE 13

OPINION WORD VALIDATION

  • We use the extracted double

propagation opinion word set and opinion target word set

  • Sentiment Threshold [Sent]:

Minimum accepted polarity

  • Frequency Threshold [Freq]:

Minimum accepted frequency co-existence of opinion word –

  • pinion target

Pantelis Agathangelou, Mining Domain Specific Dictionaries

13

Double Propagation Opinion word List Opinion Target List Opinions

> [Sent] && [Freq]

Filter Double Propagation Lexicon

slide-14
SLIDE 14

EXPERIMENTAL RESULTS

ALGORITHM FEATURES AVERAGE PRECISION – RECALL METRICS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

14

  • When Conjunction Based

Extraction Fails to discover seed words, double propagation fills in the extraction gap.

  • When Conjunction Based

Extraction value is balanced so is double propagation.

Conclusion: The above results justify the unsupervised manner of the proposed method.

slide-15
SLIDE 15

EXPERIMENTAL RESULTS

COMPARISON BETWEEN EXTRACTION STEPS

  • Filter Seed builds the base of

classification, but double propagation extents it.

  • Conjunction based has low

impact at overall classification

QUALITY OF THE EXTRACTED LEXICON BY EVALUATION CLASSIFICATION

Pantelis Agathangelou, Mining Domain Specific Dictionaries

15

slide-16
SLIDE 16

EXPERIMENTAL RESULTS

Pantelis Agathangelou, Mining Domain Specific Dictionaries 16

Evaluation Based on Sentiment Classification

Conclusion: double propagation normalizes the quality of the lexicon upwards

slide-17
SLIDE 17

INTERFACE

WELCOME SCREEN POLARITY CLASSIFICATION OPTIONS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

17

http://deixto.com/niosto/

slide-18
SLIDE 18

INTERFACE

STEMMER OPINION OPTIONS ALGORITHM EVALUATION OPTIONS

Pantelis Agathangelou, Mining Domain Specific Dictionaries

18

http://deixto.com/niosto/

slide-19
SLIDE 19

More…

Mining Domain-Specific Dictionaries

Pantelis Agathangelou, Katakis Ioannis, Fotios Kokkoras, Konstantinos Ntonas

pandelisagathangelou@gmail.com

Thank you for your attention!

Pantelis Agathangelou, Mining Domain Specific Dictionaries 19

DEiXTo - Web Extraction Tool: http://deixto.com/ NiosTo – Dictionary Extraction Tool : http://deixto.com/niosto/