Gestiondynamiquedesconnaissances , la vanne de linformation - - PDF document

gestion dynamique des connaissances la vanne de l
SMART_READER_LITE
LIVE PREVIEW

Gestiondynamiquedesconnaissances , la vanne de linformation - - PDF document

Gestiondynamiquedesconnaissances , la vanne de linformation Actionneur de la boucle de contrle Inertie psychologique Controverse Slection Perception Acceptabilit := Reprsentation Liste ordonne Rhtorique


slide-1
SLIDE 1

S’informer Production de Connaissances Perception Représentation Évaluer le risque Sensibilité de la stratégie Révision

Liste ordonnée des solutions retenues Rhétorique de la Logique décisionnelle Scores partiels

Définir une stratégie Évaluation Multicritère Argumenter Sélection des Connaissances discriminantes

Distances entre les solutions éligibles, l’ignorance et l ’idéal Risque à retenir la solution la plus stratégique Acceptabilité := Risque admis ∆Risque Inertie psychologique Controverse Non* consensus dRisque dt

Gestion
dynamique
des
connaissances, « la vanne de l’information »

  • Estimation du risque

pour un classement et une stratégie donnés

  • Dimensions les plus

pertinentes pour les acquisitions ultérieures d’information Actionneur de la boucle de contrôle Système interactif d’aide à la décision (Recommandation)  Boucle de contrôle (Automatisation cognitive) Signal de contrôle

Sélection 1/50 Is a voting approach accurate for opinion mining ? 2

Classic method : An overview of the main process: 1- Preprocessings and Vector space modelisation

Learning Corpus Complete Index Reduced Index Index reduction Text
Vectors Text
Vectors Text Vectors (Test and learning sets) Reduced
Vectors Reduced
Vectors Reduced Vectors

slide-2
SLIDE 2

Is a voting approach accurate for opinion mining ? 3

Reduced Vectors Reduced Vectors

Classic method : An overview of the main process: 2- Modelisation and Classification

(Training Corpus) (Test Corpus) Assigned Class for each Vector Reduced Vectors Reduced Vectors Reduced Vectors Reduced Vectors Reduced Vectors Reduced Vectors Classification Model

4/50

Extraction des CAs Cartographie Evaluation d’intention des CAs Attribution d’un score

Classic method : 2 étapes de classification

Deux phases principales :

 Extraction de jugements de valeur et attribution à un critère d’évaluation  Affectation d’un score au jugement de valeur

Extraction
automatisée
de
CA pour
l’évaluation
multicritère

5 – Extraction automatisée de CAs

slide-3
SLIDE 3

5

Web opinion mining: How to extract

  • pinions from blogs?

Ali Harb, Michel Plantié, Gérard Dray, Mathieu Roche, François Trousset, Pascal Poncelet

(LGI2P/EMA – LIRMM) Nîmes – France

6

Outline

 Introduction  State of the art  « AMOD » method  Results on movie domain  Test on another domain  Conclusion and future work

slide-4
SLIDE 4

7

Introduction

Opinion detection on the Web

 New techniques to express opinions are more

and more easy to use!

 We always have an opinion on anything!!  Analyse expressed opinions:

 What about my public image?  I want to buy a new camera!  It is raining .... What about viewing Indiana Jones movie ?

8

Introduction

Blogs phenomenon importance

+ 100 millions of blogs 120.000 blogs created every day 35% of net surfers rely on opinions posted on blogs. 44% of net surfers have stopped a purchase when seeing a negative opinion on a blog 91% think that the web has a “great or medium importance” in making up its

  • wn opinion regarding a company image.

Sources : Médiamétrie, EIAA, Forrester, Technorati (août 2007), OpinionWay 2006.

slide-5
SLIDE 5

9

Introduction: One example of blog

10

Aggregation tools for opinions and journals

slide-6
SLIDE 6

11

Classification vs Opinion Classification

 Classification

 Classify documents according to their theme: sport, cinema, literature, …  Word Comparisons (bag of words approach)

 Goal, Football, Transfer, Blues => SPORT Class

 Opinion Classification

 Classify documents according to their general feeling (positive vs. negative)  More difficult than traditional classification approaches: how to catch a particular opinion ?

12

 Turney Algorithm (2002)

Input: opinion documents Output : classified documents (positive vs. negative)

  • 1. Morphosyntaxic analysis to identify sentences
  • 2. Semantic Orientation (SO) estimation of extracted

sentences

  • 3. Assignment of a document to a class (positive vs.

negative)

State of the art

Unsupervised opinion classification

slide-7
SLIDE 7

13

State of the art

Class assignment

 Average computation of SOs’ for a document

 > 0 : positive  < 0 : negative

 Problems :

 Negative opinion expressions are very often softer than positive ones  Adverbs may invert polarity

 Do we use the same adjectives in different domains?

 The chair is comfortable  The movie is comfortable ????

 Same adjectives may have different meaning in

different domains or contexts

 The picture quality of this camera is high (positive)  The ceilings of the building are high (neutral)

14

State of the art:

Difficulties

slide-8
SLIDE 8

Outline

 Introduction  State of the art  Automatic Mining of Opinion Dictionnaries (AMOD)

method

 Results on movie domain  Test on another domain  Conclusion and future work

15 16

Input: PMots = {good, nice, excellent, positive, fortunate, correct, superior},

NMots = {bad, nasty, poor, negative, unfortunate, wrong, inferior}, one domain

Output: New adjectives specific to one domain

  • 1. Ask a search engine
  • 2. Search for significant adjectives
  • 3. Eliminate « noisy adjecives »
  • 4. Run another time this algorithm to find new significant adjectives
slide-9
SLIDE 9

17

AMOD: Ask a search engine

 Example of request with google and the word

good "+opinion +review +cinema +good –bad -nasty - poor -negative -unfortunate -wrong -inferior"

18

 Results

AMOD: Ask a search engine

nice good poor

bad Positive words Negative words

300 docs 300 docs

7 * 300 7 * 300 4200 documents

slide-10
SLIDE 10

19

 Association rule usage  Item : adjective  Transaction : sentence– time window

The movie is amazing, good acting, a lots of great action and the popcorn was delicious

AMOD: Search for significant adjectives

WS2 WS1

20

AMOD: Eliminate « noisy » adjectives

 Rule Example

Positive Negative excellent, good → funny Bad, wrong → boring nice, good → great Bad, wrong → commercial nice →encouraging poor → current good → different bad → different Common adjective suppression

slide-11
SLIDE 11

21

AMOD: Eliminate « noisy » adjectives

 How to eliminate useless adjectives ?  … with hits  Mutual Information

 PMI(w1,w2)=log2(p(w1&w2)/p(w1)*p(w2))

 Cubic Mutual Information

 Favor frequent co-occurrences  IM3(w1,w2)= log2(nb(w1&w2)^3/nb(w1)*nb(w2))

 AcroDefIM3

 IM3 + Domain information  log2(hit((w1&w2) and C)^3/hit(w1 and C)*hit(w2 and C))

22

AMOD: Eliminate « noisy » adjectives

 Use of AcroDefIM3 measure to get rid of noisy

adjectives

Positives Negatives

excellent, good : funny (20,49) bad, wrong : boring (8,33) nice, good : great (12,50) bad, wrong : commercial (3,054) nice : encouraging (0,001) poor : current (0,0002)

slide-12
SLIDE 12

23

State of the art

Class assignment

The movie is bad (negative) The movie is not bad (rather positive) The movie is not bad , there is a lot of funny moments

1 6

24

AMOD: Class assignment

Use of averbs inverting polarity

  • 1. The movie isn’t good
  • 2. The movie isn’t amazing at all
  • 3. The movie isn’t very good
  • 4. The movie isn’t too good
  • 5. The movie isn’t so good
  • 6. The movie isn’t good enough
  • 7. The movie is neither amazing nor funny

1, 2, 7 : inversion 3, 4, 5 : + 30% 6 : -30%

slide-13
SLIDE 13

Outline

 Introduction  State of the art  « AMOD » method  Results on movie domain  Test on another domain  Conclusion and future work

25 26

Experiments on Movie domain

 Learning phase: blogsearch.google.fr  Test : Movie Review Data (positive and negative

reviews of Internet Movie Database)

 2 data sets very differents (blogs vs journalists)

Positives PL NL Seeds L. 66,9% 7 7 Negatives PL NL Seeds L. 30,49% 7 7

slide-14
SLIDE 14

27

Classification with learned adjectives

WS-S Positives LP LN 1- 1% 67,2% 7+15 7+20 WS-S Negatives LP LN

1-1% 39,2% 7+15 7+20

 WS-S: Window Size –support value  Best results with WS=1 and support=1%

28

Learned adjectives, AcroDef, reinforcement

WS-S Positives PL NL 1- 1% 75,9% 7+11 7+11 WS-S Negatives LP LN

1-1% 46,7% 7+11 7+11

Learned Adjectives and AcrodefIM3 Reinforcement (a learned word become a seed word)

WS-S Positives PL NL 1- 1% 82,6% 7+11 7+11 WS-S Negatives PL NL

1-1% 52,4% 7+11 7+11

slide-15
SLIDE 15

29

Influence of the learning set size

From 250 documents

Relation between corpus size and number of learned adjectives Size of the learning set for each seed word

Nmber of learned adjectives 30

Comparison with a classic method

Classic Positives Negatives FSCORE 60,5% 60,9% AMOD Positives Negatives FSCORE 71,73% 62,2%

 Precision=Ratio of pertinent documents found in regard to all

documents (pertinent or not) found

 Recall = Number of pertinent documents found in regard to all

document of the knowledge base or corpus

 Fscore = Precision * Recall / (Precision+Recall)

slide-16
SLIDE 16

31

Test on another domain

 Learning on automobile domain (car)  Tests : 40 documents from www.epinions.com

WS S Positive LP LN 1 1% 57,5% 7+0 7+0

WS S Positif LP LN Learned Adj 1 1% 87,5% 7+13 7+6 AcroDef 1 1% 92,5% 7+0 7+0 Reinf 1 1% 95% 7+0 7+0

32

Conclusion and future work

 AMOD approach is very encouraging

 To extract positive and negative adjectives for opinion mining tasks  Domain specific adjectives  Experiments show very good results to classify opinion texts

 Method is independant of the domain  Automatically build opinion documents training corpora  Future work:

 Enhance the classification procedure  Use this tool to built training corpora and apply other classifications algorithms

 Extract other kind of words  Extend to other classification tasks such as criteria

classification

slide-17
SLIDE 17

THANK YOU……..

33 34

References

  • R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In

VLDB’94,1994.

  • A. Andreevskaia and S. Bergler. Semantic tag extraction from wordnet glosses. 2007.

[3] K. Church and P. Hanks. Word association norms, mutual information, and lexicography. In

Computational Linguistics, volume 16, pages 22–29, 1990.

  • D. Downey, M. Broadhead, and O. Etzioni. Locating complex named entities in web text. In

Proceedings of IJCAI’07, pages 2733–2739, 2007.

  • M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD’04, ACM

SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 2004.

  • J. Kamps, M. Marx, R. J. Mokken, and M. Rijke. Using wordnet to measure semantic orientation of

  • adjectives. In Proceedings of LREC 2004, the 4th International Conference on Language Resources

and Evaluation, pages 174–181, Lisbon, Portugal, 2004.

  • G. Miller. Wordnet: A lexical database for english. In Communications of the ACM, 1995.

  • M. Plantié, M. Roche, G. Dray, and P. Poncelet. Is a voting approach accurate for opinion mining? In

Proceedings of the 10th International Conference on Data Warehousing and Knowledge Discovery (DaWaK ’08 ), Torino Italy, 2008.

  • V. Risbergen. Information retrieval, 2nd edition. In Butterworths, London, 1979.

  • M. Roche and V. Prince. AcroDef: A Quality Measure for Discriminating Expansions of Ambiguous

  • Acronyms. In Proceedings of CONTEXT, Springer-Verlag, LNCS, pages 411–424, 2007.
slide-18
SLIDE 18

35

Classification with learned adjectives

WS-S Positives LP LN 1- 1% 67,2% 7+15 7+20 1-2% 60,3% 7+8 7+13 1-3% 65,6% 7+6 7+1 2-1% 57,6% 7+13 7+35 2-2% 56,8% 7+8 7+17 2-3% 68,4% 7+4 7+4 3-1% 28,9% 7+11 7+48 3-2% 59,3% 7+4 7+22 3-3% 67,3% 7+5 7+11 WS-S Negatives LP LN

1-1% 39,2% 7+15 7+20

36

Sentences identification

 morpho-syntactic analysis on documents

 TreeTagger « On ne change pas une équipe qui gagne »

On PRO:PER

  • n

ne ADV ne change VER:PRES changer pas ADV pas une DET:ART un équipe NOM équipe qui PRO:REL qui gagne VER:PRES gagner . SENT .

slide-19
SLIDE 19

37

How to learn opinions in a specific domain ?

AMOD Method

Input: PMots = {good, nice, excellent, positive, fortunate, correct,

superior}, NMots = {bad, nasty, poor, negative, unfortunate, wrong, inferior}, one domain

Output: New adjectives specific to one domain

  • 1. Ask a search engine
  • 2. Search for significant adjectives
  • 3. Eliminate « noisy adjecives »
  • 4. Run another time this algorithm to find new significant

adjectives

38

Semantic orientation estimation (1/3)

 Use of PMI-IR (Pointwise Mutual Information and

Information Retrieval)

 PMI between 2 words, w1 and w2

PMI(w1,w2)=log2(p(w1&w2)/p(w1)*p(w2))

 p(w1&w2) : probability that w1 and w2 appear

together

 PMI :

> 0 words tend to appear together < 0 words do not tend to appear together

slide-20
SLIDE 20

39

Semantic orientation estimation (2/3)

Semantic orientation (SO) of a word

SO-PMI(word) = _ pword ∈PWordsPMI(word,pword) –

_ nword ∈NWordsPMI (word,nword)

PWords = {good, nice, excellent, positive, fortunate, correct, superior} NWords = {bad, nasty, poor, negative, unfortunate, wrong, inferior}

40

_pwords∈PWords hits(word NEAR pword) * _nwords∈NWords

hits(nword) SO-PMI(word) = --------------------------------------------------------------------------------

_pwords∈PWords hits(pword) * _nmots∈NMots hits(word NEAR nword)

ο

With search engine (altavista : operator NEAR, Google : « m1 * m2 »)

 PMI-IR : PMI evaluation by executing requests on

search engines and counting the number of hits

Semantic orientation estimation (3/3)