opinio n mining
play

Opinio n Mining F e iyu XU & Xiwe n CHE NG Xiwe n.c he ng @ - PowerPoint PPT Presentation

Opinio n Mining F e iyu XU & Xiwe n CHE NG Xiwe n.c he ng @ dfki.de DF K I , Sa a rb rue c ke n, Ge rma ny Ja n 19th, 2011 2011-1-19 L a ng ua g e T e c hno lo g y I 1 Disc ussio n o n Opinio n Mining Applic a tio n T e xtma


  1. Opinio n Mining F e iyu XU & Xiwe n CHE NG Xiwe n.c he ng @ dfki.de DF K I , Sa a rb rue c ke n, Ge rma ny Ja n 19th, 2011 2011-1-19 L a ng ua g e T e c hno lo g y I 1

  2. Disc ussio n o n Opinio n Mining Applic a tio n

  3. T e xtma p: to pic mo nito ring syste m

  4. T we e tmo tif: T o pic summa riza tio n o n T witte r- e .g . wikile a k, pa re nting

  5. Wha t the tre nd: T re nd mo nito ring - e .g . wikile a k

  6. Opinio n g a the ring spe e d o n Inte rne t • WSJ pub lishe s a n a rtic le “why c hine se mo the r a re supe rio r” writte n b y Amy Chua o n 8 th , Ja n, 2011. Until 18 th , Ja n • 6,800 c o mme nts o n WSJ; K e ywo rd: Amy Chua • 3,490,000 o n Go o g le • 5,600 o n twitte r.c o m • 5,289 o n wo rdpre ss.c o m K e ywo rd: pa re nting • 83,200,000 se a rc h re sults o n Go o g le ; • 1,620,000 fro m twitte r.c o m; • 502,000 fro m wo rdpre ss.c o m 2011-1-19 L a ng ua g e T e c hno lo g y I 6

  7. A q ue stio n fro m Quo ra

  8. Pro po sa ls o f Opinio n Mining Applic a tio n a nd So lutio n?

  9. Disc ussio n o n Re so urc e fo r Mo vie Re vie w Summa riza tio n

  10. Re vie ws o n “Da s L e b e n de r Ande re n” @ imdb

  11. Re vie ws o n “Da s L e b e n de r Ande re n” @ imdb

  12. T o p 250 mo vie s vo te d b y imdb use rs

  13. Wha t re so urc e a nd whic h fe a ture s yo u wo uld like to c ho o se fo r OM ta sks?

  14. E xpe rime nt o n K o mPa rse Ma king NPCs e xpre ss the ir o pinio ns e mo tio na lly

  15. Go ssip Ga lo re in Ra sc a lli

  16. Ha nk in K o mPa rse

  17. Pa ul‘ s so lutio n • Unsupe rvise d ma c hine le a rning • Da ta : c o mme nts ra nke d b y re vie we rs (1 ~ 10 sta rs) • F e a ture s – N-Gra m T o ke n Pa tte rns – De pe nde nc y Pa tte rns • E xtra kno wle dg e – Wo rdNe t – Ne g a tio n e xpre ssio ns • L e a rning a lg o rithm – Sc o ring syste m

  18. Da ta Pro c e ssing • Re so urc e – I MDb (http:/ / www.imdb .c o m/ ), A mo vie o nline sto re ho use • I nte re ste d in I MDB pa g e s: – with na me (a c to rs, a utho rs, dire c to rs e tc .) – with title (mo vie title , mo vie re c o mme nda tio ns fro m I MDb ) • Co nta ining the info rma tio n: – Mo vie title – Re vie w – Re vie w title – Re vie w da te – Autho r na me – Autho r o rig in (o ptio na l) – Re c o mme nda tio n o f o the r use rs to this re vie w (o ptio na l) – T he sc o re the a utho r g a ve the re vie we d mo vie x/ 10 (o ptio na l)

  19. Da ta Pro c e ssing <Re c o rd na me ="Pa yc he c k (2003)" isA="Mo vie " type ="IMDb use r re vie ws"> <F e a ture na me ="Re c o mme nd ">0 o ut o f 3</ F e a ture > <F e a ture na me ="T ime ">25 De c e mb e r 2003</ F e a ture > <F e a ture na me ="Autho r">a k2k</ F e a ture > <F e a ture na me ="Re vie w">A po o r re ma ke o f Mino rity Re po rt, with le ss ta le nte d a c to rs. Pro mising plo t line tha t wilte d a wa y in the first thirty minute s o f the film. Inte re sting induc tive jo urne y a nd ne a t c a r c ha se s, b ut no whe re c lo se to my mo ne y's wo rth. I'd re c o mme nd to g o a nd se e L OR a g a in.</ F e a ture > <F e a ture na me ="Sc o re ">1/ 10</ F e a ture > <F e a ture na me ="F ro m">Illino is</ F e a ture > <F e a ture na me ="T itle ">A pe rfe c t Christma s mo vie ha s a b o ut a s muc h c o nne c tio n with re a lity a s Sa nta Cla use do e s.</ F e a ture > </ Re c o rd>

  20. Da ta Pro c e ssing Pre sumptio ns a nd o b se rva tio ns: • Sc o re indic a te s the se ntime nt o f the re vie w • Sho rt re vie ws a re pre fe rre d o ve r lo ng re vie ws – lo ng re vie ws ha ve a lo t o f o b je c tive pa rts a b o ut sto ryline , a ne c do te s e tc . – sho rt re vie ws c o nta ining o nly the o pinio n o ve r the mo vie a nd o fte n e xpre sse d se ntime nta l • T he se ntime nt c la ssific a tio n o n e xtre me re vie ws (ve ry hig h o r ve ry lo w ra ting ) a re mo stly una mb ig uo us a nd c le a r while mid ra te d re vie ws ha ve a lo t o f unc le a r se nte nc e s, suc h a s o ne the o ne ha nd …o n the o the r

  21. Da ta Pro c e ssing • F ilte ring the re vie w – T he numb e r o f to ke ns > 900 – with a ra ting 4, 5, 6, 7 o r 8 o ut o f 10 • SCORE a ssig nme nt to e a c h se nte nc e in the se le c te d re vie ws – SCORE = Ra nk ( 1 ~ 10 sta rt) – SCORE + 1, if the se nte nc e : • I s the first, se c o nd o r la st se nte nc e • And c o nta ins the ke ywo rds, suc h a s I , me , mo vie , film a nd this mo vie . – SCORE – 1, if the se nte nc e : • Ha s the le ng th > 100 • And c o nta ins the ke ywo rds, suc h a s imdb , yo u, yo ur, spo ile r a nd re vie w e tc . • T he se nte nc e with the hig he st SCORE fro m a re vie w a re se le c te d.

  22. F e a ture s – N-g ra m to ke n pa tte rn E xtra c ting uni-, b i- a nd trig ra ms o ut o f e ve ry se nte nc e fro m the se ntime nta l c o rpus • F o r e xa mp le : I a b so lute ly lo ve d this mo vie . • Unig ra ms: – i (NP), a b so lute ly (RB), lo ve d (VVD) • Big ra ms: – i a b so lute ly (NP RB), a b so lute ly lo ve d (RB VVD) • T rig ra ms: – i a b so lute ly lo ve d (NP RB VVD), a b so lute ly lo ve d this (RB VVD DT )

  23. F e a ture s – De pe nde nc y Pa tte rn This is a funny super interesting and exciting movie. So me imp o rta nt info rma tio n is misse d in N-g ra m to ke ns pa tte rn. • funny a nd mo vie a re no t c a ug ht b y a n-g ra m (n<6) So , we inc lude de pe nd s pa tte rns: • a mo d(mo vie -9, funny-4) T o o l: Sta nfo rd -De pe nde nc y Pa rse r

  24. E xtra K no wle d g e - Unig ra m pa tte rns e xte nd e d with Wo rd Ne t • All 1-g ra m a dje c tive a nd a d ve rb pa tte rns will b e e xte nd e d with Wo rdNe t. Bo th the syno nyms a nd the a nto nyms a re use d. • F o r instanc e , 1-g ram patte rn “dry” c an b e e xte nde d with – Pa rc he d / a rid / a nhydro us / se re / drie d-up – We t / wa te ry / da mp / mo ist / humid / so g g y • In o ur e xpe rime nt, the a nto nyms/ syno nyms a re the wo rds whic h c o nne c t the o rig ina l wo rd with a ma ximum dista nc e o f two .

  25. E xtra K no wle d g e – Ne g a tio ns • So me e le me nts in a se nte nc e c a n c ha ng e the se ntime nt o f a wo rd o r phra se , suc h a s – Sub junc tive : I tho ug ht this mo vie is g o o d. – T e mpus: T his mo vie wa s g o o d. – Ne g a tio n: T his film is no t funny. – Quo ta tio n: My frie nd to ld me “this is the b e st mo vie e ve r, yo u ha ve to wa tc h it” b ut I didn’ t like d it. • In o ur wo rk, the c o nte nt in the q uo ta tio n is re mo ve d • we c a re o nly ne g a tio ns suc h a s no t, no , ne ve r a nd n’ t, inc luding – no wo nde r, no t just, no t to me ntio n e tc . – Re stric te d c o mpa ra tive se nte nc e s “no t b e tte r a s” “no mo re ” e tc .

  26. Alg o rithm – Sc o re o f pa tte rns • E a c h pa tte rn ha s a n iSCORE , inc luding two sub -va lue s – iSCORE po s : the va lue o f b e ing po sitive – iSCORE ne g : the va lue o f b e ing ne g a tive • T he iSCORE is initia lize d with the fre q ue nc y o f this p a tte rn fro m the c o rpus

  27. Alg o rithm – Da ta b ia s • Altho ug h “mo re ” ne g a tive sc o re d se nte nc e s a re use d , i.e . (1/ 10, 2/ 10, 3/ 10) vs.(9/ 10, 10/ 10), p o sitive re vie ws a re still twic e the na tive o ne s. • Assuming 1) the re a re X ne g a tive se nte nc e s a nd Y po sitive o ne s o r o n the o the r wa y ro und, a nd 2) Y > X e q ua lize r= Y / X BIAS= e q ua lize r/ (X + Y + Y – X) iSCORE Y = iSCORE Y / 2Y – BIAS

  28. Alg o rithm – iSCORE • iSCORE = iSCORE po s - iSCORE ne g – If the va lue o f the iSCORE is po sitive the c o mpute d po la rity o f the pa tte rn is po sitive a nd if the va lue is ne g a tive the po la rity is ne g a tive • iSCORE = iSCORE * 2, if the pa tte rn is b ina ry • iSCORE = iSCORE * 3, if the pa tte rn is triple • iSCORE = iSCORE * 2.5, if the pa tte rn is a de pe nd e nc y pa tte rn

  29. Alg o rithm – iSCORE e xte nd e d b y Wo rd Ne t • T he syno nyms ha ve the sa me po la rity a s the wo rd, while the a nto nyms ha ve a re ve rse d po la rity.

  30. Alg o rithm – iSCORE e xte nd e d b y Wo rd Ne t • F o r instanc e , if Po larity(fast JJ) = po sitive , fo r wo rds with the Wo rdNe t de pth = 1 – [iSCORE (swift), iSCORE (pro mpt) , …] += 0.3 – [iSCORE (slo w) ] += - 0.3 • fo r the wo rds with a Wo rdNe t de pth = 2 – [iSCORE (swift) , iSCORE (pro mpt), …] += 0.3 * 2= 0.6 – [iSCORE (slo w)] += - 0.3 * 2 = - 0.6 – [iSCORE (slug g ish)] += - 0.3 (syno nyms a t the x nd de pth) += 0.3 * ((ma x. de pth + 1) – x) – iSCORE (a nto nyms a t the x nd de pth) += -0.3 * ((ma x. de pth + 1) – x) – iSCORE • # 0.3 is an arb itrarily c ho se n value

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend