Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. Sc. October 13, 2017 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

Introduction What is Sentiment Analysis? G. Heiczman — Sentiment Analysis 2

Introduction Problems of today: • Too much information • Too little time G. Heiczman — Sentiment Analysis 5

Introduction Agenda • Text Mining summary • Example of practical application • Presentation of results • Conclusion and Lessons Learned G. Heiczman — Sentiment Analysis 6

Text Mining Feature Selection Main purpose: Extract valuable information, get rid of redundant features ’Bag of Words’ approach Most common selection steps: • Removal of stop words (the, is, at ...) • Removal of plurals (dogs -> dog) • Word / n-gram frequency • Part of Speech (POS) tagging (adjectives) • Opinion words (like, hate, love ...) • Detection of negation (not good -> bad) G. Heiczman — Sentiment Analysis 7

Text Mining Sentiment Classification Three main categories: • Machine Learning • Lexicon-based • Hybrid G. Heiczman — Sentiment Analysis 8

Text Mining Pitfalls • Named Entity Recognition i.e. "What is the topic" • Anaphora Resolution - Reference word resolution. "What is ’it’ refering to?" • Sarcasm • Abbreviations, poor grammar / punctuation / spelling G. Heiczman — Sentiment Analysis 9

Practical Application • Dataset • Language • Email retrieval • Content retrieval • Sentiment value retrieval G. Heiczman — Sentiment Analysis 10

Practical Application Dataset Collection of emails from the IETF. Task of IETF is to set standards. G. Heiczman — Sentiment Analysis 11

Practical Application Language C# or Python? Not enough comprehensive, completely free tools Notable C# tools: • VaderSharp (free but primitive) • Aylien (paid) • Watson D.C. (paid) • Vivekn (free but no documentation) Python tool: TextBlob G. Heiczman — Sentiment Analysis 12

Practical Application Multiple values obtained through SA: • Polarity ( -1.0 <-> 1.0) • Subjectivity (0.0 <-> 1.0) • Most used word • Sentence Count G. Heiczman — Sentiment Analysis 13

Practical Application Textblob example blob = TextBlob("I think this presentation is really, really good!") print(blob.sentiment) # Gives both polarity and subjectivity around 1.0 print(blob.words.count(’really’)) # Gives 2 print(blob.noun_phrases) # Gives nouns, in this case presentation G. Heiczman — Sentiment Analysis 14

Practical Application Figure 1: Example of email with polarity 1.0 • Filename: /home/.../geopriv/2007-12.mail • Key: 251 G. Heiczman — Sentiment Analysis 15

Practical Application Programflow G. Heiczman — Sentiment Analysis 16

Practical Application Programflow G. Heiczman — Sentiment Analysis 17

Statistics 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 b l l s t n a o a x s y i m s s b p e o 4 e t e n v m i k p 8 e d p o m 3 o i a n i q t i f I h b r m a t - e i r c e i i i l k p c p a a r - t e a p a t y d t r r - s n d Figure 2: Top 10 groups who use the most sentences Even distribution Indication of in-depth discussion or off-topic rambling? G. Heiczman — Sentiment Analysis 18

Statistics 0.60 0.55 0.5 0.50 0.45 0.40 0.375854 0.35 0.3078360.303693 0.29163 0.276323 0.30 0.2532740.251799 0.25 0.25 0.25 0.20 0.15 0.10 0.05 0.00 s l l l s c e s a l l t s p p o d e r o m b s 4 t - o i t o t e l l i 8 h c t a g r - t d n c r n e n e m b s e o e l e m - v c o i s t o s a t - s r - 6 a n e r 7 9 d i d v d 6 n a o c - a l o i Figure 3: Top 10 most positive groups Logarithmic distribution Notable group: "iaoc-scribes" G. Heiczman — Sentiment Analysis 19

Statistics 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 -0.30 -0.35 -0.40 -0.45 -0.50 c l g s s a a s b m e r c b r r e r s a e o e e w s u w m i l e p i i t d a r h i t s a n c n h - e s r i f c t p t t e - a i y t 0 i l 7 b i o m - f t r i Figure 4: Top 10 most negative groups Stronger logarithmic distribution Notable group: "ietf-sailors" G. Heiczman — Sentiment Analysis 20

Statistics 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 s l s s a p t c e o s r r l e e e r s a a c b g e e e m d o m p m r i d d t c m r i i n o n a k s c h - s e e c o t t t c t u t o c a a - o y a 3 - 0 t i f i 7 t 7 l e b i i o m - f t r i Figure 5: Top 10 most subjective groups Surprising top scores Discussion groups G. Heiczman — Sentiment Analysis 21

Statistics From the 7 most negative (-1.0) polarity entries 6 belong to the group ’eos’ All of them are in Spanish (?) G. Heiczman — Sentiment Analysis 22

Conclusion Useful but not universally Lessons learned: • Filter the data-set intelligently • Don’t try to solve everything with one library G. Heiczman — Sentiment Analysis 23

Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. Sc. October 13, 2017 Chair of Network Architectures and Services

OpenPGP? mailing lists? OpenPGP on mailing lists? let's fix that! manual all subscribers have

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Network Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt

Nested Lists Nested Lists Lists can hold any object Lists are themselves objects

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

More lists Readings: HtDP , sections 11, 12, 13 (Intermezzo 2). Topics: Sorting a list List

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Using Lists and Tables Student Web Presence Guidelines Summary 1. Purpose of lists 2. Using

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 4 Instructor: Yizhou Sun

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Lucene And Solr Document Classification Alessandro Benedetti, Software Engineer, Sease Ltd. Who

Use of IT in own tasks and business and how it is learnt Learning objectives Identify

Contributing to LibreOffjce without C++ knowledge Ilmari Lauhakangas, TDF

1 Bertrand Meyer The assistants 7 8 Volkan Arslan Michael Gomez E-mail:

Physics S Student Clubs & RSOs Fall 2020 Physics at Illinois Undergraduate Student Organiz

INTRODUCE IN-KERNEL SMB3 SERVER CALLED CIFSD Namjae Jeon Samsung Electronics June 5, 2019

Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Text Mining on Mailing Lists: Sentiment Analysis Gordon Heiczman, B. Sc. October 13, 2017 Chair of Network Architectures and Services

OpenPGP? mailing lists? OpenPGP on mailing lists? let's fix that! manual all subscribers have

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Network Text Analysis of R Mailing Lists UseR! Rennes 2009 Angela Bohn, Ingo Feinerer, Kurt

Nested Lists Nested Lists Lists can hold any object Lists are themselves objects

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

More lists Readings: HtDP , sections 11, 12, 13 (Intermezzo 2). Topics: Sorting a list List

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Using Lists and Tables Student Web Presence Guidelines Summary 1. Purpose of lists 2. Using

CS6220: DATA MINING TECHNIQUES Chapter 8&amp;9: Classification: Part 4 Instructor: Yizhou Sun

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Lucene And Solr Document Classification Alessandro Benedetti, Software Engineer, Sease Ltd. Who

Use of IT in own tasks and business and how it is learnt Learning objectives Identify

Contributing to LibreOffjce without C++ knowledge Ilmari Lauhakangas, TDF

1 Bertrand Meyer The assistants 7 8 Volkan Arslan Michael Gomez E-mail:

Physics S Student Clubs &amp; RSOs Fall 2020 Physics at Illinois Undergraduate Student Organiz

INTRODUCE IN-KERNEL SMB3 SERVER CALLED CIFSD Namjae Jeon Samsung Electronics June 5, 2019

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 4 Instructor: Yizhou Sun

Physics S Student Clubs & RSOs Fall 2020 Physics at Illinois Undergraduate Student Organiz