Chair of Network Architectures and Services Department of Informatics Technical University of Munich
An empirical aproach towards analysis of discussions on mailing - - PowerPoint PPT Presentation
An empirical aproach towards analysis of discussions on mailing - - PowerPoint PPT Presentation
Chair of Network Architectures and Services Department of Informatics Technical University of Munich An empirical aproach towards analysis of discussions on mailing lists Simon Klimek March 21, 2018 Chair of Network Architectures and Services
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Motivation Related Work Approach Evaluation Future Work Bibliography
- S. Klimek
– Discussion Analysis 2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Background - IETF
Figure 1: IETF Logo
- Development of standards
- 121 active working groups
- RFCs (Request For Comments)
- S. Klimek
– Discussion Analysis 2
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Motivation
- Discussions are held via mailing lists.
- Can we analyze them automatically?
- Can the gained data help us to better understand IETF processes?
- S. Klimek
– Discussion Analysis 3
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Related Work
- Conversational Speech
- S. Klimek
– Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Related Work
- Conversational Speech
- Formal Speech
- S. Klimek
– Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Related Work
- Conversational Speech
- Telephone Conversations (human to human)
- Online Chats
- Plan recognition in dialogues
- Formal Speech
- S. Klimek
– Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Related Work
- Conversational Speech
- Telephone Conversations (human to human)
- Online Chats
- Plan recognition in dialogues
- Formal Speech
- Q&A Forum
- S. Klimek
– Discussion Analysis 4
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Conversational Speech
- Dialogue Acts labeling [6] on the Switchboard corpus [3]
- Online Chat between multiple participants [7]
- Plan recognition in dialogues [1]
- S. Klimek
– Discussion Analysis 5
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Formal Speech
- Question - Answer Forums [2]
- S. Klimek
– Discussion Analysis 6
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Previous Work
Nikolai Schwellnus’ bachelor thesis "A Heat Map for IETF Standardiza- tion Activities" [5]
- S. Klimek
– Discussion Analysis 7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Previous Work
Nikolai Schwellnus’ bachelor thesis "A Heat Map for IETF Standardiza- tion Activities" [5]
text file integer key real polarity real subjectivity text mostusedword integer sentencecount sentimentvalues text name boolean announce integer id list text messageid text file integer key timestamp with time zone date timestamp date_local text sender_addr text receiver text subject text inreply boolean spam numeric spamscore text sender_name bigint person bigint fast_person mails varchar(788) leaf integer depth mail_threads text messageid text list mail_on_list leaf:messageid messageid:messageid list:name
Figure 2: Database Schemata
- S. Klimek
– Discussion Analysis 7
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Finding Discussions
Finding discussion threads?
- S. Klimek
– Discussion Analysis 8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Finding Discussions
Finding discussion threads?
[Doh] operational considerations Eliot Lear Re: [Doh] operational considerations Martin J . Dürst Re: [Doh] operational considerations Jim Reid Re: [Doh] operational considerations Eliot Lear. Re: [Doh] operational considerations Patrick McManus Re: [Doh] operational considerations Jim Reid Re: [Doh] operational considerations Eliot Lear Re: [Doh] operational considerations Patrick McManus Re: [Doh] operational considerations Hewitt, Rory Re: [Doh] operational considerations Eliot Lear Re: [Doh] operational considerations Patrick McManus
Figure 3: Thread Structure
- S. Klimek
– Discussion Analysis 8
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Finding Discussions
Finding discussion threads?
20 40 60 80 100 101 102 103 104 105 106 number of mails in one thread
- ccurences
100 200 300 100 101 102 103 104 105 106 number of replies
- S. Klimek
– Discussion Analysis 9
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Finding Discussions
Finding discussion threads?
- In-Reply-To
- Thread-View MHonArc1
1https://www.mhonarc.org
- S. Klimek
– Discussion Analysis 10
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
WITH RECURSIVE r e p l i e s ( messageid , spam, sender_addr , receiver , depth , i n r e p l y ) as ( SELECT messageid , spam, sender_addr , receiver , 1 as depth , i n r e p l y FROM mails WHERE spam IS FALSE and i n r e p l y IS NULL UNION ALL SELECT
- m. messageid , m. spam, m. sender_addr ,
- m. receiver , tm . depth+1 as depth ,
tm . i n r e p l y FROM r e p l i e s tm , mails m WHERE m. i n r e p l y = tm . messageid )
- S. Klimek
– Discussion Analysis 11
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Processing Mails
Extract text from mails.
- 1. Multipurpose Internet Mail Extensions (MIME) [4]
- S. Klimek
– Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Processing Mails
Extract text from mails.
- 1. Multipurpose Internet Mail Extensions (MIME) [4]
- text
- text/plain
- text/html
- multipart
- mixed
- alternative
- S. Klimek
– Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Processing Mails
Extract text from mails.
- 1. Multipurpose Internet Mail Extensions (MIME) [4]
- text
- text/plain
- text/html
- multipart
- mixed
- alternative
- 2. Remove HTML-tags, decode
- S. Klimek
– Discussion Analysis 12
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Quotation and Referencing
- S. Klimek
– Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Quotation and Referencing
- S. Klimek
– Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Quotation and Referencing
- S. Klimek
– Discussion Analysis 13
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Processing of Mail-Blocks
- Tokenization
- Lexical Analysis
- S. Klimek
– Discussion Analysis 14
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Further Analysis
- sentence based
- dialogue acts
- S. Klimek
– Discussion Analysis 15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Further Analysis
- sentence based
- dialogue acts
- mail-block based
- subjectivity
- polarity
- S. Klimek
– Discussion Analysis 15
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Framework Overview
- Finding mail-threads
- Reading a single mail thread
- Pipeline for Preprocessing
- Decoding
- Mail-block chunking
- Tokenization
- Quotation/Referencing
- Polarity/Subjectivity
- Analyzer
- S. Klimek
– Discussion Analysis 16
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Results - Influential People
1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 notifications@github.com bidulock@openss7.org julian.reschke@gmx.de brian.e.carpenter@gmail.com jari.arkko@piuha.net moore@cs.utk.edu touch@isi.edu christer.holmberg@ericsson.com stephen.farrell@cs.tcd.ie stpeter@stpeter.im pekkas@netcore.fi harald@alvestrand.no trac@tools.ietf.org dhc@dcrocker.net paul.hoffman@vpnc.org alexey.melnikov@isode.com kent@bbn.com j.schoenwaelder@jacobs-university.de martin.thomson@gmail.com mnot@mnot.net magnus.westerlund@ericsson.com henrik@levkowetz.com fluffy@cisco.com fred@cisco.com bclaise@cisco.com alexandru.petrescu@gmail.com ted.lemon@nominum.com john-ietf@jck.com nico@cryptonector.com dotis@mail-abuse.org # final says
- S. Klimek
– Discussion Analysis 17
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Results - Influential People
Who starts the longest discussions?
16 18 20 22 24 26 28 30 32 xiangli@seguesoft.com iane@sussex.ac.uk jerome.grenier@bell.ca yuri.ismailov@ericsson.com npowell@harris.com thomas@koch.ro stefan.alfare@swisscom.com tammy_leino@mentor.com segred@ics.forth.gr sroberts@uniserve.com yhirano@google.com jordan.melzer@telus.com cnd@geek.net.au dave.d.smith@alcatel-lucent.com gclark@mti-systems.com phessler@theapt.org shinji.okumura@softfront.jp bart.bogaert@nokia.com mikebianc@aol.com jrn@jrn.me.uk kawashimam@vx.jp.nec.com ankriste@cisco.com
- wen@delong.com
thomas.haynes@sun.com nataraju.sip@gmail.com andrewmcgr@gmail.com kazu@iij.ad.jp rsk@gsp.org michael@wyraz.de victor@jvknet.com average thread length
- S. Klimek
– Discussion Analysis 18
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Results - Unanswered Questions
Questions remain unanswered?
500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700 1,800 1,900 bidulock@openss7.org alexandru.petrescu@gmail.com bclaise@cisco.com tolga.asveren@ss8.com christer.holmberg@ericsson.com brian.e.carpenter@gmail.com pkyzivat@cisco.com julian.reschke@gmx.de notifications@github.com harald@alvestrand.no andy@yumaworks.com pekkas@netcore.fi pthubert@cisco.com kent@bbn.com shares@ndzh.com pkyzivat@alum.mit.edu stephen.farrell@cs.tcd.ie phil.hunt@oracle.com fred.l.templin@boeing.com # unanswered questions at the end of a thread
- S. Klimek
– Discussion Analysis 19
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Future Work
- S. Klimek
– Discussion Analysis 20
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Future Work
- Dialogue Act classification
- Background information about members
- In-depth analysis of discussions
- S. Klimek
– Discussion Analysis 20
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Thank you for your attention!
Feel free to asks questions.
- S. Klimek
– Discussion Analysis 21
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
[1] S. Carberry. Plan recognition in natural language dialogue. ACL-MIT Press series in natural language processing. MIT Press, Cambridge,
- Mass. u.a., 1990.
[2] G. Cong, L. Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In Proceedings of the 31st Annual International ACM SIGIR Conference on Re- search and Development in Information Retrieval, SIGIR ’08, pages 467–474, New York, NY, USA, 2008. ACM. [3] J. J. Godfrey, E. C. Holliman, and J. McDaniel. Switchboard: Telephone speech corpus for research and development. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE In- ternational Conference on, volume 1, pages 517–520. IEEE, 1992.
- S. Klimek
– Discussion Analysis 22
Chair of Network Architectures and Services Department of Informatics Technical University of Munich
[4] P . W. Resnick. Internet message format. RFC 5322, RFC Editor, October 2008. http://www.rfc-editor.org/rfc/rfc5322.txt. [5] N. Schwellnus. A heat map for ietf standardization activities, 2016. [6] A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P . Taylor,
- R. Martin, C. V. Ess-Dykema, and M. Meteer.
Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics, 26(3):339–373, 2000. [7] S. Trausan-Matu. Automatic support for the analysis of online collaborative learning chat conversa- tions. In P . Tsang, S. K. S. Cheung, V. S. K. Lee, and R. Huang, editors, Hybrid Learning, pages 383–394, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
- S. Klimek