Author Profiling using Complementary Second Order Attributes and - PowerPoint PPT Presentation

Introduction Proposed Method Experimental Results Conclusions and Future Work Author Profiling using Complementary Second Order Attributes and Stylometric Features Konstantinos Bougiatiotis* Anastasia Krithara Institute of Information and Telecommunication, N.C.S.R ”Demokritos”, Greece September 3, 2016 1 / 28

Introduction Proposed Method Experimental Results Conclusions and Future Work Outline Introduction 1 Overview Proposed Method 2 General Workflow Preprocessing Feature Extraction Classification Experimental Results 3 PAN’16 Data Results on Train Data Results on Test Data Conclusions and Future Work 4 2 / 28

Introduction Proposed Method Overview Experimental Results Conclusions and Future Work Introduction 1 Overview Proposed Method 2 General Workflow Preprocessing Feature Extraction Classification Experimental Results 3 PAN’16 Data Results on Train Data Results on Test Data Conclusions and Future Work 4 3 / 28

Introduction Proposed Method Overview Experimental Results Conclusions and Future Work Introduction Author Profiling Find specific characteristics of authors , by studying their texts 4 / 28

Introduction Proposed Method Overview Experimental Results Conclusions and Future Work Introduction Author Profiling Find specific characteristics of authors , by studying their texts Age , gender , personality traits, emotions 4 / 28

Introduction Proposed Method Overview Experimental Results Conclusions and Future Work Introduction Author Profiling Find specific characteristics of authors , by studying their texts Age , gender , personality traits, emotions Marketing, Security, Forensics, ... 4 / 28

Introduction Proposed Method Overview Experimental Results Conclusions and Future Work Introduction Author Profiling Find specific characteristics of authors , by studying their texts Age , gender , personality traits, emotions Marketing, Security, Forensics, ... Pan’16 Languages: English, Spanish and Dutch(gender only) Focus on cross-genre evaluation 4 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Introduction 1 Overview Proposed Method 2 General Workflow Preprocessing Feature Extraction Classification Experimental Results 3 PAN’16 Data Results on Train Data Results on Test Data Conclusions and Future Work 4 5 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification General Workflow Aggregate tweets Tweets of each user raw tweets - Clean Html - Detwittify Preprocessing - Remove Numbers - Remove Punctuation clean tweets Document-Profile Features Stylometry Features - Model used in PAN’15 - Second Order Attributes extracted features Feature Concatenation Support Vector Machine 6 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Tweets Concatenate the tweets of each user � Profile Based Approach 8 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Tweets Concatenate the tweets of each user � Profile Based Approach Sample Tweet Thanks for the follow back <a href="/WolfgangDigital" Raw Tweet: Noisy data, HTML tags, links, etc class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="391 869708" ><s>@</s><b>WolfgangDigital </b></a> I'll be keeping an eye out for any vacancies you advertise in the near future. 8 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Tweets Concatenate the tweets of each user � Profile Based Approach Raw Tweet: Noisy data, Sample Tweet HTML tags, links, etc Thanks for the follow back Cleaning HTML @WolfgangDigital I'll be keeping an eye out for any vacancies you advertise in the near future. 8 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Tweets Concatenate the tweets of each user � Profile Based Approach Raw Tweet: Noisy data, Sample Tweet HTML tags, links, etc Thanks for the follow back Cleaning HTML I'll be keeping an eye Detwittify (remove out for any vacancies you hashtags, replies etc) advertise in the near future. 8 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Tweets Concatenate the tweets of each user � Profile Based Approach Raw Tweet: Noisy data, Sample Tweet HTML tags, links, etc Thanks for the follow back Cleaning HTML I ll be keeping an eye out Detwittify (remove for any vacancies you hashtags, replies etc) advertise in the near Remove all non-letter future characters (numbers, ...) 8 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Stylometric and Structural Features - PAN’15 Experimented with many features: Profiling Features Structural Stylometry Number of Number of Number of Tf-idf of Bag of Ngram Word length Number of Hashtags Links Mentions Ngrams Smileys Graphs Uppercase Finally settled on term-frequencies 3-grams (age) and unigrams (gender) 10 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Second Order Attributes-SOA Idea originally from PAN’13 winning Team (INAOE, Mexico) 1 2-step method , similar approach to Naive Bayes 1 L´ opez-Monroy et al.: INAOE’s participation at PAN’13: Author Profiling task Notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop 11 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Second Order Attributes-SOA Idea originally from PAN’13 winning Team (INAOE, Mexico) 1 2-step method , similar approach to Naive Bayes Intuition 1 Associate the different terms in our collection with target profiles (age or gender classes) → Calculate words-classes vectors based on word frequency 1 L´ opez-Monroy et al.: INAOE’s participation at PAN’13: Author Profiling task Notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop 11 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Second Order Attributes-SOA Idea originally from PAN’13 winning Team (INAOE, Mexico) 1 2-step method , similar approach to Naive Bayes Intuition 1 Associate the different terms in our collection with target profiles (age or gender classes) → Calculate words-classes vectors based on word frequency 2 Project the documents in the profile space according to the weighted aggregation of their terms → Calculate document-classes vectors 1 L´ opez-Monroy et al.: INAOE’s participation at PAN’13: Author Profiling task Notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop 11 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Example of Age Specific Terms 12 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Example of Gender Specific Terms 13 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Example illustration of generated SOA 14 / 28

Introduction General Workflow Proposed Method Preprocessing Experimental Results Feature Extraction Conclusions and Future Work Classification Weighted SOAComplementary Novelties introduced: � Use complementary classes documents for each word-class relation Intuition Counter skewed class distribution of data → Use complementary classes for each term-profile relation → More even amount of data for each class → Robust estimates and lesser bias 15 / 28

Author Profiling using Complementary Second Order Attributes and - PowerPoint PPT Presentation

Introduction Proposed Method Experimental Results Conclusions and Future Work Author Profiling using Complementary Second Order Attributes and Stylometric Features Konstantinos Bougiatiotis* Anastasia Krithara Institute of Information and

Author: Bill Buchanan Author: Bill Buchanan Author: Bill Buchanan Author: Bill Buchanan Author:

Complementary Medicines Industry Audit & Trends 2020 ABOUT COMPLEMENTARY MEDICINES

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

Spotlight on Complementary Medicines MMDR Reforms Michael Shum Director, Complementary

Evaluation of Evaluation of complementary/ complementary/ alternative medicine alternative

An introduction to Profiling Physics Coding Club: 09/06/2017 D. Dickinson

Twitter User Profiling: Bot and Gender Identification 7 th Author Profiling Task PAN 2019 CLEF

author profiling shared task on: Bots and gender profiling Francisco Rangel & Paolo Rosso

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

CAPS: A Cross-genre Author Profiling System Ivan Bilan and Desislava Zhekova Center for

Performance of b jet identification at s = 13 TeV with the ATLAS detector at CERN By Wasikul

The Carlitz-Scoville-Vaughan Theorem and its Generalizations Ira M. Gessel Department of

Diagrammatically maximal and geometrically maximal knots Jessica Purcell Monash University,

Complementary log-log and probit: activation functions implemented in artificial neural networks

Hierarchical Clustering Lecture 15 David Sontag New York

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Clustering Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors

Machine Learning Lecture Notes on Clustering (II) 2017-2018 Davide Eynard davide.eynard@usi.ch

Author Profiling using Complementary Second Order Attributes and - PowerPoint PPT Presentation

Introduction Proposed Method Experimental Results Conclusions and Future Work Author Profiling using Complementary Second Order Attributes and Stylometric Features Konstantinos Bougiatiotis* Anastasia Krithara Institute of Information and

Author: Bill Buchanan Author: Bill Buchanan Author: Bill Buchanan Author: Bill Buchanan Author:

Complementary Medicines Industry Audit &amp; Trends 2020 ABOUT COMPLEMENTARY MEDICINES

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

Spotlight on Complementary Medicines MMDR Reforms Michael Shum Director, Complementary

Evaluation of Evaluation of complementary/ complementary/ alternative medicine alternative

An introduction to Profiling Physics Coding Club: 09/06/2017 D. Dickinson

Twitter User Profiling: Bot and Gender Identification 7 th Author Profiling Task PAN 2019 CLEF

author profiling shared task on: Bots and gender profiling Francisco Rangel &amp; Paolo Rosso

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

CAPS: A Cross-genre Author Profiling System Ivan Bilan and Desislava Zhekova Center for

Performance of b jet identification at s = 13 TeV with the ATLAS detector at CERN By Wasikul

The Carlitz-Scoville-Vaughan Theorem and its Generalizations Ira M. Gessel Department of

Diagrammatically maximal and geometrically maximal knots Jessica Purcell Monash University,

Complementary log-log and probit: activation functions implemented in artificial neural networks

Hierarchical Clustering Lecture 15 David Sontag New York

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Clustering Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors

Machine Learning Lecture Notes on Clustering (II) 2017-2018 Davide Eynard davide.eynard@usi.ch

Complementary Medicines Industry Audit & Trends 2020 ABOUT COMPLEMENTARY MEDICINES

author profiling shared task on: Bots and gender profiling Francisco Rangel & Paolo Rosso