Hate Speech Detection is Not as Easy as You May Think: A Closer Look - PowerPoint PPT Presentation

Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation Aymé Arango, Jorge Pérez and Bárbara Poblete

UNDETECTED ALMOST PERFECT HATE SPEECH VS STATE-OF-THE-ART IN RESULTS SOCIAL MEDIA

UNDETECTED HATE SPEECH IN SOCIAL MEDIA

94% F1 [Agrawal and Awekar] ECIR 2018 93% F1 ALMOST PERFECT [Badjatiya et al.] WWW STATE-OF-THE-ART 2017 RESULTS 92% F1 [Zeerak Waseem] NAACL 2016

Hate Speech Detection is Not as Easy as You May Think We show that state of the art results are highly overestimated due to experimental issues in the models: Including the testing set during training phase Oversampling the data before splitting User-biased datasets

State-of-the-art replication User distribution Generalization

94% F1 [Agrawal and Awekar] ECIR 2018 93% F1 ALMOST PERFECT [Badjatiya et al.] WWW STATE-OF-THE-ART 2017 RESULTS 92% F1 [Zeerak Waseem] NAACL 2016

DATASET 1 [Waseem and Hovy] NAACL 2016 Tweet Label Hate Non-Hate

Model 1 [Badjatiya et al.] 2017 DATASET 1 PHASE 1 PHASE 2 [Waseem and Hovy] NAACL Feature Extraction Classification Method 2016 93% F1

PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method DATASET 1 [Waseem and Hovy] NAACL 2016 Embeddings LSTM Fully Connected Softmax Prediction

PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method DATASET 1 [Waseem and Hovy] Splitting TRAIN TEST NAACL 2016 Embeddings Embeddings LSTM Fully Connected Softmax Prediction

PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method DATASET 1 [Waseem and Hovy] Splitting TRAIN TEST NAACL 2016 93% F1 Embeddings AVG( Embeddings ) LSTM GBDT Prediction Fully Connected Softmax Prediction

This looks great! But there is a problem.

PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method DATASET 1 [Waseem and Hovy] Splitting TRAIN TEST NAACL 2016 TEST Embeddings AVG( Embeddings ) LSTM GBDT Prediction Fully Connected Softmax Prediction

Let’s create the model only with the training set.

PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method DATASET 1 [Waseem and Hovy] NAACL 2016

New PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method TRAIN TEST Same Splitting TRAIN TEST

New PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method TRAIN Same Splitting TRAIN TEST Embeddings LSTM Fully Connected Softmax Prediction

New PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method TRAIN Same Splitting TRAIN TEST Embeddings Embeddings LSTM Fully Connected Softmax Prediction

New PHASE 1 PHASE 2 Model 1 [Badjatiya et al.] 2017 Feature Extraction Classification Method TRAIN Same Splitting TRAIN TEST 73% F1 Embeddings AVG( Embeddings ) 93% F1 LSTM GBDT Prediction Fully Connected Softmax Prediction

The result is overestimated due to the inclusion of the testing set during the training phase.

Model 2 [Agrawal and Awekar] 2018 DATASET 1 Feature Extraction Oversampling [Waseem and Hovy] + Data NAACL Classification Method 2016 94% F1

Model 2 [Agrawal and Awekar] 2018 DATASET 1 [Waseem and Hovy] NAACL 2016

Model 2 [Agrawal and Awekar] 2018 TRAIN Oversampling Splitting TEST 94% F1 Embeddings LSTM Fully Connected Softmax Prediction

This also looks great! But there is another problem.

Model 2 [Agrawal and Awekar] 2018 DATASET 1 [Waseem and Hovy] NAACL 2016

Model 2 [Agrawal and Awekar] 2018 TRAIN Oversampling Splitting TEST

Model 2 [Agrawal and Awekar] 2018 Splitting Oversampling TEST 79% F1 Embeddings 94% F1 LSTM Fully Connected Softmax Prediction

The result is overestimated due to the fact that the oversampling phase occurs before splitting the data.

However, there is another issue to take into account.

% Tweets from the most prolific user per class 96 % 96% 44% 44 % 38% 25 % 25% Non-Hate Sexism Racism Hate

DATASET 1 Splitting without [Waseem and Hovy] TRAIN TEST overlapped users NAACL 2016 Model 1 44% F1 73% F1 93% F1 [Badjatiya et al.] 2017 Model 2 35% F1 79% F1 94% F1 [Agrawal and Awekar] 2018

What happens if we have a dataset with a better user distribution?

DATASET 1 DATASET 2 DATASET 2 NEW 250 tweets [Davidson et al.] Hateful tweets DATASET per user ICWSM per class 2017

NEW Splitting without TRAIN TEST DATASET overlapped users Model 1 78% F1 44% F1 73% F1 93% F1 [Badjatiya et al.] 2017 Model 2 76% F1 35% F1 79% F1 94% F1 [Agrawal and Awekar] 2018

User distribution on datasets has an impact on the classification results.

TRAINING TESTING SET SET

DATASET 3 TRAINING [Basile et al.] SET SemEval 2019

DATASET 1 DATASET 3 47% F1 [Waseem and Hovy] [Basile et al.] NAACL SemEval 2016 2019 Model 1 [Badjatiya et al.] 2017 DATASET 3 NEW 51% F1 [Basile et al.] DATASET SemEval 2019 DATASET 1 DATASET 3 51% F1 [Waseem and Hovy] [Basile et al.] NAACL SemEval 2016 2019 Model 2 [Agrawal and Awekar] 2018 DATASET 3 NEW 54% F1 [Basile et al.] DATASET SemEval 2019

Better user-distributed datasets lead to better generalization.

Conclusions

Hate Speech Detection is Not as Easy as You May Think We show that state of the art results are highly overestimated due to experimental issues in the models: Including the testing set during training phase Oversampling the data before splitting User-biased datasets

Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation Aymé Arango, Jorge Pérez and Bárbara Poblete

Hate Speech Detection is Not as Easy as You May Think: A Closer Look - PowerPoint PPT Presentation

Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation Aym Arango, Jorge Prez and Brbara Poblete UNDETECTED ALMOST PERFECT HATE SPEECH VS STATE-OF-THE-ART IN RESULTS SOCIAL MEDIA UNDETECTED HATE

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Tackling Hate Crime Partnership working Why report Hate Incidents and/or Crime.. All hate

I Hate Your Database I Hate Your Database Andrew Godwin Andrew Godwin @andrewgodwin

Hate Crime Darren Goddard Hate Crime Officer leics.police.uk Definitions leics.police.uk 2 1

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Responding to Online Hate Speech commonsense.org/education Shareable with attribution for

NO HATE SPEECH MOVEMENT YOUTH CAMPAIGN FOR HUMAN RIGHTS ONLINE & OFFLINE FROM INDIVIDUAL

Communities Uniting Against Hate Not In Our Town APA Presentation New film A

Options and Configurations Anand Paurana If you think you can Or if you think you cant,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Fanning the Flames of Hate: Social Media and Hate Crime Karsten Mller (karstenmuller.eu) Carlo

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

TO SIGHT-READ OR NOT . . . THERE IS NO QUESTION! What YOU think they think of sight-reading.

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

From Atoms to Bits Ahmet Onat 2018 onat@sabanciuniv.edu Layout of the Lecture Analog

Generic Polyphase Filterbanks with CUDA Jan Krmer DLR German Aerospace Center Communication

Mixed-Signal VLSI Design Course Code: EE719 Department: Electrical Engineering Lecture 38: April

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik & Daniela Chuda 1 T a sk

Welcome! Todays Agenda: Primitives (contd.) Ray Tracing Intersections

Multilinear Algebra Based Fitting of a Sum of Exponentials to Oversampled Data Lieven De

Delta-Sigma Modulation References 1. Franca and Tsividis, Chapter 10. 2. Oversampling

Radial basis function partition of unity methods for PDEs RBF-PUM Elisabeth Larsson, Scientific

Hate Speech Detection is Not as Easy as You May Think: A Closer Look - PowerPoint PPT Presentation

Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation Aym Arango, Jorge Prez and Brbara Poblete UNDETECTED ALMOST PERFECT HATE SPEECH VS STATE-OF-THE-ART IN RESULTS SOCIAL MEDIA UNDETECTED HATE

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Tackling Hate Crime Partnership working Why report Hate Incidents and/or Crime.. All hate

I Hate Your Database I Hate Your Database Andrew Godwin Andrew Godwin @andrewgodwin

Hate Crime Darren Goddard Hate Crime Officer leics.police.uk Definitions leics.police.uk 2 1

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Responding to Online Hate Speech commonsense.org/education Shareable with attribution for

NO HATE SPEECH MOVEMENT YOUTH CAMPAIGN FOR HUMAN RIGHTS ONLINE &amp; OFFLINE FROM INDIVIDUAL

Communities Uniting Against Hate Not In Our Town APA Presentation New film A

Options and Configurations Anand Paurana If you think you can Or if you think you cant,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Fanning the Flames of Hate: Social Media and Hate Crime Karsten Mller (karstenmuller.eu) Carlo

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

TO SIGHT-READ OR NOT . . . THERE IS NO QUESTION! What YOU think they think of sight-reading.

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

From Atoms to Bits Ahmet Onat 2018 onat@sabanciuniv.edu Layout of the Lecture Analog

Generic Polyphase Filterbanks with CUDA Jan Krmer DLR German Aerospace Center Communication

Mixed-Signal VLSI Design Course Code: EE719 Department: Electrical Engineering Lecture 38: April

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik &amp; Daniela Chuda 1 T a sk

Welcome! Todays Agenda: Primitives (contd.) Ray Tracing Intersections

Multilinear Algebra Based Fitting of a Sum of Exponentials to Oversampled Data Lieven De

Delta-Sigma Modulation References 1. Franca and Tsividis, Chapter 10. 2. Oversampling

Radial basis function partition of unity methods for PDEs RBF-PUM Elisabeth Larsson, Scientific

NO HATE SPEECH MOVEMENT YOUTH CAMPAIGN FOR HUMAN RIGHTS ONLINE & OFFLINE FROM INDIVIDUAL

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik & Daniela Chuda 1 T a sk