SLIDE 13 Research Group Social Computing Department of Informatics Technical University of Munich
Domains – Hate Speech Detection | Topics
Maximilian Wich, M.Sc.
Preliminary Meeting of the NLP Lab Course SS2020 12
Potential topics/ideas:
- Multitask learning to combine data sets with different labeling schemes
− Problem:
there are many hate speech data sets, but they use different labeling schemes
− Idea:
train a multitask classifier (e.g., BERT) with shared layers based on several data sets
- Learning from weak supervision to increase the amount of training data without manual labeling
− Problem:
we do not have enough trainings data
− Idea:
train classifiers on available data, collect new data with these classifiers, and retrain the classifiers
- Classify hate speech based on stylistic elements (e.g., POS, usage of emojis...)
− Problem:
implicit hate speech is often hard to identify
− Idea:
use stylistic elements to find patterns in hate speech and train an classifier