twitter user profiling bot
play

Twitter User Profiling: Bot and Gender Identification 7 th Author - PowerPoint PPT Presentation

Twitter User Profiling: Bot and Gender Identification 7 th Author Profiling Task PAN 2019 CLEF Workshop Dijana Kosmajac Dr Vlado Keselj Faculty of Computer Science, Dalhousie University Halifax, Nova Scotia, Canada Overview


  1. Twitter User Profiling: Bot and Gender Identification 7 th Author Profiling Task PAN 2019 – CLEF Workshop Dijana Kosmajac Dr Vlado Keselj Faculty of Computer Science, Dalhousie University Halifax, Nova Scotia, Canada

  2. Overview • Introduction • Bot Detection on Social Media • Methodology • DNA-inspired User Behaviour Fingerprint • Diversity Measures • Dataset of 7 th Author Profiling Task • Experiments and Results • Conclusion Note: for gender detection approach, please refer to the working notes 2

  3. Bot Detection on Social Media • Social media - convenient platforms for people to share, communicate, and collaborate. • Openness of social media is great, but… malicious behaviors happen, such as bullying, terrorist attack planning, and fraud information dissemination, etc. • Important task: detect these abnormal activities as accurately and early as possible to prevent disasters and attacks. • For this study we approached to a subdomain: bot detection Introduction Methodology Dataset Experiments Conclusion 3

  4. Bot and Gender Detection on Social Media • DeBot: Twitter Bot Detection via Warped Correlation, Chavoshi et al., 2016 • DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection, Cresci et al., 2016 Introduction Methodology Dataset Experiments Conclusion 4

  5. DNA-inspired User Behaviour Fingerprint • Introduced first time in Cresci et al., 2016 User timeline 3 ∗ 2^3= 24 different labels ACBCADDCCAF… ASCII(65+ code ) Introduction Methodology Dataset Experiments Conclusion 5

  6. DNA-inspired User Behaviour Fingerprint • We used 1-, 2-, 3- and 4-grams • 3-gram example: Introduction Methodology Dataset Experiments Conclusion 6

  7. Diversity Measures 2 1 𝑛 𝑛𝑏𝑦 𝑊(𝑛, 𝑂) 𝑛 • Yule’s 𝐿 = 𝐷 − 𝑂 + σ 𝑛=1 𝑂 𝑊(𝑂) 𝑞 𝑗 ln(𝑞 𝑗 ) • Shannon’s 𝐼 = − σ 𝑗=1 1 • Simpson’s 𝐸 = 𝑊(𝑂) 𝑞 𝑗 2 σ 𝑗=1 log(𝑂) • Honore’s 𝑆 = 100 1− 𝑊(1,𝑂) 𝑊(𝑂) 𝑊(2,𝑂) • Sichel’s 𝑇 = 𝑂 Introduction Methodology Dataset Experiments Conclusion 7

  8. Dataset • Bot t-SNE visualization. (a) English, (b) Spanish • English: • 2,880 train and 1,240 dev • Spanish: • 2,080 train and 920 dev Introduction Methodology Dataset Experiments Conclusion 8

  9. Dataset • Diversity measures visualization for English Honore’s R Yule’s K Shannon’s H Simpson’s D Sichel’s S Introduction Methodology Dataset Experiments Conclusion 9

  10. Dataset • Diversity measures visualization for Spanish Honore’s R Yule’s K Shannon’s H Simpson’s D Sichel’s S Introduction Methodology Dataset Experiments Conclusion 10

  11. Experiments with language-specific training • Experiment 1: character n-grams range 2-4, w/o diversity measures. • Experiment 2: character n-grams 1-3, w/ diversity measures Introduction Methodology Dataset Experiments Conclusion 11

  12. Experiments with combined training • Experiment 3: same as E1, only combined training set • Experiment 4: same as E2, only combined training set Introduction Methodology Dataset Experiments Conclusion 12

  13. Official results • 13 th place in total, better than all baselines. Introduction Methodology Dataset Experiments Conclusion 13

  14. Conclusion and Future Work • A novel, yet simple method for bot detection on social media. • Language independent, since it does not use the language-specific features. • Disadvantage – doesn’t consider language -specific features which may be more fine-grained. • Explore the effect of the length of the user fingerprint on ability to differentiate bot and genuine users. • Explore the effect of the timespan the fingerprint is collected. • Explore the effect of using variable length fingerprint. • Explore possibility of unsupervised bot detection using diversity measures and clustering. Introduction Methodology Dataset Experiments Conclusion 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend