overview of the celebrity profiling task at pan 2020
play

Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY - PowerPoint PPT Presentation

Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY F LeFloid Kendall Neymar Jr @ LilTunechi @ LeFloid @ KendallJenner @ nejmarjr Matti Wiegmann , Benno Stein, Martin Potthast Bauhaus-Universitt Weimar webis.de Celebrity


  1. Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY F LeFloid Kendall Neymar Jr @ LilTunechi @ LeFloid @ KendallJenner @ nejmarjr Matti Wiegmann , Benno Stein, Martin Potthast Bauhaus-Universität Weimar webis.de

  2. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics. 1 Sep ’25 • WIEGMANN

  3. Celebrity Profiling Motivation Celebrity Profiling 2019: Given the Twitter feeds of the followers of a celebrity, determine the demographics. Why Celebrities? ❑ They write many public, high-quality texts. ❑ Many personal demographics are public knowledge. 2 Sep ’25 • WIEGMANN

  4. Celebrity Profiling Motivation Celebrity Profiling 2019: Given the Twitter feeds of the followers of a celebrity, determine the demographics. Why Celebrities? ❑ They write many public, high-quality texts. ❑ Many personal demographics are public knowledge. ➜ This is not the case for many users on social media. 3 Sep ’25 • WIEGMANN

  5. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the (?) of a celebrity, determine the demographics. How can we profile users that do not write a lot? 4 Sep ’25 • WIEGMANN

  6. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the Twitter profile of a celebrity, determine the demographics. How can we profile users that do not write a lot? ❑ Author Metadata: Biography, profile picture, ... 5 Sep ’25 • WIEGMANN

  7. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the behavior on Twitter of a celebrity, determine the demographics. How can we profile users that do not write a lot? ❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ... 6 Sep ’25 • WIEGMANN

  8. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics. How can we profile users that do not write a lot? ❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ... ❑ Social Graph: Homophily. 7 Sep ’25 • WIEGMANN

  9. Celebrity Profiling Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics. How can we profile users that do not write a lot? ❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ... ❑ Social Graph: Homophily and language variation. Feather Stylus Pen 8 Sep ’25 • WIEGMANN

  10. Celebrity Profiling Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics: ❑ Age , 60 2380 2380 Creator 40 Count Male Sports 1190 1190 20 Performer Female Politics 0 0 0 1940 1950 1960 1970 1980 1990 Gender Occupation Age 9 Sep ’25 • WIEGMANN

  11. Celebrity Profiling Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics: ❑ Age , ❑ Gender , 60 2380 2380 Creator 40 Count Male Sports 1190 1190 20 Performer Female Politics 0 0 0 1940 1950 1960 1970 1980 1990 Gender Occupation Age 10 Sep ’25 • WIEGMANN

  12. Celebrity Profiling Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics: ❑ Age , ❑ Gender , and ❑ Occupation . 60 2380 2380 Creator 40 Count Male Sports 1190 1190 20 Performer Female Politics 0 0 0 1940 1950 1960 1970 1980 1990 Gender Occupation Age 11 Sep ’25 • WIEGMANN

  13. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 28 ... 25 28 38 ➜ 25 ... 12 Sep ’25 • WIEGMANN

  14. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 2. Download follower network. 28 ... 25 28 38 ➜ 25 ... 13 Sep ’25 • WIEGMANN

  15. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 2. Download follower network. 3. Eliminate inactive users. ❑ Users with few connections in the network. 28 ... 25 28 38 ➜ 25 ... 14 Sep ’25 • WIEGMANN

  16. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 2. Download follower network. 3. Eliminate inactive users, passive users. ❑ Users with less than 100 original, English tweets. 28 ... 25 28 38 ➜ 25 ... 15 Sep ’25 • WIEGMANN

  17. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 2. Download follower network. 3. Eliminate inactive users, passive users, and other hub users. ❑ Users with many followers or atypical behavior. 28 ... 25 28 38 ➜ 25 ... 16 Sep ’25 • WIEGMANN

  18. Celebrity Profiling Data Dataset creation: 1. Extract celebrities with matching profiles from a Corpus [ACL 2019] . 2. Download follower network. 3. Eliminate inactive users, passive users, and other hub users. 4. Sample 10 followers per celebrity in a balanced dataset. ❑ Training dataset : 1,980 celebrities. ❑ Test dataset : 400 celebrities. 28 ... 25 28 38 38 ➜ 25 ... 17 Sep ’25 • WIEGMANN

  19. Celebrity Profiling Evaluation Performance is measured as the harmonic mean of the classwise averaged F 1 . 3 cRank = 1 1 1 F 1 , gender + F 1 , occupation + F 1 , age 18 Sep ’25 • WIEGMANN

  20. Celebrity Profiling Evaluation Performance is measured as the harmonic mean of the classwise averaged F 1 . 3 cRank = 1 1 1 F 1 , gender + F 1 , occupation + F 1 , age Variable-bucketed age evaluation: ❑ Predict author age directly. ❑ Count near-misses as correct, depending on the age of the author. ❑ Apply multi-class evaluation. 19 Sep ’25 • WIEGMANN

  21. Celebrity Profiling Results Baseline: ❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994. 20 Sep ’25 • WIEGMANN

  22. Celebrity Profiling Results Baseline: ❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994. Trained and tested on all followers’ tweets as a lower bound. Participant Test dataset cRank Age Gender Occupation baseline-follower 0.47 21 Sep ’25 • WIEGMANN

  23. Celebrity Profiling Results Baseline: ❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994. Trained and tested on all followers’ tweets as a lower bound. Trained and tested on the celebrities’ tweets as a goalpost. Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 baseline-follower 0.47 22 Sep ’25 • WIEGMANN

  24. Celebrity Profiling Results As proof of concept: Profiling users from their followers’ texts works. ❑ Baseline was beaten by a healty margin. Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 Hodge and Price 0.58 Koloski et al. 0.52 Alroobaea et al. 0.47 baseline-follower 0.47 23 Sep ’25 • WIEGMANN

  25. Celebrity Profiling Results As proof of concept: Profiling users from their followers’ texts works. ❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F 1 . Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 Hodge and Price 0.58 0.43 Koloski et al. 0.52 0.41 Alroobaea et al. 0.47 0.32 baseline-follower 0.47 0.36 24 Sep ’25 • WIEGMANN

  26. Celebrity Profiling Results As proof of concept: Profiling users from their followers’ texts works. ❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F 1 . ❑ Submissions skew towards the “Male” class. Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 0.75 Hodge and Price 0.58 0.43 0.68 Koloski et al. 0.52 0.41 0.62 Alroobaea et al. 0.47 0.32 0.70 baseline-follower 0.47 0.36 0.58 25 Sep ’25 • WIEGMANN

  27. Celebrity Profiling Results As proof of concept: Profiling users from their followers’ texts works. ❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F 1 . ❑ Submissions skew towards the “Male” class. ❑ Submissions beat the oracle on occupation, although “Creators” is a problematic class (.46 F 1 ). Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 0.75 0.70 Hodge and Price 0.58 0.43 0.68 0.71 Koloski et al. 0.52 0.41 0.62 0.60 Alroobaea et al. 0.47 0.32 0.70 0.60 baseline-follower 0.47 0.36 0.58 0.52 26 Sep ’25 • WIEGMANN

  28. Celebrity Profiling Outlook We still have many open questions: ❑ Does the communities’ text reflect the demographics of a celebrity? 27 Sep ’25 • WIEGMANN

  29. Celebrity Profiling Outlook We still have many open questions: ❑ Does the communities’ text reflect the demographics of a celebrity? ❑ Do celebrities influence the writing of their fans? ❑ What are the rules of style formation? See you at CLEF 2021! 28 Sep ’25 • WIEGMANN

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend