Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY - - PowerPoint PPT Presentation

overview of the celebrity profiling task at pan 2020
SMART_READER_LITE
LIVE PREVIEW

Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY - - PowerPoint PPT Presentation

Overview of the Celebrity Profiling Task at PAN 2020 Lil Wayne WEEZY F LeFloid Kendall Neymar Jr @ LilTunechi @ LeFloid @ KendallJenner @ nejmarjr Matti Wiegmann , Benno Stein, Martin Potthast Bauhaus-Universitt Weimar webis.de Celebrity


slide-1
SLIDE 1

Overview of the Celebrity Profiling Task at PAN 2020

LeFloid

@LeFloid

Kendall

@KendallJenner

Neymar Jr

@nejmarjr

Lil Wayne WEEZY F

@LilTunechi

Matti Wiegmann, Benno Stein, Martin Potthast Bauhaus-Universität Weimar webis.de

slide-2
SLIDE 2

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics.

1

Sep ’25 • WIEGMANN

slide-3
SLIDE 3

Celebrity Profiling

Motivation Celebrity Profiling 2019: Given the Twitter feeds of the followers of a celebrity, determine the demographics. Why Celebrities?

❑ They write many public, high-quality texts. ❑ Many personal demographics are public knowledge.

2

Sep ’25 • WIEGMANN

slide-4
SLIDE 4

Celebrity Profiling

Motivation Celebrity Profiling 2019: Given the Twitter feeds of the followers of a celebrity, determine the demographics. Why Celebrities?

❑ They write many public, high-quality texts. ❑ Many personal demographics are public knowledge.

➜ This is not the case for many users on social media.

3

Sep ’25 • WIEGMANN

slide-5
SLIDE 5

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the (?) of a celebrity, determine the demographics. How can we profile users that do not write a lot?

4

Sep ’25 • WIEGMANN

slide-6
SLIDE 6

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the Twitter profile of a celebrity, determine the demographics. How can we profile users that do not write a lot?

❑ Author Metadata: Biography, profile picture, ...

5

Sep ’25 • WIEGMANN

slide-7
SLIDE 7

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the behavior on Twitter of a celebrity, determine the demographics. How can we profile users that do not write a lot?

❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ...

6

Sep ’25 • WIEGMANN

slide-8
SLIDE 8

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics. How can we profile users that do not write a lot?

❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ... ❑ Social Graph: Homophily.

7

Sep ’25 • WIEGMANN

slide-9
SLIDE 9

Celebrity Profiling

Motivation Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics. How can we profile users that do not write a lot?

❑ Author Metadata: Biography, profile picture, ... ❑ Author Behavior: Retweets, Likes, ... ❑ Social Graph: Homophily and language variation.

Stylus Pen Feather

8

Sep ’25 • WIEGMANN

slide-10
SLIDE 10

Celebrity Profiling

Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics:

❑ Age,

Age Count

1940 1950 1960 1970 1980 1990 20 40 60

Male Female Creator Sports Performer Politics

1190 2380

Gender Occupation

1190 2380

9

Sep ’25 • WIEGMANN

slide-11
SLIDE 11

Celebrity Profiling

Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics:

❑ Age, ❑ Gender,

Age Count

1940 1950 1960 1970 1980 1990 20 40 60

Male Female Creator Sports Performer Politics

1190 2380

Gender Occupation

1190 2380

10

Sep ’25 • WIEGMANN

slide-12
SLIDE 12

Celebrity Profiling

Task Celebrity Profiling 2020: Given the Twitter feeds of the followers of a celebrity, determine the demographics:

❑ Age, ❑ Gender, and ❑ Occupation.

Age Count

1940 1950 1960 1970 1980 1990 20 40 60

Male Female Creator Sports Performer Politics

1190 2380

Gender Occupation

1190 2380

11

Sep ’25 • WIEGMANN

slide-13
SLIDE 13

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].

38 28 25

... ... 25 28

12

Sep ’25 • WIEGMANN

slide-14
SLIDE 14

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].
  • 2. Download follower network.

38 28 25

... ... 25 28

13

Sep ’25 • WIEGMANN

slide-15
SLIDE 15

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].
  • 2. Download follower network.
  • 3. Eliminate inactive users.

❑ Users with few connections in the network.

38 28 25

... ... 25 28

14

Sep ’25 • WIEGMANN

slide-16
SLIDE 16

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].
  • 2. Download follower network.
  • 3. Eliminate inactive users, passive users.

❑ Users with less than 100 original, English tweets.

38 28 25

... ... 25 28

15

Sep ’25 • WIEGMANN

slide-17
SLIDE 17

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].
  • 2. Download follower network.
  • 3. Eliminate inactive users, passive users, and other hub users.

❑ Users with many followers or atypical behavior.

38 28 25

... ... 25 28

16

Sep ’25 • WIEGMANN

slide-18
SLIDE 18

Celebrity Profiling

Data Dataset creation:

  • 1. Extract celebrities with matching profiles from a Corpus [ACL 2019].
  • 2. Download follower network.
  • 3. Eliminate inactive users, passive users, and other hub users.
  • 4. Sample 10 followers per celebrity in a balanced dataset.

❑ Training dataset: 1,980 celebrities. ❑ Test dataset: 400 celebrities.

38 28 25

... ... 25 28 38

17

Sep ’25 • WIEGMANN

slide-19
SLIDE 19

Celebrity Profiling

Evaluation Performance is measured as the harmonic mean of the classwise averaged F1. cRank = 3

1 F1,gender + 1 F1,occupation + 1 F1,age

18

Sep ’25 • WIEGMANN

slide-20
SLIDE 20

Celebrity Profiling

Evaluation Performance is measured as the harmonic mean of the classwise averaged F1. cRank = 3

1 F1,gender + 1 F1,occupation + 1 F1,age

Variable-bucketed age evaluation:

❑ Predict author age directly. ❑ Count near-misses as correct, depending on the age of the author. ❑ Apply multi-class evaluation.

19

Sep ’25 • WIEGMANN

slide-21
SLIDE 21

Celebrity Profiling

Results Baseline:

❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994.

20

Sep ’25 • WIEGMANN

slide-22
SLIDE 22

Celebrity Profiling

Results Baseline:

❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994.

Trained and tested on all followers’ tweets as a lower bound. Participant Test dataset cRank Age Gender Occupation baseline-follower 0.47

21

Sep ’25 • WIEGMANN

slide-23
SLIDE 23

Celebrity Profiling

Results Baseline:

❑ Algorithm: Logistic regression. ❑ Features: Bags of word 1 and 2-grams, TD-IDF weighted. ❑ Age was predicted in 5 classes: 1947, 1963, 1975, 1985, and 1994.

Trained and tested on all followers’ tweets as a lower bound. Trained and tested on the celebrities’ tweets as a goalpost. Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 baseline-follower 0.47

22

Sep ’25 • WIEGMANN

slide-24
SLIDE 24

Celebrity Profiling

Results As proof of concept: Profiling users from their followers’ texts works.

❑ Baseline was beaten by a healty margin.

Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 Hodge and Price 0.58 Koloski et al. 0.52 Alroobaea et al. 0.47 baseline-follower 0.47

23

Sep ’25 • WIEGMANN

slide-25
SLIDE 25

Celebrity Profiling

Results As proof of concept: Profiling users from their followers’ texts works.

❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F1.

Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 Hodge and Price 0.58 0.43 Koloski et al. 0.52 0.41 Alroobaea et al. 0.47 0.32 baseline-follower 0.47 0.36

24

Sep ’25 • WIEGMANN

slide-26
SLIDE 26

Celebrity Profiling

Results As proof of concept: Profiling users from their followers’ texts works.

❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F1. ❑ Submissions skew towards the “Male” class.

Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 0.75 Hodge and Price 0.58 0.43 0.68 Koloski et al. 0.52 0.41 0.62 Alroobaea et al. 0.47 0.32 0.70 baseline-follower 0.47 0.36 0.58

25

Sep ’25 • WIEGMANN

slide-27
SLIDE 27

Celebrity Profiling

Results As proof of concept: Profiling users from their followers’ texts works.

❑ Baseline was beaten by a healty margin. ❑ Submissions predict young users (20-30) better by .2 F1. ❑ Submissions skew towards the “Male” class. ❑ Submissions beat the oracle on occupation, although “Creators” is a

problematic class (.46 F1). Participant Test dataset cRank Age Gender Occupation baseline-oracle 0.63 0.50 0.75 0.70 Hodge and Price 0.58 0.43 0.68 0.71 Koloski et al. 0.52 0.41 0.62 0.60 Alroobaea et al. 0.47 0.32 0.70 0.60 baseline-follower 0.47 0.36 0.58 0.52

26

Sep ’25 • WIEGMANN

slide-28
SLIDE 28

Celebrity Profiling

Outlook We still have many open questions:

❑ Does the communities’ text reflect the demographics of a celebrity?

27

Sep ’25 • WIEGMANN

slide-29
SLIDE 29

Celebrity Profiling

Outlook We still have many open questions:

❑ Does the communities’ text reflect the demographics of a celebrity? ❑ Do celebrities influence the writing of their fans? ❑ What are the rules of style formation?

See you at CLEF 2021!

28

Sep ’25 • WIEGMANN