Crowdsourcing a Corpus for Clickbait Spoiling
July 4th, 2019 ◦ Jana Puschmann
Bachelor‘s thesis defence
- 1. Referee: Prof. Dr. Benno Stein
- 2. Referee: PD Dr. Andreas Jakoby
Crowdsourcing a Corpus for Clickbait Spoiling July 4th, 2019 Jana - - PowerPoint PPT Presentation
Bachelors thesis defence Crowdsourcing a Corpus for Clickbait Spoiling July 4th, 2019 Jana Puschmann 1. Referee: Prof. Dr. Benno Stein 2. Referee: PD Dr. Andreas Jakoby Clickbait The term clickbait refers to social media messages
July 4th, 2019 ◦ Jana Puschmann
Bachelor‘s thesis defence
2
The term “clickbait” refers to social media messages that are foremost designed to entice their readers into clicking an accompanying link to the posters’ website, at the expense of informativeness and objectiveness.
3 https://twitter.com/BuzzFeed/status/1143221248257748993
4 https://www.facebook.com/stern/posts/10156859926369652
5 https://twitter.com/HuffPost/status/1143895724645593089
6 https://twitter.com/Independent/status/1143793015523123201
7
8
9
10
11
https://twitter.com/SavedYouAClick/status/1090226980740628480
12
Crowdsourcing a Corpus for Clickbait Spoiling
13
14
Task Data Workers HIT Assignments
Review
15 https://www.clickbait-challenge.org/#task
their related articles were adopted
16 https://twitter.com/SavedYouAClick/status/1096773022449582080
17
18
19
20
21
22
23
Webis-Clickbait-19 corpus, which consists of 3,042 articles.
24 367 2675
Webis-Clickbait-19
Webis-Clickbait-17 Webis-Clickbait-18
25
Corpus Analysis
26
27
Random Ranking Naive Ranking Cosine Similarity Logistic Regression Precision@1 Precision@2 Precision@3 Precision@4 Precision@5 Precision@6 Precision@7 Precision@8 Precision@9 Precision@10 Average Rank
[Precision@n in %]
28
29
Random Ranking Naive Ranking Cosine Similarity Logistic Regression Precision@1 8.02 Precision@2 14.40 Precision@3 20.97 Precision@4 27.32 Precision@5 32.94 Precision@6 38.40 Precision@7 44.28 Precision@8 49.01 Precision@9 53.32 Precision@10 57.82 Average Rank 12.99
[Precision@n in %]
to spoil a clickbait than the following sentences
30
31
Random Ranking Naive Ranking Cosine Similarity Logistic Regression Precision@1 8.02 6.28 Precision@2 14.40 22.22 Precision@3 20.97 35.04 Precision@4 27.32 45.30 Precision@5 32.94 53.52 Precision@6 38.40 60.82 Precision@7 44.28 67.19 Precision@8 49.01 72.42 Precision@9 53.32 76.92 Precision@10 57.82 80.60 Average Rank 12.99 7.73
[Precision@n in %]
spoil it, than sentences that are not
32
33
Random Ranking Naive Ranking Cosine Similarity Logistic Regression Precision@1 8.02 6.28 12.89 Precision@2 14.40 22.22 27.94 Precision@3 20.97 35.04 40.04 Precision@4 27.32 45.30 49.28 Precision@5 32.94 53.52 58.71 Precision@6 38.40 60.82 64.50 Precision@7 44.28 67.19 70.45 Precision@8 49.01 72.42 75.12 Precision@9 53.32 76.92 78.96 Precision@10 57.82 80.60 81.95 Average Rank 12.99 7.73 7.06
[Precision@n in %]
previous approaches will increase the performance
34
35 4028 80809
Spoiler Sentences
Yes No
are part of a spoiler
36
Random Ranking Naive Ranking Cosine Similarity Logistic Regression Precision@1 8.02 6.28 12.89 13.91 Precision@2 14.40 22.22 27.94 32.58 Precision@3 20.97 35.04 40.04 46.25 Precision@4 27.32 45.30 49.28 55.06 Precision@5 32.94 53.52 58.71 62.46 Precision@6 38.40 60.82 64.50 68.61 Precision@7 44.28 67.19 70.45 73.93 Precision@8 49.01 72.42 75.12 78.11 Precision@9 53.32 76.92 78.96 81.79 Precision@10 57.82 80.60 81.95 84.29 Average Rank 12.99 7.73 7.06 6.71
[Precision@n in %]
37
Possible approaches to continue this work
38
thesis
39
40
41
42 Jurafsky et. al. [2018]
43 Jurafsky et. al. [2018]
Questions?
44
45
20
challenge 2017: Towards a regression model for clickbait strength. CoRR, abs/1812.10847, 2018. URL hTp://arxiv.org/abs/1812.10847.
Clickbait-Spoiling. Bachelor thesis, Bauhaus-Universität Weimar, Faculty Media, Media Informa<cs, December 2017. URL hTps://webis.de/ downloads/theses/papers/terakopyan_2017.pdf.
September 2018. URL hTps://web.stanford.edu/~jurafsky/slp3/ ed3book.pdf.