Authorship Attribution of Micro-Messages
Roy Schwartz+, Oren Tsur+, Ari Rappoport+ and Moshe Koppel*
+The Hebrew University, *Bar Ilan University
Authorship Attribution of Micro-Messages Roy Schwartz + , Oren Tsur + - - PowerPoint PPT Presentation
Authorship Attribution of Micro-Messages Roy Schwartz + , Oren Tsur + , Ari Rappoport + and Moshe Koppel * + The Hebrew University, * Bar Ilan University In proceedings of EMNLP 2013 Overview Authorship attribution of tweets Users tend to
+The Hebrew University, *Bar Ilan University
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Flexible patterns – Significant improvement over our baselines
2
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
question”
thou Romeo”
word, is what people fear most”
we shall shelter Him underground.”
is the one most listened to, and who teaches the best.“
continents, but new men.“
3
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
3
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
4
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
4
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
4
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
4
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 4
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
5
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
5
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Tweets are shorter (14.2 words vs. 20.9) – Tweets have smaller sentence length variance (6.4 vs. 21.4)
5
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– SVM with linear kernel; character n-grams, word n-gram, flexible patterns features
– Varying training set sizes, varying number of authors, recall-precision tradeoff
– 6.1% improvement over current state-of-the-art
6
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 6
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
7
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– A feature that is unique to a specific author A – Appears in at least k% of A’s training set, while not appearing in the training set of any other user
7
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 8
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 9
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
10
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
10
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
10
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 11
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Character n-grams, word n-grams
– Multiclass SVM with a linear kernel
12
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– 10 groups of 50 authors each, 50-1000 training tweets pet author
13
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– 50-1000 authors, 200 training tweets per author
13
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– “don’t know” option
13
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 14
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
14
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
14
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 15
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
15
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 16
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
16
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
16
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Capture potentially unseen word n-grams
– Language and domain independent
17
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Go to the house of the rising sun – Can you hear the sound of the wind?
– John is as clever as Mary . – Dogs run as fast as 30mph .
18
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Extraction of semantic relationships (Davidov, Rappoport and Koppel, ACL 2007) – Enhancing lexical concepts (Davidov and Rappoport, EMNLP 2009) – Detection of sarcasm (Tsur, Davidov and Rappoport, ICWSM 2010) – Sentiment analysis (Davidov, Tsur and Rappoport, Coling 2010) – …
19
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– “the way I treated her” – “half of the things I’ve seen” – “the friends I have had for years” – “in the neighborhood I grew up in”
20
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– “the way I treated her” – “half of the things I’ve seen” – “the friends I have had for years” – “in the neighborhood I grew up in”
20
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– “the way I treated her” – “half of the things I’ve seen” – “the friends I have had for years” – “in the neighborhood I grew up in”
20
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– 2.9% improvement over character n-grams – 1.5% improvement over character n-grams + word n-grams
21
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– Using the same dataset
21
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
– 6.1% improvement over current state-of-the-art
– A partial explanation for our high-quality results
– Statistically significant improvement
22
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
23
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013
23
Authorship Attribution of Micro-Messages @ Schwartz et al., EMNLP 2013 24