Towards Modelling Language Innovation Acceptance in Online Social Networks
Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM’16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee
1
Towards Modelling Language Innovation Acceptance in Online Social - - PowerPoint PPT Presentation
Towards Modelling Language Innovation Acceptance in Online Social Networks Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee 1 Outline
Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM’16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee
1
2
In this work we demonstrate how such innovations in language can be identified across two different OSN’s Online Social Networks through the operationalisation of known language acceptance models that incorporate relatively simple statistical tests.
3
your babe, Before Anyone Else ur bae
// 2014 Pharrell “Come Get It Bae”
4
5
Output Input Operationalisation 1 . Frequency 2 . Form 3 . Meaning 4 . Classification Pre-Processing Data Grouping
6
TwitterNLP’s POS tagger : http://www.cs.cmu.edu/~ark/TweetNLP/
through using regex long pattern repetitions of the same letter were truncated down to just three characters, e.g. soooooooo would be normalised to soo.
7
Time : To group the data by time a function weekofyear(e) returns the week the Tweet or Reddit post was created on.
8
Word I am a girl I watch the movie Time (weeks ) 1 1 1 1 2 2 2 2 Word I, am, a, girl I, watch, the, movie Time (weeks ) 1 2
Community :
sub- reddit), regional (collection of subreddits) and global (all subreddits).
Tweets could be clustered through the use of the longitude and latitude associated with each tweet.Twitter API (coordinates)https:// dev.twitter.com/overview/terms/geo-developer-guidelines
9
Community :
subreddit (the lowest community in Reddit), potentially containing a greater convergence on topic and language used
are global understood across all sub communities.
10
Word I am a boy I watch the show Commu nity (Twitter/ Reddit) Twitter Twitter Twitter Twitter Reddit Reddit Reddit Reddit Word I, am, a, boy I, watch, the, show Community (Twitter/Reddit) Twitter Reddit
Frequency :
11 Word I, am, a, girl, I, am, a, boy I, watch, the, movie, I, watch, the, show …… When, bae, eat… Time (weeks) 1 2 …… n
Word I am a girl boy I watch the movie show …… When bae eat… Time (weeks) 1 1 1 1 1 2 2 2 2 2 …… n n n T(w, t) 2/8 2/8 2/8 1/8 1/8 2/8 2/8 2/8 1/8 1/8 …… …… …… ……
Form :
12 Word I, am, watching, I, am, listening, homosexual I, am, homosexual, they, are, homogeneous, joking ……
When, bae, eating, homogeneous …
Time (weeks) 1 2 …… n
Word homo homo …… homo …… Time (weeks) 1 2 …… n n MP(w, t, P) 1/7 2/7 …… …… …… Word ing ing …… ing …… Time (weeks) 2 1 …… n n MS(w, t, S) 2/7 1/7 …… …… ……
Meaning :
http://city.shaform.com/blog/2014/11/04/word2vec.html
c :
word2vec to each community (c)
13
Meaning :
14
Meaning : similarity between communities while still showing
too diverse for general usage (i.e. too colloquial), while a word with a value near 1 would potentially indicate that it is too specific.
15
Classification : Increase/Decrease
16 t 1 2 … … n Tw 9 18 … … 9*n t 1 2 … … n Tw 1000 900 … … 5
bae Increase Decrease TGIF
Limitations : The three method proposed though do not cover all the categories proposed through the VFRGT and FUDGE frameworks
17
18
Output Input Operationalisation 1 . Frequency 2 . Form 3 . Meaning 4 . Classification Pre-Processing Data Grouping
19
20
21
22
classified as an innovation did not appear across all the communities, but when they did they they appeared at a low rank and thus the learned embedding, from the word2vec function, generated sparse words within the context of the innovation.
23
tests one is able to use known linguistic models to assess language and its change in on-line social networks
can show variation in innovations usage and persistence
make up the networks, where we have shown how varying community structure has poten- tially different language dynamics.
24
look into identifying the dynamics of language innovations within the context of users, along with the influence communities have
25