Towards Modelling Language Innovation Acceptance in Online Social - - PowerPoint PPT Presentation

towards modelling language innovation acceptance in
SMART_READER_LITE
LIVE PREVIEW

Towards Modelling Language Innovation Acceptance in Online Social - - PowerPoint PPT Presentation

Towards Modelling Language Innovation Acceptance in Online Social Networks Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee 1 Outline


slide-1
SLIDE 1

Towards Modelling Language Innovation Acceptance in Online Social Networks

Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM’16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee

1

slide-2
SLIDE 2

Outline

  • Introduction
  • Approach
  • Experiment
  • Conclusion

2

slide-3
SLIDE 3

Introduction

  • Goal :

In this work we demonstrate how such innovations in language can be identified across two different OSN’s Online Social Networks through the operationalisation of known language acceptance models that incorporate relatively simple statistical tests.

3

your babe, Before Anyone Else ur bae

// 2014 Pharrell “Come Get It Bae”

slide-4
SLIDE 4

Introduction(cont.)

  • Reddit : https://www.reddit.com
  • Twitter : https://twitter.com

4

slide-5
SLIDE 5
  • Framework :

Introduction(cont.)

5

Output Input Operationalisation 1 . Frequency 2 . Form 3 . Meaning 4 . Classification Pre-Processing Data Grouping

slide-6
SLIDE 6

Outline

  • Introduction
  • Approach
  • Experiment
  • Conclusion

6

slide-7
SLIDE 7

Approach

  • Pre-Processing :

TwitterNLP’s POS tagger : http://www.cs.cmu.edu/~ark/TweetNLP/

  • remove : hashtags(#), mentions(@), HTTP links

through using regex long pattern repetitions of the same letter were truncated down to just three characters, e.g. soooooooo would be normalised to soo.

7

slide-8
SLIDE 8

Approach(cont.)

  • Data Grouping :

Time : To group the data by time a function weekofyear(e) returns the week the Tweet or Reddit post was created on.

8

Word I am a girl I watch the movie Time (weeks ) 1 1 1 1 2 2 2 2 Word I, am, a, girl I, watch, the, movie Time (weeks ) 1 2

slide-9
SLIDE 9

Approach(cont.)

  • Data Grouping :

Community :

  • 1. Reddit : Louvain community detection algorithm
  • Dataset being broken down into on three community levels : local (the

sub- reddit), regional (collection of subreddits) and global (all subreddits).

  • 2. Twitter : geographically bound from within the UK this meant that

Tweets could be clustered through the use of the longitude and latitude associated with each tweet.Twitter API (coordinates)https:// dev.twitter.com/overview/terms/geo-developer-guidelines

9

slide-10
SLIDE 10

Approach(cont.)

  • Data Grouping :

Community :

  • low-level community defined by a postcode LA1 could be compared to a

subreddit (the lowest community in Reddit), potentially containing a greater convergence on topic and language used

  • higher level community could be classed as showing the ‘general’ patterns that

are global understood across all sub communities.

10

Word I am a boy I watch the show Commu nity (Twitter/ Reddit) Twitter Twitter Twitter Twitter Reddit Reddit Reddit Reddit Word I, am, a, boy I, watch, the, show Community (Twitter/Reddit) Twitter Reddit

slide-11
SLIDE 11

Approach(cont.)

  • Operationalisation :

Frequency :

11 Word I, am, a, girl, I, am, a, boy I, watch, the, movie, I, watch, the, show …… When, bae, eat… Time (weeks) 1 2 …… n

Word I am a girl boy I watch the movie show …… When bae eat… Time (weeks) 1 1 1 1 1 2 2 2 2 2 …… n n n T(w, t) 2/8 2/8 2/8 1/8 1/8 2/8 2/8 2/8 1/8 1/8 …… …… …… ……

slide-12
SLIDE 12

Approach(cont.)

  • Operationalisation :

Form :

12 Word I, am, watching, I, am, listening, homosexual I, am, homosexual, they, are, homogeneous, joking ……

When, bae, eating, homogeneous …

Time (weeks) 1 2 …… n

Word homo homo …… homo …… Time (weeks) 1 2 …… n n MP(w, t, P) 1/7 2/7 …… …… …… Word ing ing …… ing …… Time (weeks) 2 1 …… n n MS(w, t, S) 2/7 1/7 …… …… ……

slide-13
SLIDE 13

Approach(cont.)

  • Operationalisation :

Meaning :

  • Word2vec

http://city.shaform.com/blog/2014/11/04/word2vec.html

  • W2Vt

c :

word2vec to each community (c)

13

slide-14
SLIDE 14

Approach(cont.)

  • Operationalisation :

Meaning :

14

slide-15
SLIDE 15

Approach(cont.)

  • Operationalisation :

Meaning : similarity between communities while still showing

  • variation. If the value is near 0 then it could mean that the word is

too diverse for general usage (i.e. too colloquial), while a word with a value near 1 would potentially indicate that it is too specific.

15

slide-16
SLIDE 16

Approach(cont.)

  • Operationalisation :

Classification : Increase/Decrease

  • : Spearman’s Rank

16 t 1 2 … … n Tw 9 18 … … 9*n t 1 2 … … n Tw 1000 900 … … 5

bae Increase Decrease TGIF

slide-17
SLIDE 17

Approach(cont.)

  • Operationalisation :

Limitations : The three method proposed though do not cover all the categories proposed through the VFRGT and FUDGE frameworks

17

slide-18
SLIDE 18

Approach(cont.)

18

Output Input Operationalisation 1 . Frequency 2 . Form 3 . Meaning 4 . Classification Pre-Processing Data Grouping

  • Framework :
slide-19
SLIDE 19

Outline

  • Introduction
  • Approach
  • Experiment
  • Conclusion

19

slide-20
SLIDE 20

Experiment

  • Frequency :

20

slide-21
SLIDE 21

Experiment(cont.)

21

  • Form :
slide-22
SLIDE 22

Experiment(cont.)

22

  • Meaning :

classified as an innovation did not appear across all the communities, but when they did they they appeared at a low rank and thus the learned embedding, from the word2vec function, generated sparse words within the context of the innovation.

slide-23
SLIDE 23

Outline

  • Introduction
  • Approach
  • Experiment
  • Conclusion

23

slide-24
SLIDE 24

Conclusion

  • demonstrated that through the use of relatively simple statistical

tests one is able to use known linguistic models to assess language and its change in on-line social networks

  • when the methods are applied to two on-line social networks, they

can show variation in innovations usage and persistence

  • these methods can be applied to the individual communities that

make up the networks, where we have shown how varying community structure has poten- tially different language dynamics.

24

slide-25
SLIDE 25

Conclusion(cont.)

  • Future work :

look into identifying the dynamics of language innovations within the context of users, along with the influence communities have

  • ver language and innovation diffusion.

25