On the Dynamics of Topic-Based Communites in Online - - PowerPoint PPT Presentation

on the dynamics of topic based communites in online
SMART_READER_LITE
LIVE PREVIEW

On the Dynamics of Topic-Based Communites in Online - - PowerPoint PPT Presentation

On the Dynamics of Topic-Based Communites in Online Knowledge-Sharing Networks Anna Guimar aes, Ana Paula Couto da Silva, Jussara Almeida Department of Computer Science - UFMG (Brazil) September 21, 2015 Introduction Online


slide-1
SLIDE 1

On the Dynamics of Topic-Based Communites in Online Knowledge-Sharing Networks

Anna Guimar˜ aes, Ana Paula Couto da Silva, Jussara Almeida Department of Computer Science - UFMG (Brazil)

September 21, 2015

slide-2
SLIDE 2

Introduction

  • Online Knowledge-Sharing Networks

– Wikis, Q&A sites, discussion forums – User-created and maintained discussions – Wealth of knowledge

2

slide-3
SLIDE 3

Introduction

  • Online Knowledge-Sharing Networks

– Wikis, Q&A sites, discussion forums – User-created and maintained discussions – Wealth of knowledge

  • Prior research focus on knowledge extraction by:

– Detecting quality content [Agichtein et al., 2008] – Ranking questions and answers [Dalip et al., 2013] – Identifying expert users [Ravi et al., 2014, Wang et al., 2013]

2

slide-4
SLIDE 4

Introduction

  • More than repositories for knowledge!

– Community structure surrounding discussions – Topics and communities subject to temporal changes – Multiple topics, multiple communities

  • This study:

– Community approach to knowledge-sharing networks – Characterization and modeling of community evolution

3

slide-5
SLIDE 5

Case Study: Stack Overflow

4

slide-6
SLIDE 6

Case Study: Stack Overflow

Tags

4

slide-7
SLIDE 7

Topic-Based Communities in Stack Overflow

  • Communities centered around topics

– Topics are explicity defined – Independent from social interaction graph

  • Non-exclusive membership to multiple communities

5

slide-8
SLIDE 8

Stack Overflow Dataset

  • User activity

– User ID, Tag ID, Time stamp

  • Data covering a six-year period

– 2008–2014

Tags Posts Users 400 19.8 million 1.7 million

6

slide-9
SLIDE 9

Topic-Based Communities in Stack Overflow

  • Temporal analyses of community activity in terms of:

– How user behavior affects community sustainability – How users relate to communities in the long run – How users divide their attention across different communities – How communities affect one another

7

slide-10
SLIDE 10

Communities in Stack Overflow: Findings

  • Significant revisiting behavior

– Users continue to contribute to a same community – Revisitors to a community grow more significant over time Mean Fraction of Revisits

1st month 6th month 12th month Revisitors 0.20 0.44 0.50 Revisits 0.27 0.46 0.50

8

slide-11
SLIDE 11

Communities in Stack Overflow: Findings

  • Participation in multiple communities

– 32% of users participate in up to 3 communities – Average user participates in 17 communities – Decaying pattern of activity over time

2 4 6 8 10 12 Months 5 10 15 20 25 30 Communities

18

13

2 4 6 8 10 12 Months 10 20 30 40 50 60 70 80 Posts

42

28 9

slide-12
SLIDE 12

Communities in Stack Overflow: Findings

  • Migrating behavior

– Users traverse different communities over time – Shared member base across communities

Ruby on Rails 3 → Ruby on Rails 4

Feb 2013 Aug 2013 Feb 2014 Aug 2014

Months

100 200 300 400 500 600 700 800 900

# Members

Rails 3 Members New Members

10

slide-13
SLIDE 13

Communities in Stack Overflow: Findings

  • Migrating behavior

– Users traverse different communities over time – Shared member base across communities

MySQL → PHP

Feb 2013 Aug 2013 Feb 2014 Aug 2014

Months

1000 2000 3000 4000 5000 6000

# Members

MySQL New Members

10

slide-14
SLIDE 14

Communities in Stack Overflow: Findings

  • Key aspects dictating community evolution

– Intra-community aspects

– User revisits – Continued activity

– Inter-community aspects

– Shared member base – User migration

11

slide-15
SLIDE 15

How can we then describe community evolution?

12

slide-16
SLIDE 16

CERIS Model

  • CERIS

– Community Evolution model with Revisits and Inter-community effectS

  • Goal: describe community activity (number of posts) over time
  • Incorporates revisits and community relationships

13

slide-17
SLIDE 17

CERIS Model

  • CERIS extends state-of-the-art models

– Phoenix-R evolution model with revisits [Figueiredo et al., 2014] – Competition model [Beutel et al., 2012]

  • Epidemiology approach to network dynamics

– Objects in the network are modeled as infections

14

slide-18
SLIDE 18

CERIS Model

  • Users are initially exposed to different communities

S

15

slide-19
SLIDE 19

CERIS Model

  • Users become infected by participating in a community

S I1 I2

β2 β1

15

slide-20
SLIDE 20

CERIS Model

  • Users can recover by ceasing activity in a community

S I1 I2

β2 β1 γ1 γ2

15

slide-21
SLIDE 21

CERIS Model

  • Or they can be infected by additional communities

S I1 I2

β2 β1 γ1 γ2

I1,2

γ2 γ1 εβ2 εβ1

15

slide-22
SLIDE 22

CERIS Model

  • Revisits to a same community captured by hidden states

S I1 I2

β2 β1 γ1 γ2

I1,2

γ2 γ1 εβ2 εβ1

V1 V2 V1,2

ω1,2 ω1 ω2

15

slide-23
SLIDE 23

CERIS Model

I1,2 I1 S I2 V1 V2 V1,2

γ2 γ1 ω1,2 εβ2 γ1 ω1 β2 β1 εβ1 γ2 ω2

ˆ v1 V1 V1,2 V1 V1,2 s1 sn ...

+ + + +

16

slide-24
SLIDE 24

CERIS Model

  • Analyzes the time series for the number of posts in the

communities simultaneously

  • Contagious process occurs following “shocks”

– Wavelets method to identify activity peaks as shock candidates – e.g. When a new related community becomes active

  • Model fitting with the Levenberg-Marquardt algorithm and

Minimum Description Length

17

slide-25
SLIDE 25

CERIS Model Results

HTML and CSS

2009 2010 2011 2012 2013 2014 10000 20000 30000 40000 50000 60000 70000

css html model

iOS versions

Jan 2012 Jan 2013 Jan 2014 Jul Jul Jul Jul 50 100 150 200 250 300 350 400

ios7 ios6 ios5 model

18

slide-26
SLIDE 26

CERIS Model Results

  • Model results:

– Reasonably accurate fittings – Captures different patterns of activity – Captures concurrent evolution of related communities RMSE

HTML and CSS iOS versions All (mean, daily) 3046.895 13.612 21.131

19

slide-27
SLIDE 27

CERIS Model Results

  • Model outputs used to quantify the relationship between

communities

  • Flow of users between communities:

flowC1,C2(t) = εβ2(t) flowC2,C1(t) = εβ1(t)

20

slide-28
SLIDE 28

CERIS Model Results

Top 100

20 40 60 80 100

Communities

20 40 60 80 100

Communities

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Top 15

j a v a j a v a s c r i p t c # p h p a n d r

  • i

d j q u e r y p y t h

  • n

h t m l c + + i

  • s

m y s q l c s s a s p . n e t

  • b

j e c t i v e

  • c

. n e t java javascript c# php android jquery python html c++ ios mysql css asp.net

  • bjective-c

.net

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

21

slide-29
SLIDE 29

Conclusions

  • Knowledge-sharing networks as a community environment

– Topic-based communities defined by users interacting with topics

  • f their interest
  • Investigation of topic-based communities in Stack Overflow

– User activity in terms of communities they belong to – Impact of related communities

  • New model to describe community evolution

– Incorporates key factors behind community activity – Good portrayal of the co-evolution of multiple communities

22

slide-30
SLIDE 30

Thank you!

Anna Guimar˜ aes anna@dcc.ufmg.br

23

slide-31
SLIDE 31

References I

Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. (2008). Finding High-Quality Content in Social Media. In Proc. WSDM. Beutel, A., Prakash, B. A., Rosenfeld, R., and Faloutsos, C. (2012). Interacting Viruses in Networks: Can Both Survive? In Proc. ACM SIGKDD.

24

slide-32
SLIDE 32

References II

Dalip, D. H., Gon¸ calves, M. A., Cristo, M., and Calado, P. (2013). Exploiting User Feedback to Learn to Rank Answers in Q&A Forums: A Case Study with Stack Overflow. In Proc. ACM SIGIR. Figueiredo, F., Almeida, J. M., Matsubara, Y., Ribeiro, B., and Faloutsos, C. (2014). Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries.

  • Proc. PKDD.

25

slide-33
SLIDE 33

References III

Hansen, M. H. and Yu, B. (2001). Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association, 96(454). Mor´ e, J. J. (1978). The levenberg-marquardt algorithm: implementation and theory. In Numerical analysis, pages 105–116. Springer. Ravi, S., Pang, B., Rastogi, V., and Kumar, R. (2014). Great Question! Question Quality in Community Q&A. In Proc. ICWSM.

26

slide-34
SLIDE 34

References IV

Wang, X., Butler, B. S., and Ren, Y. (2013). The impact of membership overlap on growth: An ecological competition view of online groups. Organization Science, 24(2):414–431.

27