large scale social network analysis of facebook data
play

Large-Scale Social Network Analysis of Facebook Data Emma S. Spiro 1 - PowerPoint PPT Presentation

Large-Scale Social Network Analysis of Facebook Data Emma S. Spiro 1 Zack W. Almquist 1 Carter T. Butts 1 , 2 1 Department of Sociology 2 Institute for Mathematical Behavioral Sciences University of California Irvine Presented at MURI All


  1. Large-Scale Social Network Analysis of Facebook Data Emma S. Spiro 1 Zack W. Almquist 1 Carter T. Butts 1 , 2 1 Department of Sociology 2 Institute for Mathematical Behavioral Sciences University of California – Irvine Presented at MURI All Hands Meeting January 10, 2012 This material is based on research supported by the Office of Naval Research under award N00014-08-1-1015. As well as the National Science Foundation under awards BCS-0827027 and OIA-1028394. Scalable Methods for the Analysis of Network-Based Data E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  2. MURI Themes and Goals ◮ Large-scale social networks ◮ Spatially embedded networks ◮ Rich models with complex covariates ◮ Scalable methods and models E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  3. Spatially Embedded Networks ◮ Social interaction occurs within a spatial context ◮ Opportunities for, costs of interaction strongly influenced by spatial factors ◮ Interest in spatial factors per se (e.g., neighborhood research) ◮ Propinquity known to be a powerful determinant of tie probability ◮ Extension to attribute spaces (Blau space) ◮ Useful way to parameterize homophily, clustering effects ◮ Simple idea: assign vertices to spatial locations ◮ Location function: ℓ : V ⇒ S where S is an abstract space. ◮ Take ℓ as given fixed, e.g. latitude/longitude coordinates E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  4. Spatial Bernoulli Graphs, (Butts 2002) ◮ A simple family of models for spatially embedded social networks � � � Pr( Y = y | D ) = Y ij = y ij |F d ( D ij ) (1) B { i , j } ◮ Y ∈ { 0 , 1 } N × N ◮ D ∈ [0 , ∞ ) N × N ◮ F d : [0 , ∞ ) �→ [0 , 1] ◮ Assumes that dependence among edges is absorbed by the distance structure – edges conditionally independent. ◮ Related to gravity model from geography. ◮ Advantage: Estimable under sampling and scalable ◮ How does distance effect tie probability? E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  5. Spatial Interaction Function ◮ Decay as a power law in distance p b F d ( x ) = (1 + α x ) γ where 0 ≤ p b ≤ 1 is a baseline tie probability, α ≥ 0 is a scaling parameter, and γ > 0 is the exponent which controls the distance effect ◮ Attenuated power law, arctangent decay, etc. E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  6. Spatial Interaction Function ◮ Small changes in the SIF Power Law 1.0 can make big differences in 0.8 F d ( x ) = 1 ( 1 + 8 x ) 3 the underlying network 0.6 0.4 0.2 ◮ Changes in the functional 0.0 0.0 0.2 0.4 0.6 0.8 1.0 form of the SIF can also Distance make a big difference Attenuated Power Law 1.0 ◮ Notice that the difference F d ( x ) = 1 ( 1 + ( 8 x ) 3 ) 0.8 0.6 between the APL and the 0.4 PL is not visually striking 0.2 0.0 but the resulting networks 0.0 0.2 0.4 0.6 0.8 1.0 Distance are quite different E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  7. Theories of the Distance Effect ◮ How does distance effect tie probability? ◮ Is the way in which distance matters homogeneous? ◮ Vary along lines of status or prestige ◮ Want to allow for inhomogeneity in the relationship between distance and tie probability ◮ How to extend the spatial Bernoulli models E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  8. Spatial Bernoulli Models with Covariates ◮ We can extend the model in a simple way to include tie covariates ◮ Add GLM structure to the parameters of the SIF, F d p b ij Pr( Y ij = 1) = (1 + α ij d ij ) γ ij where p b ij = ilogit ( θ ∗ X ij ) α ij = exp ( ψ ∗ W ij ) γ ij = exp ( φ ∗ U ij ) and where θ , ψ , and φ are parameter vectors, and X , W , and U are covariate matrices. E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  9. Application: Selective Mixing on Facebook ◮ Facebook is an extremely large online social network ◮ Data: sample of almost 1 million egocentric networks (Gjoka et al. 2009) ◮ Each Facebook user may indicate a university affiliation, < 4% actually do ◮ Rich set of covariates at the institution level ◮ Online context is a best case scenario for equal mixing and “weak” distance effects E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  10. Selecting Covariates of Interest ◮ Institutional prestige: USNWR National University Ranking ◮ Top 194 schools receive a rank, score, and selectivity measure ◮ Prestige as the first principal component scores of these measures ◮ Public/Private ◮ Endowment, Tuition, Location etc. E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  11. Quick Comment on Model Fitting and Computation ◮ Fitting these models is not an easy task ◮ Bayesian point estimation ◮ Importance sampling to fit the exponential family model ◮ Numerical tricks E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  12. Model Fitting and Selection Model p b Effects α Effects γ Effects SIF Form BIC Covariate Pub/Priv Pub/Priv Pub/Priv Intercept Intercept Intercept Prestige Prestige Prestige Model 1 √ √ √ √ √ √ √ √ pl 24911904 Model 2 √ √ √ √ √ √ √ √ pl 24918710 Model 3 √ √ √ √ √ √ √ apl 24926060 Model 4 √ √ √ √ √ √ √ √ apl 24933741 Model 5 √ √ √ √ √ √ √ apl 24935807 Model 6 √ √ √ apl 25139114 E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  13. Facebook Friendship Network E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  14. A Model of Facebook Friendship Parameter Component Estimate p.s.d.e. Intercept -6.0974 0.0061 ** Private-Public -0.4340 0.0200 ** p b Public-Public -0.7501 0.0063 ** Prestige -0.0176 0.0000 ** Intercept 2.1687 0.0259 ** Private-Public -2.2169 0.0493 ** α Public-Public -4.5387 0.0269 ** Prestige -0.0187 0.0001 ** Intercept -1.0789 0.0016 ** γ Private-Public 0.4523 0.0026 ** Public-Public 1.0009 0.0023 ** E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  15. A Model of Facebook Friendship 5e−04 1e−04 Edge Probability 2e−05 5e−06 1e−06 1 5 50 500 5000 Distance (km) E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  16. A Model of Facebook Friendship 5e−04 1e−04 d Edge Probability e c r e a s e 2e−05 s 5e−06 1e−06 1 5 50 500 5000 Distance (km) E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  17. A Model of Facebook Friendship 5e−04 1e−04 Edge Probability 2e−05 5e−06 1e−06 1 5 50 500 5000 Distance (km) E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  18. A Model of Facebook Friendship 5e−04 regional ties 1e−04 Edge Probability 2e−05 5e−06 1e−06 1 5 50 500 5000 Distance (km) E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  19. Effects of Difference in Prestige 5e−04 5e−04 5e−04 1e−04 1e−04 1e−04 Edge Probability Edge Probability Edge Probability 2e−05 2e−05 2e−05 5e−06 5e−06 5e−06 1e−06 1e−06 1e−06 1 5 50 500 5000 1 5 50 500 5000 1 5 50 500 5000 Distance (km) Distance (km) Distance (km) E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

  20. Summary ◮ Spatial mixing models to sampled data from Facebook ◮ Model extension to include covariates ◮ Non-trivial model fitting procedure ◮ Inhomogeneous relationship between distance and tie probability ◮ Scalable models for large-scale social networks E. Spiro espiro@uci.edu University of California, Irvine January 10, 2012

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend