Networks of Computational Social Science Ian Dennis Miller - - PowerPoint PPT Presentation

networks of computational social science
SMART_READER_LITE
LIVE PREVIEW

Networks of Computational Social Science Ian Dennis Miller - - PowerPoint PPT Presentation

Networks of Computational Social Science Ian Dennis Miller 2018-11-22 Ian Dennis Miller Networks of Computational Social Science 2018-11-22 1 / 56 Introduction Ian Dennis Miller Networks of Computational Social Science 2018-11-22 2 / 56


slide-1
SLIDE 1

Networks of Computational Social Science

Ian Dennis Miller 2018-11-22

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 1 / 56

slide-2
SLIDE 2

Introduction

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 2 / 56

slide-3
SLIDE 3

Objective

1 In order to study the scholarly

literature within which my own work is embedded:

2 this work will discuss the construction of

a citation library

3 and the analysis of its co-authorship

network

4 to identify communities of

collaboration.

Figure 1: https://commons.wikimedia.org/wiki/File: Goal_Japan_vs_Uz_2009.JPG

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 3 / 56

slide-4
SLIDE 4

Structure

Introduction

Motivation Literature Review Background

Methods Results Discussion Conclusion

Figure 2: https://commons.wikimedia.org/wiki/File: Structure_Paris_les_Halles.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 4 / 56

slide-5
SLIDE 5

Motivation

Locate myself within the literature

No specific literature seems to exist Relevant methods in “distant” literatures

Applied insights from network science

Network methods for scholarly synthesis

Goal: become the bridge

I need to discover the audience

Figure 3: https://commons.wikimedia.org/wiki/File: Motivation%3F.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 5 / 56

slide-6
SLIDE 6

Starting Point

Reading list for my PhD oral defense

Also basis for a chapter in dissertation

Seeded bibliography with 90 articles

30 articles about contagion 30 articles about social networks analysis 30 articles about memes

Find common thread that ties articles together

Figure 4: https://commons.wikimedia.org/wiki/File: Arokia_Rajiv(Silver)_For_India_Starts_To_Run.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 6 / 56

slide-7
SLIDE 7

Background

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 7 / 56

slide-8
SLIDE 8

Small World Problem

Travers and Milgram (1967) Mail letters to Kansas Return to Boston by hand (not mail)

  • n a “first-name-basis” with peers

Do all our social circles overlap?

(yes)

AKA: “Six Degrees of Separation”

Figure 5: Path length distribution

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 8 / 56

slide-9
SLIDE 9

Strength of Weak Ties

Granovetter (1973): close is influential

Why? network is denser Many links among all possible connections

But weak ties connect distant “clumps”

A Small World: unexpectedly short distances

How much does this matter for scholarship?

Libraries and search tools vs. weak ties

Figure 6: https://commons.wikimedia.org/wiki/File: Weak_tie_bridge.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 9 / 56

slide-10
SLIDE 10

Small World Networks

Watts and Strogatz (1998)

“Starting from a ring lattice. . . ” “with n vertices and k edges per vertex” “rewire edges at random with probability p”

Regularity: p = 0 Disorder: p = 1

Small-world coupling facilitates epidemics “Shortcuts” connect across long distances Dynamic structure of co-authorship

Figure 7: https://commons.wikimedia.org/wiki/File: Small-world-network-example.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 10 / 56

slide-11
SLIDE 11

Scale-free Networks

Barabasi and Albert (1999)

Applies to: World Wide Web,

  • genetics. . .

Scale-free; power-law distribution

Preferential attachment network “Rich get richer”

Dynamic structure of scholarly citation

(These systems tend to be large)

Figure 8: https://commons.wikimedia.org/wiki/File: Scale-free_network_sample.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 11 / 56

slide-12
SLIDE 12

Scientific Collaboration Networks

Newman (2001) Examination of coauthorship

Mined: biomedical, physics, comp sci

Cornell arXiv Detected small world networks How are “silos” even possible?

Clustering, from a network perspective Average path lengths are longer

Figure 9: Newman. (2001) PNAS. p. 408

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 12 / 56

slide-13
SLIDE 13

Scholarly Communication and Bibliometrics

Borgman and Furner (2002) Review of bibliometrics

Information Sciences perspective

Provides taxonomy:

Behavior: writing, linking, submit, collaboration Aggregation: person, group, domain, nation Format: paper, lit review, reference

Figure 10: Borgman & Furner (2002), p. 9

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 13 / 56

slide-14
SLIDE 14

Co-authorship: Structural/Socio-academic Groups

Rodriguez and Pepe (2008) Community detection study “Even in interdisciplinary research. . . ”

“coauthorship is driven by. . . ”

“departmental” “institutional affiliation.” Figure 11: Rodriguez & Pepe (2008)

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 14 / 56

slide-15
SLIDE 15

Choosing coauthorship

Identify the people to identify the beliefs

Assume authors endorse the publication Stronger ties than citation network Loosely tracks institutional network

Can leverage biographical info

Fundamentally: a tractable proposition

Does not require comprehensive data mining Meaningful results from under 2,500 articles

Figure 12: https://commons.wikimedia.org/wiki/File: Accademia_-_Maggiotto_Self- portrait_with_two_students.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 15 / 56

slide-16
SLIDE 16

Alternatives to coauthorship

Citation

Easy to acquire structured data

Co-citation

Related works are probably co-cited

Acknowledgment

Stronger ties than citations

Institutional/mentorship

Totally unstructured

Figure 13: https://commons.wikimedia.org/wiki/File: Option-key.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 16 / 56

slide-17
SLIDE 17

Hypotheses

Coauthorship indicates shared belief

Beliefs accumulate into a discipline Metaphors/synonyms: Colleges, schools, arms, lines of reasoning

Weak ties connect the silos

But weak ties may be harder to spot

For weak ties that would be bridges:

Longer chains of strong ties also exist

Figure 14: https://commons.wikimedia.org/wiki/File: Mad_scientist_caricature.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 17 / 56

slide-18
SLIDE 18

Methods

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 18 / 56

slide-19
SLIDE 19

Methods Overview

Data Methods

Acquisition, storage

Scholarship Methods

Search

Analysis Methods

Networks, statistics

Reporting

Visualization, interaction

Figure 15: https://commons.wikimedia.org/wiki/File: The_Earth_seen_from_Apollo_17.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 19 / 56

slide-20
SLIDE 20

Data Methods: BibTeX

Imperfect but ubiquitous No canonical standard Highest adoption rate format, online Many incomplete parsers/writers

Parsers for all languages Zotero, LaTeX, R, and Python

Good compromise

Figure 16: https://commons.wikimedia.org/wiki/File: Example_bibtex2.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 20 / 56

slide-21
SLIDE 21

Data Methods: Zotero

Open source citation manager

Manage database of citations Provides plug-in system

Native import/export BibTeX BetterBibTeX: handy plug-in

Sync Zotero library to .bib file

Good integration with web browsers

(Facilitates scholarship)

Figure 17: My Zotero library

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 21 / 56

slide-22
SLIDE 22

Scholarship Methods

Search tools

Google Scholar

Problems with rate limits Dystopia

WorldCat, Citeseer DBLP; APA; arXiv

Library portal

Multiple publisher licenses permissions aggregation

Figure 18: University of Toronto Library Website

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 22 / 56

slide-23
SLIDE 23

Observations of academic publishing over time

post-2010: nearly complete, largely computer-readable post-2000: excellent availability, maybe not OCR post-1990: good availability, lower OCR rate, irregularities 1950-1990: good indexing, okay availability, irregular 1920-1950: okay indexing, some availability, very irregular pre-1920: classic offline scholarship methods

Figure 19: https://commons.wikimedia.org/wiki/File: Osgoode_Library_Stacks_2007.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 23 / 56

slide-24
SLIDE 24

Biographic methods

Necessary

farther back in history single-author papers

sources

Wikipedia university biographical resources

used to identify

contemporaries mentors

Figure 20: https://commons.wikimedia.org/wiki/File: Isaac_Newton_grave_in_Westminster_Abbey.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 24 / 56

slide-25
SLIDE 25

Analysis Methods

Preparation

Co-authorship Connected Components

Calculate

Path Length Clustering Coefficient Modularity

Have I observed a small world?

Figure 21: https://commons.wikimedia.org/wiki/File: Mechanical-calculator-Brunsviga-800-02.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 25 / 56

slide-26
SLIDE 26

Analysis: Extract Co-authorships

challenge: fix irregularities with citations

regularize .bib file with bibclean (Beebe, 2015)

load .bib into R via bibtex package (Francois, et al., 2017) clean author names

capitalization abbreviations accents spaces

iterate through citations in bibliography

extract pairwise author combinations append author pairs to edge list

Figure 22: https://commons.wikimedia.org/wiki/File: Sorting_machine_(Census)_LCCN2016823355.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 26 / 56

slide-27
SLIDE 27

Analysis: Co-authorship Network

convert edge list to R network object

adjacency matrix: canonical coauthorship

Network is undirected

Coauthorship relationships are reciprocal

Edges are unweighted

Not counting number of articles together

Convert to igraph object for graph algorithms

Figure 23: Co-authorship network, visualized

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 27 / 56

slide-28
SLIDE 28

Visualization Methods

Obtain .graphml network representation from R Gephi Software for interactive visualization Force Layout: Fruchterman-Reingold Modularity Coloring

Manifests recognizable disciplines

HTML/JS Online Viewer

OII site generator plug-in for Gephi

Figure 24: https://commons.wikimedia.org/wiki/File: Gephi_0.9.1_Network_Analysis_and_ Visualization_Software.png

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 28 / 56

slide-29
SLIDE 29

Scholarship Synthesis Algorithm

Import .bib and compute adjacency Identify connected components

Largest component is always “target”

Visualize (R: faster, Gephi: interactive) Scholarly search (literature and biography) Find coauthorship to connect small component

Otherwise, find coauthorship chain

Re-export bibliography; repeat algorithm

Figure 25: Synthesis algorithm

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 29 / 56

slide-30
SLIDE 30

Results

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 30 / 56

slide-31
SLIDE 31

Description of Co-authorship Network

metrics

Bibliography size: 2435 citations 3690 authors (vertices/nodes) 8734 collaborations (edges/links) Average path length: 10.43778 Global clustering coefficient: 0.721

Many authors orbit the main component

no bridges

Method:

Layout: Fruchterman-Reingold (1991) ggplot2 (Wickham, 2011) ggnetwork (Briatte, 2016)

Figure 26: Co-authors layout

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 31 / 56

slide-32
SLIDE 32

Description of Main Component Network

Largest connected component

Extracted subgraph as new graph Hereafter: Component

1577 vertices (authors)

component retains 42.8% of authors

5460 edges (collaborations)

component retains 62.6% of collaborations

Average path length: 10.46605

igraph (Csardi and Nepusz, 2006)

Component clustering coefficient: 0.675

Figure 27: Biggest Component

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 32 / 56

slide-33
SLIDE 33

Component Path Length Distribution

Figure 28: Small Milgram Figure 29: Path length distribution

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 33 / 56

slide-34
SLIDE 34

Community Detection

Louvain Clustering in igraph (Csardi, et al)

based on Blondel et al. (2008) coded by Tom Gregorovic

Component clustering results

Modularity: 0.927 Communities: 36 (a tractable quantity)

Distributed Recursive layout (Martin et

  • al. 2008)

Node Coloring Polygon shapes/coloring

Author names are too numerous to display

Figure 30: Communities

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 34 / 56

slide-35
SLIDE 35

Component Community Size Distribution

Stochastic clustering

Ordering is arbitrary Results can change each run

How many authors per community?

Most have at least 25 members.

Would be surprised if this were systematic

glad it’s not

Figure 31: Community size distribution

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 35 / 56

slide-36
SLIDE 36

Labeling Communities

Produce chart with membership lists for each community

Each node from the Community Layout

Performed qualitative assessment

Thematic link that describes community Relate authors via publication topics

I assigned keyword labels to each community

Figure 32: https://commons.wikimedia.org/wiki/File: Hello_My_Name_Is_(15283079263).jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 36 / 56

slide-37
SLIDE 37

Community Labels 1/3

Ian Dennis Miller co-authors Complexity

Computational Social Science Agent-Based Modeling Systems

Psychology

Social Psych (Mischel) Social Psychology Social/Behavioral Economics Social Neuroscience Social/Biological Psychology Social Cognition Cognitive Science

Ecology

Ecological Modeling Ecology of Communities

Figure 33: https://commons.wikimedia.org/wiki/File: The_Thinker,_Rodin.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 37 / 56

slide-38
SLIDE 38

Community Labels 2/3

Social Information

Sociology Social Media/Networks Social Networks Internet Data and Information Online Community Early Social Computing Humans, Computers, and Society

Computing

Artificial Intelligence Network Science Algorithms and Systems Big Data, Search, Mturk Human/Computer Interaction

Figure 34: https://commons.wikimedia.org/wiki/File: Symbolics3640_Modified.JPG

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 38 / 56

slide-39
SLIDE 39

Community Labels 3/3

Physics

Physics/Networks Magnetism Nuclear Physics, Early Computation Physical Experimentalists Information, Radar, early AI Cyberneticists

Language Digital Art Statistics Design

Figure 35: https://commons.wikimedia.org/wiki/File: MIM-23_HAWK_PAR_radar_2.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 39 / 56

slide-40
SLIDE 40

Centrality Measures

Turn investigation towards Authors

Who is influential? (who drives clustering)

Network clustering algorithms (Newman and Girvan, 2004)

Betweenness Centrality of each Author Closeness Centrality of each Author Expected Influence of each Author

As implemented in qgraph (Constantini et al., submitted) Kinds of centrality in image: A) Degree B) Betweenness C) Eigenvector D) Closeness (normalized) E)Harmonic Centrality F) Closeness (not normalized)

Figure 36: https://commons.wikimedia.org/wiki/File: Six_centrality_measure.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 40 / 56

slide-41
SLIDE 41

Centrality Results, Ranked

Betweenness

Huberman, Bernardo A. Adamic, Lada A. Christakis, Nicholas A. Cacioppo, John T. McCarthy, John Simon, Herbert A.

Closeness

Adamic, Lada A. Huberman, Bernardo A. Hogg, Tad Christakis, Nicholas A. Asur, Sitaram Adar, Eytan

Expected Influence

Huberman, Bernardo A. Marlow, Cameron Christakis, Nicholas A. Grimm, Volker Adamic, Lada A. Railsback, Steven F.

Table of centrality scores, sorted by Expected Influence:

Figure 37: Expected Influence

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 41 / 56

slide-42
SLIDE 42

Influential Institutions

Hewlett-Packard, Stanford, Xerox

Bernardo Huberman: Xerox, Stanford, Hewlett-Packard Sitaram Asur: Ohio State, Salesforce, Hewlett-Packard Tad Hogg: Caltech, Stanford, Xerox, Hewlett-Packard

Michigan, Facebook, MIT, Yahoo

Lada Adamic: Michigan, Facebook, Hewlett-Packard Cameron Marlow: MIT, Yahoo, Facebook Eytan Adar: MIT, UW, Michigan Eytan Bakshy: Michigan, Yahoo, Facebook

Classic Academics

Nicholas Christakis: Yale, Harvard, UPenn Volker Grimm: Helmholtz Umweltforschung Steven Railsback: Humboldt State John McCarthy: Stanford, MIT, Princeton; d. 2011 John Cacioppo: University of Chicago;

  • d. 2018

Herbert Simon: Carnegie Mellon; d. 2001

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 42 / 56

slide-43
SLIDE 43

Online Interactive Viewer

R networks and graphs are powerful

provides insights regarding structure

but interaction provides different information

tracing paths reveals relationships between communities

OII Gephi network exporter (Hale, Melville, & Kono, 2012) Online Network Viewer

http://imiller.utsc.utoronto.ca/media/ network

Figure 38: Online Interactive Visualization

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 43 / 56

slide-44
SLIDE 44

Summary of Results

main component

authors: 1,577 (42.8% of total) collaborations: 5,460 (62.6% of total) Average path length: 10.46605 Clustering coefficient: 0.675

2-mode path length distribution

length=8 & length=18

Louvain Clustering

  • num. communities: 36

modularity: 0.927 (high)

Centrality

Authors: Huberman, Adamic, Christakis, Grimm Institutions: HP, Stanford, Xerox, Michigan, Facebook, MIT

Figure 39: Path length distribution Figure 40: Communities of collaboration

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 44 / 56

slide-45
SLIDE 45

Discussion

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 45 / 56

slide-46
SLIDE 46

Academia is a Small World

Compare to results from Newman (2001)

path length

range: (4.0, 9.7)

  • bserved: 10.449

clustering coefficient

range: (0.066, 0.726)

  • bserved: 0.675

Academia is probably a small world

Newman (2001) probably generalizes. . . doesn’t matter, per se It is not a requirement for analysis

caveat: not a sample Future direction: test with ERGM?

Figure 41: https://commons.wikimedia.org/wiki/File: Academia_mosaic_flipped.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 46 / 56

slide-47
SLIDE 47

Longer Path Length

Path length proportional to siloedness?

  • r artifact of my manual search?

Network distance corresponds to “interest gap”?

Semantic distance? Linguistic distance?

10 hops can lead to an unfamiliar literature

Each hop adapts the science slightly

More hops reduces likelihood of epidemic

“Weak ties” probably exist

Computational social science requires 10 levels of abstraction to discuss?

Figure 42: https://commons.wikimedia.org/wiki/File: Mind-the-gap-toronto.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 47 / 56

slide-48
SLIDE 48

Utility of Scholarly “Silos”

Clustering reduces distance

increases efficiency

A Discipline is a low-energy communication channel Disciplines (silos) are intellectual “desire paths”

reinforcement of link weights a form of network learning

co-authorship: indicator of knowledge structures

knowledge structure is the more fundamental construct

Figure 43: https://commons.wikimedia.org/wiki/File: Lantm%C3%A4nnens_silo_i_Falk%C3%B6ping_0923.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 48 / 56

slide-49
SLIDE 49

Institutions

Some speculation:

Hewlett-Packard had something special So did Yahoo and Xerox

(yet again, Xerox was sitting on a gold mine)

Stanford, MIT, Michigan

Institutions lower the cost of cross-silo communication

Increased chance of intellectual “cross-pollination”

Network analysis of co-affiliation instead

  • f co-authorship

Figure 44: https://commons.wikimedia.org/wiki/File: UCBerkeley_Campanile_Sather_Tower.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 49 / 56

slide-50
SLIDE 50

Time Dynamics

Graph centrality could favor recent publications

Rate of collaboration seems to increase by year More co-authors per paper, over time

Academic society a century ago was insular

Membership was required for publication

Investigate historical basis for my bibliography

Figure 45: https://commons.wikimedia.org/wiki/File: Czech-2013-Prague-Astronomical_clock_face.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 50 / 56

slide-51
SLIDE 51

When to Stop

Because I was satisfied

2400 citations produced a coherent picture

Sources of frustration:

Trying to connect other components Rate limits from search engines

Compare to Newman (2001)

Clustering Coefficient, Path Length

Figure 46: https://commons.wikimedia.org/wiki/File: MUTCD_S3-1_(old).svg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 51 / 56

slide-52
SLIDE 52

Conclusion

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 52 / 56

slide-53
SLIDE 53

Scientific Embeddedness

Where am I in the network?

Unsurprisingly: with co-authors Situated in Social Psych literature

Social network sciences at center of graph

Next “ring”: those who applied network methods

Cacioppo and Simon connect Psychology

Figure 47: Scientific Embeddedness

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 53 / 56

slide-54
SLIDE 54

Future Directions

Two paths

deeper into bibliometrics deeper into computational social science

Individual academics

Website, CV, Linked-in, University Biography

Identification of communities

Handbook of X Lexicon; Nomenclature Conferences and Journals

and so on. . .

Figure 48: https://commons.wikimedia.org/wiki/File: Alan_Hersey_Nature_Reserve_path_fork.JPG

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 54 / 56

slide-55
SLIDE 55

Computational Social Psychology

produced new knowledge

familiarity with the literature insights from network structure

application to psychological modeling

Grimm and Railsback Ecology and Individual-based modeling

next step: apply to memes

urban legend modeling

Figure 49: https://commons.wikimedia.org/wiki/File: NASCAR_Nationwide_Rain_Tire_2014_ Road_America.jpg

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 55 / 56

slide-56
SLIDE 56

Thank you

Ian Dennis Miller PhD Candidate Psychology Department University of Toronto twitter: @iandennismiller email: i.miller@utoronto.ca lab: https://www.sisrlab.com web: http://imiller.utsc.utoronto.ca

Figure 50: Photograph by Geoff MacDonald

Ian Dennis Miller Networks of Computational Social Science 2018-11-22 56 / 56