RECSM Summer School: Social Media and Big Data Research Pablo - - PowerPoint PPT Presentation

recsm summer school social media and big data research
SMART_READER_LITE
LIVE PREVIEW

RECSM Summer School: Social Media and Big Data Research Pablo - - PowerPoint PPT Presentation

RECSM Summer School: Social Media and Big Data Research Pablo Barber a London School of Economics www.pablobarbera.com Course website: pablobarbera.com/social-media-upf Discovery in Large-Scale Social Media Data Human behaviour is


slide-1
SLIDE 1

RECSM Summer School: Social Media and Big Data Research

Pablo Barber´ a London School of Economics www.pablobarbera.com Course website:

pablobarbera.com/social-media-upf

slide-2
SLIDE 2

Discovery in Large-Scale Social Media Data

slide-3
SLIDE 3

Human behaviour is characterized by connections to others

slide-4
SLIDE 4

Digital technologies have led to an explosion in the availability

  • f networked data
slide-5
SLIDE 5

Moreno, “Who Shall Survive?” (1934)

slide-6
SLIDE 6

Moreno, “Who Shall Survive?” (1934)

slide-7
SLIDE 7

Moreno, “Who Shall Survive?” (1934)

slide-8
SLIDE 8

Moreno, “Who Shall Survive?” (1934)

slide-9
SLIDE 9

Christakis & Fowler, NEJM, 2007

slide-10
SLIDE 10

Adamic & Glance, 2004, IWLD

slide-11
SLIDE 11

Email network of a company

slide-12
SLIDE 12

Barbera et al, 2015, Psychological Science

slide-13
SLIDE 13

(Quick) introduction to social network analysis

What we will cover:

◮ Familiarity with language of social network analysis

slide-14
SLIDE 14

(Quick) introduction to social network analysis

What we will cover:

◮ Familiarity with language of social network analysis ◮ Two key dimensions to analyze:

slide-15
SLIDE 15

(Quick) introduction to social network analysis

What we will cover:

◮ Familiarity with language of social network analysis ◮ Two key dimensions to analyze:

◮ Centrality: who is most influential in a network?

slide-16
SLIDE 16

(Quick) introduction to social network analysis

What we will cover:

◮ Familiarity with language of social network analysis ◮ Two key dimensions to analyze:

◮ Centrality: who is most influential in a network? ◮ Structure: how to discover communities in a network?

slide-17
SLIDE 17

(Quick) introduction to social network analysis

What we will cover:

◮ Familiarity with language of social network analysis ◮ Two key dimensions to analyze:

◮ Centrality: who is most influential in a network? ◮ Structure: how to discover communities in a network?

◮ Characteristics of networks that emerge in digital

environments, such as social media sites

slide-18
SLIDE 18

Basic concepts

◮ Node (vertex): each of the units in the network

slide-19
SLIDE 19

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

slide-20
SLIDE 20

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines

slide-21
SLIDE 21

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines ◮ Directed: imply direction, represented by arrows

slide-22
SLIDE 22

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines ◮ Directed: imply direction, represented by arrows ◮ Unweighted: all edges have same strength

slide-23
SLIDE 23

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines ◮ Directed: imply direction, represented by arrows ◮ Unweighted: all edges have same strength ◮ Weighted: some edges have more strength than others

slide-24
SLIDE 24

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines ◮ Directed: imply direction, represented by arrows ◮ Unweighted: all edges have same strength ◮ Weighted: some edges have more strength than others

◮ A network consists of a set of nodes and edges

slide-25
SLIDE 25

Basic concepts

◮ Node (vertex): each of the units in the network ◮ Edge (tie): connection between nodes

◮ Undirected: symmetric connection, represented by lines ◮ Directed: imply direction, represented by arrows ◮ Unweighted: all edges have same strength ◮ Weighted: some edges have more strength than others

◮ A network consists of a set of nodes and edges

i.e. a set of actors and their relationships

slide-26
SLIDE 26

Basic concepts

Network Visualization

Jennifer Josh Evgeniia Whitney Tom

Adjacency Matrix P J E W T P 1 1 J 1 1 1 E 1 1 W 1 1 1 T 1 1

slide-27
SLIDE 27

Basic concepts

Network Visualization

Jennifer Josh Evgeniia Whitney Tom

Edgelist Node1 Node2 1 Paul Josh 2 Paul Evgeniia 3 Josh Whitney 4 Josh Tom 5 Whitney Tom 6 Evgeniia Whitney

slide-28
SLIDE 28

Types of social media networks

◮ Internet: websites / hyperlinks

slide-29
SLIDE 29

Types of social media networks

◮ Internet: websites / hyperlinks ◮ Twitter: users / retweets

slide-30
SLIDE 30

Types of social media networks

◮ Internet: websites / hyperlinks ◮ Twitter: users / retweets ◮ Twitter: users / following connections

slide-31
SLIDE 31

Types of social media networks

◮ Internet: websites / hyperlinks ◮ Twitter: users / retweets ◮ Twitter: users / following connections ◮ Twitter: hashtags / co-appeareance

slide-32
SLIDE 32

Types of social media networks

◮ Internet: websites / hyperlinks ◮ Twitter: users / retweets ◮ Twitter: users / following connections ◮ Twitter: hashtags / co-appeareance ◮ Facebook: friends / friendship connections

slide-33
SLIDE 33

Types of social media networks

◮ Internet: websites / hyperlinks ◮ Twitter: users / retweets ◮ Twitter: users / following connections ◮ Twitter: hashtags / co-appeareance ◮ Facebook: friends / friendship connections ◮ Reddit: subreddits / users in common

slide-34
SLIDE 34

Social network analysis: key dimensions of analysis

slide-35
SLIDE 35

Node centrality

How to measure actor influence or importance in a network?

slide-36
SLIDE 36

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

slide-37
SLIDE 37

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections

slide-38
SLIDE 38

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

slide-39
SLIDE 39

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential
slide-40
SLIDE 40

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential

◮ How well a node connects different parts of the network

slide-41
SLIDE 41

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential

◮ How well a node connects different parts of the network ◮ Fraction of shortest paths between any two nodes on which

a particular node lies

slide-42
SLIDE 42

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential

◮ How well a node connects different parts of the network ◮ Fraction of shortest paths between any two nodes on which

a particular node lies

→ Other measures:

slide-43
SLIDE 43

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential

◮ How well a node connects different parts of the network ◮ Fraction of shortest paths between any two nodes on which

a particular node lies

→ Other measures:

◮ Closeness centrality: broadcasting potential

slide-44
SLIDE 44

Node centrality

How to measure actor influence or importance in a network? Two main conceptual definition of centrality:

  • 1. Degree centrality: number of connections for each node

(potential for direct reach)

◮ Indegree: incoming connections ◮ Outdegree: outgoing connections

  • 2. Betweenness centrality: gatekeeping potential

◮ How well a node connects different parts of the network ◮ Fraction of shortest paths between any two nodes on which

a particular node lies

→ Other measures:

◮ Closeness centrality: broadcasting potential ◮ Eigenvector centrality and coreness: centrality

measured as being connected to other central neighbors

slide-45
SLIDE 45

Florentine family marriages in the 15th century

Source: Padgett (1993) and Sinclair (2016)

slide-46
SLIDE 46

Occupy Wall Street Twitter networks

Source: Lotan (2011)

slide-47
SLIDE 47

Protest networks on Twitter

Source: Gonz´ alez-Bail´

  • n et al (2013)
slide-48
SLIDE 48

Occupy Wall Street Twitter networks

Source: Gonz´ alez-Bail´

  • n and Wang (2016)
slide-49
SLIDE 49

Discovery in large-scale networks

How to understand the structure of large-scale networks?

◮ Latent communities or clusters

slide-50
SLIDE 50

Discovery in large-scale networks

How to understand the structure of large-scale networks?

◮ Latent communities or clusters

◮ Community detection algorithms

slide-51
SLIDE 51

Discovery in large-scale networks

How to understand the structure of large-scale networks?

◮ Latent communities or clusters

◮ Community detection algorithms ◮ Finding groups of nodes that densely connected internally,

more so than to the rest of the networks

slide-52
SLIDE 52

Discovery in large-scale networks

How to understand the structure of large-scale networks?

◮ Latent communities or clusters

◮ Community detection algorithms ◮ Finding groups of nodes that densely connected internally,

more so than to the rest of the networks

◮ Overlap with shared visible or latent similarities (homophily)

slide-53
SLIDE 53

Discovery in large-scale networks

How to understand the structure of large-scale networks?

◮ Latent communities or clusters

◮ Community detection algorithms ◮ Finding groups of nodes that densely connected internally,

more so than to the rest of the networks

◮ Overlap with shared visible or latent similarities (homophily) ◮ Also hierarchy: core-periphery detection

slide-54
SLIDE 54

Community detection

Community structure:

◮ Network nodes often cluster

into tightly-knit groups with a high density of within-group edges and a lower density of between-group edges

◮ Modularity score: measures

clustering of nodes compared to random network of same size

◮ Many different community

detection algorithms based on different assumptions Source: Newman (2012)

slide-55
SLIDE 55

Network hierarchy

◮ Intuition

slide-56
SLIDE 56

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

slide-57
SLIDE 57

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

slide-58
SLIDE 58

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
slide-59
SLIDE 59

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals

slide-60
SLIDE 60

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals (not captured by simple topological measures)

slide-61
SLIDE 61

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals (not captured by simple topological measures)

◮ k-core decomposition

slide-62
SLIDE 62

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals (not captured by simple topological measures)

◮ k-core decomposition

◮ Algorithm to partition a network in nested shells of

connectivity

slide-63
SLIDE 63

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals (not captured by simple topological measures)

◮ k-core decomposition

◮ Algorithm to partition a network in nested shells of

connectivity

◮ The k-core of a graph is the maximal subgraph in which

every node has at least degree k

slide-64
SLIDE 64

Network hierarchy

◮ Intuition

◮ Large-scale networks have hierarchical properties

◮ Network core:

  • 1. Centrality: high relative importance in network
  • 2. Connectivity: many possible distinct paths between

individuals (not captured by simple topological measures)

◮ k-core decomposition

◮ Algorithm to partition a network in nested shells of

connectivity

◮ The k-core of a graph is the maximal subgraph in which

every node has at least degree k

◮ Many applications; scales well to large networks.

slide-65
SLIDE 65

k-core decomposition

Source: Alvarez-Hamelin et al, 2005

slide-66
SLIDE 66

k-core decomposition

3−core 2−core 1−core

Source: Alvarez-Hamelin et al, 2005

slide-67
SLIDE 67

1-shell 2-shell 20-shell 3-shell 60-shell 80-shell 40-shell 120-shell 100-shell

activity

(no. of tweets)

periphery core in Taksim 18% .25% max min RTs periphery to core periphery to periphery

k-core decomposition of #OccupyGezi network