MY560 Workshop: Collecting and Analyzing Social Media Data Pablo - - PowerPoint PPT Presentation

my560 workshop collecting and analyzing social media data
SMART_READER_LITE
LIVE PREVIEW

MY560 Workshop: Collecting and Analyzing Social Media Data Pablo - - PowerPoint PPT Presentation

MY560 Workshop: Collecting and Analyzing Social Media Data Pablo Barber a London School of Economics www.pablobarbera.com Workshop website: pablobarbera.com/social-media-workshop 62% of Americans get news on social media (Pew) 62%


slide-1
SLIDE 1

MY560 Workshop: Collecting and Analyzing Social Media Data

Pablo Barber´ a London School of Economics www.pablobarbera.com Workshop website:

pablobarbera.com/social-media-workshop

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

◮ 62% of Americans get

news on social media (Pew)

slide-14
SLIDE 14

◮ 62% of Americans get

news on social media (Pew)

◮ 27% of online EU citizens

use social media to get news on national political matters (Eurobarometer, Fall 2012)

slide-15
SLIDE 15

◮ 62% of Americans get

news on social media (Pew)

◮ 27% of online EU citizens

use social media to get news on national political matters (Eurobarometer, Fall 2012)

◮ Social media: top source

  • f news for U.S. young

adults (Pew)

slide-16
SLIDE 16

Shift in communication patterns

slide-17
SLIDE 17

Shift in communication patterns Digital footprints of human behavior

slide-18
SLIDE 18

Hello!

slide-19
SLIDE 19

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

slide-20
SLIDE 20

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California

slide-21
SLIDE 21

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015)

slide-22
SLIDE 22

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016

slide-23
SLIDE 23

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

slide-24
SLIDE 24

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

slide-25
SLIDE 25

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

◮ Social network analysis, Bayesian statistics, text as data

methods

slide-26
SLIDE 26

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

◮ Social network analysis, Bayesian statistics, text as data

methods

◮ Author of R packages to analyze data from social media

slide-27
SLIDE 27

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

◮ Social network analysis, Bayesian statistics, text as data

methods

◮ Author of R packages to analyze data from social media

◮ Contact:

slide-28
SLIDE 28

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

◮ Social network analysis, Bayesian statistics, text as data

methods

◮ Author of R packages to analyze data from social media

◮ Contact:

◮ P.Barbera@lse.ac.uk

slide-29
SLIDE 29

About me

◮ Assistant Professor of Computational Social Science in the

Methodology Department at LSE

◮ Previously Assistant Prof. at Univ. of Southern California ◮ PhD in Politics, New York University (2015) ◮ Data Science Fellow at NYU, 2015–2016 ◮ My research:

◮ Social media and politics, comparative electoral behavior,

corruption and accountability

◮ Social network analysis, Bayesian statistics, text as data

methods

◮ Author of R packages to analyze data from social media

◮ Contact:

◮ P.Barbera@lse.ac.uk ◮ www.pablobarbera.com

slide-30
SLIDE 30

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges

slide-31
SLIDE 31

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

slide-32
SLIDE 32

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

slide-33
SLIDE 33

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

slide-34
SLIDE 34

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

◮ Guided coding session: Collecting Twitter data from the

REST API

slide-35
SLIDE 35

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

◮ Guided coding session: Collecting Twitter data from the

REST API

◮ Coding challenge 2: Twitter’s REST API

slide-36
SLIDE 36

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

◮ Guided coding session: Collecting Twitter data from the

REST API

◮ Coding challenge 2: Twitter’s REST API ◮ Guided coding session: Collecting Facebook data from the

Graph API

slide-37
SLIDE 37

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

◮ Guided coding session: Collecting Twitter data from the

REST API

◮ Coding challenge 2: Twitter’s REST API ◮ Guided coding session: Collecting Facebook data from the

Graph API

◮ Application: Dictionary methods applied to social media

slide-38
SLIDE 38

Today’s workshop

Session 1, 10–12:00

◮ Social media research: opportunities and challenges ◮ Guided coding session: collecting Twitter data from the

Streaming API

◮ Challenge 1: interacting with Twitter’s Streaming API

Session 2, 14–16:00

◮ Guided coding session: Collecting Twitter data from the

REST API

◮ Coding challenge 2: Twitter’s REST API ◮ Guided coding session: Collecting Facebook data from the

Graph API

◮ Application: Dictionary methods applied to social media ◮ Coding challenge 3: Facebook’s Graph API

slide-39
SLIDE 39

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

slide-40
SLIDE 40

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-41
SLIDE 41

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-42
SLIDE 42

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . .

slide-43
SLIDE 43

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . . → Non-intrusive measurement of behavior and public opinion Beauchamp (AJPS 2016): “Predicting and Interpolating State-level Polls using Twitter Textual Data”

slide-44
SLIDE 44

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . . → Non-intrusive measurement of behavior and public opinion → Inference of latent traits: political knowledge, ideology, personal traits, socially undesirable behavior, . . .

slide-45
SLIDE 45

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . . → Non-intrusive measurement of behavior and public opinion → Inference of latent traits: political knowledge, ideology, personal traits, socially undesirable behavior, . . .

Kosinki et al, 2013, “Private traits and attributes are predictable from digital records

  • f human behavior”, PNAS (also

personality, PNAS 2015)

slide-46
SLIDE 46

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . . → Non-intrusive measurement of behavior and public opinion → Inference of latent traits: political knowledge, ideology, personal traits, socially undesirable behavior, . . .

slide-47
SLIDE 47

Behavior, opinions, and latent traits

◮ Digital footprints: check-ins, conversations, geolocated

pictures, likes, shares, retweets, . . . → Non-intrusive measurement of behavior and public opinion → Inference of latent traits: political knowledge, ideology, personal traits, socially undesirable behavior, . . .

2012 Registration History

  • −2

−1 1 2 Dem. Rep. <−5 [−3,−5] −2 −1 +1 +2 [+3,+5] >+5

Party (# elections registered Dem. − # elections registered Rep.) θi, Twitter−Based Ideology Estimates Data: 2,360 Twitter accounts, matched with Ohio voter file. Barber´ a, 2015, “Birds of the Same Feather Tweet

  • Together. Bayesian Ideal

Point Estimation Using Twitter Data”, Political Analysis

slide-48
SLIDE 48

Estimating political ideology using Twitter networks

  • @nytimes

@msnbc @HillaryClinton @POTUS @MotherJones @SenSanders @tedcruz @RealBenCarson @RandPaul @JohnKasich @marcorubio @DRUDGE_REPORT @GrahamBlog @JebBush @FoxNews @GovChristie @CarlyFiorina @realDonaldTrump @WSJ Average Twitter User

−2 −1 1 2

Position on latent ideological scale Barber´ a “Who is the most conservative Republican candidate for president?” The Monkey Cage / The Washington Post, June 16 2015

slide-49
SLIDE 49

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-50
SLIDE 50

Interpersonal networks

◮ Political behavior is social, strongly influenced by peers

Bond et al, 2012, “A 61-million-person experiment in social influence and political mobilization”, Nature

slide-51
SLIDE 51

Interpersonal networks

◮ Political behavior is social, strongly influenced by peers ◮ Costly to measure network structure

slide-52
SLIDE 52

Interpersonal networks

◮ Political behavior is social, strongly influenced by peers ◮ Costly to measure network structure ◮ High overlap across online and offline social networks

Jones et al, 2013, “Inferring Tie Strength from Online Directed Behavior”, PLOS One

slide-53
SLIDE 53

Interpersonal networks

◮ Political behavior is social, strongly influenced by peers ◮ Costly to measure network structure ◮ High overlap across online and offline social networks ◮ Online and offline ties are similar in nature

slide-54
SLIDE 54

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-55
SLIDE 55

Elite behavior

◮ Authoritarian governments’ response to threat of collective

action

King et al, 2013, “How Censorship in China Allows Government Criticism but Silences Collective Expression”, APSR

slide-56
SLIDE 56

Elite behavior

◮ Authoritarian governments’ response to threat of collective

action

◮ Estimation of conflict intensity in real time

slide-57
SLIDE 57

Elite behavior

◮ Authoritarian governments’ response to threat of collective

action

◮ Estimation of conflict intensity in real time ◮ How elected officials communicate with constituents

slide-58
SLIDE 58

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-59
SLIDE 59

Affordable field experiments

slide-60
SLIDE 60

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-61
SLIDE 61
slide-62
SLIDE 62

#OccupyGezi #Euromaidan

slide-63
SLIDE 63

#OccupyGezi #Euromaidan #OccupyWallStreet #Indignados

slide-64
SLIDE 64

slacktivism?

slide-65
SLIDE 65

Why the revolution will not be tweeted

When the sit-in movement spread from Greensboro throughout the South, it did not spread indiscriminately. It spread to those cities which had preexisting “movement centers” – a core of dedicated and trained activists ready to turn the “fever” into action. The kind of activism associated with social media isn’t like this at all. [. . . ] Social networks are effective at increasing participation – by lessening the level of motivation that participation requires. Gladwell, Small Change (New Yorker)

slide-66
SLIDE 66

Why the revolution will not be tweeted

When the sit-in movement spread from Greensboro throughout the South, it did not spread indiscriminately. It spread to those cities which had preexisting “movement centers” – a core of dedicated and trained activists ready to turn the “fever” into action. The kind of activism associated with social media isn’t like this at all. [. . . ] Social networks are effective at increasing participation – by lessening the level of motivation that participation requires. Gladwell, Small Change (New Yorker) You can’t simply join a revolution any time you want, contribute a comma to a random revolutionary decree, rephrase the guillotine manual, and then slack off for months. Revolutions prize centralization and require fully committed leaders, strict discipline, absolute dedication, and strong relationships. When every node on the network can send a message to all other nodes, confusion is the new default equilibrium. Morozov, The Net Delusion: The Dark Side of Internet Freedom

slide-67
SLIDE 67

The critical periphery

◮ Structure of online protest networks:

slide-68
SLIDE 68

The critical periphery

◮ Structure of online protest networks:

  • 1. Core: committed minority of resourceful protesters
slide-69
SLIDE 69

The critical periphery

◮ Structure of online protest networks:

  • 1. Core: committed minority of resourceful protesters
  • 2. Periphery: majority of less motivated individuals
slide-70
SLIDE 70

The critical periphery

◮ Structure of online protest networks:

  • 1. Core: committed minority of resourceful protesters
  • 2. Periphery: majority of less motivated individuals

◮ Our argument: key role of peripheral participants

slide-71
SLIDE 71

The critical periphery

◮ Structure of online protest networks:

  • 1. Core: committed minority of resourceful protesters
  • 2. Periphery: majority of less motivated individuals

◮ Our argument: key role of peripheral participants

  • 1. Increase reach of protest messages (positional effect)
slide-72
SLIDE 72

The critical periphery

◮ Structure of online protest networks:

  • 1. Core: committed minority of resourceful protesters
  • 2. Periphery: majority of less motivated individuals

◮ Our argument: key role of peripheral participants

  • 1. Increase reach of protest messages (positional effect)
  • 2. Large contribution to overall activity (size effect)
slide-73
SLIDE 73

1-shell 2-shell 20-shell 3-shell 60-shell 80-shell 40-shell 120-shell 100-shell

activity

(no. of tweets)

periphery core in Taksim 18% .25% max min RTs periphery to core periphery to periphery

k-core decomposition of #OccupyGezi network

slide-74
SLIDE 74

Relative importance of core and periphery

reach: aggregate size of participants’ audience activity: total number of protest messages published (not only RTs)

slide-75
SLIDE 75

Peripheral mobilization during the Arab Spring

Steinert-Threlkeld (APSR 2017) “Spontaneous Collective Action”

slide-76
SLIDE 76

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

slide-80
SLIDE 80

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

slide-81
SLIDE 81

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

slide-82
SLIDE 82

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

slide-83
SLIDE 83

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

◮ Messing and Westwood (2012): social cues can be as important as partisan

cues to explain news consumption through social media

slide-84
SLIDE 84

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

◮ Messing and Westwood (2012): social cues can be as important as partisan

cues to explain news consumption through social media ◮ Real-time broadcasting in reaction to events

slide-85
SLIDE 85

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

◮ Messing and Westwood (2012): social cues can be as important as partisan

cues to explain news consumption through social media ◮ Real-time broadcasting in reaction to events

◮ e.g. dual screening (Vaccari et al, 2015)

slide-86
SLIDE 86

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

◮ Messing and Westwood (2012): social cues can be as important as partisan

cues to explain news consumption through social media ◮ Real-time broadcasting in reaction to events

◮ e.g. dual screening (Vaccari et al, 2015)

◮ Micro-targeting

slide-87
SLIDE 87

Political persuasion

Social media as a new campaign tool:

“Let me tell you about Twitter. I think that maybe I wouldn’t be here if it wasn’t for Twitter. [...] Twitter is a wonderful thing for me, because I get the word out... I might not be here talking to you right now as president if I didn’t have an honest way of getting the word out.” Donald Trump, March 16, 2017 (Fox News)

◮ Diminished gatekeeping role of journalists

◮ Part of a trend towards citizen journalism (Goode, 2009)

◮ Information is contextualized within social layer

◮ Messing and Westwood (2012): social cues can be as important as partisan

cues to explain news consumption through social media ◮ Real-time broadcasting in reaction to events

◮ e.g. dual screening (Vaccari et al, 2015)

◮ Micro-targeting

◮ Affects how campaigns perceive voters (Hersh, 2015), but unclear if effective

in mobilizing or persuading voters

slide-88
SLIDE 88

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-89
SLIDE 89

Social capital

◮ Social connections are essential in democratic societies, but

  • nline interactions do not facilitate creation and

strengthening of social capital (Putnam, 2001)

slide-90
SLIDE 90

Social capital

◮ Social connections are essential in democratic societies, but

  • nline interactions do not facilitate creation and

strengthening of social capital (Putnam, 2001)

◮ Online networking sites facilitate and transform how social

ties are established

slide-91
SLIDE 91

Social capital

◮ Social connections are essential in democratic societies, but

  • nline interactions do not facilitate creation and

strengthening of social capital (Putnam, 2001)

◮ Online networking sites facilitate and transform how social

ties are established

slide-92
SLIDE 92

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-93
SLIDE 93

Social media as echo chambers?

◮ communities of like-minded individuals (homophily, influence)

Adamic and Glance (2005) Conover et al (2012)

slide-94
SLIDE 94

Social media as echo chambers?

◮ communities of like-minded individuals (homophily, influence)

Adamic and Glance (2005) Conover et al (2012)

◮ ...generates selective exposure to congenial information ◮ ...reinforced by ranking algorithms – “filter bubble” (Parisier)

slide-95
SLIDE 95

Social media as echo chambers?

◮ communities of like-minded individuals (homophily, influence)

Adamic and Glance (2005) Conover et al (2012)

◮ ...generates selective exposure to congenial information ◮ ...reinforced by ranking algorithms – “filter bubble” (Parisier) ◮ ...increases political polarization (Sunstein, Prior)

slide-96
SLIDE 96

Social media as echo chambers?

2013 SuperBowl 2012 Election

Barber´ a et al (2015) “Tweeting From Left to Right: Is Online Political Communication More Than an Echo Chamber?” Psychological Science

slide-97
SLIDE 97

Social media as echo chambers?

Bakshy, Messing, & Adamic (2015) “Exposure to ideologically diverse news and opinion on Facebook”. Science.

slide-98
SLIDE 98

Social media and democracy

“How can one technology – social media – simultaneously give rise to hopes for liberation in authoritarian regimes, be used for repression by these same regimes, and be harnessed by antisystem actors in democracy? We present a simple framework for reconciling these contradictory developments based on two propositions: 1) that social media give voice to those previously excluded from political discussion by traditional media, and 2) that although social media democratize access to information, the platforms themselves are neither inherently democratic nor nondemocratic, but represent a tool political actors can use for a variety of goals, including, paradoxically, illiberal goals.” Journal of Democracy, 2017

slide-99
SLIDE 99

Social media research

Two different approaches in the growing field of social media research:

  • 1. Social media as a new source of information

◮ Behavior, opinions, and latent traits ◮ Interpersonal networks ◮ Elite behavior ◮ Affordable field experiments

  • 2. How social media affects social behavior

◮ Collective action and social movements ◮ Political campaigns ◮ Social capital and interpersonal communication ◮ Political attitudes and behavior

slide-100
SLIDE 100

What are the most important challenges when working with social media data?

slide-101
SLIDE 101

Social media data and social science: challenges

  • 1. Big data, big bias?
slide-102
SLIDE 102

Social media data and social science: challenges

  • 1. Big data, big bias?
  • 2. The end of theory?
slide-103
SLIDE 103

Social media data and social science: challenges

  • 1. Big data, big bias?
  • 2. The end of theory?
  • 3. Spam and bots
slide-104
SLIDE 104

Social media data and social science: challenges

  • 1. Big data, big bias?
  • 2. The end of theory?
  • 3. Spam and bots
  • 4. The privacy paradox
slide-105
SLIDE 105

Social media data and social science: challenges

  • 1. Big data, big bias?
  • 2. The end of theory?
  • 3. Spam and bots
  • 4. The privacy paradox
  • 5. Generalizing from online to offline behavior
slide-106
SLIDE 106

Social media data and social science: challenges

  • 1. Big data, big bias?
  • 2. The end of theory?
  • 3. Spam and bots
  • 4. The privacy paradox
  • 5. Generalizing from online to offline behavior
  • 6. Ethical concerns
slide-107
SLIDE 107
  • 1. Big data, big bias?

Ruths and Pfeffer, 2015, “Social media for large studies of behavior”, Science

slide-108
SLIDE 108

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

slide-109
SLIDE 109

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

slide-110
SLIDE 110

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

slide-111
SLIDE 111

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

◮ Partisans more likely to post about politics (Barber´

a & Rivero, 2014)

slide-112
SLIDE 112

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

◮ Partisans more likely to post about politics (Barber´

a & Rivero, 2014)

◮ Proprietary algorithms for public data

slide-113
SLIDE 113

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

◮ Partisans more likely to post about politics (Barber´

a & Rivero, 2014)

◮ Proprietary algorithms for public data

◮ Twitter API does not always return 100% of publicly available

tweets (Morstatter et al, 2014)

slide-114
SLIDE 114

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

◮ Partisans more likely to post about politics (Barber´

a & Rivero, 2014)

◮ Proprietary algorithms for public data

◮ Twitter API does not always return 100% of publicly available

tweets (Morstatter et al, 2014)

◮ Human behavior and online platform design

slide-115
SLIDE 115

Big data, big bias?

Sources of bias (Ruths and Pfeffer, 2015; Lazer et al, 2017)

◮ Population bias

◮ Sociodemographic characteristics are correlated with

presence on social media

◮ Self-selection within samples

◮ Partisans more likely to post about politics (Barber´

a & Rivero, 2014)

◮ Proprietary algorithms for public data

◮ Twitter API does not always return 100% of publicly available

tweets (Morstatter et al, 2014)

◮ Human behavior and online platform design

◮ e.g. Google Flu (Lazer et al, 2014)

slide-116
SLIDE 116
  • 1. Big data, big bias?

Ruths and Pfeffer, 2015, “Social media for large studies of behavior”, Science

slide-117
SLIDE 117
  • 2. The end of theory?

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. Chris Anderson, Wired, June 2008

slide-118
SLIDE 118
  • 2. The end of theory?

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. Chris Anderson, Wired, June 2008 Correlations are a way of catching a scientist’s attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. John Timmer, Ars Technica, June 2008

(Big) social media data as a complement - not a substitute - for theoretical work and careful causal inference.

slide-119
SLIDE 119
  • 3. Spam and bots

“Follow your coordinators. We need to start tweeting, all at the same time, using the hashtag #ItsTimeForMexico. . . and don’t forget to retweet tweets from the candidate’s account...” Unidentified PRI campaign manager minutes before the May 8, 2012 Mexican Presidential debate

slide-120
SLIDE 120
  • 3. Spam and bots

Ferrara et al, 2016, Communications of the ACM

slide-121
SLIDE 121
  • 4. The privacy paradox

Online data present a paradox in the protection of privacy: Data are at

  • nce too revealing in terms of privacy protection, yet also not revealing

enough in terms of providing the demographic background information needed by social scientists. Golder & Macy, Digital footprints, 2014

slide-122
SLIDE 122
  • 5. Generalizing from online to offline behavior

What makes online behavior different:

◮ Platform affordances may distort behavior

slide-123
SLIDE 123
  • 5. Generalizing from online to offline behavior

What makes online behavior different:

◮ Platform affordances may distort behavior ◮ Tools extend innate capacities (e.g. Dunbar’s number)

slide-124
SLIDE 124
  • 5. Generalizing from online to offline behavior

What makes online behavior different:

◮ Platform affordances may distort behavior ◮ Tools extend innate capacities (e.g. Dunbar’s number) ◮ Anonymity encourages vitriol

slide-125
SLIDE 125
  • 6. Ethical concerns
  • 1. Shifting notion of informed consent
slide-126
SLIDE 126
  • 6. Ethical concerns
  • 1. Shifting notion of informed consent
  • 2. Most personal data can be de-anonymized
slide-127
SLIDE 127
  • 6. Ethical concerns
  • 1. Shifting notion of informed consent
  • 2. Most personal data can be de-anonymized
  • 3. Inequalities in data access
slide-128
SLIDE 128
  • 6. Ethical concerns
  • 1. Shifting notion of informed consent
  • 2. Most personal data can be de-anonymized
  • 3. Inequalities in data access

“Ethical concerns must be weighed against the value of social research with appropriate steps taken to protest individual privacy” (Shah et al, 2015)

slide-129
SLIDE 129

Twitter data

slide-130
SLIDE 130

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:
slide-131
SLIDE 131

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets

slide-132
SLIDE 132

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets

slide-133
SLIDE 133

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

slide-134
SLIDE 134

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

slide-135
SLIDE 135

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:
slide-136
SLIDE 136

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published

slide-137
SLIDE 137

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

slide-138
SLIDE 138

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

2.1 Filter stream: tweets filtered by keywords

slide-139
SLIDE 139

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

2.1 Filter stream: tweets filtered by keywords 2.2 Geo stream: tweets filtered by location

slide-140
SLIDE 140

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

2.1 Filter stream: tweets filtered by keywords 2.2 Geo stream: tweets filtered by location 2.3 Sample stream: 1% random sample of tweets

slide-141
SLIDE 141

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

2.1 Filter stream: tweets filtered by keywords 2.2 Geo stream: tweets filtered by location 2.3 Sample stream: 1% random sample of tweets

◮ R library: streamR

slide-142
SLIDE 142

Twitter APIs

Two different methods to collect Twitter data:

  • 1. REST API:

◮ Queries for specific information about users and tweets ◮ Search recent tweets ◮ Examples: user profile, list of followers and friends, tweets

generated by a given user (“timeline”), users lists, etc.

◮ R library: tweetscores (also twitteR, rtweet)

  • 2. Streaming API:

◮ Connect to the “stream” of tweets as they are being published ◮ Three streaming APIs:

2.1 Filter stream: tweets filtered by keywords 2.2 Geo stream: tweets filtered by location 2.3 Sample stream: 1% random sample of tweets

◮ R library: streamR

Important limitation: tweets can only be downloaded in real time (exception: user timelines, ∼ 3,200 most recent tweets are available)

slide-143
SLIDE 143

Anatomy of a tweet

slide-144
SLIDE 144

Anatomy of a tweet

Tweets are stored in JSON format:

{ "created_at": "Wed Nov 07 04:16:18 +0000 2012", "id": 266031293945503744, "text": "Four more years. http://t.co/bAJE6Vom", "source": "web", "user": { "id": 813286, "name": "Barack Obama", "screen_name": "BarackObama", "location": "Washington, DC", "description": "This account is run by Organizing for Action staff. Tweets from the President are signed -bo.", "url": "http://t.co/8aJ56Jcemr", "protected": false, "followers_count": 54873124, "friends_count": 654580, "listed_count": 202495, "created_at": "Mon Mar 05 22:08:25 +0000 2007", "time_zone": "Eastern Time (US & Canada)", "statuses_count": 10687, "lang": "en" }, "coordinates": null, "retweet_count": 756411, "favorite_count": 288867, "lang": "en" }

slide-145
SLIDE 145

Streaming API

◮ Recommended method to collect tweets

slide-146
SLIDE 146

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

slide-147
SLIDE 147

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

slide-148
SLIDE 148

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

slide-149
SLIDE 149

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

◮ My workflow:

slide-150
SLIDE 150

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

◮ My workflow:

◮ Amazon EC2, cloud computing

slide-151
SLIDE 151

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

◮ My workflow:

◮ Amazon EC2, cloud computing ◮ Cron jobs to restart R scripts every hour.

slide-152
SLIDE 152

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

◮ My workflow:

◮ Amazon EC2, cloud computing ◮ Cron jobs to restart R scripts every hour. ◮ Save tweets in .json files, one per day.

slide-153
SLIDE 153

Streaming API

◮ Recommended method to collect tweets ◮ Potential issues:

◮ Filter streams have same rate limit as spritzer: when volume

reaches 1% of all tweets, it will return random sample

◮ Stream connections tend to die spontaneously. Restart

regularly.

◮ My workflow:

◮ Amazon EC2, cloud computing ◮ Cron jobs to restart R scripts every hour. ◮ Save tweets in .json files, one per day. ◮ Will show some examples later

slide-154
SLIDE 154

Sampling bias?

Morstatter et al, 2013, ICWSM, “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose”:

◮ 1% random sample from Streaming API is not truly random ◮ Less popular hashtags, users, topics... less likely to be

sampled

◮ But for keyword-based samples, bias is not as important

slide-155
SLIDE 155

Sampling bias?

Morstatter et al, 2013, ICWSM, “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose”:

◮ 1% random sample from Streaming API is not truly random ◮ Less popular hashtags, users, topics... less likely to be

sampled

◮ But for keyword-based samples, bias is not as important

Gonz´ alez-Bail´

  • n et al, 2014, Social Networks, “Assessing the

bias in samples of large online networks”:

◮ Small samples collected by filtering with a subset of relevant

hashtags can be biased

◮ Central, most active users are more likely to be sampled ◮ Data collected via search (REST) API more biased than

those collected with Streaming API

slide-156
SLIDE 156

Tweets from Korea: 40k tweets collected in 2014 (left) Korean peninsula at night, 2003 (right). Source: NASA.

slide-157
SLIDE 157

Who is tweeting from North Korea?

Twitter user: @uriminzok engl

slide-158
SLIDE 158

But remember...

slide-159
SLIDE 159

Facebook data

slide-160
SLIDE 160

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
slide-161
SLIDE 161

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
  • 2. Likes, reactions, comments, replies...
slide-162
SLIDE 162

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
  • 2. Likes, reactions, comments, replies...

Some public user data (gender, location) was available through previous versions of the API (not anymore)

slide-163
SLIDE 163

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
  • 2. Likes, reactions, comments, replies...

Some public user data (gender, location) was available through previous versions of the API (not anymore) Aggregate-level statistics available through the FB Marketing

  • API. See the code by Connor Gilroy (UW)
slide-164
SLIDE 164

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
  • 2. Likes, reactions, comments, replies...

Some public user data (gender, location) was available through previous versions of the API (not anymore) Aggregate-level statistics available through the FB Marketing

  • API. See the code by Connor Gilroy (UW)

Access to other (anonymized) data used in published studies requires permission from Facebook or from users

slide-165
SLIDE 165

Collecting Facebook data

Facebook only allows access to public pages’ data through the Graph API:

  • 1. Posts on public pages and groups
  • 2. Likes, reactions, comments, replies...

Some public user data (gender, location) was available through previous versions of the API (not anymore) Aggregate-level statistics available through the FB Marketing

  • API. See the code by Connor Gilroy (UW)

Access to other (anonymized) data used in published studies requires permission from Facebook or from users R library: Rfacebook

slide-166
SLIDE 166

Login details: RStudio Server RStudio Server URL: rstudio.pablobarbera.com user = userXX and password = passwordXX where XX is your assigned number