Mining socio-political and socio-economic signals from social media - PowerPoint PPT Presentation

Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on “ Big Data & Networks in Social Sciences ” University of Warwick, Sept. 21-23, 2016

Structure of the presentation 1. Introductory remarks 2. Collective inference tasks   — Mining emotions   — Modelling voting intention 3. Personalised inference tasks   — Occupational class   — Income   — Socioeconomic status 4. Concluding remarks

Context and motivation the Internet, the World Wide Web , connectivity numerous web products feeding from user activity user-generated content , publicly available, esp. on social media platforms (e.g. Twitter) large-scale digitised data, ‘ Big Data ’, ‘Data Science’ How can we use online user-generated content to enhance our understanding about our world?

About Twitter

About Twitter > 140 characters per published status ( tweet ) > users can follow and be followed > embedded usage of topics (using #hashtags) > user interaction (re-tweets, @mentions, likes) > real-time nature > biased demographics (13-15% of UK’s population, age bias etc.) > information is noisy and not always accurate

Inferring collective information   from user-generated content mood / emotions voting intention Lampos (Ph.D. Thesis, 2012) Lansdall-Welfare, Lampos & Cristianini (WWW 2012) Lampos, Preotiuc-Pietro & Cohn (ACL 2013)

Emotion taxonomies and quantification > WordNet Affect > Linguistic Inquiry and Word Count (LIWC) ( Strapparava & Valitutti, 2004 ; Pennebaker et al., 2001, 2007 ) ‘Emotional’ keywords , representing + anger , e.g. angry , irritate + fear , e.g. fearful , afraid + joy , e.g. cheerful , enthusiastic + sadness , e.g. depressed , gloomy + plus other emotions Simply — but maybe not good enough! — we compute the mean keyword frequency score per emotion

Circadian emotion patterns from Twitter (UK) Winter Summer Aggregated Data Sadness Score 0.1 0.1 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 0.1 0.1 Joy Score 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 Hourly Intervals Hourly Intervals 24h emotion patterns for ‘joy’ and ‘sadness’ for summer and winter with 95% confidence intervals

‘ Joy’ time series based on Twitter (UK) y o 933 Day Time Series for Joy in Twitter Content , 10 * XMAS * XMAS raw joy signal * XMAS Normalised Emotional Valence 14 − day smoothed joy s 8 6 4 * valentine . * valentine * halloween * easter 2 * halloween * easter * RIOTS * halloween 0 * CUTS * roy.wed. − 2 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date s, Clear peaking pattern during XMAS or other annual celebrations (Valentine’s Day, Easter)

Recession, riots, and Twitter emotions (UK) Budget Cuts (UK) Riots (UK) 1.5 Anger Fear Date of Budget Cuts 1 Date of Riots Difference in mean 0.5 0 − 0.5 − 1 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date Difference in mean mood score 50 days prior and after each date; peaks indicate increase in mood change

Inferring voting intention — Data sets United Kingdom + 3 political parties (Conservatives, Labour, Lib Dem) + 42,000 Twitter users distributed proportionally to UK’s regional population figures + 60 million tweets, 80,976 1-grams + 240 polls from 30 Apr. 2010 to 13 Feb. 2012 Austria + 4 political parties (SPO, OVP , FPO, GRU) + 1,100 active Twitter users selected by political scientists + 800,000 tweets, 22,917 1-grams + 98 polls from 25 Jan. to 25 Dec. 2012

Regularised text regression x i ∈ R m , i ∈ { 1 , . . . , n } observations — X responses y i ∈ R , i ∈ { 1 , . . . , n } — y weights, bias w j , β ∈ R , j ∈ { 1 , . . . , m } — w ∗ = [ w ; β ] f ( x i ) = x T i w + β Elastic Net ( Zou & Hastie, 2005 ) 2 8 9 0 1 n m m m < = X X X X w 2 argmin + λ 1 | w j | + λ 2 @ y i − β − x ij w j A j w , β : ; i =1 j =1 j =1 j =1 L1-norm L2-norm

Bilinear (users+text) regularised regression users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R , — y weights, bias u k , w j , β ∈ R , k ∈ { 1 , . . . , p } u , w , β — j ∈ { 1 , . . . , m } f ( Q i ) = u T Q i w + β + β × × ) = u T Q T Q i w Q i w

Bilinear elastic net (BEN) + β × × ) = u T Q : T Q i w Q i w ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 � � where ψ ( x , λ 1 , λ 2 ) = λ 1 k x k ` 1 + λ 2 k x k 2 ` 2

Training bilinear elastic net (BEN) : ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 Biconvex problem + fix u , learn w and vice versa + iterate through convex optimisation tasks Large-scale solvers in SPAMS ( Mairal et al., 2010 ) Global Objective 2.4 RMSE Global objective function 2 during training ( red ) 1.6 1.2 Corresponding prediction 0.8 error on held out data ( blue ) 0.4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Step

Bilinear and multi-task regression tasks τ ∈ Z + users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R τ , Y — weights, bias β ∈ R τ , k ∈ { 1 , . . . , p } — β β u k , w j , β U , W , β β j ∈ { 1 , . . . , m } 1 2 � U T Q i W � f ( Q i ) = tr + β × × T Q i w U T Q T � i W

1 2 Bilinear Group L 2,1 (BGL) × × T Q i w U T Q T � i W 8 9 p ⌧ n m � 2 + λ u < = X X X X u T Q i w t + β t � y ti � argmin k U k k 2 + λ w k W j k 2 U , W , � � � : ; t =1 i =1 j =1 k =1 + a nonzero weighted feature (user or word) is encouraged to be nonzero for all tasks , but with potentially different weights + intuitive for political preference inference

Voting intention inference performance Mean poll Last poll 3 Elastic Net (words) 3.067 BEN Root Mean Squared Error BGL 2 2 1.851 1.723 1.699 1.69 1.573 1.478 1.47 1.442 1.439 1 0 UK Austria

Voting intention comparative plots 40 40 35 35 Voting Intention % Voting Intention % 30 30 BEN CON CON 25 25 BGL LAB LAB 20 20 LIB LIB 15 15 10 10 5 5 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 40 35 Voting Intention % 30 CON 25 YouGov LAB 20 LIB 15 10 5 0 5 10 15 20 25 30 35 40 45 Time

Voting intention comparative plots 30 30 25 25 Voting Intention % Voting Intention % 20 20 15 15 10 10 SPÖ BGL SPÖ BEN ÖVP ÖVP 5 5 FPÖ FPÖ GRÜ GRÜ 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 30 25 Voting Intention % 20 15 10 Polls SPÖ ÖVP 5 FPÖ GRÜ 0 5 10 15 20 25 30 35 40 45 Time

Mining socio-political and socio-economic signals from social media - PowerPoint PPT Presentation

Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on Big Data & Networks in Social Sciences

Asynchronous Events: Signals Signals Concepts Generating Signals Catching Signals

Asynchronous Events: Signals Signals Concepts Generating Signals

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Topic 1: LTI Systems Overview: Introduction to Signals Types of Signals: CT/DT,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

July 2019 POLITICAL MONITOR 1 1 Ipsos MORI Political Monitor | Public Ipsos MORI Political

Welfare Reform and Economic Citizenship: The Cultural Political Economy of Socio-Economic

6.003: Signals and Systems Signals and Systems September 8, 2011 1 6.003: Signals and Systems

EE361: SIGNALS AND SYSTEMS II REVIEW SIGNALS AND SYSTEMS I http://www.ee.unlv.edu/~b1morris/ee361

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO

Signals Maninder Kaur professormaninder@gmail.com Maninder Kaur www.eazynotes.com 1 Various

Signal Encoding Techniques Digital Data, Analog Signals Analog Data, Digital Signals ITS323:

Signals - II Tevfik Ko ar Louisiana State University October 9 th , 2008 1 2 Sending

INTRODUCTION - GEOGRAPHICAL AND SOCIO-ECONOMIC INDICATORS GEOGRAPHICAL INDICATORS SOCIO-ECONOMIC

Socio-economic considerations for Biosafety: case study of Moldova By Angela Lozan, Ministry of

OBJ/349/013 SOCIO-ECONOMIC IMPACT PRESENTATION CORE DOCUMENTS OBJ/349/POE/10 Socio-economic

Summe r 2018 Crimina l L a w We b ina r JOHN RU BI N, SHE A DE NNI NG & PHI L DI XON

NEIGHBORS TOGETHER Promising Practices to Strengthen Relations with Refugees and Muslims

Safe Neighborhoods by Design Crime Prevention Through Environmental Design (CPTED) August 7, 2017

Arkansas CDBG Disaster Funds CDBG Disaster Funds In 2008 Arkansas received a total of

Why teachers should want to follow our curriculum design? Ceclia Galvo Instituto de

Follow Your Soul CO-CREATE YOUR DREAM LIFE MODULE THREE Welcome! Welcome to Module Three! u

I dont know what Im doing I have no EE training Everything Ive done with PCBs

IIRP World Conference November, 2013 Jackie Vazquez, Mary Beth Spinelli, Kathy Sweetland

Mining socio-political and socio-economic signals from social media - PowerPoint PPT Presentation

Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on Big Data & Networks in Social Sciences

Asynchronous Events: Signals Signals Concepts Generating Signals Catching Signals

Asynchronous Events: Signals Signals Concepts Generating Signals

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Topic 1: LTI Systems Overview: Introduction to Signals Types of Signals: CT/DT,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

July 2019 POLITICAL MONITOR 1 1 Ipsos MORI Political Monitor | Public Ipsos MORI Political

Welfare Reform and Economic Citizenship: The Cultural Political Economy of Socio-Economic

6.003: Signals and Systems Signals and Systems September 8, 2011 1 6.003: Signals and Systems

EE361: SIGNALS AND SYSTEMS II REVIEW SIGNALS AND SYSTEMS I http://www.ee.unlv.edu/~b1morris/ee361

091031 091031 VIDEO SIGNALS VIDEO SIGNALS Lecturer: Marco Marcon 091032 - AUDIO AND VIDEO

Signals Maninder Kaur professormaninder@gmail.com Maninder Kaur www.eazynotes.com 1 Various

Signal Encoding Techniques Digital Data, Analog Signals Analog Data, Digital Signals ITS323:

Signals - II Tevfik Ko ar Louisiana State University October 9 th , 2008 1 2 Sending

INTRODUCTION - GEOGRAPHICAL AND SOCIO-ECONOMIC INDICATORS GEOGRAPHICAL INDICATORS SOCIO-ECONOMIC

Socio-economic considerations for Biosafety: case study of Moldova By Angela Lozan, Ministry of

OBJ/349/013 SOCIO-ECONOMIC IMPACT PRESENTATION CORE DOCUMENTS OBJ/349/POE/10 Socio-economic

Summe r 2018 Crimina l L a w We b ina r JOHN RU BI N, SHE A DE NNI NG &amp; PHI L DI XON

NEIGHBORS TOGETHER Promising Practices to Strengthen Relations with Refugees and Muslims

Safe Neighborhoods by Design Crime Prevention Through Environmental Design (CPTED) August 7, 2017

Arkansas CDBG Disaster Funds CDBG Disaster Funds In 2008 Arkansas received a total of

Why teachers should want to follow our curriculum design? Ceclia Galvo Instituto de

Follow Your Soul CO-CREATE YOUR DREAM LIFE MODULE THREE Welcome! Welcome to Module Three! u

I dont know what Im doing I have no EE training Everything Ive done with PCBs

IIRP World Conference November, 2013 Jackie Vazquez, Mary Beth Spinelli, Kathy Sweetland

Summe r 2018 Crimina l L a w We b ina r JOHN RU BI N, SHE A DE NNI NG & PHI L DI XON