Mining the Social Web: A series of statistical NLP case studies - - PowerPoint PPT Presentation

mining the social web a series of statistical nlp case
SMART_READER_LITE
LIVE PREVIEW

Mining the Social Web: A series of statistical NLP case studies - - PowerPoint PPT Presentation

Mining the Social Web: A series of statistical NLP case studies Vasileios Lampos Department of Computer Science University College London March, 2015 1 / 52 v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j 1/52 Key assumptions about social


slide-1
SLIDE 1

Mining the Social Web: A series of statistical NLP case studies

Vasileios Lampos

Department of Computer Science University College London March, 2015

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

1/52

1/52

slide-2
SLIDE 2

Key assumptions about social media

  • a significant sample of the population uses them — biases exist
  • a significant amount of the published content is geo-located
  • reflect on collective portions of real-life (e.g., opinions, events)
  • usually forming a real-time relationship
  • it is easy to collect, store and process this content (?)
  • more data (big data) → higher confidence (?)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

2/52

2/52

slide-3
SLIDE 3

Twitter in one slide

  • 140 characters per published status (tweet)
  • users can follow others and can be followed
  • embedded usage of topics (#rbnews, #inception in statistics)
  • user interaction: re-tweets, @replies, @mentions, favourites
  • real-time nature
  • biased demographics (13-15% of UK’s population)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

3/52

3/52

slide-4
SLIDE 4

In this talk

Case studies where we harness social media information to:

  • extract simplified collective mood patterns

(Lansdall et al., 2012)

  • nowcast phenomena (an infectious disease or rainfall rates)

(Lampos, Cristianini, 2010 & 2012)

  • model voting intention

(Lampos et al., 2013)

  • estimate user impact and explore user characteristics related to it

(Lampos et al., 2014) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

4/52

4/52

slide-5
SLIDE 5

Proof of concept and a little more: extracting collective mood patterns

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

5/52

5/52

slide-6
SLIDE 6

Time series of joy and anger based on UK tweets

, e d by st is.

  • d

ied d ying location s,

Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 −2 2 4 6 8 10 933 Day Time Series for Joy in Twitter Content Date Normalised Emotional Valence

* RIOTS * CUTS * XMAS * XMAS * XMAS * roy.wed. * halloween * halloween * halloween * valentine * valentine * easter * easter

raw joy signal 14−day smoothed joy

joy

happy, enjoy, love, glad, joyful, elated...

Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 −1 −0.5 0.5 1 1.5 Date Difference in mean Anger Fear Date of Budget Cuts Date of Riots

derivative of anger & fear

(Lansdall et al., 2012), (Strapparava, Valitutti, 2004) → WordNet Affect v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

6/52

6/52

slide-7
SLIDE 7

Mood projections via PCA

Projection of 4-dimensional mood score signals (joy, sadness, anger and fear) on their top-2 principal components (2011 Twitter data)

−1.5 −1 −0.5 0.5 1 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 Saturday Sunday Monday Tuesday Wednesday Thursday Friday 1st Principal Component 2nd Principal Component Days of the Week −8 −6 −4 −2 2 4 6 8 −2 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 5253 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 8687 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 1st Principal Component 2nd Principal Component Days in 2011

New Year (1), Valentine’s (45), Christmas Eve (358), New Year’s Eve (365) O.B. Laden’s death (122), Winehouse’s death & Breivik (204), UK riots (221)

(Lampos, 2012), (Strapparava, Valitutti, 2004) → WordNet Affect v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

7/52

7/52

slide-8
SLIDE 8

Supervised learning

Primary outcomes (linear methods)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

8/52

8/52

slide-9
SLIDE 9

Regression basics — Ordinary Least Squares

  • observations

x x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R,

j ∈ {1, ..., m}

— w w w∗ = [w w w; β] Ordinary Least Squares (OLS) argmin

w w w∗

X X X∗w w w∗ − y y y2

ℓ2 ⇒ w

w w∗ =

  • X

X XT

∗ X

X X∗

−1 X

X XT

∗ y

y y Why not? − − − X X XT

∗ X

X X∗ may be singular (thus difficult to invert) − − − high-dimensional models become difficult to interpret − − − unsatisfactory prediction accuracy (estimates have large variance)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

9/52

9/52

slide-10
SLIDE 10

Regression basics — Ridge Regression

  • observations

x x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R,

j ∈ {1, ..., m}

— w w w∗ = [w w w; β] Ridge Regression (RR) argmin

w w w∗

  • X

X X∗w w w∗ − y y y2

ℓ2 + λw

w w2

ℓ2

  • +

+ + size constraint on the weight coefficients (regularisation) → resolves problems caused by collinear variables + + + less degrees of freedom, better predictive accuracy than OLS − − − does not perform feature selection (nonzero coefficients)

(Hoerl, Kennard, 1970) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

10/52

10/52

slide-11
SLIDE 11

Regression basics — Lasso

  • observations

x x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R,

j ∈ {1, ..., m}

— w w w∗ = [w w w; β] ℓ1 ℓ1 ℓ1–norm regularisation or lasso (Tibshirani, 1996) argmin

w w w∗

  • X

X X∗w w w∗ − y y y2

ℓ2 + λw

w wℓ1

− − no closed form solution — quadratic programming problem + + + Least Angle Regression (LAR) → entire reg. path (Efron et al., 2004) + + + sparse w w w, interpretability, better performance (Hastie et al., 2009) − − − if m > n, at most n variables can be selected − − − co-linear predictors → unable to select true model (Zhao, Yu, 2009)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

11/52

11/52

slide-12
SLIDE 12

Lasso for text regression

  • n-gram: set of n words or tokens
  • n-gram frequency: count (often normalised) in a corpus
  • target variable: numerical representation of an “event”
  • n-gram frequencies x

x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • target phenomenon yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R, j ∈ {1, ..., m} — w w w∗ = [w w w; β] lasso (for text regression) argmin

w w w∗

  • X

X X∗w w w∗ − y y y2

ℓ2 + λw

w wℓ1

  • v.lampos@ucl.ac.uk

Slides: http://bit.ly/1GrxI8j

12/52

12/52

slide-13
SLIDE 13

Nowcasting ILI rates from Twitter (1/2)

Assumptions

  • Twitter users post about their health condition
  • We can turn this information into an influenza-like-illness (ILI) rate

Is there a signal in the data?

  • 41 illness related keyphrases (e.g., flu, fever, sore throat, headache)
  • z-scored aggregate keyphrase frequency vs. official ILI rates

160 180 200 220 240 260 280 300 320 340 −2 −1 2 3 4 5 −2

Day Number (2009) Flu rate / score (z−scores)

Twitter’s Flu−score (region D) HPA’s Flu rate (region D)

England & Wales (region D) r = .856

(Lampos, Cristianini, 2010) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

13/52

13/52

slide-14
SLIDE 14

Nowcasting ILI rates from Twitter (2/2)

  • create a pool of 1-gram features (approx. 1600) by indexing

relevant web pages (e.g., Wikipedia, NHS, health forums)

  • stop-words removed, Porter-stemming applied
  • automatic 1-gram selection and weighting via lasso

Selected uni-grams

‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ‘flu’, ‘cough’, ‘nose’, ‘vomit’, ‘diseas’, ‘sore’, ‘throat’, ‘fever’, ‘ach’, ‘runni’, ‘sick’, ‘ill’, ...

180 200 220 240 260 280 300 320 340 50 100 150

Day Number (2009) Flu rate HPA Inferred

England & Wales r = .968

(Lampos, Cristianini, 2010) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

14/52

14/52

slide-15
SLIDE 15

Nowcasting rainfall rates — a generalisation

  • fix lasso’s model selection with bootstrap lasso (Bach, 2008)
  • include 2-grams and perform hybrid combination with 1-grams

Bristol

5 10 15 20 25 30 2 4 6 8 10 12 14 16

Days Rainfall rate (mm) − Bristol Actual Inferred

5 10 15 20 25 30 2 4 6 8 10 12 14 16

Days Rainfall rate (mm) − London Actual Inferred

London (Lampos, Cristianini, 2012) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

15/52

15/52

slide-16
SLIDE 16

Back to regression basics — Elastic Net

  • observations

x x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R,

j ∈ {1, ..., m}

— w w w∗ = [w w w; β] linear Elastic Net (LEN) argmin

w w w∗

    

X X X∗w w w∗ − y y y2

ℓ2

  • OLS

+ λ1w w w2

ℓ2

  • RR reg.

+ λ2w w wℓ1

  • Lasso reg.

    

+ + + combination of RR (co-linear predictors) and lasso (sparsity) + + + entire reg. path can be explored by modifying LAR + + + if m > n, number of selected variables not limited to n − − − may select redundant variables!

(Zhou, Hastie, 2005) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

16/52

16/52

slide-17
SLIDE 17

Supervised learning

Bilinear approaches

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

17/52

17/52

slide-18
SLIDE 18

Bilinear text regression — The general idea (1/2)

Linear regression: f (x x xi) = x x xT

i w

w w + β

  • observations

x x xi ∈ Rm,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

wj, β ∈ R,

j ∈ {1, ..., m}

— w w w∗ = [w w w; β] Bilinear regression: f (Q Q Qi) = u u uTQ Q Qiw w w + β

  • users

p ∈ Z+

  • observations

Q Q Qi ∈ Rp×m,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

uk, wj, β ∈ R,

k ∈ {1, ..., p}

— u u u, w w w, β

j ∈ {1, ..., m}

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

18/52

18/52

slide-19
SLIDE 19

Bilinear text regression — The general idea (2/2)

  • users

p ∈ Z+

  • observations

Q Q Qi ∈ Rp×m,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

uk, wj, β ∈ R,

k ∈ {1, ..., p}

— u u u, w w w, β

j ∈ {1, ..., m}

f (Q Q Qi) = u u uTQ Q Qiw w w + β

× × + β

u u uT Q Q Qi w w w

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

19/52

19/52

slide-20
SLIDE 20

Bilinear text regression — Regularisation

  • users

p ∈ Z+

  • observations

Q Q Qi ∈ Rp×m,

i ∈ {1, ..., n}

— X X X

  • responses

yi ∈ R,

i ∈ {1, ..., n}

— y y y

  • weights, bias

uk, wj, β ∈ R,

k ∈ {1, ..., p}

— u u u, w w w, β

j ∈ {1, ..., m}

argmin

u u u,w w w,β

n

  • i=1
  • u

u uTQ Q Qiw w w + β − yi

2 + ψ(u

u u, θu) + ψ(w w w, θw)

  • ψ(·): regularisation function with a set of hyper-parameters (θ)
  • if ψ (v

v v, λ) = λv v vℓ1 Bilinear Lasso

  • if ψ (v

v v, λ1, λ2) = λ1v v v2

ℓ2 + λ2v

v vℓ1 Bilinear Elastic Net (BEN)

(Lampos et al., 2013) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

20/52

20/52

slide-21
SLIDE 21

Learning the parameters of BEN

argmin

u u u,w w w,β

  • n
  • i=1
  • u

u uTQ Q Qiw w w + β − yi

2 + λu1u

u u2

ℓ2 + λu2u

u uℓ1 + λw1w w w2

ℓ2 + λw2w

w wℓ1

  • Bi-convexity: fix u

u u, learn w w w and vice versa Iterating through convex optimisation tasks: convergence

(Al-Khayyal, Falk, 1983; Horst, Tuy, 1996)

FISTA (Beck, Teboulle, 2009) implemented in SPAMS (Mairal et al., 2010) Large-scale optimisation solver, quick convergence

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0.4 0.8 1.2 1.6 2 2.4

Step

Global Objective RMSE

RMSE on held-out data vs Obj. function through iterations

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

21/52

21/52

slide-22
SLIDE 22

Supervised learning

Bilinear approaches for modelling voting intention (based on social media content)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

22/52

22/52

slide-23
SLIDE 23

Political opinion/voting intention mining — Brief recap

Primary papers

  • predict the result of an election via Twitter (Tumasjan et al., 2010)
  • model socio-political sentiment polls (O’Connor et al., 2010)
  • above 2 failed in 2009 US congr. elections (Gayo-Avello, 2011)
  • desired properties of such models (Metaxas et al., 2011)

Features used

  • lexicon-based, e.g. using LIWC (Tausczik, Pennebaker, 2010)
  • task-specific keywords (names of parties, politicians)
  • tweet volume

reviewed in (Gayo-Avello, 2013)

  • However. . .

− − − political descriptors change in time, differ per country − − − personalised (user) modelling missing (present in actual polls) − − − multi-task learning? a user who likes party A, may dislike party B

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

23/52

23/52

slide-24
SLIDE 24

Voting intention modelling — Data (UK)

  • 42K users distributed proportionally to regional population figures
  • 60 million tweets from 30/04/2010 to 13/02/2012
  • 80,976 1-grams → (Pret

¸iuc-Pietro et al., 2012)

  • 240 voting intention polls (YouGov)
  • 3 parties: Conservatives (CON), Labour Party (LAB), Liberal

Democrats (LIB)

  • main language: English

5 30 55 80 105 130 155 180 205 230 5 10 15 20 25 30 35 40 45

Voting Intention % Time

CON LAB LIB

voting intention for the UK

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

24/52

24/52

slide-25
SLIDE 25

Voting intention modelling — Data (Austria)

  • 1.1K users manually selected by political analysts (SORA)
  • 800K tweets from 25/01 to 01/12/2012
  • 22,917 1-grams → (Pret

¸iuc-Pietro et al., 2012)

  • 98 voting intention polls from various pollsters
  • 4 parties: Social Democratic Party (SP¨

O), People’s Party (¨ OVP), Freedom Party (FP¨ O), Green Alternative Party (GR¨ U)

  • main language: German

5 20 35 50 65 80 95 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

voting intention for Austria

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

25/52

25/52

slide-26
SLIDE 26

Voting intention modelling — Evaluation

  • 10-fold (not cross) validation

− − − train a model using data based on a set of contiguous polls A − − − test on the next D = 5 polls − − − expand training set to {A ∪ D}, test on the next |D′| = 5 polls

  • realistic scenario: train on past, predict future polls
  • overall test predictions on 50 polls (in each case study)

Baselines

µ µ: constant prediction based on µ(y

y y) in the training set

  • Blast: constant prediction based on last(y

y y) in the training set

  • LEN: (linear) Elastic Net prediction (using word frequencies)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

26/52

26/52

slide-27
SLIDE 27

Voting intention modelling — BEN’s performance (1/2)

Average RMSEs on the voting intention percentage predictions in the 10-step validation process ‘UK’ case study CON LAB LIB µ µ µ Bµ

µ µ

2.272 1.663 1.136 1.136 1.136 1.69 Blast 2 2.074 1.095 1.723 LEN 3.845 2.912 2.445 3.067 BEN 1.939 1.939 1.939 1.644 1.644 1.644 1.136 1.136 1.136 1.573 1.573 1.573 ‘Austria’ case study SP¨ O ¨ OVP FP¨ O GR¨ U µ µ µ Bµ

µ µ

1.535 1.373 3.3 1.197 1.851 Blast 1.148 1.148 1.148 1.556 1.639 1.639 1.639 1.536 1.47 LEN 1.291 1.286 1.286 1.286 2.039 1.152 1.152 1.152 1.442 1.442 1.442 BEN 1.392 1.31 2.89 1.205 1.699

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

27/52

27/52

slide-28
SLIDE 28

Voting intention modelling — BEN’s performance (2/2)

Polls BEN UK

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40

Voting Intention % Time

CON LAB LIB

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40

Voting Intention % Time

CON LAB LIB

Austria

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

good, but probably not good enough?

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

28/52

28/52

slide-29
SLIDE 29

Supervised learning

MULTI-TASK Bilinear approaches for modelling voting intention (based on social media content)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

29/52

29/52

slide-30
SLIDE 30

Multi-task learning

What

  • Instead of learning/optimising a single task (one target variable)
  • ... optimise multiple tasks jointly

Why (Caruana, 1997)

  • improves generalisation performance exploiting domain-specific

information of related tasks

  • a good choice for under-sampled distributions
  • knowledge transfer
  • application-driven reasons
  • e.g., explore interplay between political parties

How

  • Multi-task regularised regression

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

30/52

30/52

slide-31
SLIDE 31

Linear multi-task learning: the ℓ2,1 ℓ2,1 ℓ2,1-norm regularisation

W W W2,1 =

m

  • j=1

W W W jℓ2 , where W W W j denotes the j-th row ℓ2,1 ℓ2,1 ℓ2,1-norm regularisation argmin

W W W,β β β

  X

X XW W W − Y Y Y 2

ℓF + λ m

  • j=1

W W W jℓ2

  

  • multi-task learning: instead of w

w w ∈ Rm, learn W W W ∈ Rm×τ, where τ is the number of tasks

  • ℓ2,1-norm regularisation → sum of W

W W’s row ℓ2-norms (Argyriou et al.,

2008; Liu et al., 2009) extends group lasso (Yuan, Lin, 2006)

  • group lasso: instead of single variables, selects groups of variables
  • ‘groups’ now become the τ-dimensional rows of W

W W

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

31/52

31/52

slide-32
SLIDE 32

Bilinear multi-task learning

  • tasks

τ ∈ Z+

  • users

p ∈ Z+

  • observations

Q Q Qi ∈ Rp×m,

i ∈ {1, ..., n}

— X X X

  • responses

y y yi ∈ Rτ,

i ∈ {1, ..., n}

— Y Y Y

  • weights, bias

u u uk,w w wj,β β β ∈ Rτ, k ∈ {1, ..., p} — U U U, W W W, β β β

j ∈ {1, ..., m}

f (Q Q Qi) = tr

  • U

U UTQ Q QiW W W

  • + β

β β

× ×

U U UT Q Q Qi W W W

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

32/52

32/52

slide-33
SLIDE 33

Bilinear Group ℓ2,1 ℓ2,1 ℓ2,1 (BGL) (1/2)

  • tasks

τ ∈ Z+

  • users

p ∈ Z+

  • observations

Q Q Qi ∈ Rp×m,

i ∈ {1, ..., n}

— X X X

  • responses

y y yi ∈ Rτ,

i ∈ {1, ..., n}

— Y Y Y

  • weights, bias

u u uk,w w wj,β β β ∈ Rτ, k ∈ {1, ..., p} — U U U, W W W, β β β

j ∈ {1, ..., m}

argmin

U U U,W W W,β β β

  • τ
  • t=1

n

  • i=1
  • u

u uT

t Q

Q Qiw w wt + βt − yti

2

+ λu

p

  • k=1

U U Uk2 + λw

m

  • j=1

W W W j2

  • Learning: 2 convex tasks → first learn {W

W W,β β β}, then {U U U,β β β} and vice versa; iterate through this process

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

33/52

33/52

slide-34
SLIDE 34

Bilinear Group ℓ2,1 ℓ2,1 ℓ2,1 (BGL) (2/2)

argmin

U U U,W W W,β β β

  • τ
  • t=1

n

  • i=1
  • u

u uT

t Q

Q Qiw w wt + βt − yti

2

+ λu

p

  • k=1

U U Uk2 + λw

m

  • j=1

W W W j2

  • ×

×

U U UT Q Q Qi W W W

  • a feature (user or word) is activated (selected) for all tasks
  • with different weights
  • especially useful in the domain of politics
  • e.g., user pro party A, but against parties B and C

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

34/52

34/52

slide-35
SLIDE 35

Voting intention modelling — BGL’s performance (1/2)

‘UK’ case study CON LAB LIB µ µ µ Bµ

µ µ

2.272 1.663 1.136 1.69 Blast 2 2.074 1.095 1.723 LEN 3.845 2.912 2.445 3.067 BEN 1.939 1.644 1.136 1.573 BGL 1.785 1.785 1.785 1.595 1.595 1.595 1.054 1.054 1.054 1.478 1.478 1.478 ‘Austria’ case study SP¨ O ¨ OVP FP¨ O GR¨ U µ µ µ Bµ

µ µ

1.535 1.373 3.3 1.197 1.851 Blast 1.148 1.148 1.148 1.556 1.639 1.639 1.639 1.536 1.47 LEN 1.291 1.286 2.039 1.152 1.152 1.152 1.442 BEN 1.392 1.31 2.89 1.205 1.699 BGL 1.619 1.005 1.005 1.005 1.757 1.374 1.439 1.439 1.439

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

35/52

35/52

slide-36
SLIDE 36

Voting intention modelling — BGL’s performance (2/2)

Polls BEN BGL UK

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40

Voting Intention % Time

CON LAB LIB

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40

Voting Intention % Time

CON LAB LIB

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40

Voting Intention % Time

CON LAB LIB

Austria

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

5 10 15 20 25 30 35 40 45 5 10 15 20 25 30

Voting Intention % Time

SPÖ ÖVP FPÖ GRÜ

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

36/52

36/52

slide-37
SLIDE 37

Voting intention modelling — Qualitative insight

Party Tweet Score Author CON PM in friendly chat with top EU mate, Sweden’s Fredrik Re- infeldt, before family photo 1.334 Journalist LAB I am so pleased to hear Paul Savage who worked for the Labour group has been Appointed the Marketing manager for the baths hall GREAT NEWS −0.552 Politician (Labour) LBD RT @user: Must be awful for TV bosses to keep getting knocked back by all the women they ask to host election night (via @user) 0.874 LibDem MP SP¨ O Inflationsrate in ¨

  • O. im Juli leicht gesunken: von 2,2 auf 2,1%.

Teurer wurde Wohnen, Wasser, Energie. Translation: Inflation rate in Austria slightly down in July from 2,2 to 2,1%. Accommodation, Water, Energy more expensive. 0.745 Journalist ¨ OVP kann das buch “res publica” von johannes #voggenhuber wirklich empfehlen! so zum nachdenken und so... #europa #demokratie Translation: can really recommend the book “res publica” by johannes #voggenhuber! Food for thought and so on #europe #democracy −2.323 User GR¨ U Protestsong gegen die Abschaffung des Bachelor-Studiums Internationale Entwicklung: <link> #IEbleibt #unibrennt #uniwut Translation: Protest songs against the closing-down of the bachelor course of International Development: <link> #IDremains #uniburns #unirage 1.45 Student Union v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

37/52

37/52

slide-38
SLIDE 38

What does content tell us about users?

User impact characterisation on Twitter (with a nonlinear approach)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

38/52

38/52

slide-39
SLIDE 39

Predicting and characterising user impact on Twitter

Motivation

  • predict user impact from user activity, including text
  • use this prediction model as a guide to qualitatively investigate

links between user impact and user behaviour Data

  • 48 million tweets posted by 38,020 UK users

− − − from 14/04/2011 to 12/04/2012 − − − subset of the data set used in (Lampos et al., 2013)

  • 400 million tweets (from the Gardenhose stream — 10%)

− − − from 02/01/2011 to 28/02/2011 − − − for creating topic clusters − − − data processed via (Pret

¸iuc-Pietro et al., 2012) (Lampos et al., 2014) v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

39/52

39/52

slide-40
SLIDE 40

User impact — a simplified definition

S(φin, φout, φλ) = ln

  • (φλ + θ) (φin + θ)2

φout + θ

  • φin: number of followers, φout: number of followees
  • φλ: number of times the account has been listed
  • θ = 1, logarithm is applied on a positive number

φ2

in/φout

  • = (φin − φout) × (φin/φout) + φin

Histogram of the user impact scores in our data set µ(S) = 6.776

−5 5 10 15 20 25 30 0.05 0.1 0.15

Impact Score (S) Probability Density

@guardian @David_Cameron @PaulMasonNews @lampos @nikaletras @spam?

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

40/52

40/52

slide-41
SLIDE 41

User activity features

a1 # of tweets a2 proportion of retweets a3 proportion of non-duplicate tweets a4 proportion of tweets with hashtags a5 hashtag-tokens ratio in tweets a6 proportion of tweets with @-mentions a7 # of unique @-mentions in tweets a8 proportion of tweets with @-replies a9 links ratio in tweets a10 # of favourites the account made a11 total # of tweets (entire history) a12 using default profile background (binary) a13 using default profile image (binary) a14 enabled geolocation (binary) a15 population of account’s location a16 account’s location latitude a17 account’s location longitude a18 proportion of days with nonzero tweets v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

41/52

41/52

slide-42
SLIDE 42

User participation in topic-specific discussions

NPMI (Bouma, 2009) + Spectral Clustering (von Luxburg, 2007)

Label Cluster’s words ranked by centrality Weather (τ1) mph, humidity, barometer, gust, winds, hpa, temperature, kt Healthcare, Finance, Housing (τ2) nursing, nurse, rn, registered, bedroom, clinical, #news, es- tate, #hospital, rent, healthcare, therapist, condo, invest- ment, furnished, medical, #nyc, occupational, investors, #ny Politics (τ3) senate, republican, gop, police, arrested, voters, robbery, democrats, presidential, elections, charged, election, charges, #religion, arrest, repeal, dems, #christian, reform Showbiz, Movies (τ4) damon, potter, #tvd, harry, elena, kate, portman, pattinson, hermione, jennifer, kristen, stefan, robert, catholic, stewart, katherine, lois, jackson, vampire, natalie, #vampirediaries Commerce (τ5) chevrolet, inventory, coupon, toyota, mileage, sedan, nissan, adde, jeep, 4x4, 2002, #coupon, enhanced, #deal, dodge Twitter hashtags (τ6) #teamfollowback, #500aday, #tfb, #instantfollowback, #ifollowback, #instantfollow, #followback Social unrest (τ7) #egypt, #tunisia, #iran, #israel, #palestine, tunisia, arab, #jan25, iran, israel, protests, egypt, #yemen, #iranelection, israeli, #jordan, regime, yemen, #gaza, protesters, #lebanon ... ... v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

42/52

42/52

slide-43
SLIDE 43

User impact modelling as a regression task

Feature sets − − − user activity only (A) − − − A and top 1-grams (AW) − − − A + |τ| topic clusters (AC) Regression via − − − Ridge Regression (RR) − − − Gaussian Process (GP) using a Squared Exponential kernel with Automatic Relevance Determination (ARD)

(Rasmussen and Williams, 2006)

GPs offer a very interesting (and well established) framework for performing regression [and classification] tasks in a nonlinear, kernelised fashion — intro at: http://videolectures.net/gpip06_mackay_gpb/

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

43/52

43/52

slide-44
SLIDE 44

Performance estimates

Linear (RR) Nonlinear (GP) Model r RMSE r RMSE A .667 2.642 .759 2.298 AW .712 2.529 .768 2.263 AC, |τ| = 50 .703 2.518 .774 2.234 AC, |τ| = 100 .714 2.480 .780 .780 .780 2.210 2.210 2.210

Most valuable / relevant features

  • 1. default profile image
  • 2. # of historical tweets
  • 3. # of unique @-mentions
  • 4. # of tweets (last year)
  • 5. links (ratio)
  • 6. topic:weather
  • 7. topic:healthcare-finance
  • 8. topic:politics
  • 9. : days with nonzero tweets (ratio)
  • 10. @-replies (ratio)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

44/52

44/52

slide-45
SLIDE 45

User impact — Qualitative analysis (1/2)

100 100 100 100 10 20 30 100 10 20 30

L H L H L H L H L H Tweetszinzentirezhistoryz(α11) Uniquez@-mentionsz(α7) Linksz(α9) @-repliesz(α8) Dayszwithznonzeroztweetsz(α18)

impact score distribution for user accounts with high (H) or low (L) values for the most relevant user attributes solid line: µ(S) in our data dashed line: µ(S) in user class

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

45/52

45/52

slide-46
SLIDE 46

User impact — Qualitative analysis (2/2)

10 20 30 150 300 450 600 750 900 IA NIA 10 20 30 100 200 300 400 IA IAC 10 20 30 100 200 300 400 500 L NL 10 20 30 100 200 300 400 500 TO TF 10 20 30 50 100 150 200 LT ST A B C D E

  • A: Interactive (IA) vs non Interactive (NIA) users

− − − interactive: tweet regularly, do many @-mentions and @-replies, mention many different users

  • B: IA vs clique-Interactive (IAC)

− − − IAC: interactive but not mentioning many different users

  • C: Use links (L) vs does not (NL) when discussing the most prediction

relevant topics (i.e., Politics and Showbiz)

  • D: Topic focused (TF) vs topic overall (TO)
  • E: ‘Serious’ (ST) vs ‘light’ (LT) topics

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

46/52

46/52

slide-47
SLIDE 47

Summary

You’ve seen: + + + how user-generated data can be used to make inferences about

  • collective mood / emotions
  • real-world phenomena — flu, rainfall rates
  • political preference — voting intention

+ + + a new class of bilinear models adaptive to the nature of social media content + + + how a simplified notion of impact is connected to the usage of social media platforms Simple future challenges − − − embed such derivations into real-world systems and enhance decision making (i.e., epidemiological surveillance tasks) − − − further improvements on the applied supervised modelling (predictive models)

v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

47/52

47/52

slide-48
SLIDE 48

In collaboration with

Trevor Cohn, University of Melbourne Nello Cristianini, University of Bristol Daniel Preot ¸iuc-Pietro, University of Pennsylvania Nikolaos Aletras, University College London Thomas Lansdall-Welfare, University of Bristol

http://www.i-sense.org.uk/ v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

48/52

48/52

slide-49
SLIDE 49

Thank you

Any questions?

Download the slides from

http://www.lampos.net/research/talks-posters v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

49/52

49/52

slide-50
SLIDE 50

References I

Al-Khayyal and Falk. Jointly Constrained Biconvex Programming. MOR, 1983. Argyriou, Evgeniou and Pontil. Convex multi-task feature learning. Machine Learning, 2008.

  • Bach. Bolasso: Model Consistent Lasso Estimation through the Bootstrap. ICML, 2008.

Beck and Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. J. Imaging Sci., 2009.

  • Bouma. Normalized (pointwise) mutual information in collocation extraction. GSCL,

2009.

  • Caruana. Multitask Learning. Machine Learning, 1997.

Efron, Hastie, Johnstone and Tibshirani. Least Angle Regression. The Annals of Statistics, 2004. Gayo-Avello. A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter

  • Data. SSCR, 2013.

Gayo-Avello, Metaxas and Mustafaraj. Limits of Electoral Predictions using Twitter. ICWSM, 2011. Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. 2009. Hoerl and Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 1970. Horst and Tuy. Global Optimization: Deterministic Approaches. 1996. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

50/52

50/52

slide-51
SLIDE 51

References II

Lampos and Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP, 2010. Lampos and Cristianini. Nowcasting Events from the Social Web with Statistical

  • Learning. ACM TIST, 2012.

Lampos, Preot ¸iuc-Pietro and Cohn. A user-centric model of voting intention from Social

  • Media. ACL, 2013.

Lampos, Aletras, Preot ¸iuc-Pietro and Cohn. Predicting and Characterising User Impact

  • n Twitter. EACL, 2014.

Liu, Ji and Ye. Multi-task feature learning via efficient ℓ2,1 ℓ2,1 ℓ2,1-norm minimization. UAI, 2009. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 2007. Mairal, Jenatton, Obozinski and Bach. Network Flow Algorithms for Structured Sparsity. NIPS, 2010. Metaxas, Mustafaraj and Gayo-Avello. How (not) to predict elections. SocialCom, 2011. O’Connor, Balasubramanyan, Routledge and Smith. From Tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 2010. Preot ¸iuc-Pietro, Samangooei, Cohn, Gibbins and Niranjan. Trendminer: An architecture for real time analysis of social media text. ICWSM, 2012. Rasmussen and Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

51/52

51/52

slide-52
SLIDE 52

References III

Strapparava and Valitutti. Wordnet-Affect: An affective extension of WordNet. LREC, 2004. Tausczik and Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. JLSP, 2010.

  • Tibshirani. Regression Shrinkage and Selection via the LASSO. JRSS, 1996.

Tumasjan, Sprenger, Sandner and Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM, 2010. Yuan and Lin. Model selection and estimation in regression with grouped variables. JRSS, 2006. Zhao and Yu. On model selection consistency of LASSO. JMLR, 2006. Zhou and Hastie. Regularization and variable selection via the elastic net. JRSS, 2005. v.lampos@ucl.ac.uk Slides: http://bit.ly/1GrxI8j

52/52

52/52