Semi-supervised Geolocation via Graph Convolutional Networks Afshin - - PowerPoint PPT Presentation

semi supervised geolocation via graph convolutional
SMART_READER_LITE
LIVE PREVIEW

Semi-supervised Geolocation via Graph Convolutional Networks Afshin - - PowerPoint PPT Presentation

Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25 Location Lost in Translation impacts impacts Social Media Society 2 / 25 Applications: Public Health Monitoring


slide-1
SLIDE 1

Semi-supervised Geolocation via Graph Convolutional Networks

Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018

1 / 25

slide-2
SLIDE 2

Location Lost in Translation

Society Social Media impacts impacts

2 / 25

slide-3
SLIDE 3

Applications: Public Health Monitoring Allergy Rates (Paul and Dredze, 2011)

3 / 25

slide-4
SLIDE 4

Applications: Emergency Situation Awareness: Bushfires, Floods and Earthquakes Fight bushfire with #fire: Alert hospital before anybody calls (Cameron et al., 2012)

4 / 25

slide-5
SLIDE 5

Location Location Location 64% Profile 1% GPS 35% Nothing 50 100 %users Profile field is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens, 2014), and biased toward younger urban users (Pavalanathan and Eisenstein, 2015)

5 / 25

slide-6
SLIDE 6

Geolocation: The three Ls

Ain’t this place a geographical

  • ddity; two weeks away from

everywhere!

Link Location Language Geolocation User geolocation is the task of identifying the “home” location of a social media user using contextual information such as geographical variation in language use and in social interactions.

6 / 25

slide-7
SLIDE 7

Huge amounts of unlabelled data, little labelled data Multiple views of Data: Text, Network

7 / 25

slide-8
SLIDE 8

Previous Work (not exhaustive)

Text-based Supervised Classification Backstrom et al. (2008) Cheng et al. (2010) Wing and Baldridge (2011, 2014) Network-based Semi-supervised Regression Backstrom et al. (2010) Davis Jr et al. (2011) Jurgens (2013) No Text No Network Joint/Hybrid Text+Network Rahimi et al. (2015) Do et al. (2017) Miura et al. (2017) Don’t utilise unlabelled text data

Our work: Text+Network Semi-supervised Geolocation

8 / 25

slide-9
SLIDE 9

Twitter Geolocation Datasets

GeoText TwitterUS TwitterWorld 1

3

1

4

1

5

1

6

1

7

1

8

9k 420k 1.4m 370k 38m 12m users tweets

9 / 25

slide-10
SLIDE 10

Discretisation of Labels Cluster continuous lat/lon: cluster ids are labels. Use the median training point of the predicted region as the final continuous prediction. Evaluate using Mean and Median errors between the known and the predicted coordinates.

10 / 25

slide-11
SLIDE 11

Text and Network Views of Data

@-mention Graph

Steven Trevor Mark Tim Karin

Normalised Adj. Matrix: A

0.5 0.5 0.5 0.5 1 0.5 0.5 0.25 0.25 0.25 0.25

Karin Mark Steven Tim Trevor Karin Mark Steven Tim Trevor Karin Mark Steven Tim Trevor

... ... ... ... ….

Melbourne #ACL2018

Text BoW: X

Two users are connected if they have a common @-mention.

11 / 25

slide-12
SLIDE 12

Baseline 1: FeatConcat Concatenate A and X, and feed them to a DNN: Y = f ([X, A]) The dimensions of A, and consequently the number of parameters grow with the number of samples.

12 / 25

slide-13
SLIDE 13

Baseline 2: DCCA

maximally correlated FC sigmoid FC softmax X: text BoW A: Neighbours predicted location: ˆ y FC linear Unsupervised DCCA Supervised Geolocation FC ReLU

CCA loss backprop

Learn a shared representation using Deep Canonical Correlation Analysis (Andrew et al., 2013): ρ = corr(f1(X), f2(A)) =

cov(f1(X),f2(A))

var(f1(X)).var(f2(A))

Y = f ([f1(X), f 2(A)])

13 / 25

slide-14
SLIDE 14

Proposed Model: GCN

Highway GCN: Highway GCN: , Output GCN: X A A A tanh tanh softmax H0 H1 Hl−1 Hl predicted location: ˆ y W l−1, bl−1, W l−1

h

, bl−1

h

W 1, b1, W 1

h , b1 h

W l, bl

GCN Layer: H(l+1) = ReLU

  • AH(l)W (l) + b
  • Adding more layers results in expanded neighbourhood smoothing:

control with highway gates W l

h, bl h

14 / 25

slide-15
SLIDE 15

Highway GCN: Control Neighbourhood Smoothing

1 2 3 4 5 6 7 8 9 10 50 100 200 400 800 #layers median error (km)

  • highway

+highway

layer gates: T( hl) = σ

  • W l

h

hl + bl

h

  • layer output:
  • hl+1 =

hl+1 ◦ T( hl) + hl ◦ (1 − T( hl))

  • weighted sum of layer input and output

15 / 25

slide-16
SLIDE 16

Neighbourhood Smoothing

0.5 0.5 0.5 0.5 1 0.5 0.5 0.25 0.25 0.25 0.25

Karin Mark Steven Tim Trevor Karin Mark Steven Tim Trevor

... ... ... ... ….

Text BoW: X

✕ Karin Mark Steven Tim Trevor

Normalised Adj. Matrix: A

Smoothing immediate neighbourhood: A · X smoothing expanded neighbourhood: A · A · X

16 / 25

slide-17
SLIDE 17

Sample Representation using t-SNE FeatConcat [X, A] 1 GCN A · X DCCA 2 GCN A · A · X

17 / 25

slide-18
SLIDE 18

Test Results: Median Error GeoText TwitterUS TwitterWorld 200 400 600 800 1000 Median Error in km

Text Social Hybrid

18 / 25

slide-19
SLIDE 19

Test Results: Median Error GeoText TwitterUS TwitterWorld 200 400 600 800 1000 Median Error in km

Text Social Hybrid Joint DCCA Joint FeatConcat Joint GCN

19 / 25

slide-20
SLIDE 20

Test Results: Median Error GeoText TwitterUS TwitterWorld 200 400 600 800 1000 Median Error in km

Text Social Hybrid Joint DCCA Joint FeatConcat Joint GCN Joint Miura et. al (2017) Joint Do et. al (2017)

20 / 25

slide-21
SLIDE 21

Top Features Learnt from Unlabelled Data (1% Supervision) Seattle, WA Austin, TX Jacksonville, FL Columbus, OH #goseahawks stubb unf laffayette smock gsd ribault #weareohio traffuck #meatsweats wahoowa #arcgis ferran lanterna wjct #slammin promissory pupper fscj #ouhc chowdown effaced floridian #cow ckrib #austin #jacksonville mommyhood #uwhuskies lmfbo #mer beering Top terms for a few regions detected by GCN using only 1% of Twitter-US for supervision. The terms that existed in labelled data are removed.

21 / 25

slide-22
SLIDE 22
  • Dev. Results: How much labelled data do we really have?

60 40 20 10 5 2 1 40 400 800 1,200 labelled data (%samples) median error (km) GCN DCCA FeatConcat

GeoText

100 50 20 10 5 2 1 50 500 1,000 1,500 labelled data (%samples) median error (km) GCN DCCA FeatConcat

Twitter-World

100 50 20 10 5 2 1 40 500 1,000 1,500 2,000 labelled data (%samples) median error (km) GCN DCCA FeatConcat

Twitter-US

GeoText TwitterUS TwitterWorld 500 1000 1500 2000 2500 Median Error in km Joint DCCA 1% Joint FeatConcat 1% Joint GCN 1%

Test results with 1% labelled data

22 / 25

slide-23
SLIDE 23

Confusion Matrix Between True Location and Predicted Location

ME MA RI NH VT CT NY NJ DE MD PA VA NC SC WV OH FL GA MI KY IN AL TN WI IL MS LA MO AR MN IA KS NE OK TX SD ND WY CO NM UT MT AZ ID NV CA WA OR Predicted OR WA CA NV ID AZ MT UT NM CO WY ND SD TX OK NE KS IA MN AR MO LA MS IL WI TN AL IN KY MI GA FL OH WV SC NC VA PA MD DE NJ NY CT VT NH RI MA ME True

Users from smaller states are misclassified in nearby larger states such as TX, NY, CA, and OH. Users from FL are misclassified in several other states possibly because they are not born in FL, and are well connected to their hometowns in other states.

23 / 25

slide-24
SLIDE 24

Conclusion Simple concatenation in FeatConcat is a strong baseline with large amounts of labelled data. GCN performs well with both large and small amounts of labelled data by effectively using unlabelled data. Gating mechanisms (e.g. highway gates) are essential for controlling neighbourhood smoothing in GCN with multiple layers. The models proposed here are applicable to other demographic inference tasks.

24 / 25

slide-25
SLIDE 25

Thank you!

Code available at: https://github.com/afshinrahimi/geographconv

25 / 25