Semi-supervised Geolocation via Graph Convolutional Networks Afshin - PowerPoint PPT Presentation

Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25

Location Lost in Translation impacts impacts Social Media Society 2 / 25

Applications: Public Health Monitoring Allergy Rates (Paul and Dredze, 2011) 3 / 25

Applications: Emergency Situation Awareness: Bushfires, Floods and Earthquakes Fight bushfire with #fire: Alert hospital before anybody calls (Cameron et al., 2012) 4 / 25

Location Location Location Profile 64% GPS 1% Nothing 35% 0 50 100 %users Profile field is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens, 2014), and biased toward younger urban users (Pavalanathan and Eisenstein, 2015) 5 / 25

Geolocation: The three Ls Ain’t this place a geographical oddity; two weeks away from everywhere! Language Link Geolocation Location User geolocation is the task of identifying the “home” location of a social media user using contextual information such as geographical variation in language use and in social interactions . 6 / 25

Huge amounts of unlabelled data, little labelled data Multiple views of Data: Text, Network 7 / 25

Previous Work (not exhaustive) Text-based Supervised Classification Network-based Semi-supervised Regression Backstrom et al. (2008) Backstrom et al. (2010) Cheng et al. (2010) Davis Jr et al. (2011) Wing and Baldridge (2011, 2014) Jurgens (2013) No Text No Network Joint/Hybrid Text+Network Rahimi et al. (2015) Do et al. (2017) Don’t utilise unlabelled text data Miura et al. (2017) Our work: Text+Network Semi-supervised Geolocation 8 / 25

Twitter Geolocation Datasets 8 1 0 38m users 12m tweets 7 1 0 1.4m 6 1 0 420k 370k 5 1 0 9k 4 1 0 3 1 0 GeoText TwitterUS TwitterWorld 9 / 25

Discretisation of Labels Cluster continuous lat/lon: cluster ids are labels. Use the median training point of the predicted region as the final continuous prediction. Evaluate using Mean and Median errors between the known and the predicted coordinates. 10 / 25

Text and Network Views of Data Tim Mark Karin Steven Trevor Karin Mark Tim Melbourne #ACL2018 0.5 0 0 0 0.5 Karin Mark 0 0.5 0 0 0.5 Karin ... 0 0 1 0 0 Steven Steven ... Mark ... 0 0 0 0.5 0.5 Tim Tim ... 0.25 0.25 0 0.25 0.25 Trevor Trevor …. Normalised Adj. Matrix: A Text BoW: X Trevor Steven @-mention Graph Two users are connected if they have a common @-mention. 11 / 25

Baseline 1: FeatConcat Concatenate A and X , and feed them to a DNN: Y = f ([ X , A ]) The dimensions of A , and consequently the number of parameters grow with the number of samples. 12 / 25

Baseline 2: DCCA maximally correlated predicted location: ˆ y CCA loss backprop FC softmax FC linear FC ReLU FC sigmoid A : Neighbours X : text BoW Unsupervised DCCA Supervised Geolocation Learn a shared representation using Deep Canonical Correlation Analysis (Andrew et al., 2013): cov ( f 1 ( X ) , f 2 ( A )) ρ = corr ( f 1 ( X ) , f 2 ( A )) = √ var ( f 1 ( X )) . var ( f 2 ( A )) Y = f ([ f 1 ( X ) , f 2( A )]) 13 / 25

Proposed Model: GCN predicted location: ˆ y softmax H l Output GCN: W l , b l A tanh H l − 1 Highway GCN: , W l − 1 , b l − 1 , W l − 1 , b l − 1 h h A tanh H 1 Highway GCN: W 1 , b 1 , W 1 h , b 1 h A H 0 X � � H ( l +1) = ReLU AH ( l ) W ( l ) + b GCN Layer: Adding more layers results in expanded neighbourhood smoothing: control with highway gates W l h , b l h 14 / 25

Highway GCN: Control Neighbourhood Smoothing 800 -highway +highway median error (km) 400 200 100 50 1 2 3 4 5 6 7 8 9 10 #layers � � h l + b l T ( � h � h l ) = σ W l layer gates: h h l +1 = � � h l +1 ◦ T ( � h l ) + � h l ◦ (1 − T ( � h l )) layer output: � �� weighted sum of layer input and output 15 / 25

Neighbourhood Smoothing Trevor Steven Karin Mark Tim 0.5 0 0 0 0.5 ... Karin Karin 0 0.5 0 0 0.5 ... Mark Mark 0 0 1 0 0 ... Steven Steven ✕ 0 0 0 0.5 0.5 ... Tim Tim 0.25 0.25 0 0.25 0.25 …. Trevor Trevor Normalised Adj. Matrix: A Text BoW: X Smoothing immediate neighbourhood: A · X smoothing expanded neighbourhood: A · A · X 16 / 25

Sample Representation using t-SNE FeatConcat [X, A ] DCCA 2 GCN A · A · X 1 GCN A · X 17 / 25

Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km 600 400 200 0 GeoText TwitterUS TwitterWorld 18 / 25

Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km Joint DCCA Joint FeatConcat 600 Joint GCN 400 200 0 GeoText TwitterUS TwitterWorld 19 / 25

Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km Joint DCCA Joint FeatConcat 600 Joint GCN Joint Miura et. al (2017) 400 Joint Do et. al (2017) 200 0 GeoText TwitterUS TwitterWorld 20 / 25

Top Features Learnt from Unlabelled Data (1% Supervision) Seattle, WA Austin, TX Jacksonville, FL Columbus, OH #goseahawks stubb unf laffayette smock gsd ribault #weareohio traffuck #meatsweats wahoowa #arcgis ferran lanterna wjct #slammin promissory pupper fscj #ouhc chowdown effaced floridian #cow ckrib #austin #jacksonville mommyhood #uwhuskies lmfbo #mer beering Top terms for a few regions detected by GCN using only 1% of Twitter-US for supervision. The terms that existed in labelled data are removed. 21 / 25

Dev. Results: How much labelled data do we really have? GCN GCN DCCA DCCA 1 , 200 2 , 000 FeatConcat FeatConcat median error (km) median error (km) 1 , 500 800 1 , 000 400 500 40 40 1 2 5 10 20 40 60 1 2 5 10 20 50 100 labelled data (%samples) labelled data (%samples) GeoText Twitter-US 1 , 500 GCN 2500 DCCA Joint DCCA 1% FeatConcat Joint FeatConcat 1% 2000 Joint GCN 1% median error (km) Median Error in km 1 , 000 1500 1000 500 500 50 0 1 2 5 10 20 50 100 GeoText TwitterUS TwitterWorld labelled data (%samples) Test results with 1% labelled data Twitter-World 22 / 25

Confusion Matrix Between True Location and Predicted Location ME MA RI NH VT CT NY NJ DE MD PA VA NC SC WV OH FL Users from smaller states are misclassified in nearby larger GA MI KY states such as TX, NY, CA, and OH. IN AL TN True WI IL MS LA Users from FL are misclassified in several other states possibly MO AR MN IA because they are not born in FL, and are well connected to KS NE OK their hometowns in other states. TX SD ND WY CO NM UT MT AZ ID NV CA WA OR ME MA RI NH VT CT NY NJ DE MD PA VA NC SC WV OH FL GA MI KY IN AL TN WI IL MS LA MO AR MN IA KS NE OK TX SD ND WY CO NM UT MT AZ ID NV CA WA OR Predicted 23 / 25

Conclusion Simple concatenation in FeatConcat is a strong baseline with large amounts of labelled data. GCN performs well with both large and small amounts of labelled data by effectively using unlabelled data. Gating mechanisms (e.g. highway gates) are essential for controlling neighbourhood smoothing in GCN with multiple layers. The models proposed here are applicable to other demographic inference tasks. 24 / 25

Thank you! Code available at: https://github.com/afshinrahimi/geographconv 25 / 25

Semi-supervised Geolocation via Graph Convolutional Networks Afshin - PowerPoint PPT Presentation

Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25 Location Lost in Translation impacts impacts Social Media Society 2 / 25 Applications: Public Health Monitoring

Shoestring: Graph-Based Semi- Supervised Classification with Severely Limited Labeled Data Wanyu

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning Qimai Li, Zhichao

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton,

On the Accuracy of Country-Level IP Geolocation Ioana Livadariu , Thomas Dreibholz, Anas Saeed

.Im a little bit weak .... because some thingsre like in my head yeah but I cant

Japan Mexico Workshop on Pharmacobiology and Nanobiology 25. 27.02.2009, At

Universidad Veracruzana Universidad Veracruzana Universidad Veracruzana Veracruz state

Canadian Tire Corporation Investor presentation November 2011 November 2011 Forward looking

Rethinking Humane Care for Humans Trivial, Superficial, Unrealistic or Essential ? Carol

Mens Labor Migration and Womens Health and Mortality in Rural Mozambique Victor Agadjanian ,

Our adventure . It began in Hastings in 2018. Foreword In March 2018, East Sussex based

Restrictive Learning with Distributions over Underlying Representations Karen Jesney, Joe Pater