semi supervised geolocation via graph convolutional
play

Semi-supervised Geolocation via Graph Convolutional Networks Afshin - PowerPoint PPT Presentation

Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25 Location Lost in Translation impacts impacts Social Media Society 2 / 25 Applications: Public Health Monitoring


  1. Semi-supervised Geolocation via Graph Convolutional Networks Afshin Rahimi, Trevor Cohn and Tim Baldwin July 16, 2018 1 / 25

  2. Location Lost in Translation impacts impacts Social Media Society 2 / 25

  3. Applications: Public Health Monitoring Allergy Rates (Paul and Dredze, 2011) 3 / 25

  4. Applications: Emergency Situation Awareness: Bushfires, Floods and Earthquakes Fight bushfire with #fire: Alert hospital before anybody calls (Cameron et al., 2012) 4 / 25

  5. Location Location Location Profile 64% GPS 1% Nothing 35% 0 50 100 %users Profile field is noisy (Hecht et. al, 2011), GPS data is scarce (Hecht and Stephens, 2014), and biased toward younger urban users (Pavalanathan and Eisenstein, 2015) 5 / 25

  6. Geolocation: The three Ls Ain’t this place a geographical oddity; two weeks away from everywhere! Language Link Geolocation Location User geolocation is the task of identifying the “home” location of a social media user using contextual information such as geographical variation in language use and in social interactions . 6 / 25

  7. Huge amounts of unlabelled data, little labelled data Multiple views of Data: Text, Network 7 / 25

  8. Previous Work (not exhaustive) Text-based Supervised Classification Network-based Semi-supervised Regression Backstrom et al. (2008) Backstrom et al. (2010) Cheng et al. (2010) Davis Jr et al. (2011) Wing and Baldridge (2011, 2014) Jurgens (2013) No Text No Network Joint/Hybrid Text+Network Rahimi et al. (2015) Do et al. (2017) Don’t utilise unlabelled text data Miura et al. (2017) Our work: Text+Network Semi-supervised Geolocation 8 / 25

  9. Twitter Geolocation Datasets 8 1 0 38m users 12m tweets 7 1 0 1.4m 6 1 0 420k 370k 5 1 0 9k 4 1 0 3 1 0 GeoText TwitterUS TwitterWorld 9 / 25

  10. Discretisation of Labels Cluster continuous lat/lon: cluster ids are labels. Use the median training point of the predicted region as the final continuous prediction. Evaluate using Mean and Median errors between the known and the predicted coordinates. 10 / 25

  11. Text and Network Views of Data Tim Mark Karin Steven Trevor Karin Mark Tim Melbourne #ACL2018 0.5 0 0 0 0.5 Karin Mark 0 0.5 0 0 0.5 Karin ... 0 0 1 0 0 Steven Steven ... Mark ... 0 0 0 0.5 0.5 Tim Tim ... 0.25 0.25 0 0.25 0.25 Trevor Trevor …. Normalised Adj. Matrix: A Text BoW: X Trevor Steven @-mention Graph Two users are connected if they have a common @-mention. 11 / 25

  12. Baseline 1: FeatConcat Concatenate A and X , and feed them to a DNN: Y = f ([ X , A ]) The dimensions of A , and consequently the number of parameters grow with the number of samples. 12 / 25

  13. Baseline 2: DCCA maximally correlated predicted location: ˆ y CCA loss backprop FC softmax FC linear FC ReLU FC sigmoid A : Neighbours X : text BoW Unsupervised DCCA Supervised Geolocation Learn a shared representation using Deep Canonical Correlation Analysis (Andrew et al., 2013): cov ( f 1 ( X ) , f 2 ( A )) ρ = corr ( f 1 ( X ) , f 2 ( A )) = √ var ( f 1 ( X )) . var ( f 2 ( A )) Y = f ([ f 1 ( X ) , f 2( A )]) 13 / 25

  14. Proposed Model: GCN predicted location: ˆ y softmax H l Output GCN: W l , b l A tanh H l − 1 Highway GCN: , W l − 1 , b l − 1 , W l − 1 , b l − 1 h h A tanh H 1 Highway GCN: W 1 , b 1 , W 1 h , b 1 h A H 0 X � � H ( l +1) = ReLU AH ( l ) W ( l ) + b GCN Layer: Adding more layers results in expanded neighbourhood smoothing: control with highway gates W l h , b l h 14 / 25

  15. Highway GCN: Control Neighbourhood Smoothing 800 -highway +highway median error (km) 400 200 100 50 1 2 3 4 5 6 7 8 9 10 #layers � � h l + b l T ( � h � h l ) = σ W l layer gates: h h l +1 = � � h l +1 ◦ T ( � h l ) + � h l ◦ (1 − T ( � h l )) layer output: � �� � weighted sum of layer input and output 15 / 25

  16. Neighbourhood Smoothing Trevor Steven Karin Mark Tim 0.5 0 0 0 0.5 ... Karin Karin 0 0.5 0 0 0.5 ... Mark Mark 0 0 1 0 0 ... Steven Steven ✕ 0 0 0 0.5 0.5 ... Tim Tim 0.25 0.25 0 0.25 0.25 …. Trevor Trevor Normalised Adj. Matrix: A Text BoW: X Smoothing immediate neighbourhood: A · X smoothing expanded neighbourhood: A · A · X 16 / 25

  17. Sample Representation using t-SNE FeatConcat [X, A ] DCCA 2 GCN A · A · X 1 GCN A · X 17 / 25

  18. Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km 600 400 200 0 GeoText TwitterUS TwitterWorld 18 / 25

  19. Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km Joint DCCA Joint FeatConcat 600 Joint GCN 400 200 0 GeoText TwitterUS TwitterWorld 19 / 25

  20. Test Results: Median Error 1000 Text Social 800 Hybrid Median Error in km Joint DCCA Joint FeatConcat 600 Joint GCN Joint Miura et. al (2017) 400 Joint Do et. al (2017) 200 0 GeoText TwitterUS TwitterWorld 20 / 25

  21. Top Features Learnt from Unlabelled Data (1% Supervision) Seattle, WA Austin, TX Jacksonville, FL Columbus, OH #goseahawks stubb unf laffayette smock gsd ribault #weareohio traffuck #meatsweats wahoowa #arcgis ferran lanterna wjct #slammin promissory pupper fscj #ouhc chowdown effaced floridian #cow ckrib #austin #jacksonville mommyhood #uwhuskies lmfbo #mer beering Top terms for a few regions detected by GCN using only 1% of Twitter-US for supervision. The terms that existed in labelled data are removed. 21 / 25

  22. Dev. Results: How much labelled data do we really have? GCN GCN DCCA DCCA 1 , 200 2 , 000 FeatConcat FeatConcat median error (km) median error (km) 1 , 500 800 1 , 000 400 500 40 40 1 2 5 10 20 40 60 1 2 5 10 20 50 100 labelled data (%samples) labelled data (%samples) GeoText Twitter-US 1 , 500 GCN 2500 DCCA Joint DCCA 1% FeatConcat Joint FeatConcat 1% 2000 Joint GCN 1% median error (km) Median Error in km 1 , 000 1500 1000 500 500 50 0 1 2 5 10 20 50 100 GeoText TwitterUS TwitterWorld labelled data (%samples) Test results with 1% labelled data Twitter-World 22 / 25

  23. Confusion Matrix Between True Location and Predicted Location ME MA RI NH VT CT NY NJ DE MD PA VA NC SC WV OH FL Users from smaller states are misclassified in nearby larger GA MI KY states such as TX, NY, CA, and OH. IN AL TN True WI IL MS LA Users from FL are misclassified in several other states possibly MO AR MN IA because they are not born in FL, and are well connected to KS NE OK their hometowns in other states. TX SD ND WY CO NM UT MT AZ ID NV CA WA OR ME MA RI NH VT CT NY NJ DE MD PA VA NC SC WV OH FL GA MI KY IN AL TN WI IL MS LA MO AR MN IA KS NE OK TX SD ND WY CO NM UT MT AZ ID NV CA WA OR Predicted 23 / 25

  24. Conclusion Simple concatenation in FeatConcat is a strong baseline with large amounts of labelled data. GCN performs well with both large and small amounts of labelled data by effectively using unlabelled data. Gating mechanisms (e.g. highway gates) are essential for controlling neighbourhood smoothing in GCN with multiple layers. The models proposed here are applicable to other demographic inference tasks. 24 / 25

  25. Thank you! Code available at: https://github.com/afshinrahimi/geographconv 25 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend