so far away and yet so close augmenting toponym
play

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and - PowerPoint PPT Presentation

So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz, Johanna Gei and Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group,


  1. So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks Andreas Spitz, Johanna Geiß and Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group, Heidelberg { spitz, geiss, gertz } @informatik.uni-heidelberg.de 3rd GeoRich Workshop San Francisco, June 26, 2016

  2. Motivation Network Construction Network Properties Toponym Disambiguation Summary Implicit Networks Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 1 of 18

  3. Motivation Network Construction Network Properties Toponym Disambiguation Summary Implicit Text-Based Networks “Most of the circuits currently in use are specially constructed for competition. The current street circuits are Monaco , Mel- bourne , Montreal , Singapore and Sochi , although races in other urban locations come and go ( Las Vegas and Detroit , for example) and proposals for such races are often discussed – most recently New Jersey .” en.wikipedia.org/wiki/Formula One Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 2 of 18

  4. Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18

  5. Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18

  6. Motivation Network Construction Network Properties Toponym Disambiguation Summary Graph Extraction from Text s ( v, w ) := distance in sentences between toponyms v and w � − s ( v, w ) � d ( v, w ) := exp 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 3 of 18

  7. Motivation Network Construction Network Properties Toponym Disambiguation Summary Edge Aggregation Distance-based cosine for nodes v and w : � i d i ( v ) d i ( w ) dicos ( v, w ) := �� i d i ( v ) 2 �� i d i ( w ) 2 Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 4 of 18

  8. Motivation Network Construction Network Properties Toponym Disambiguation Summary Nonreciprocal Relationships Dirk Beyer, Wikimedia Commons Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 5 of 18

  9. Motivation Network Construction Network Properties Toponym Disambiguation Summary Inducing Edge Directions Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 6 of 18

  10. Motivation Network Construction Network Properties Toponym Disambiguation Summary Inducing Edge Directions Normalize weights of outgoing edges: dicos ( v, w ) ω ( v → w ) := � x ∈ V dicos ( v, x ) Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 6 of 18

  11. Motivation Network Construction Network Properties Toponym Disambiguation Summary Adding Knowledge Base Support: Wikidata Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 7 of 18

  12. Motivation Network Construction Network Properties Toponym Disambiguation Summary Toponym Extraction in Wikipedia & Wikidata Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 8 of 18

  13. Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Overview Network statistics: | V | | E | density clustering coefficient 6 . 8 · 10 − 4 723 , 779 178 , 890 , 238 0.56 Node types: Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 9 of 18

  14. Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Overview Network statistics: | V | | E | density clustering coefficient 6 . 8 · 10 − 4 723 , 779 178 , 890 , 238 0.56 Node types: Wikidata location hierarchy: Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 9 of 18

  15. Motivation Network Construction Network Properties Toponym Disambiguation Summary Network Properties % of remaining edges clustering coefficient 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● 0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● network metric 25 ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 number of components assortativity ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40000 ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 20000 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 dicos threshold Augmenting Toponym Disambiguation with Text-Based Networks Andreas Spitz 10 of 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend