1/17
RISIS / Working with geographical data
UPEM geocoding and clustering methods
applied to EUPRO FP3 subdataset
Lionel Villard, Michel Revollo 10/09/2015
UPEM geocoding and clustering methods applied to EUPRO FP3 - - PowerPoint PPT Presentation
RISIS / Working with geographical data UPEM geocoding and clustering methods applied to EUPRO FP3 subdataset Lionel Villard, Michel Revollo 10/09/2015 1/17 Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further
1/17
RISIS / Working with geographical data
Lionel Villard, Michel Revollo 10/09/2015
2/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
3/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
4/17
We chose to use two different attributes: sAddress_orig : complete addresses, with eventually a building names, postal codes, cities, countries 19 710 objects 5 without address (excluded) % 4 with only a country in the address (excluded) sCity and ISO country names : 95,8 % with a city name We tried to use postal code: not accurate with the batchgeocode geocoding engine.
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
5/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
6/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
7/17
accuracy % Geocoded Addr LabelAccuracy 1 0% Country level 2 0% Region (state, province, prefecture, etc.) level 3 51% Sub-region (county, municipality, etc.) level 4 0% Town (city, village) level 5 13% Post code (zip code) level 6 0% Street level 7 0% Intersection level 8 6% Address level 9 37% Premise (building name, property name, shopping center, etc.) level Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
8/17
Sources of addresses, accuracies and geocoded addresses
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
9/17
Top 10 : geocoded addresses per country
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
10/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
11/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
12/17
Main goal : convert points group by a unique cluster key into areas delimited by boundaries Using Minimum Convex Polygons (MPC or convex hull) of the software Geospatial Modelling Environment (Hawthorne Beyer, 2014) Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
13/17
Main goals : finding a relevant name for each cluster (readable and easily understandable name, not which does not depend on the data) identifying the core cities of the clusters Method : geographical intersection of two layers populated Places : layer of points for cities produced by Natural Earth project (Fourth Edition, Oct. 2009-2012, mainly members of North American Cartographic Information Society)
many capitals, major cities and towns, plus a sampling of smaller towns in sparsely inhabited regions
Cluster s shapes : layer of shapes for clusters
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
14/17
All the 7323 cities with population (2012) All clusters shapes Selected cities with population inside clusters shapes
Geoprocessing : intersection
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
15/17
Building the cluster s name by a popularity criteria : names of the core cities are ordered by population
IdClusterD ClustAddr ClustName 1 1034 Athens / Piraievs 2 362 Lisbon 3 77 Valencia 4 466 Madrid 5 120 Thessaloniki 6 197 Barcelona 7 562 Rome / Vatican City 8 121 Toulouse 9 112 Montpellier 10 75 Pisa 11 97 Florence 12 88 Genoa 13 82 Bologna 14 150 Turin 15 159 Grenoble 16 308 Milan 17 130 Lyon 18 272 Munich 19 79 Vienna 20 2552 Paris / Versailles
IdClusterD ClustAddr ClustName stOrg NbOrgAdd Pc 42 1662 Kobenhavn / Malmo / Roskilde Technical University of Denmark - Danmarks Tekniske 97 5,84% 42 1662 Kobenhavn / Malmo / Roskilde University of Copenhagen - Koebenhavns Universitet (KU) 91 5,48% 25 1110 Brussels / Namur Katholieke Universiteit Leuven 108 9,73% 25 1110 Brussels / Namur Universite catholique de Louvain 73 6,58% 1 1034 Athens / Piraievs National Technical University of Athens (NTUA) 87 8,41% 7 562 Rome / Vatican City Universitá di Roma La Sapienza, University of Rome La Sapienza 40 7,12% 30 531 Essen / Wuppertal Ruhr-Universität Bochum 29 5,46% 4 466 Madrid UPM Universidad Politecnica de Madrid/Madrid Polytechnical 55 11,80% 4 466 Madrid CSIC - Consejo Superior de Investigaciones Cientificas/Higher 52 11,16% 4 466 Madrid UCM Universidad Complutense de Madrid 49 10,52%
Examples of cluster names Main organisations in proportion of addresses in the clusters
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
16/17
100 % 75 % Clust 9298 50,3% 10446 56,5% Hclut 9187 49,7% 8039 43,5% Total 18485 18485
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges
17/17
Goals / Sources / Geocoding / Filtering / Clustering / Boundaries / Naming / Further challenges