Geodemographic Dept. of Geography and Planning Classifications - - PowerPoint PPT Presentation

geodemographic
SMART_READER_LITE
LIVE PREVIEW

Geodemographic Dept. of Geography and Planning Classifications - - PowerPoint PPT Presentation

The Role of Geographical Alexandros Alexiou Alex Singleton Context in Building - Geodemographic Dept. of Geography and Planning Classifications University of Liverpool 23rd GIS Research UK conference, Leeds, April 2015 Summary


slide-1
SLIDE 1

The Role of Geographical Context in Building Geodemographic Classifications

23rd GIS Research UK conference, Leeds, April 2015 Alexandros Alexiou Alex Singleton

  • Dept. of Geography and Planning

University of Liverpool

slide-2
SLIDE 2

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Summary

 Introduction to Geodemographic Classifications  Research Outline  Methodology and Data  Case studies  Results and Discussion

slide-3
SLIDE 3

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Introduction

A Geodemographic Classification (GC) is a data reduction technique that aims to generate through spatial profiling, clusters of populations that share similarities across multiple socio-economic and build environment attributes.

Their composition differs based on the intended stakeholders’ perspective as well as the skills, experience and available data of the creator.

Webber, 1977: pragmatic strategy; what is deemed to work and what is required, alongside some degree of empirical evaluation.

Among the conventional classification systems :

Proprietary classifications primarily designed to describe consumption patterns. Databases are populated not only with census data but compiled from large consumer databases such as credit checking histories, product registrations and private surveys.

MOSAIC (Experian), ACORN (CACI), P2 People and Places (BD), Claritas (PRiZM) and EuroDirect (CAMEO).

Public/Open Classifications: ONS Output Area Classification (OAC) 2001 and 2011.

Similar products have also been created in academia.

slide-4
SLIDE 4

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Introduction

Geodemographic classifications create a typology that is usually presented as a hierarchy; clusters produce varying tiers of aggregated areas.

Cluster names are described usually through pen portraits. An example from the 2011 OAC:

A top-down approach includes the creation of larger groups that are subsequently divided into smaller sub-groups. E.g. for the 2001 OAC, 7 super-groups split into 21 groups and further into 52 sub-groups.

A bottom-up approach includes the creation of numerous smaller groups, aggregated based on their similarities into larger groups (typically with hierarchical algorithms such as Ward’s clustering criterion).

Common clustering techniques used as classifiers:

K-means clustering

Self-Organizing Maps (SOM)

Fuzzy logic algorithms or “soft” classifiers

1 – Rural residents 5a1 – White professionals 2 – Cosmopolitans 5a – Urban professionals and families 5a2 – Multi-ethnic professionals with families 3 – Ethnicity central 5a3 – Families in terraces and flats 4 – Multicultural metropolitans 5 – Urbanites 6 – Suburbanites 5b1 – Delayed retirement 7 – Constrained city dwellers 5b – Ageing urban living 5b2 – Communal retirement 8 – Hard-pressed living 5b3 – Self-sufficient retirement

slide-5
SLIDE 5

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Research Outline

Main research question:

Can conventional national classifications be applied locally with satisfactory results?

If so, to what extent? what is the degree of differentiation?

How can this differentiation be measured effectively?

Rationale:

Conventional national classifications may not account for local socio-spatial patterns, increasing the risk of mistargeting when applied locally.

National aggregations sweep away contextual differences between proximal zones.

Researchers without the necessary expertise may find it difficult to produce specific- purpose GCs ad hoc. General-purpose classifications are more convenient to use.

Such debate is long withstanding, originating in the earliest of UK classifications (see Openshaw, Cullingford and Gillard, 1980 and Webber, 1980).

slide-6
SLIDE 6

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Methodology and Data

This research uses a set of fixed input attributes for Output Area zonal geography to build classifications with different geographic context.

For this purpose, a number of geographic contexts are considered (local, regional, national) to demonstrate the impact on final classification outcome when input variables are kept constant.

In order to demonstrate how much output classifications differ, we perform an analysis of the sets of classifications for Liverpool, Manchester and Leeds.

Creation:

Initial 60+ Census 2011 Variables from Demographic, Housing and Economic Activity attributes.

Output Area aggregation level for England (>170.000 neighbourhoods).

K-Means Clustering (Hartigan & Wong, 1979), single hierarchy (Supergroup Level).

Analysis carried out using the R software.

slide-7
SLIDE 7

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Methodology and Data

 K-Means Input Dataset

 Variable formatting:  “Unfit” data: Variable distribution and correlation checks.  Normalisation using Box-Cox Transformation:

Obtaining ratios per areal unit Percentages

where xa,i is the attribute value i of area a and Pa is the population of reference (denominator) of area a, i.e. total population, number of households, etc.

Standardised by group

where xa,i is the attribute value i of area a, rN,g is the observed national ratio N for group g and Pa,i is the population of group g in area a.

Normalisation Transformation Box – Cox

The power λ achieves the best normalization and can be estimated algorithmically.

Variable Scaling Z-Score Scaling

where xa,i is the attribute value i of area a, μS is the mean and σS is the standard deviation of the set of observations S.  Standardisation (for all three geographic scales seperately):

slide-8
SLIDE 8

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Methodology and Data

Final Dataset with Variable Definition: 2011 Census (ONS)

Demographic V1 Age0_4 Percentage of resident population aged 0–4 years V2 Age5_14 Percentage of resident population aged 5–14 years V3 Age15_24 Percentage of resident population aged 15-24 years V4 Age45_64 Percentage of resident population aged 45–64 years V5 Age65_ Percentage of resident population aged 65 or more years V6 Eth_Arab Percentage of people identifying as Arab V7 Eth_Black Percentage of people identifying as black African, black Caribbean or other black V8 Eth_Asian Percentage of people identifying as Indian, Pakistani, Bangladeshi, Chinese or Other Asian V9 Mar_Single Percentage of population over 16 years who are single Housing V10 Density Number of people per hectare V11 Ten_Rent Percentage of households that are private sector rented accommodation V12 Ten_Social Percentage of households that are public sector rented accommodation V13 House_Share Percentages of households that are shared accommodation V14 House_Flat Percentage of households which are flats V15 CeH_No Percentage of occupied household spaces without central heating Economic Activity V16 EA_Part Percentage of household representatives who are working part-time V17 EA_Unemp Percentage of household representatives who are unemployed V18 EA_Stud Percentage of household representatives who are students V19 Edu_Low Percentage of people over 16 years with some qualifications but not a HE qualification V20 Edu_HE Percentage of people over 16 years for which the highest level of qualification is level 4 qualifications and above V21 NS_Manager Percentage of household reference persons in higher managerial, administrative and professional occupations V22 NS_Semi Percentage of household reference persons in intermediate occupations V23 Ind_Agr Percentage of population aged 16-74 who work in the A, B and C industry sector V24 Ind_Man Percentage of population aged 16-74 who work in the D, E and F industry sector V25 Ind_Sales Percentage of population aged 16-74 who work in the G, H and I industry sector V26 Ind_Tech Percentage of population aged 16-74 who work in the K, L and M industry sector V27 Ind_Adm Percentage of population aged 16-74 who work in the N, O, P, Q, T, and U industry sector V28 Ind_Art Percentage of population aged 16-74 who work in the R and S industry sector Travel behavior V29 Car_0 Percentage of households with no car V30 Car_1 Percentage of households with 1 car V31 Car_3 Percentage of households with 3 or more cars V32 Tr_Public Percentage of population aged 16-74 who travel to work by public transport V33 Tr_Foot Percentage of population aged 16-74 who travel to work on foot or by bicycle

slide-9
SLIDE 9

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Methodology and Data

Currently there is no best practice to compare two different sets of classifications in

  • rder to find “best fits” between clusters (cluster IDs are assigned randomly):

Even if they derive from the same observations set S, a classification for a set of local

  • bservations L compared with a national classification derived form S will produce dissimilar

cluster assignments.

Two sources of cluster assignment variance:

Standardisation (for different geographical contexts, the mean μ and standard deviation σ changes)

Clustering process

We explore and illustrate the variation with a number of methods:

1.

Plotting the Cluster Mean Centres (attribute means) so we can assess the nature of the cluster (pen-portraits).

2.

Contingency Tables: cross-tabulating the cluster distribution frequencies.

3.

Mapping our results.

slide-10
SLIDE 10

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies

We compare 3 sets of classifications, one set for each case study, that were built using the same data set:

We compare outcomes based on k-means algorithm for 7 clusters:

  • 1. Radial plots to assess “attribute fit”.
  • 2. Cross-tabulation to assess “geographic fit”.

Geographic area Local Classification Regional Classification National Classification Liverpool Liverpool Local Authority North West England Manchester Greater Manchester Area North West England Leeds Leeds Local Authority Yorkshire and the Humber England

slide-11
SLIDE 11

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies

Constructing Pen Portraits

slide-12
SLIDE 12

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies - Liverpool

Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 3 203 61% Retired Communities 185 2 0% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 134 78% Suburban Living 306 4 52 17% Hard-Pressed Families 381 6 352 92% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster National OA Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0% 

Cross-Tabulation vs. Radial Plots

slide-13
SLIDE 13

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies - Liverpool

Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 3 203 61% Retired Communities 185 2 0% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 134 78% Suburban Living 306 4 52 17% Hard-Pressed Families 381 6 352 92% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster National OA Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0% 

Cross-Tabulation vs. Radial Plots

slide-14
SLIDE 14

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies - Liverpool

Liverpool Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 332 7 203 61% Retired Communities 185 3 0% Student Living 81 1 81 100% Striving Ethnic Workers 171 5 134 78% Suburban Living 306 6 52 17% Hard-Pressed Families 381 4 352 92% Young Cosmopolitans 128 2 36 28% Sum / Mean 1584 858 54.2% Liverpool Cluster Name OA Amount National Cluster

  • Nat. OA

Amount Cluster Similarity Urban Professionals 332 3 214 64% Retired Communities 185 2 9 5% Student Living 81 5 81 100% Striving Ethnic Workers 171 7 126 74% Suburban Living 306 4 103 34% Hard-Pressed Families 381 6 381 100% Young Cosmopolitans 128 1 36 28% Sum / Mean 1584 950 60.0% 

Cross-Tabulation vs. Radial Plots

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 Age0_4 Age5_14 Age15_24 Age45_64 Age65_ Car_0 Car_1 Car_3 CeH_No Density EA_Part EA_Unemp EA_Stud Eth_Asian Eth_Black Eth_Arab Edu_Low Edu_HE House_Flat NS_Manager NS_Semi Ten_Rent Ten_Social Mar_Single Mar_Married Tr_Public Tr_Foot Ind_Agr Ind_Man Ind_Sales Ind_Tech Ind_Adm Ind_Art

Asian Communities

Case Studies – G. Manchester

  • G. Manchester

Cluster Name OA Amount NW Cluster NW OA Amount Cluster Similarity Urban Professionals 2255 G 1 0.0% Asian Communities 546 Retired Communities 1 0.2% Student Living 360 A 359 99.7% Striving Ethnic Workers 864 E 724 83.8% Suburban Living 2202 F 945 42.9% Hard-Pressed Families 1638 D 1389 84.8% Young Cosmopolitans 819 B 764 93.3% Sum / Mean 8684 4183 48.2%

  • G. Manchester

Cluster Name OA Amount National Cluster

  • Nat. OA

Amount Cluster Similarity Urban Professionals 2255 B 1398 62.0% Asian Communities 546 Retired Communities 0.0% Student Living 360 G 287 79.7% Striving Ethnic Workers 864 F 547 63.3% Suburban Living 2202 E 1189 54.0% Hard-Pressed Families 1638 A 1614 98.5% Young Cosmopolitans 819 D 293 35.8% Sum / Mean 8684 5328 61.4% 

Cross-Tabulation vs. Radial Plots

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Case Studies - Leeds

Leeds Cluster Name OA Amount YH Cluster YH OA Amount Cluster Similarity Urban Professionals 682 C 461 67.6% Young & Single “Techies” 112 Retired Communities 0.0% Student Living 116 G 116 100.0% Striving Ethnic Workers 373 D 352 94.4% Suburban Living 340 E 300 88.2% Hard-Pressed Families 569 A 301 52.9% Young Cosmopolitans 351 B 340 96.9% Sum / Mean 2543 1870 73.5% Leeds Cluster Name OA Amount National Cluster

  • Nat. OA

Amount Cluster Similarity Urban Professionals 682 G 342 50.1% Young & Single "Techies" 112 Retired Communities 0.0% Student Living 116 D 115 99.1% Striving Ethnic Workers 373 F 253 67.8% Suburban Living 340 B 298 87.6% Hard-Pressed Families 569 E 470 82.6% Young Cosmopolitans 351 A 121 34.5% Sum / Mean 2543 1599 62.9% 

Cross-Tabulation vs. Radial Plots

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Geographic Sensitivity of geodemographic classifications is very difficult to assess, given the complexity of the problem. Some remarks:

Results and Discussion

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Age0_4 Age5_14 Age15_24 Age45_64 Age65_ Car_0 Car_1 Car_3 CeH_No Density EA_Part EA_Unemp EA_Stud Eth_Asian Eth_Black Eth_Arab Edu_Low Edu_HE House_Flat NS_Manager NS_Semi Ten_Rent Ten_Social Mar_Single Mar_Married Tr_Public Tr_Foot Ind_Agr Ind_Man Ind_Sales Ind_Tech Ind_Adm Ind_Art

Cluster Comparison - Hard-Pressed Households Liverpool Manchester Leeds

The notions of attribute fit and geographic fit are central to comparisons.

Attribute means do provide a basis for correlation between cluster pairs, however they do not account for the magnitude of deviation

  • f the OA attribute values from the mean.

Between geographic scales, formed clusters can be completely different in nature, making comparisons inconclusive.

Policy implications:

In-between classification comparisons: Small differentiation in attributes can demonstrate central tendencies of the local populations.

However actual socio-spatial patterns can in fact be very different.

When assessing spatial policies, upper hierarchies (i.e. Supergroup Level) from national classifications may not be suitable as they can produce misleading results.

slide-27
SLIDE 27

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Methodological Implications:

Standardising attributes directly affects cluster formation. Clusters at national scales appear more homogenous due to reduced absolutes distances.

I.e. for k = 7, the total variation lost (smoothed out) has a magnitude of ~ 9%.

A key research should focus on whether there are specific geographical contexts that maximise clustering efficiency to local variation, and how unique clusters can be handled.

Administrative boundaries do not necessarily reflect the actual organisation of communities.

For instance calculating geographic boundaries in non-Euclidian space.

Results and Discussion

slide-28
SLIDE 28

23rd GISRUK, Leeds, April 2015

SCHOOL OF ENVIRONMENTAL SCIENCES

Results and Discussion

 Future research and preliminary results (benchmark geographic boundaries)

 We use the angular similarity

 Benchmark results:

LA (Local Authority) Classification

  • vs. National Classification.

Standardised attributes per LA.

The aim is to produce geographic boundaries that maximize local efficiency, other than the arbitrary administrative boundaries.

Such boundaries can be used in any research regarding population dynamics (e.g. retail analysis) and can be made publicly available easily.

measure to compare cluster attribute means:

slide-29
SLIDE 29

Thank you for your time

a.alexiou@liv.ac.uk https://speakerdeck.com/dblalex

Acknowledgements: This work is funded as part of an ESRC PhD studentship and in collaboration with the Office for National Statistics

North West Doctoral Training Centre