Postal Code Conversion for Data Analysis An overview of the PCCF - - PDF document

postal code conversion for data analysis
SMART_READER_LITE
LIVE PREVIEW

Postal Code Conversion for Data Analysis An overview of the PCCF - - PDF document

26/11/2015 Postal Code Conversion for Data Analysis An overview of the PCCF and PCCF+ Saeeda Khan Michael Tjepkema Health Analysis Division, Statistics Canada December 1, 2015 www.statcan.gc.ca Outline 1. Postal codes Components of a


slide-1
SLIDE 1

26/11/2015 1

www.statcan.gc.ca

Postal Code Conversion for Data Analysis

An overview of the PCCF and PCCF+

Saeeda Khan Michael Tjepkema Health Analysis Division, Statistics Canada December 1, 2015

Outline

  • 1. Postal codes
  • Components of a postal code
  • Uses of small-area data
  • 2. Introduction to the Postal Code Conversion File

(PCCF) and the Postal Code Conversion File Plus (PCCF+)

  • 3. Single link indicator geocoding versus population-

weighting

  • 4. Why PCCF+?
  • 5. Limitations of PCCF & PCCF+

11/26/2015 Statistics Canada • Statistique Canada 2

slide-2
SLIDE 2

26/11/2015 2

  • 1. Postal Codes

11/26/2015 Statistics Canada • Statistique Canada 3

What are postal codes?

  • An identifier managed by Canada Post Corporation

for the efficient sorting and delivery of mail.

  • They are not created as units for the analysis or

mapping of population, business or dwelling characteristics.

  • However, postal codes are part of most

administrative data sets and are usually the only variable available for geographic identification

  • Thus, they are important identifiers for geocoding

11/26/2015 Statistics Canada • Statistique Canada 4

slide-3
SLIDE 3

26/11/2015 3

Components of a postal code

  • The postal code is a six-character alphanumeric code
  • Postal codes are not geographic attributes
  • Only spatial in that mail is delivered by geographic area
  • Six character code ‘ANA NAN’
  • First 3 – Forward Sortation Area (FSA)
  • Last 3 – Local Delivery Unit (LDU)

11/26/2015 Statistics Canada • Statistique Canada 5 Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

What is a postal code?

11/26/2015 Statistics Canada • Statistique Canada 6

ANA NAN

Province / Territory / Region First Character Newfoundland and Labrador A Nova Scotia B Prince Edward Island C New Brunswick E Eastern Québec G Metropolitan Montréal H Western Québec J Eastern Ontario K Central Ontario L Metropolitan Toronto M Southwestern Ontario N Northern Ontario P Manitoba R Saskatchewan S Alberta T British Columbia V Northwest Territories and Nunavut X Yukon Y

Forward Sortation Area Local Delivery Unit if 0 then rural if 1-9 then urban

slide-4
SLIDE 4

26/11/2015 4

Components of a postal code

11/26/2015 Statistics Canada • Statistique Canada 7

Components of a postal code

  • Local Delivery Unit (LDU)
  • Letter carrier delivery to ordinary urban address
  • Community mailbox
  • Apartment building
  • Business building
  • Large firm or organisation (Foothills Medical Centre:

T2N2T9; CBC: M5W 1E6)

  • Federal department or agency (Statistics Canada: K1A 0T6)
  • Mail delivery route (suburban, rural, or mobile)
  • General delivery and post office boxes (large or small)

11/26/2015 Statistics Canada • Statistique Canada 8 Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

slide-5
SLIDE 5

26/11/2015 5

Components of a postal code

Haydu G. The Postal Code – Geographic classification code conversion file, a tool for social science research. Paper presented at the 1979 annual meeting of the Canadian Association of Geographers, Victoria, BC, Canada. 11/26/2015 Statistics Canada • Statistique Canada 9

How can postal codes be used for analysis

  • Postal codes are part of most administrative data sets
  • PCCF, PCCF+, and related tools are now the standard
  • Allows for the conversion of address and postal code

attributes to standard geographical codes

  • Used in data collection, processing, and analysis, e.g.,

dissemination area (DA), census tract (CT), health region (HR)

  • Resulting small-area geography have a variety of uses
  • Familiarity with the methods, strengths, and

limitations will help researchers exploit the potential

11/26/2015 Statistics Canada • Statistique Canada 10

slide-6
SLIDE 6

26/11/2015 6

Uses of small area data

  • Add policy relevance by aggregating to admin areas
  • Health Regions, School Districts, etc…
  • Deal with changes over time (boundary shifts)
  • Assign neighbourhood socio-economic status (SES)

and other confounders

  • Determine point-distance, road distance, travel time
  • Allow for studies of migration over time (longitudinal)
  • Help in the imputation of missing data
  • Obtain additional identifiers for record linkage

11/26/2015 Statistics Canada • Statistique Canada 11

  • 2. Introduction to the PCCF and PCCF+

11/26/2015 Statistics Canada • Statistique Canada 12

slide-7
SLIDE 7

26/11/2015 7

What is the PCCF?

  • A flat file that links postal codes (active and retired)

to standard geographic areas

  • Allows for:
  • Association of postal codes to standard geographic areas
  • Selection of statistical units by geographic areas
  • Provides linkages (including a single link indicator

(SLI)) to block face (BF), dissemination block (DB), and dissemination area (DA)

  • However, some postal codes are only linked to post
  • ffice locations, many serve multiple DAs, and some

are non-residential (government offices, etc)

11/26/2015 Statistics Canada • Statistique Canada 13 Statistics Canada. Postal Codes Conversion File (PCCF), Reference Guide. Catalogue no. 92-153-G, no 02. Ottawa, ON: Statistics Canada, 2011.

What is the PCCF+?

  • The PCCF+ consists of:

1. SAS control program, 2. reference files primarily derived from the PCCF 3. postal code population-weight file derived from the Census of Population

  • Assigns geographic identifiers based on postal codes
  • Full diagnostic output (troublesome postal codes,

precision of geocoding, etc.)

  • Provides residential & institutional coding separately

11/26/2015 Statistics Canada • Statistique Canada 14 Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide: Automated geocoding based on the Statistics Canada Postal Code Conversion File. Catalogue no. 82F0086-XDB. Ottawa, ON: Statistics Canada, 2011.

slide-8
SLIDE 8

26/11/2015 8

Importance of Identifying Non-residential PCs

  • PCCF+ is able to identify non-residential postal codes
  • Government Offices, e.g., Statistics Canada
  • Coroners Offices
  • Children’s Aid Societies
  • Hospitals in a Birth File
  • Tax preparers office in a Tax File
  • UPS Store, Mailboxes Etc,

11/26/2015 Statistics Canada • Statistique Canada 15

How does the PCCF+ geocode postal codes?

  • Assigns geographic identifiers based on postal codes

in a staged approached:

  • 1. assigns 6-digit postal codes in rural areas to disseminations

areas (DA) and dissemination blocks (DB) using population- weighted random allocation

  • 2. assigns 6-digit postal codes with an exact match to a PCCF

unique record

  • 3. randomly assigns 6-digit postal codes with an exact match

to a PCCF duplicate record

  • 4. imputes full geography for the first 5-, first 4- and first 3-

digit postal codes using census population weights

  • 5. imputes partial geography for the first 2-digit postal codes

11/26/2015 Statistics Canada • Statistique Canada 16 Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide: Automated geocoding based on the Statistics Canada Postal Code Conversion File. Catalogue no. 82F0086-XDB. Ottawa, ON: Statistics Canada, 2011.

slide-9
SLIDE 9

26/11/2015 9

Uses of the PCCF and the PCCF+

  • A 2011 literature review for publications using the

PCCF and PCCF+ resulted in 622 publications

  • Health Sciences

463 (74%)

  • Social Sciences & Economics

93 (15%)

  • Education, data, & statistics

34 (6%)

  • Natural & applied sciences

12 (2%)

  • Other

20 (3%)

  • Articles appeared in 233 different journals, top two:
  • Canadian Medical Association Journal (23)
  • Canadian Journal of Public Health (19)

11/26/2015 Statistics Canada • Statistique Canada 17 Peller P. An analysis of the Postal Code Conversion File’s use in research. DLI research paper series, 2011. Calgary, AB: University of Calgary.

  • 3. PCCF-SLI vs. PCCF+

11/26/2015 Statistics Canada • Statistique Canada 18

slide-10
SLIDE 10

26/11/2015 10

Single-link (PCCF-SLI) vs. PCCF+

  • PCCF-SLI forces each postal code to be assigned to a

single dissemination area (DA) & dissemination block (DB), regardless of how large the actual service area may be

  • For most research purposes, the distribution of the

population across the entire service area is needed

  • PCCF+ uses a population-weighted method of

geocoding where multiple-matches are possible

  • As such, the distribution of respondents more accurately

reflects the underlying population

  • “Numerator-denominator consistency”

11/26/2015 Statistics Canada • Statistique Canada 19 11/26/2015 Statistics Canada • Statistique Canada 20

PCCF (SLI) PCCF+

Of 10 records reporting this postal code, all 10 will be assigned to DA 1 using the PCCF single link indicator (SLI) Of 10 records reporting this postal code, 6 will be assigned to DA 1, 3 to DA2 and 1 to DA 3 using the PCCF+ 10

A1A 1A1

DA 1 60% DA 2 30% DA 3 10%

A1A 1A1

6 3 1

A1A 1A1

slide-11
SLIDE 11

26/11/2015 11

Population assignment using PCCF-SLI

11/26/2015 Statistics Canada • Statistique Canada 21 Saskatchewan Manitoba Alberta

Population assignment using PCCF+

11/26/2015 Statistics Canada • Statistique Canada 22 Saskatchewan Manitoba Alberta

slide-12
SLIDE 12

26/11/2015 12

Population non-assignment via PCCF-SLI & PCCF+

11/26/2015 Statistics Canada • Statistique Canada 23

Geographic Unit PCCF-SLI PCCF+ # of Units Percent of Population # of Units Percent of Population DA 8,476 2.9 187 CT 73 0.1 7 CMA .. .. .. .. CSD 1,438 0.6 109 CD .. .. .. .. Percent of total 2006 census population in areas with no respondent assignment

Population assignment using PCCF-SLI

11/26/2015 Statistics Canada • Statistique Canada 24 Gatineau Ottawa

slide-13
SLIDE 13

26/11/2015 13

Population assignment using PCCF+

11/26/2015 Statistics Canada • Statistique Canada 25 Gatineau Ottawa

Population miss-assignment using PCCF-SLI & PCCF+

11/26/2015 Statistics Canada • Statistique Canada 26

Geographic Unit PCCF PCCF+ % of total population % of total population DA 37.4 7.6 CT 6.6 1.4 CMA 4.3 0.1 CSD 11.4 2.7 CD 1.1 0.3

Comparison of population coding errors using PCCF-SLI versus PCCF+ (5J)*

* Population coding errors are defined as the sum over all areas at this geographic level of the absolute value of the population coded less the population known from the census sample, expressed as a percentage of the total population in all areas at this level.

slide-14
SLIDE 14

26/11/2015 14

Limitation of SLI (e.g., 2001 Census Geography)

  • Over a third of the total population of rural and small

town Canada can never get the correct dissemination area (DA) code when using the PCCF SLI since nearly 11,000 DAs are never linked to postal codes when

  • nly the SLI is selected.
  • Also at the census subdivision (CSD) level, over a

quarter of all CSDs never get coded using SLI. In rural and small town Canada, nearly 30% of CSDs never get coded using the SLI.

11/26/2015 Statistics Canada • Statistique Canada 27

  • 4. Why PCCF+?

11/26/2015 Statistics Canada • Statistique Canada 28

slide-15
SLIDE 15

26/11/2015 15

Why PCCF+ and not regular PCCF (with SLI=1)?

  • 1. Population weighted approach
  • 2. Supplemental coding
  • 3. Postal codes less than perfect
  • 4. Documentation and diagnostics
  • 5. Modifiable SAS code
  • 6. Vintage of postal codes
  • 7. Postal codes used by residents for “incompletely

enumerated Indian Reserves”

11/26/2015 Statistics Canada • Statistique Canada 29

Why PCCF+? – 1: population weighting

  • Almost all rural and several urban categories of postal

code provide service to multiple dissemination areas (DAs), census subdivisions (CSDs), etc…

  • Use of the single link indicator (SLI) equal to 1 in PCCF

forces any occurrence of a postal code to only one set

  • f geocodes
  • Using single-link approach introduces systematic bias
  • PCCF+ probabilistically assigns each postal code

record using census derived population weights

11/26/2015 Statistics Canada • Statistique Canada 30

slide-16
SLIDE 16

26/11/2015 16

Why PCCF+? – 2: supplemental coding

  • ID, PCODE
  • PR, CD, CSD, CCSD
  • CMA, CT, MIZ, ER, FED
  • DA, BLK
  • BLKURB*, DPL*
  • LAT, LONG

11/26/2015 Statistics Canada • Statistique Canada 31 * Poorly coded and not recommended for analytic use

PCCF-SLI & PCCF+

Why PCCF+? – 2: supplemental coding

  • ID, PCODE
  • PR, CD, CSD, CCSD
  • CMA, CT, MIZ, ER, FED
  • DA, BLK
  • BLKURB*, DPL*
  • LAT, LONG
  • HR, AHR
  • QAIPPE, IMMTER
  • CSIZE, NSREL, AIRLIFT, AR
  • EA81uid, EA86uid, EA91uid EA96uid,

DA01uid, DA06uid, DA11uid

11/26/2015 Statistics Canada • Statistique Canada 32 * Poorly coded and not recommended for analytic use

PCCF-SLI & PCCF+ PCCF+

  • nly
slide-17
SLIDE 17

26/11/2015 17

Why PCCF+? – 3: postal codes less than perfect

  • Most files will include some postal codes that never

existed (reporting or data capture errors)

  • Sensitive files may omit the last digit of the postal

code

  • Some files may only contain the first 3 digits of the

postal code

  • PCCF+ can be used to geocode the above information

11/26/2015 Statistics Canada • Statistique Canada 33

Why PCCF+? – 4: documentation & diagnostics

  • Output is documented with user manual and version
  • Method has been validated in many publications
  • Diagnostic codes for problem codes are provided
  • Two outputs: Full file & Problem File

11/26/2015 Statistics Canada • Statistique Canada 34

DMT, DMTDIFF RPF, SERV, PREC LINK (PROB) BLG NAME + ADR SOURCE CSDNAME + TYPE NCSD, NCD CPCCODE RESFLG, INSTFLG

This variable provides a measure of the quality

  • f the geographic

coordinates assigned to the representative point

slide-18
SLIDE 18

26/11/2015 18

Why PCCF+? – 5: Modifiable SAS code

11/26/2015 Statistics Canada • Statistique Canada 35

  • Length of ID variable can be changed
  • SAS code can be easily tweaked so results are exactly

reproducible

  • Define a specific kernel for probabilistic assignment

/********************************************************************************************/ /* Random Seed Value */ /* If the seed value is 0 (default) then computer time is used */ /* Change this value as desired to use the same seed between PCCF+ trials */ %let seedVal=0;

Why PCCF+? – 6: “Vintage” of postal codes

  • PCCF+ assigns full census geography for most recent

census year

  • It also assigns dissemination (DA) area or

enumeration area (EA) from each previous census back to 1981

  • Useful for time-varying analysis
  • For higher levels of vintage geography (e.g., CMA) use

the Geographic Attributes File (GAF) or the Geographic Tape File (GTF)

11/26/2015 Statistics Canada • Statistique Canada 36

slide-19
SLIDE 19

26/11/2015 19

Why PCCF+? – 7: Indian Reserves

  • Your file includes postal codes used by residents of

“incompletely enumerated Indian Reserves”

  • These postal codes will not properly be coded by

PCCF-SLI

  • PCCF+ includes census population weights adjusted

to account for estimates of the population living on the incompletely enumerated reserves

11/26/2015 Statistics Canada • Statistique Canada 37

Summary: PCCF+ vs PCCF-SLI

  • Consider using PCCF+ rather than PCCF-SLI if any of

the following apply

  • You want to do better coding in rural areas
  • You want to use variables present on the PCCF+ which are

not present in regular PCCF

  • Your file is less than perfect with respect to postal codes
  • You want help to evaluate the quality of the postal code on

your data file

  • The “vintage” of the postal codes on your file spans more

than one census

  • If your file includes postal codes used by residents of

“incompletely enumerated Indian Reserves”

11/26/2015 Statistics Canada • Statistique Canada 38

slide-20
SLIDE 20

26/11/2015 20

  • 4. Limitation of the PCCF-SLI & the PCCF+

11/26/2015 Statistics Canada • Statistique Canada 39

Limitations with PCCF-SLI & PCCF+

  • In rural areas and at urban fringe, probabilistic

assignment leads to random misclassification of dissemination area (DA) and neighbourhood income quintiles

  • Reduced ability to detect effects in rural areas
  • Lower risk ratios (RRs) and risk differences (RDs) for

epidemiologic studies

  • This is effect modification not confounding, so it is

recommended to stratify analysis by urban & rural

  • Take care in interpreting lower effect estimates in

rural versus urban areas

11/26/2015 Statistics Canada • Statistique Canada 40

slide-21
SLIDE 21

26/11/2015 21

Limitations with PCCF and PCCF+

  • Postal codes may change over time

1. Many technical changes to address ranges

  • Usually no change at block-face of block level
  • Very little change at higher levels

2. Some reuse of retired postal codes within same FSA 3. Two FSA in British Columbia moved in mid-90s

  • Generally, these changes translate to
  • no change of the block face (BF) or dissemination block (DB)

latitude/longitude

  • very little change at higher levels (dissemination area (DA),

census tract (CT), etc.)

  • Moral – code as received and interpret the output

11/26/2015 Statistics Canada • Statistique Canada 41

Concluding remarks

  • Small-area geography & spatial coordinates are part
  • f most data sets and useful in most studies
  • Familiarity with methods, limitations, and

interpretation of data helps researchers more meaningfully exploit data potential

  • It is not enough to use the data mechanically, users

need to think about what they are doing and why

  • Consult the PCCF+ documentation

11/26/2015 Statistics Canada • Statistique Canada 42

slide-22
SLIDE 22

26/11/2015 22

Thank you!

  • Acknowledgments
  • Russell Wilkins (retired), Paul A Peters (University of New

Brunswick) & Michael Tjepkema (Health Analysis Division)

  • For more information please contact:
  • HAD-DAS@canada.ca

11/26/2015 Statistics Canada • Statistique Canada 43