SLIDE 1
DRAFT—DO NOT CITE
1
Decent Work and Sustainable Communities: The Power of IPUMS Census Microdata for SDG Measurement and Disaggregation Kristen Jeffers, Rodrigo Lovaton, Sula Sarkar, Lara Cleveland, and Patricia Kelly Hall— University of Minnesota The United Nations 2030 Agenda for Sustainable Development proposes 17 goals and 169 targets that aim to carry on the momentum generated by the Millennium Development Goals. The Sustainable Development Goals (SDG) and targets concentrate on the eradication of poverty, hunger and inequality; access to education and healthcare; gender equality; environmental sustainability; economic, social, and technological progress, and the establishment of new partnerships (Sustainable Development Solutions Network, 2015). The proposed framework for monitoring the SDGs emphasizes the need for disaggregated indicators that measure progress among different demographic and social groups at various levels of sub-national geography. The Sustainable Development Solutions Network recommends spatial disaggregation and stratification by sex, gender, age, income, disability, ethnicity, indigenous status, economic activity, and migrant status for nearly half of the proposed monitoring indicators. While enhanced data collection will almost certainly be necessary to monitor several SDGs, high-density census microdata samples, like those disseminated by IPUMS-International, represent useful data that are part of the existing statistical infrastructure of most developing countries. These data are highly representative of national populations, are collected at regular intervals, and include measures of the population characteristics required for SDG indicator disaggregation. The UN Commission on Population and Development is encouraging statistical offices to disseminate public-use, anonymized, geo-referenced census and survey microdata for use in SDG monitoring (2016). To make accurate comparisons across time and place, indicators must be derived with comparable input data that has been standardized across countries with varying statistical cultures and capacities. More than twenty of the 232 SDG indicators are directly measurable with census microdata included in IPUMS-International. These include indicators related to fertility, mortality, access to basic services, enrollment in education, and labor force participation and composition. For dozens of additional targets that rely on “big data” and other non-traditional data sources that are not nationally representative, census data will be required to produce population-level estimates. Likewise, census data will be required for the disaggregation of indicators derived from data sources that lack the sample sizes or stratifying variables necessary to support disaggregated estimates. While targeted household surveys often provide more detail than population censuses, they rarely produce sample sizes large enough to support the multi-dimensional
SLIDE 2 DRAFT—DO NOT CITE
2
disaggregation suggested for SDG monitoring. When empirical disaggregation is not available, census data can be used to model indicator estimates for population subgroups and subnational geographic units. As of April 2017, 145 indicators lack regularly produced data (tier II) or lack established methodology (tier III) (United Nations Statistics Division, 2017). Census data may also provide proxy estimates for tier II and tier III indicators in countries where new data cannot be easily collected. In this paper, we identify SDG indicators that can be measured using data from IPUMS-International, discuss the strengths of IPUMS-International census microdata for SDG monitoring, provide specific examples of its application for indicator measurement, and describe a method for using IPUMS- International census microdata to produce disaggregated estimates for indicators derived from other data
- sources. To minimize resources spent on new data collection and maximize resources devoted to the
implementation of policies and programs to achieve the goals and targets, national governments and SDG custodian agencies should make good use of available data. Integrated microdata from IPUMS- International represent an important resource for measurement, disaggregation, validation, and further exploration of SDG indicators. IPUMS for SDG Monitoring Census microdata disseminated by IPUMS-International offer several advantages for SDG monitoring. IPUMS-International disseminates high-precision census microdata samples from around the world, and represents the world's largest collection of publicly available census microdata. As of 2017, 301 anonymized microdata samples from 85 countries are available to researchers free of charge through the IPUMS-International online data dissemination system. Truly global in its coverage, the series includes more than 50 samples each from Africa, Asia, Europe, and the Americas. Many of these nationally representative samples are unavailable elsewhere. IPUMS-International samples are individual-level subsets of full count census data. The samples are systematically drawn from the total enumerated population by IPUMS-International or by the statistical offices of the country of origin according to a variety of sample designs. Where possible, IPUMS-International provides 10 percent samples of census data by selecting every 10th household after a random start. Nearly all samples available from IPUMS- International are cluster samples: they are samples of households rather than individuals. Individuals are sampled as parts of households because many important topics, such as fertility, household composition, and nuptiality, require information about multiple individuals within the same household (Jeffers et al. 2017).
SLIDE 3 DRAFT—DO NOT CITE
3
The data series includes information on a broad range of population and housing characteristics. The population questions address fertility, nuptiality, migration, disability, labour-force participation,
- ccupational structure, education, ethnicity, and household composition (Ruggles et al. 2003; Sobek et al.
2011). Housing questions cover economic indicators (such as dwelling ownership and building material), possession of amenities (such as a car or television), and utilities (such as water source, sewage disposal, and cooking fuel). Data available from IPUMS-International can be used to calculate 28 SDG indicators as officially operationalized (see table 1). These include indicators related to fertility, mortality, access to basic services, enrollment in education, and labor force participation and composition. Table 1: SDG Indicators measureable with data from IPUMS-International 1.4.1 Proportion of population living in households with access to basic services 1.4.2* Proportion of total adult population with secure tenure rights to land, with legally recognized documentation & who perceive their rights to land as secure, by sex and by type of tenure. 3.1.1* Maternal mortality ratio 3.2.1* Under-five mortality rate 3.7.1 Under-five mortality rate 3.7.2* Adolescent birth rate (aged 10-14 yrs.; aged 15-19 yrs.) per 1,000 women in that age group 3.c.1* Health worker density and distribution 4.1.1 Proportion of children and young people: (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary achieving at least a minimum proficiency level in (i) reading and (ii) mathematics, by sex 4.3.1 Participation rate of youth and adults in formal and non-formal education and training in the last 12 months, by sex 4.5.1 Parity indices (female/male, rural/urban, bottom/top wealth quintile and others such as disability status, indigenous peoples and conflict-affected, as data become available) for all education indicators on this list that can be disaggregated 4.6.1 Percentage of population in a given age group achieving at least a fixed level of proficiency in functional (a) literacy and (b) numeracy skills, by sex 4.c.1 Percentage of teachers in: a) pre-primary; b) primary; c) lower secondary; and d) upper secondary education who have received at least the minimum organized teacher training (i.e. pedagogical training) pre-service or in-service required for teaching at the relevant level in a given country 5.3.1* Proportion of women aged 20-24 years who were married or in a union before age 15 and before age 18 5.5.2 Proportion of women in managerial positions 5.a.1* (a) Proportion of total agricultural population with ownership or secure rights over agricultural land, by sex; and (b) share of women among owners or rights bearers of agricultural land, type
6.1.1* Proportion of population using safely managed drinking water services 6.2.1* Proportion of population using safely managed sanitation services, including a hand-washing facility with soap and water 6.3.1* Proportion of wastewater safely treated 7.1.1* Percentage of population with access to electricity 7.1.2* Proportion of population with primary reliance on clean fuels and technology 8.3.1 Proportion of informal employment in non-agriculture employment, by sex
SLIDE 4
DRAFT—DO NOT CITE
4
8.5.1 Average hourly earnings of female and male employees, by occupation, age and persons with disabilities 8.6.1 Proportion of youth (aged 15-24 years) not in education, employment or training 8.7.1 Proportion and number of children aged 5-17 years engaged in child labour, by sex and age 9.2.2 Manufacturing employment as a proportion of total employment 9.5.2 Researchers (in full-time equivalent) per million inhabitants 11.1.1 * Proportion of urban population living in slums, informal settlements, or inadequate housing 11.2.1 * Proportion of population that has convenient access to public transport, by sex, age and persons with disabilities *census data explicitly mentioned in SDG metadata Unique individual, household, dwelling, and subnational geographic identifiers allow researchers to select the level of analysis most suitable to their research. Geographic detail varies across samples. For most countries, the first and second administrative levels are identified; for some countries, smaller entities, such as municipalities, are specified. Most samples are truly nationally representative, including individuals living in group quarters such as prisons, nursing homes, children's homes, and religious institutions and thus providing information on population subgroups often excluded from household, health, and labor force surveys. Large, nationally representative samples support accurate national and subnational empirical estimates disaggregated by the recommended social and demographic characteristics. The principal advantage of IPUMS-International is the reconciliation of sample-specific variable coding schemes to produce datasets that integrate records across time and space. The basic goal of variable harmonization is to make data suitable for comparative analysis across time and space by applying the same coding schemes across all samples in the data series. Microdata are integrated so that identical concepts have identical codes. This integration process not only facilitates the researcher's analysis across time and space, it also ensures that comparative analysis is utilizing the same concepts among countries and sample years. Integrated microdata facilitate informed comparative research, but changes in administrative boundaries pose a major challenge for the spatio-temporal analyses required to accurately monitor SDGs. Holding space constant is critical in measuring progress toward goals at sub-national levels; units that have changed boundaries cannot be compared across time in any meaningful way. Without consistent geographic boundaries, variation in indicator values across time due to changing boundaries may be misinterpreted as substantive change. IPUMS has developed a method for creating spatially consistent units in the
SLIDE 5 DRAFT—DO NOT CITE
5
microdata, starting with the first and second administrative units identified in the census samples. Where geographic boundaries of modern units do not align with historical census units because of boundary changes, larger aggregated units are created that remain stable over time. If units split
- r merged, the harmonized unit will
have the boundaries of the largest version of the unit; if a territory is redistributed between two or more units, the units are combined (Sarkar et
Figure 1 provides an example of changing boundaries over time at the second administrative level (district) in South Africa. The four maps in the corners of the image correspond to administrative boundaries at the time of each census. The number and boundaries of these units change between each census. The map at the center of the figure reflects the IPUMS harmonized variable that applies consistent boundaries across all census years. IPUMS-International provides GIS boundary files corresponding to both year-specific and harmonized geography variables. The SDG disaggregation mandate requires indicator estimation below the national level. Accurate SDG monitoring will require geographic harmonization. Integrated microdata from IPUMS-International represent an important resource for UN custodian agencies for the validation and further exploration of indicator data submitted by national statistical offices. Analysis: Indicator Calculation and Visualization To demonstrate the utility of IPUMS-International census microdata for SDG monitoring, we calculate and visualize indicators for targets 4.3) By 2030, ensure equal access for all women and men to affordable and quality technical, vocational and tertiary education, including university; 5.b) Enhance the use of enabling technology, in particular information and communications technology, to promote the empowerment of women; 8.6) By 2020, substantially reduce the proportion of youth not in employment, education or training; and 11.1) By 2030, ensure access for all to adequate, safe and affordable housing and basic services and upgrade slums. We present measures calculated using data from 2000- and 2010- Figure 1: Year-specific and harmonized district boundaries, South Africa 1996-2011
SLIDE 6 DRAFT—DO NOT CITE
6
round censuses to illustrate the utility of harmonized census data for tracking development across census years and for establishing baseline measures for monitoring SDG progress at a sub-national level and disaggregated by demographic characteristics. Each calculation we present involves different dimensions
- f geographic and/or demographic disaggregation to highlight the versatility of IPUMS-International data.
The data and shapefiles used for all calculations and visualizations are freely available from international.ipums.org. Enrollment in tertiary education in Brazil (4.3.1) UN guidelines recommend using the gross enrollment ratio in all tertiary education to monitor access to third-level education1. IPUMS variables that measure school attendance, level of school attending, and age can be used to calculate gross enrollment ratios for all levels of education and at varying levels of subnational geography (table 2). Table 2: IPUMS variables used for calculation of gross enrollment ratio AGE Age SCHOOL School attendance BR2010A_EDLEVEL1 Level of school attending We disaggregate the national enrollment ratio by sex and household-income quartile, and calculate enrollment ratios by state. Enrollment ratios varied by state, sex, and income group in Brazil in 2010. Third-level education enrollment is highest in the states containing large urban centers (figure 2). The highest enrollment ratio, 80 percent, was in the Federal District (Brasilia). Enrollment ratios are higher among women than among men in all regions of the country and also for all household-income groups (figure 3). The gap between male and female enrollment widens as income increases, and it is more pronounced in rural border states in western and northern Brazil (not visualized here).
1 Gross enrollment ratio: total enrollments of any age in tertiary education expressed as a percentage of the 5-year age-group immediately
following the end of upper secondary education (18-22 year olds in Brazil).
SLIDE 7 DRAFT—DO NOT CITE
7
Gender disparities in cell phone ownership in Ghana (5.b.1) For target 5.b, we measure the use of enabling information and communications technology, to promote the empowerment of women. We use the latest census from Ghana, which asked respondents whether they owned a mobile phone, and disaggregate this information by sex. Such questions on the access and use of information and communications technology (ICT) are becoming increasingly common in the developing world. We visualize phone
- wnership among adults age 12 and
- lder by sex at the second administrative
level of geography (district) for Ghana (Figure 4). Cell phone ownership is more common in the urbanized south among both sexes. Gender disparities exist throughout the country. Mobile phone ownership rates are higher among men in every district in Ghana, though gaps are less pronounced in the urban centers of the country. Figure 2: Gross enrollment ratio in tertiary education by state, Brazil 2010 Figure 3: Gross enrollment ratio in tertiary education by household-income quartile and sex, Brazil 2010
20 40 60 80 100 120 1 2 3 4 5 Total Males Females
Figure 4: Percent of population age 12 and older that owns a mobile phone, Ghana 2010
SLIDE 8
DRAFT—DO NOT CITE
8
Urban slum population in Mexico (11.1.1) The United Nations guidelines recommend the percentage of urban households living in slums as an indicator for tracking SDG 11.1. UN-HABITAT defines a slum household as a group of individuals living under the same roof lacking one or more of the following conditions: access to improved water, access to improved sanitation, sufficient living area, durability of housing, and security of tenure2. Data on water access, sewage disposal, and size and quality of dwellings are commonly collected by censuses in the developing world. IPUMS integrates these data to build variables with conceptually consistent categories and harmonized codes across all samples. Using IPUMS integrated variables (Table 3), we computed dichotomous dummy variables to identify households that lacked access to improved water, improved sanitation, sufficient living area, or durable housing according to the UN operationalization of each of the four slum conditions. Urban households lacking one or more of these amenities were considered to be living in slum dwellings. Table 3: IPUMS integrated variables used to identify urban households living in slums URBAN Urban-rural status3 WATSUP Means by which household receives its water SEWAGE Access to sewage system or septic tank ROOMS Number of rooms occupied by the household PERSONS Number of persons in the household WALL Wall or building material FLOOR Floor material ROOF Roof material The percentage of urban households living in slums in Mexico decreased from 20.1 percent in 2000 to 12.4 percent in 2010; figures 5 and 6 visualize this trend. IPUMS harmonized geographic variables enable us to calculate the same measures at sub-national levels, holding spatial units constant across sample years. The sub-national analyses identify the 133 municipalities (10 percent) where slum populations stagnated or increased between 2000 and 2010. Slum-dwelling remained high or increased in several municipalities in southeastern Mexico, particularly in the state of Tabasco, where poverty rates exceed the national average4.
2 Secure tenure is included as a fifth condition in the official definition of slum, but information on secure tenure is not available for most
countries, and only the first four conditions are used estimate the proportion of urban population living in slums in practice. For detailed definitions of improved water, improved sanitation, sufficient living area, durable housing, and secure tenure, see: http://www.un.org/esa/sustdev/natlinfo/indicators/methodology_sheets/poverty/urban_slums.pdf
3 In Mexico, any locality with a population of 2,500 or more is considered urban 4 Banco de Información INEGI, http://www3.inegi.org.mx/sistemas/biinegi/?ind=6300000269
SLIDE 9
DRAFT—DO NOT CITE
9
Figure 5: Percentage of the urban population living in slums, Mexico 2000 Figure 6: Percentage of the urban population living in slums, Mexico 2010 Percentage of young people in Mozambique not in education, employment, or training (NEET) (8.6.1) This indicator tracks the share of youth (age 15 to 24) who are neither in formal employment nor in full- time education or training. It is a measure of the percentage of youth who are not on track acquiring skills at school, or either unemployed, work in the informal sector, or have other forms of precarious jobs. SDG metadata suggests household labor force surveys as preferred source of data for employment-related
SLIDE 10 DRAFT—DO NOT CITE
10
indicators, but agree “the population census and/or other household surveys with an appropriate employment module may also be used to obtain the required data" (United Nations Statistics Division 2017). However, some countries may not field labor force surveys regularly. By cross-classifying persons using the IPUMS integrated variables SCHOOL and EMPSTAT we define four categories for youth ages 15-24: attending school, working, both attending school and working, and none of the above. Table 4: IPUMS integrated variables used to define NEET SCHOOL School attendance at, or prior to, time of census EMPSTAT Economic activity status (labor force participation) at time of census or during specified time period prior to census AGE Age In Mozambique in 2007, 19.6 percent of youth age 15-24 were not in education, employment, or training. NEET in Mozambique was lower than in neighboring countries in the region like Zambia (2010) and Malawi (2008), where NEET rates were 41.7 percent and 29.5 percent, respectively. Disaggregation by sex and administrative unit identifies enclaves of high NEET, particularly among females. Figure 2 shows the percentage of NEET youth in Mozambique in 2007 by sex at the first and second levels of geography. Gender disparities exist throughout the country, but are particularly pronounced in the capital region of Maputo (southern tip of Mozambique) and in the urban center of Tete in the northwest. NEET among young men is concentrated in the province of Gaza, where male employment levels are the lowest in the
- country. Among young women, high levels of NEET are visible throughout the country. Preliminary
results indicate parity in labor force participation at the national level, but much lower levels of school attendance for young women (27.6 percent) compared to young men (45.3 percent). Conclusion and Next Steps The aim of our analysis is to demonstrate the diverse applicability
data from IPUMS-International for SDG
- monitoring. Specifically, this
study shows the use of IPUMS data to calculate SDG indicators Figure 7: Percentage of youth age 15-24 not in education, employment, or training, Mozambique 2007, by sex
SLIDE 11 DRAFT—DO NOT CITE
11
at the regional and local level. Disaggregation of national- and population-level trends reveals areas of change and stasis not necessarily visible at the aggregate level. Integrated microdata, measures of several demographic and social characteristics, and spatially consistent geographic units provide monitoring agencies and national governments with reliable data for the establishment of disaggregated baseline measures for a number SDGs. In particular, IPUMS variable and geography harmonization facilitate disaggregation and spatial-temporal analysis. Census-taking remains a primary function of national statistical offices in the developing world, as censuses are still an important element of the existing data
- infrastructure. The 2020- and 2030-round censuses will provide data for continuing assessment of these
targets without requiring cumbersome investment in national statistical capacities. As new census data are collected and integrated in IPUMS, SDG progress can be measured against these baselines according to methods presented in this paper. The next step for this study is to explore how the application of Small Area Estimation (SAE) methods could expand the possibilities to measure and disaggregate SDG indicators using IPUMS data. Censuses do not include the information required to calculate many SDG indicators. However, SAE overcomes this
- bstacle by producing estimates at a low-level geography using ancillary data. The SAE technique was
- riginally developed for poverty measurement in the seminal paper by Elbers, Lanjouw, and Lanjouw
(2003), but has been proposed as an approach for SDG disaggregation. For instance, Zhang (2017) used the Demographic and Health Survey (DHS) for Nepal to calibrate a model for select family planning
- indicators. The author was then able to predict those indicators using census data, based on variables
common across the DHS and the census. Future work will explore this method and examine other data sources that might be used with census data to produce disaggregated indicator estimates.
SLIDE 12 DRAFT—DO NOT CITE
12
References Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. 2003. “Micro–Level Estimation of Poverty and Inequality.” Econometrica 71: 355–364. doi:10.1111/1468-0262.00399 Jeffers, Kristen, Miriam King, Lara Cleveland, and Patricia Kelly Hall. 2017. "Data Resource Profile: IPUMS-International." International Journal of Epidemiology. DOI: 10.1093/ije/dyw321 Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 6.5 [dataset]. Minneapolis: University of Minnesota, 2017. http://doi.org/10.18128/D020.V6.5. Ruggles, Steven, Miriam L. King, Deborah Levison, Robert McCaa, and Matthew Sobek. 2003. “IPUMS- International.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 36 (2): 60–
- 65. doi:10.1080/01615440309601215.
Sarkar, Sula, Lara Cleveland, Majory Silisyene, and Matthew Sobek. 2015. "Harmonized census geography and spatio-temporal analysis: gender equality and empowerment of women in Africa." Paper presented at the Annual Meeting of the Population Association of America, San Diego, April 30 – May 2. Sobek, Matthew, Lara Cleveland, Sarah Flood, Patricia Kelly Hall, Miriam L. King, Steven Ruggles, and Matthew Schroeder. 2011. “Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 44 (2): 61–68. doi:10.1080/01615440.2011.564572. Sustainable Development Solutions Network. 2015. "Indicators and a Monitoring Framework for the Sustainable Development Goals: Launching a Data Revolution for the SDGs." A report to the Secretary- General of the United Nations by the Leadership Council of the Sustainable Development Solutions
- Network. Available for download from: http://unsdsn.org/wp-content/uploads/2015/05/150612-FINAL-
SDSN-Indicator-Report1.pdf Sustainable Development Solutions Network. 2015. "Leaving No One Behind: Disaggregating Indicators for the SDGs". Available for download from: http://unsdsn.org/wp-content/uploads/2015/10/151026- Leaving-No-One-Behind-Disaggregation-Briefing-for-IAEG-SDG.pdf United Nations Commission on Population and Development. 2016. Resolution 2016/1: Strengthening the demographic evidence base for the 2030 Agenda for Sustainable Development Available for download from: http://www.un.org/en/development/desa/population/pdf/commission/2016/documents/CPD49%20Resol ution%202.pdf United Nations Statistics Division. 2017. SDG Indicator Metadata, Indicator 8.6.1. Available for download from: https://unstats.un.org/sdgs/metadata/files/Metadata-08-06-01.pdf United Nations Statistics Division. 2017. Tier Classification for Global SDG Indicators. Available for download from: https://unstats.un.org/sdgs/iaeg-sdgs/tier-classification/ Zhang, Sainan. 2017. “Small Area Estimates: an Illustration for Family Planning using Nepal as a Case Study.” Paper presented at World Data Forum, Cape Town, January 15-18.