High Resolution Mapping of Fertility and Mortality from National - - PDF document

▶

Nov 07, 2022 211 likes •415 views

1 High Resolution Mapping of Fertility and Mortality from National Household Survey Data in Low Income Settings Alessandra Carioli 1,2 Claudio Bosco 1,2 Andrew J. Tatem 1,2 1. WorldPop, Department of Geography and Environment. University of

SLIDE 1

1

High Resolution Mapping of Fertility and Mortality from National Household Survey Data in Low Income Settings

Alessandra Carioli1,2 Claudio Bosco1,2 Andrew J. Tatem1,2

1. WorldPop, Department of Geography and Environment. University of

Southampton, UK

2. Flowminder Foundation, Stockholm, Sweden

Abstract

SLIDE 2

2 2 The UN sustainable development goals (SDGs), were launched in 2015 and represent an intergovernmental set of 17 aspirational goals and 169 targets to be achieved by 2030. All SDGs are based on ensuring a certain percentage of the population has access to specific services or resources, or achieves a certain level of social, economic, or physical health, and therefore there is a need for a strong and regularly updated demographic evidence base. The SDGs strive to include the social, economic and environmental dimensions, which have prominent heterogeneous characteristics at sub-national level. Moreover, a particular focus across the goals and targets is achievement 'everywhere', ensuring that no one gets left behind and that progress is monitored regularly at subnational levels to avoid national-level statistics masking local heterogeneities. Census data can provide the requisite demographic data at fine spatial scales, but such data are only collected every 10 years, and sometimes longer in many low income settings. Moreover, administrative, civil and vital registration systems are often weak, incomplete or are rarely available in low-income setting. On the other hand, National household survey data are collected more regularly to enable SDG monitoring, but are typically summarized at national or provincial level, masking substantial local

heterogeneities. Recent work has however shown the potential of using spatial

interpolation techniques applied to GPS-located survey cluster data in combination with geospatial covariate layers to produce high-resolution maps of key demographic and health indicators, including age structures, access to sanitation and malaria prevalence.

SLIDE 3

3 3 In this paper we apply a full Bayesian methodology to test the potential of such approaches for mapping key demographic indicators, total fertility and child mortality rates, across a set of low-income countries. We employ Development and Health Survey data and implement the Integrated Nested Laplace Approximations (INLA) (Rue et al. 2014) approach as a computationally effective tool to produce fine scale predictions for three countries: Nigeria, Nepal, and Bangladesh. Using the fitted model we obtain the 1 x 1 km grid cells with the predicted values for the two demographic indicators.

SLIDE 4

4 4 Introduction The Sustainable Development Goals (SDGs) for the 2030 Agenda are a wide ranging set of 17 different aspirational targets aimed to all countries to significantly improve life standards worldwide over a 15 years horizon. They include ending poverty and malnutrition, improving health and education, and building resilience to natural disasters and climate change. Efficient monitoring of the goals as well as country based policies directed towards the achievement of SDGs needs up to date, detailed and good quality data on populations and on key demographic indicators. In this context, knowledge of population demographics is key to understanding the direction each country is facing in fulfilling the SDGs, especially at fine scale, in

rder to appraise spatial heterogeneity between different regions as well as between

urban and rural areas. In particular in a context of low and middle-income countries (LMIC), detailed measures at fine grid scale of fertility rates and child mortality are important key to measure and monitor spatial heterogeneity of women’s and children health and wealth. For instance, a declining fertility rate may mean the demographic dividend may take place freeing up resources (derived from a higher support ratio) that can be invested in children’s health and education, with clear long-term advantages for the living standards. Fertility in low income countries tends to be well above replacement level (set at 2.1) with important variations between macro regions, such as Western and Middle African countries 4.9 and 5.8 respectively and South-eastern Asia with 2.4 children per woman (source un.org). Similarly, child mortality in low-income settings is on average

SLIDE 5

5 5 11 times higher than in high-income countries (source: WHO). For instance, in Nigeria 1 in 8 children will not reach the age of 5, and 1 in 15 will not reach their first birthday (source: Nigeria Standard DHS 2013), while this ratio fall by more than half in Bangladesh (source: Bangladesh Standard DHS 2014). Such heterogeneity in the demographic indicators is not only appreciated at national level but also at subnational level, as there is a substantial divergence in trends across regions as well as between urban and rural areas. Thus, the availability of updated, contemporary, spatially detailed, and comparable datasets that accurately depict the distribution of fertility and child mortality is especially important for family planning and decision-making purposes. Materials and methods This study focuses on two South Asian countries, Bangladesh and Nepal, and

ne Sub-Saharan African country, Nigeria. All three countries are Least Developed

Countries, characterized by socioeconomic underdevelopment, poverty, inequality, social exclusion and a low level of human development. However, Bangladesh and Nepal have achieved considerable human development gains over the last few decades in terms of poverty reduction, , as opposed to Nigeria, which has high infant mortality and high fertility. We estimate two demographic variables using recent DHS data, under 5 mortality and total fertility rate, employing appropriate micro data survey techniques. In

rder to estimate total fertility rates, we employ direct survey techniques, excluding from

SLIDE 6

6 6 the fertility prediction analysis clusters that have 0 women numbers in any of age- groups among the final exposures computed for earlier completed calendar years plus the year of women’s interview. Table 1: Number of input clusters Country Clusters Nigeria 2013 886 Bangladesh 2014 575 Nepal 2011 289 Geographical locations of the selected cluster centroids were provided in the survey and consist of 886 geolocated households for Nigeria, 575 for Bangladesh, and 289 for Nepal. In the survey, a cluster centroid of geolocation displacement was introduced to anonymize the cluster location, up to 5 km in rural areas and up to 2 km in urban areas, where urban areas are defined as settlements with more than 20000 inhabitants. Data The maps for U5MR and fertility present the predicted estimates of mortality for children under 5 years old and fertility for women aged 15-49,⁠1 as a result of the geostatistical modelling. In general Northern Nigeria displays a higher incidence of child mortality as well as a higher fertility with respect to the Central and Southern regions. The Borno region (North-East) shows remarkably low child mortality and fertility, probably due to data quality (few observations).

SLIDE 7

7 7 This study uses data from household surveys available for a large number of countries and repeated fairly constantly throughout time, on average every five to three

years. Its aim is to quantify relevant demographic indicators, fertility and child mortality

using a Bayesian hierarchical spatiotemporal model. Looking at the constructed variables at cluster level (figure 1a and 1b), it is evident that there exists sizable subnational variation in both fertility and child mortality. Results from this study suggest that detailed and accurate depiction of important demographic indicators can be efficiently realized from survey data and mapped at fine spatial resolution, which in turn can be promptly summarized for policy intervention or resource allocation purposes. The innovative approach coupled with the use of available geolocated survey data, helps to improve the understanding of demographic dynamics. Indeed, in these countries the realization of censuses is often difficult, with data that often provide unreliable data or that are mostly out-dated due to the fast transformation of the population. Geo-located household surveys The demographic rates investigated in these analyses were computed from the DHS Program, which collects and analyses data on populations in low and middle- income countries since the mid-90s on a regular 4 years interval. DHS household surveys, adopt a multistage cluster sampling design, where the primary-sampling units (PSU) consists of pre-existing geographic areas known as census enumeration areas (EAs). Each census tract is defined by the country Census Bureau, as well as the classification into rural and urban areas.

SLIDE 8

8 8 The boundaries of the EAs are defined by the country’s census bureau, as are the urban and rural status of each cluster. Since DHS also provides georeferenced datasets, which can be linked to individual and household records through unique cluster identifiers. In order to ensure that respondent confidentiality is maintained, clusters are randomly displaced so that urban clusters contain a minimum of 0 and a maximum of 2 kilometres of error, while rural clusters contain a minimum of 0 and a maximum of 5 kilometres of positional error with a further 1% of the rural clusters displaced a minimum of 0 and a maximum of 10 kilometres. The displacement is restricted so that the points stay within the country and within the DHS survey region. In surveys released since 2009 the displacement is restricted to the country's second administrative level where possible. Because displacement affects the physical location

f the data, it is necessary to account for displacement when undertaking spatial

modelling using DHS surveys. Maps of under-five mortality and total fertility rates were constructed for each

country. These indicators were estimated starting from the DHS data following survey

specific methodologies, indirect for mortality estimation and direct for fertility. Total fertility rates The total fertility rates (TFR) are computed applying a direct survey estimation method employing DHS survey data on birth histories combining the completed past three calendar years information and including the months-exposures of interview in a manner similar to that described by Schoumaker (2013). We aggregate occurrences

SLIDE 9

9 9 and exposures into five-year groups of women (15-49) to avoid erratic rates at single

age. Also, clusters with 0 counts for women in any age group are omitted from the

modelling, to further ensure rates stability. Predicting TFR is important to understand where, for instance, maternal care, vaccination or family planning campaigns could be concentrated as it is often the case with developing countries there is un unmet need for contraception (Boongarts 2002). Under-five mortality rates Under-five mortality rates are computed using the DHS method preceding birth technique as described by Schoumaker (2013). Similarly to TFR, U5M is not a year measure; therefore it does not refer to a specific year but rather to an average over the past 3 years plus the year of interview. U5M is a measure of children’s health and captures a number of factors that may influence its outcomes’. Indeed, in countries with poor health systems, vaccination, poor mother’s health and education or poor infrastructure and accessibility, U5M tends to be considerably higher. The three countries in this studies present substantial differences as Nigeria is considerably worse

ff with respect to Bangladesh and Nepal.

Defining a suite of covariates for predicting health and demographic indicators at fine spatial resolution

SLIDE 10

10 10 The idea behind spatial modeling is that a variable under study displays similar behaviors in locations that are close so that everything is related to everything else, but close locations are more related than distant locations (Tobler, 1970). In our model, we make use of this spatial relationship in order to model and predict mortality and fertility at fine grid scale. The advantage of this technique is to employ high-resolution geographic data employing indexes of health, wellbeing, environmental, and geographic conditions to describe the variables at survey location and predict them across the rest

f the country.

Following literature ref, we employ a suite of physical (topography, climate, land cover, etc.) and socio-economic (population density, urbanization, wealth, ethnicity) covariate grids were selected from existing publicly available datasets, focusing on factors that have been shown to correlate with the modeled indicators and covering the selected countries. For each country, we assemble a unique set of covariates, and due to the different spatial resolution, projection system, format and extent of the datasets, algorithms were developed and applied for converting all the layers to create a common 1x1 km gridded dataset, suitable to be used in mapping. For privacy purposes, DHS employs cluster displacement. Therefore, we determine the mean value of each variable in a buffer of two kilometres from the cluster location for urban areas and five kilometres for rural areas, as recommended in in applying a linear modelled approach.

SLIDE 11

11 11 Selection of geospatial covariates The selection of relevant covariates is pivotal to maximize modelling outcomes, therefore in our opinion; covariate selection cannot be discretional but need to be funded on statistical evidence, especially when starting from a large set of covariates. Indeed, including too few variables could lead to an underperforming predictive model, while the inclusion of too many variables could ensue problems of multicollinearity or

verfitting. We select the best performing covariates within the chosen modelling

architecture, by means of sensitivity analysis using a jack-knife approach, which leads to the best set of explanatory variables. To make sure the modelling structure does not incur into multicollinearity, which negatively impacts the quality, stability and interpretability of the model, we compute the Variance Inflation Factor (VIF), where the larger the VIF score, the bigger the

multicollinearity. VIF measures how much the variance of the estimated regression

coefficients are inflated as compared to when the predictor variables are not linearly

related. Multicollinearity was tested here among independent variables and we remove

variables with a VIF higher than 3. Covariates We aim to build a spatial model that employs a combination of covariates to predict child mortality and fertility at a fine grid resolution. To do so, we employ a two- stage process to get a suitable model combination that best predicts the under-five

population. First, covariates were selected through a non-spatial generalized linear

SLIDE 12

12 12 regression model (glm) to identify suitable predictors. Then, the selected covariates were used in the Bayesian approach implemented via INLA. The optimal model was selected by comparing the BIC⁠2 score across various models. The first model aimed at predicting U5MR includes three variables: enhanced vegetation index (MODIS EVI), economic indicator⁠3 (G-Econ database gross cell product), and urbanization. The second model employed to predict fertility employs night-time lights, economic indicator⁠4 (G-Econ database gross cell product), and urbanization Results In the last few decades the increased availability of geolocated spatio-temporal data has increased substantially mainly due to the advances in statistical modelling, which allow to collect data with GPS referencing. We are interested in mapping continuous spatial variables, which are measured only at finite sets of data points distributed unevenly over the country, and in predicting their values at unobserved locations to eventually obtain a fine grid maps of such variables (100m per 100m). To model and predict U5MR and TFR we employ Bayesian modelling tools, which have the great advantage to take into account uncertainty in the estimates, can easily take into account complex models and effects (such as spatial and temporal).⁠ Most Bayesian methods employ a Markov Chain Monte Carlo algorithms (MCMC) with a main limitation deriving from the computational burden when such methods are applied to large datasets (for instance sampling algorithms such as the Gibbs sampler or Metropolis and Hastings). This weakness is of considerable importance as the availability of large datasets with high spatial and temporal resolutions offers a unique chance for scientists

SLIDE 13

13 13 to explore spatio-temporal structures, unthinkable just a couple of decades ago. To

vercome this issue we employ the integrated nested Laplace approximation (INLA), a

deterministic algorithm for Bayesian inference, especially designed for Gaussian models,⁠ which has the advantage of being considerably faster computationally. Moreover, INLA allows for comparison across models and various predictive measures to further improve model estimation. We model two outcome variables: the under-five mortality rate (U5MR) in the first model, and the total fertility rate (TFR) in the second, which are both unevenly distributed and display significant spatial clustering. In this extended abstract we present prediction results for U5MR and TFR for Nigeria. The methodology we apply makes use of data at known cluster centroid locations (geo-referenced using GPS), survey date, together with a designated set of available covariates that are employed to predict the two demographic measures. The data and covariates are used in a Bayesian hierarchical spatial model, implemented through a stochastic partial differential equations (SPDE) approach with INLA for inference, to produce continuous maps of the estimated U5MR and TFR in each 100m per 100m grid square in Nigeria. The SPDE approach⁠ (with a stationary Matérn covariance) consists in performing computations using a Gaussian Markov Random Field, characterized by sparse precision matrices, which in turn allow implementing computationally efficient numerical methods. Validation is implemented in two steps: first internal model validation was implemented using a leave-one-out cross-validation approach. Second, an external

SLIDE 14

14 14 model validation procedure was applied on a 30% subset of data. Predictions were made at validation locations and compared to the observed data. 1. Nepal Mortality Fertility Distance from health facilities Distance from health facilities Ethnicity Madhesi Ethnicity Madhesi Ethnicity Muslims GRUMP urban extent EVI Nightlight GHLS 2km precipitation GRUMP urban extent Protected areas 20km Protected areas 20km Roads distance Explained variance: 0.5194118 Explained variance: 0.711167 Mean MSE: 0.001839 Mean MSE: 5.64 Trivial model: 0.05528 Trivial model: 3.17

SLIDE 15

15 15

SLIDE 16

16 16

2. Nigeria

Mortality Fertility Gross cell product Elevation Roads distance Gross cell product MODIS Modis Nightlight Nightlights Percent urban Rivers distance Landcover Roads distance Landcover Explained variance: 0.5149 Explained variance: 0.5566 Mean MSE: 0. 0065 Mean MSE: 0. 3739 Trivial model: 0.1218 Trivial model: 5.6429

SLIDE 17

17 17

3. Bangladesh

Mortality Fertility Distance from waterways Goats Elevation WPop Bio1 EVI GeoEPVR1 Aridity Percent urban Protected areas Explained variance: 0.1636 Explained variance: 0.0336 Mean MSE: 0. 0017 Mean MSE: 1.0507 Trivial model: 0.0499 Trivial model: 1.1091

SLIDE 18

18 18

SLIDE 19

19 19 References Bongaarts, John. "The end of the fertility transition in the developed world." Population and development review 28.3 (2002): 419-443. Rue, Havard, et al. "INLA: Functions which allow to perform full Bayesian analysis

f latent Gaussian models using Integrated Nested Laplace Approximaxion." R package

version 0.0-1404466487, URL http://www. R-INLA. org (2014). Schoumaker, Bruno. "A Stata module for computing fertility rates and TFRs from birth histories: tfr2." Demographic Research 28 (2013): 1093-1144. Tobler, Waldo R. "A computer movie simulating urban growth in the Detroit region." Economic geography 46.sup1 (1970): 234-240.