SLIDE 1 Improving the quality of national survey data in South Africa through digital data collection
Mahier Hattas1 Johan Breytenbach2
Abstract
Digital data collection (DDC) offers national statistical organizations (NSOs) in Africa possible, albeit partial, solutions to several current performance and profitability concerns. Perceived potential benefits of DDC methods over paper based collection methods include increased speed
- f data collection, increased data accuracy, timeous data availability, higher data quality,
increased security of data, and lower costs of data collection. Secondary benefits may include better-informed policies from governmental departments reliant on NSO’s for strategic data. This article presents data from two iterations of a large scale DDC implementation in South Africa, whereby aspects related to collection speed, data accuracy, availability, quality, and costs of data collection receive attention. The implications of this research will affect the standard generic statistical value chain for the collection of household surveys. Findings include inter alia: poor initial speed of DDC interviews followed by a significant speed increase as interviewers master DDC technology and skills, the importance of effective training within DDC processes, proof of higher accuracy in geographic data capturing, real time availability
- f data, a shorter data cleaning and release process, and higher initial costs of mobile devices.
Keywords: digital data collection, data quality, secure data
Introduction
In planning local economic development, the South African government and private sector relies
- n its national statistical organization (NSO), Stats SA3, for high quality, digitised data, these
includes census’es, demographic and economic data series on which to build national development strategies. In 2015/16, Stats SA, like most NSOs in Africa, relied on manual, paper- based data collection methods in survey work to produce the majority of data/statistics that informs local economic development policy. Paper-based methods involve time intensive and expensive processes, including the printing of paper questionnaires, manual and/or scanned data entry and processing of the collected data. These processes not only delay the production of data for decision making, but also require more personnel than comparative digital data collection and processing processes, thereby contributing to high costs. Moreover, manual errors are hard to avoid, compromising data quality. In comparison to DDC, manual, paper based methods used by NSOs can be summarized as being slower, expensive, and more complex to manage from a quality assurance perspective. In order to keep abreast with ever-changing technologies, current
1 Department of Information Systems, University of the Western Cape, South Africa.
Corresponding author: mhattas@gmail.com
2 Department of Information Systems, University of the Western Cape, South Africa. 3 http://www.statssa.gov.za
SLIDE 2 methodologies, processes, systems and quality standards such as the South African Statistical Quality Assessment Framework (SASQAF) (8) should be reviewed for potential gains in social efficiencies, increasing levels of quality, increasing productivity as well as enhancing the economic potential of a country's developmental objectives such as South Africa. During 2015 and 2016 a wide range of new digital data collection (DDC) methods and sources has become available to Stats SA, primarily as a result of recent improvements in technical hardware (1) - lower cost and better technical features of mobile devices- and connectivity in South Africa (2) - mobile ownership and network coverage. Since 2015 such mobile technologies and DDC processes have been deployed and tested by Stats SA for household survey data collection to achieve faster data availability on a selection of key indicators in near real-time. This paper provides early insight into an iterative action research approach adopted by the
- rganisation for Digital Data Collection (DDC) implementation. Key concerns that received
attention were the speed of data collection, the timeous availability of data, the costs of data collection, and data quality. Two 2015/2016 DDC pilot projects are discussed and analysed. The latter pilot, in particular, produced some valuable guidelines for future DDC implementations by African NSO’s.
Background
The current trend of African NSOs transforming their manual processes towards DDC is motivated by the need to ensure the (i) maximisation of profit, precision, and accuracy of data collection (ii) the reduction of non-responses and item responses in surveys, (iii) the security of data (viz. ensure confidentiality and integrity) and (iv) increasing the overall quality of collection processes (3).Community surveys have typically involved capturing data using paper questionnaires at the household level and then sending completed questionnaires to a data processing center as a base for data entry and data cleaning. Current technological advances and economic trends motivated hand-held computers or personal digital assistants (PDA) as a viable alternative to manual data collection (4). Proponents would argue that direct data capture at the point of interview can reduce error rates and speed up the data cleaning process, and hence make databases available for analysis significantly sooner (4).
Literature
Stats SA, by virtue of its annual household surveys (including Community Surveys) and national censuses, requires cheap, versatile (mobile) technologies for use by in-person field interviewers i.e. Personal interviews carried out with enumerators recording responses directly on smart phones or data connected tablets (Computer Assisted Personal Interview (CAPI)). CAPI can drastically improve the speed and quality with which survey data is collected. With increasing availability of mobile data networks and internet, instant transmission of collected data to cloud- based servers is increasingly viable and circumvents the tedious traditional data entry process. Using integrated cloud services also makes data available for processing and analysis in near real-time. Computer-assisted data collection (CADAC) includes both computer-assisted interviewing and
- nline data collection, with the latter eliminating the need for an interviewer. From the 1970’s
SLIDE 3 computer-assisted telephone interviewing (CATI) was followed by computer-assisted personal interviewing (CAPI) and computer-assisted self-interviewing (CASI), as a result of the development of more affordable, and more portable, computers. Computers considerably reduced the amount of work associated with data collection, by automating some phases (e.g. data entry, coding) and omitting others entirely (e.g. printing and posting back the questionnaire) (5). Within an African context, 2016 witnessed a pledge by NSO’s towards the 2020 round of African national household censuses being conducted using DDC (6). One of the primary lessons from previous DDC pilots in Africa is that technology itself is not sufficient to meet all project objectives. Even a platform for free data collection does not guarantee the right data will be collected effectively (7). Maintaining a team that can design the collection efforts, implement them accordingly, and evaluate the data is as important as the technology. Technology such as DDC is used as the enabler to sound processes. Training (technology and processes) is an important component to collecting data through ICT tools (7). Software such as Survey Solutions from the World Bank4, or the privately owned iFormBuilder5, works directly with client NSO’s to help build their surveys, until the client’s capacity is such that they are capable of doing it on their own. On African farms and rural areas field enumerators and farmers using new technologies need additional training and support. With proper instruction, most organizations have found that even poor, uneducated farmers are capable of picking up skills required for DDC (7). The most complex technology or platform is not always necessary to solve solutions. For example, local lead farmers in USAID’s Feed the Future Project collect basic data from the community’s farmers with paper and pencil. Field advisors from the agricultural associations then check the data and upload it onto an Excel sheet, which is then shared with millers (buyers) in real-time via free accounts on Drop Box (7). Another important consideration is the possibility of utilizing a technology that farmers already use, such as smartphones. Introducing a new technology will increase costs and reduce
- sustainability. Other environmental factors related to technology such as access to electricity and
device theft also require consideration before investment by NSO’s.
Quality framework
Stats SA incorporates the use of the South African Statistical Quality Assurance Framework (SASQAF) for all Paper and Pen Interviewing (PAPI) surveys. SASQAF provides best and recommended practices, processes and standard indicators to consider when collecting household surveys of good quality (8). For DDC it is imperative to conform to a standard similar to SASQAF, however this standard to date has not been developed. In Rwanda, in their national strategy for the development of statistics, 2014/15 – 2018/19, the use of DDC is their first
- bjective, namely to strengthen civil registration system, administrative records, surveys &
censuses and other sources of data such that the anticipated surge in data demand can be met more quickly without compromising statistical quality (9); however DDC has not been defined in any of the African NSO’s explicitly. DDC within Stats SA is currently still in its infancy, and
4 https://solutions.worldbank.org/account/login?ReturnUrl=%2f 5 https://www.iformbuilder.com/
SLIDE 4
developing a quality framework is critical to the long-term success of DDC. Within Africa, SASQAF is respected and recognized as an enabler towards facilitation of the South African democracy.
Research problem
African NSOs require guidelines for: (a) implementing faster, cost effective, context sensitive, mobile based DDC processes, and (b) ensuring the quality of data collected during DDC processes This paper forms part of a longitudinal study that aims towards informing the quality assurance frameworks of African NSOs to accommodate new technologies and include guidelines for quality assurance during DDC implementations.
Research design
The research study of which this paper forms an introductory part follows several PAPI to DDC conversion projects of a South African NSO, treating each of these conversion projects as an iteration in a greater action research process, aimed at transforming the NSO’s entire data collection process into a DDC process. Each iteration covers a complete medium to large scale data collection process in South Africa. Lessons learnt during an iteration influence the planning phase of the following iteration directly. Each iteration can also be seen as a supporting case within the bigger research case study – the case of an African NSO adopting DDC methods for data quality purposes. According to Yin (10), case study research can be defined as follows: “A case study is an empirical inquiry that investigates a contemporary phenomenon (the “case”) in depth and within its real-world context especially when the boundaries between phenomenon and context may not be clearly evident.” Could a detailed analysis of multiple cases (iterations) of digital data collection methods in South Africa – a process in which the boundaries between context and phenomenon are not clear - provide us with useful guidelines for future DDC implementations and help us to enhance the Stats SA QA framework to accommodate DDC? After our involvement in the first two of these iterations, the authors believe this hands-on, iterative research approach can address the mentioned research problem. Our study will focus on the “digital data collection process” as cases, or units of analysis, with the case study approach allowing us to “retain a holistic and real-world perspective” (10) while at the same time being aware of the necessity to set clear boundaries to the analysis. Our real-world perspective – a realization becoming clearer with each iteration – is the direct impact that faster, cheaper, high quality data collection can have on local economic development policy (and delivery).
SLIDE 5 Defining DDC
Digital Data Collection (DDC) for household surveys is also referred to as mobile device/phone data collection. Collecting and reporting on household surveys using DDC is aimed at reducing time latencies between the reporting period to and accessing the data for analysis from a database (11). Before presenting the first two DDC iterations done by Stats SA and discussing lessons learnt, we define exactly what is meant by DDC within African NSO’s, and describe the details of a typical DDC implementation. We define DDC by means of an example – the CAPI project. The CAPI project6 started early in
- 2012. To date, 135 surveys have been conducted in 49 countries, mainly in developing countries.
Introducing CAPI dramatically reduces the time lag between data collection and data analysis. Since manual coding of the responses recorded with pen and paper is no longer necessary and data validation is done at the time of data collection, the information is ready for statistical analysis as soon as surveying is completed. Using CAPI also reduces the costs of conducting surveys. While the initial outlay for the purchase of digital devices can be big, there are benefits in terms of reduction in costs related to printing, logistics, etc., so at the end of the day CAPI is much cheaper than Pen and Paper Interview (PAPI). CAPI also allows for real-time quality monitoring. The system indicates to the fieldworker if there are any questions that have not been answered. Questionnaires that do not meet minimum standards can be rejected back to the device for the fieldworker to correct the errors or get additional information from the household. The system saves all the actions a fieldworker does the device. This allows for additional quality checks. For example, in one survey it was noted that some questionnaires were completed within 30 minutes, whereas the average completion time was 90 minutes. These questionnaires could then be checked for quality issues and were not included in the final dataset. CAPI was first used for the 2015 Citizen Satisfaction Survey (CSS) 2015 conducted in KwaZulu-Natal. The foundation laid by the CSS 2015 will provide the rock upon which digital data collection will be expanded to the CS 2016. The following section describes the standardized process of CAPI implementation for the CSS. In August 2015, a Western Cape (WC) pilot was conducted using CAPI implementation. The 28th September 2015 marked the beginning of the two-week district training for the KZN CSS. The purpose of the CSS survey was to measure perceptions of the province’s residents and covered a sample of over 20 000 dwelling units (DUs) in KZN and employed roughly 208 interviewers. The process flow (Figure 1) for the WC Pilot and CSS was as follows:
6 See: Survey Solutions
http://www.worldbank.org/content/dam/Worldbank/Feature%20Story/japan/pdf/event/2015/0 60815_Michael_Lokshin.pdf
SLIDE 6 Design Questionnaire on Survey Solutions (Solutions.worldbank.org) Create user accounts for Supervisors (Capi.statssa.gov.za) Create user accounts for FWs under each Supervisor (Capi.statssa.gov.za) Define and upload the Questionnaire and sample (Capi.statssa.gov.za) Create sample for Supervisors (Capi.statssa.gov.za) Supervisors assign workload to FWs (Capi.statssa.gov.za) FW sync via 3g (To receive work allocation) FW conducts enumeration FW sync via 3g (To send completed Questionnaires) Supervisor Checks completed Questionnaires Administrator Accepts/rejects Questionnaires Rejected Qns Accepted Qns Observer sees completed Questionnaires (Capi.statssa.gov.za) Register on Survey Solutions Designer (Solutions.worldbank.org) Administrator Log On to HQ (Capi.statssa.gov.za) Export/Download data (Capi.statssa.gov.za) FW Logon to Interviewer App
Stored Data Rejects Qns
Figure 1: CAPI DDC Process Flow
The CAPI process provides tools for managing the human resources responsible for data collection—creating, editing, and deleting user accounts for supervisors, interviewers, headquarters (administrator), and observers (administrator). The high-level data flow (Figure 2) below describes the flow of questionnaires between administrative users and client users - devices used to capture data.
Figure 2: CAPI Data Flow
SLIDE 7 In both case studies used as data for this article, the following three general rules apply to the DDC process and explains why certain terms are used in the data presentation section to follow. First, team supervisors, having received completed questionnaires, review these questionnaires to confirm that all questions are answered and that answers are accurate, coherent, and
- plausible. After reviewing these completed assignments, team supervisors either approve or reject
them. Second, when a team supervisor approves a completed assignment received from an interviewer, the assignment is sent to an administrative user. If a team supervisor rejects a completed assignment received from an interviewer, the assignment is returned to the interviewer initially responsible for completing it. Third, when receiving the rejected assignment, the interviewer must either correct it or provide explanatory notes on strange or implausible answers. When the assignments are corrected, the interviewer sends them back to the team supervisor for approval or rejection, a process that continues until the assignments are completed with the highest level of quality according to the team supervisor.
DDC supporting case / Iteration 1: Western Cape DDC pilot
The World Bank Survey Solutions was set up for a pilot in the Western Cape in August 2015. Speed, cost and the accuracy or quality of completed information was measured. A sample of 100 geo-located points or households were selected which spread across the Western Cape Province. Introducing the survey solution to StatsSA’s staff warranted a test of a basic household questionnaire design to be translated into a dynamic digital form that was capable of capturing data securely. The resultant processes were tested and is as follows: Primary Objectives of the WC DDC pilot: The intention was to have a CAPI fieldwork test around the questionnaire and data transmission and other downstream CAPI processes. The test objectives have been aligned around CAPI standards. To test the following CAPI capabilities (Table 1) :
Table 1 : WC Capi Test Descriptors
CASE NO: TEST LOCATION 01 USER PROFILE CREATION [OFFICE] 02 SURVEY SAMPLE FILE UPLOAD [OFFICE] 03 ASSIGNMENT PLANNING [OFFICE] 04 SAMPLE LOCATION VERIFICATION [IN FIELD] 05 DATA COLLECTION [IN FIELD] 06 CAPTURED DATA SYNCHRONIZATION [IN FIELD] 07 REAL TIME REPORTING [OFFICE]
SLIDE 8 08 DATA EXTRACTION [OFFICE] Test cases [04-06] were conducted by interviewers equipped with DDC devices in the field. 88% of all completed questionnaires were submitted to the CAPI server. The zoomed in maps (Figures 3 – 5) below displays the satellite image of the collected information.
Figure 3: WC Pilot Map Report Figure 4 : WC Pilot Single Sample Point
Figure 4 displays the red icon indicating a single location where the sample point (or dwelling)
- resides. Enumerators would conduct data collection at this location.
SLIDE 9 Figure 5: WC Pilot Sample Status
Figure 5 in addition to figure 4 above displays the current status of the sampled dwelling, viz. “Approved by Supervisor”. The outcome of this pilot study was fairly successful in terms of CAPI standards. Although the content of the information captured was not analysed, this first iteration provided useful variables to consider during data collection phases, e.g. the length of time completing a questionnaire does take long and may result in respondent non-contact. Security concerns for navigation were highlighted by members in the field and there exists a correlation between the number of persons captured vs. the speed of the device during data collection. Secondary objectives, including navigation and methodological issues were noted by the teams in their respective cases. Google Mymaps could at least navigate the fieldworker to a sample point; viz. to the nearest specific named road. This pilot study focused primarily on safe urban areas in the Western Cape. Questions regarding the viability of CAPI DDC processes in rural areas remained.
DDC supporting case / Iteration 2: KZN DDC CSS
The KwaZulu-Natal Citizen (KZN) Citizen Satisfaction Survey (CSS) was conducted in the second semester of 2015, the results of which were released on the 04th February 2016. On the 12th October 2015, the province of KZN commenced the journey in the world of digital data collection. The province has trained field workers for a period of 1 month, including both provincial and district training. Fieldwork covered 11 districts in the province, visiting approximately 20800 actual sampled dwelling units in a six week period. The interviewers (figures 6 and figure7) were well received by both gatekeepers and the general community. The questionnaire was uploaded onto 7 inch tablets. Benefits of these devices included a real-time quality checking of incoming data, rapid production and analysis of data as well as accurate identification of sampled dwelling units through the use of an online/offline navigation app installed on each device. The way the survey was conducted in the organisation represented a major shift in operational
- processes. This saw the implementation of DDC and usage of a geo-referenced dwelling frame
for sampling purposes. These initiatives amongst many resulted in the survey being conducted in
SLIDE 10
less than six months from planning to release of results. While these efficiencies were realised, there are many more improvement areas that the organisation still needs to attend to in order to consolidate the gains realised from moving over to technology based survey processes (6).
Figure 6 : Fieldwork A Figure 7: Fieldwork B
SLIDE 11 Data presentation and findings
The success of the two presented cases – two iterations of DDC by an African NSO within two different contexts, one urban and one rural - was rooted in high quality training. DDC Training Training builds better communication skills, develops unleashed talent, ensures consistent quality, provides greater focus, produces more effective and productive efforts and clarifies the concepts
- f the survey/census processes. Lessons learnt from past surveys have demonstrated that
training must be timely, targeted and specific to the needs of the program. During the second case, KZN CSS 2015, training covered technical, practical and theory training (e.g. how to use the gadget for enumeration and reporting). Speed of data collection The first two iterations of DDC showed a clear increase in data collection speed. The average time of completion for a questionnaire for a 4 person households was approximately 1 hour. The initial interviews took longer than PAPI, however over the period of the pilot, this average time decreased until DDC was 20% faster than PAPI. Availability of data DDC using Survey Solutions is dependent on cellular network for transmitting data to the central server; however interviewers can continue with the data collection process without being connected to the cellular network. Once signal is found, data can then be transmitted to the central server. Internet access in South Africa is rapidly increasing (12) as well as cellular network
- coverage. During both DDC implementations, collected data was available real-time. Figure 8
below shows a percentage breakdown of internet signal availability during both the WC 2015 and KZN CSS 2015 iterations.
Figure 8: Internet signal availability
Location accuracy Arguably the largest barrier to overcome in any national household survey is the accurate (Figure 9) capturing and storage of location data for individual dwellings. It is in this area where DDC devices with GPS functionality bring a significant improvement over PAPI methods. Evident from Figure 3, Figure 4, and Figure 5, DDC allows for accurate geocoding of individual dwellings, making future survey work faster, and more accurate.
3G 4G EDG E GPRS H+ Signal Strength(B) 89 1 3 6 1 20 40 60 80 100 %
Internet availability during DDC fieldwork
SLIDE 12 Figure 9: Accuracy of DDC location capturing
Cost saving of DDC Initially the costs of procuring devices was is higher than PAPI; however downstream PAPI processes such as printing costs, forward and reverse logistics, courier, scanning and processing
- f data outweighs the costs of DDC. A summary of costs for the WC Pilot (Table 2) provided
indicates that the devices (Lenovo Tabs) were the primary cost drivers in the test (R15000), since the other items were freely available due to the Worldbank setup or staff volunteering in the process.
Table 2: WC Pilot Cost Breakdown Item DDC WC Pilot CAPI Costs a) Staff required: 10 X Interviewers (permanent staff volunteers) b) Hardware: 10 X Lenovo Tablets (R1500 each) 1 central server (WorldBank owned) c) CAPI Support: Questionnaire development, Training (R0.00) Permanent staff completed these tasks.
Discussion
In order to present a meaningful answer to our research question, we discuss findings in the following categories related to the research question: (i) speed of data collection, (ii) availability of data, (iii) data security, (iv) data quality, and (v) cost of data collection. Speed of data collection Collecting information from an average 3 person household for DDC is longer in comparison to
- PAPI. For first time DDC data collectors, an average time of 55 minutes versus 30 minutes initially
proves to be significant in respect to an increasing probability of respondent refusal or respondent
SLIDE 13
- fatigue. Once familiarized with the themes within a questionnaire and the various navigation
- ptions, DDC interviewers on average time for completion of questionnaires improved
- significantly. Example a 4 person household questionnaire initially took 1 hour and 30 minutes to
complete on PAPI; however after 5 iterations conducted by the same interviewer the time reduced to 1 hour. Both the CSS pilot and the WC pilot displays evidence of this reduction in interview time, and hence an increase in speed. Research in the speed of DDC as well as the user (or data collector) interaction with the user interface attempts to improve the overall experience for both data collectors and respondents. DDC promotes the use of “big data” processing (13) and thereby provides an ideal opportunity for Stats SA to venture deeper towards “big data”. Availability of data Availability of data means that both data and services are accessible to authorized parties at appropriate times. Information collected in the field for DDC, once completed on the device is synched in real time to the central server and is available almost instantaneously to the supervisor and administrative users. Data security Ensuring trust in any data collection survey process involves the three main aspects of information security (14) viz. “confidentiality”, “integrity” and “availability”, of which should be equally balanced to ensure sound SDDC practices. According to Pfleeger & Pfleeger (14), security in computing addresses the three goals of confidentiality, integrity and availability; however the main challenges in any secure system is finding the right balance amongst the three, at times conflicting goals. (see Figure below) (15). A concern that cannot be ignored is around the challenge of ensuring data protection. This is an issue that needs to be carefully regulated by clear data protection policies relevant to the changing nature of how data are collected, handled and stored, something which may be a challenge for resource poor governments in developing countries. Together with this, ways need to be found to address issues surrounding the ethics of providing truly informed consent for participants (14–16).
Figure 8: Data security (15)
Data quality In household surveys in general, the quality of data collection is often considered as the most critical phase of any survey. The justification is primarily due to the fact that collected data is the primary input into the final survey product. Additional to this; it is also one of the key phases
SLIDE 14 during the survey life cycle that can be adversely affected by external factors in the field of data
- collection. QA personnel’s (quality monitors) main focus is to ensure that good quality information
is collected by data collectors or enumerators. Currently manual QA processes are specifically for paper-based survey interviews. The completed paper questionnaires are checked for consistency by an independent team of quality monitors who checks the validity of responses as well as ensuring the correct household has been enumerated and the respondent’s information is completed correctly. Moving towards QA for digital household survey data collection implies the rigorous adaptation of the current SASQAF guidelines to ensure good statistical best-practices and standards are followed. These include: the asset management of handheld digital devices e.g. storage and safe-keeping of digital handheld devices, proper utilization of handheld devices for the purposes of digital data collection only and the correct process methodology followed by enumerators and quality monitors. The process of verification in the Figure b is as follows: (1) A mirror image of the survey data processing (SD) database is ported to the QA processing database, in order to delineate or separate live (capturing) digital production processes from QA processes, thereafter (2) a QA monitor initiates a QA control questionnaire for a given unique sampled household, (3) firstly the location of the sampled household is verified through GPS and once it passed the location verification test, (4) the record moves towards the second verification of "captured" completed and correct information. If the record passes both QA verification stages, (5) the record moves towards the "QA PASSED" database. However, if it fails, (6) the first (location) or the second (data) verification, (7) the record is sent back to the supervisor to follow up for full detail and verification of (8) the result of the audit trail for correction purposes. All records are flagged accordingly to indicate the status of QA results in the process.
Survey Data (SD) Processing Database A Quality Monitor administers a Control Questionnaire on a digital device which is populated with data from the SD Processing database (includes GPS coordinates). Exclude QA error flagged observations Verification failed: Send back to Survey Supervisor for follow up, Load QA Audit trail, flag, Corrective processes to follow. Reload QA processing database with QA error flag Failed Passed QA PASSED Database Failed Passed Location verification: Household (HH) verification: QA confirms that correct HH has been visited Data Verification: QA verifies that data collected at point is correct and true reflection Start End 1. 2. 3. 4. 5. 6. 7. 8. 9. QA Processing Database
The selection of the point for the control visit can be done independently and in real-time if a GPS is part of the QA system. The GPS (location) coordinates can be linked to every interview and thus every questionnaire can be traced back to this particular point. As the enumerator collects the respondent information, data is streamed directly to a secured-central survey data processing
SLIDE 15
- database. The quality monitor would be able to screen the information and determine whether the
information is correctly captured. The standard ratio of 10% quality control can easily be increased since the number of checks that can be completed is increased by the real-time monitoring and feedback mechanisms. The quality assessor’s login is uniquely defined for monitoring key survey items as required by the quality framework. Likewise, mitigation and control for audit purposes can be more rigorously applied as specified by survey management. What makes digital QA or digital data collection flexible and attractive is the functionality of having device software updates or questionnaire updates synchronized at any moment, as well as key “survey-related” or “QA” messages can be sent to enumerators or supervisors at any given moment. Cost saving of DDC There are many cost benefits in relation to DDC viz: In terms of environmental benefits, saving paper not only reduces cost, it also helps in eliminating greenhouse emissions (3) The security of the data is better than using PAPI. The risk of data loss is very minimal (if not at all, given the research in this study). Data is available in real-time to the user or researcher. If data is available quicker, the analysis and other downstream benefits includes faster response times towards action (if dependent on statistical evidence) Quality Assurance and the monitoring of field progress are excellent management tools for supervisors and managers. The navigate-back function allows for better monitoring of survey related performance benchmarks. GPS allows tracking of SOs and ensures that the correct sampled dwellings are visited. GPS and timestamp data ensures that the SOs is working during the allocated time. Savings in relation to (a) paper usage (b) courier costs, (c) data capturing staff (d) reusing devices and customising questionnaires on devices over many projects are all significant cost drivers in any official household survey data collection. Adhering to the guidelines and high-level processes within the South African Statistical Quality Assessment Framework for manual household survey data collection ensures best practices are applied for official statistics. Applying the same concepts using digital devices during the data collection phase of household surveys provides the same effect in user trust in the statistics collected using this method. Modifying a manual process of data collection to custom-fit a digital process of data collection entails more than just the mapping or overlaying of current best practices for data collection; it encompasses the ability to adapt to the situation under which the household survey is conducted in as well as ensuring the security of the data collected (3). The QA process is a specialized domain which covers the broader statistical survey life cycle (see Figure 1) and it’s importance and relevance cannot be undermined or underestimated in the least. This paper therefore provides insight into key areas to consider when conducting QA for official household surveys using digital handheld devices for data collection.
SLIDE 16
Further research
The conversion from PAPI to DDC in South Africa and in other developing country contexts presented the researchers with both benefits and challenges, and highlighted some of the significant issues that are as of yet unresolved in implementations of DDC. Other studies indicated that the advantages gained in DDC examples highlighted the potential for the use of DDC in the collection of large data sets, with a large sample size. The upcoming large scale survey, i.e. the Community Survey 2016 with an estimated 1.5 million households nationwide will present new challenges across the value chain. There is also the potential for their use with smaller data sets but carried out on a frequent basis, which timeliness of information is critical to using it to its full potential. In conclusion, the future of DDC is most likely to be based around an ever-changing and adaptable process towards a quality assured value chain, improving efficiencies in every aspect from the need, design, build and rapid development of questionnaire design to operationalizing a secure and effective operational process of collection from the field to ultimately tabulation of weighted and quality data. Free open source websites offering tools for building, collecting and aggregating data are readily available and relatively easy to use. With the increasing use of DDC and cloud-based platforms will come ever-more effective error-trapping systems that can be built in to surveys, giving an even greater level of data accuracy and timeliness (16).
References
1. Gillwald A, Moyo M, Stork C. What is happening in ICT in South Africa: A supply-and demand-side analysis of the ICT sector. Evid ICT Policy Action [Internet]. 2012;(7). Available from: http://www.researchictafrica.net/docs/Policy Paper 7 - Understanding what is happening in ICT in South Africa.pdf 2. We Are Social. Digital in 2016. Http://WearesocialCom/ [Internet]. 2016;1–29. Available from: http://wearesocial.com/uk/special-reports/digital-in-2016 3. Hattas M, Eloff M. Secure Digital Data Collection in household surveys. AFRICON, 2011. 2011. p. 1– 6. 4. Byass P, Hounton S, Ouédraogo M, Somé H, Diallo I, Fottrell E, et al. Direct data capture using hand- held computers in rural Burkina Faso: experiences, benefits and lessons learnt. Trop Med Int Health [Internet]. 2008 Jul [cited 2014 Apr 3];13 Suppl 1(july):25–30. Available from: http://www.ncbi.nlm.nih.gov/pubmed/18578809 5. Skarupova K. Computer-assisted and online data collection in general population surveys. 2014; 6. Statistics South Africa. CS 2016: The biggest CAPI survey to date. Pulse Mag. 2016; 7. Palmer N. ICT for Data Collection and Monitoring and Evaluation. 2011; 8. Statistics South Africa. South African Statistical Quality Assessment Framework (SASQAF). 2.0 ed. Pretoria: Statistics South Africa; 2010.
SLIDE 17 9. National Institute of Statistics of Rwanda. National Strategy for the Development of Statistics. 2014; 10. Yin RK. Case Study Research: Design and Methods. Fifth Edit. Thousand Oaks: Sage Publications Inc;
11. Ganesan M, Prashant S, Jhunjhunwala A. A Review on Challenges in Implementing Mobile Phone Based Data Collection in Developing Countries. J Health Inform Dev Ctries. 2012;6(1):366–74. 12. World Wide Worx. Internet Access in South Africa. 2012;1–4. Available from: http://www.worldwideworx.com/broadband201 13. Khan N, Yaqoob I, Abaker I, Hashem T, Inayat Z, Kamaleldin W, et al. Big Data : Survey , Technologies , Opportunities , and Challenges. 2014;2014. 14. Pfleeger C. & PSL. Security in computing. (4th ed). Upper Saddle River: NJ: Prentice Hall; 2007. 15. Hattas M. Towards secure digital data collection (sddc) in household surveys for South Africa. University of South Africa; 2013. 16. Fitzgerald G, Fitzgibbon M. A Comparative Analysis of Traditional and Digital Data Collection Methods in Social Research in LDCs - Case Studies Exploring Implications for Participation , Empowerment , and ( mis ) Understandings. 2014;11437–43.