DATA PROCESSING AND APPROACH TO SOFTWARE DEVELOPMENT

The actual development of the software is described below in an easy to follow description of the main characteristics the software will have. These characteristics facilitate the most efficient field implementation while at the same time improving data quality and safeguarding confidentiality of the information throughout the data entry and database development process.

The Academy of Preventive Medicine in partnership with ICF Macro developed a data capture system that was used for central or field-based data entry. The software was written using CSPro to support the following:

a. Range checks for all categorical and quantitative variables. Ranges were adjusted dynamically to allow only categories and values that are relevant based on prior questions in the questionnaire, where appropriate.

b. The software applied all skip patterns as specified in the questionnaires, ensuring that the correct path through the questionnaire is followed. The software supported complex checks within or between modules and permitted complete control of the skip pattern. To facilitate the software conversion, there were cross-reference listing of all skips showing the variable name where the skip is originated, the target variable name and the program line number of the skip.

c. The software checks the consistency of responses entered with prior responses or external data as soon as the responses are entered. Inconsistent data were flagged for review by central data processing staff.

d. The software permitted the entry of out of range values, but flagged these values for review by central data processing staff. These out of range values were highlighted on screen.

e. The data entry screens designed to reflect the visual appearance of the physical questionnaires in order to assist data entry staff in accurately recording the data collected.

f. In addition to flagging data problems at the time the field is entered, the software also produced a report of data errors and inconsistencies that remain in the data at the end of entry of each questionnaire. This report then used by interviewers to return to the interviewed households with the original questionnaires and the report listing to review and correct the errors found.

g. Summary error reports were also produced providing a brief summary of the frequency of different types of errors found in the data entered, including frequencies of out of range values by question, frequencies of consistency errors by type of inconsistency.

h. The software organized the data in a hierarchical ASCII data file, organized by questionnaire. In addition to the data entry software, batch processing programs, also written in CSPro, performed batch consistency checking of key information beyond that checked during the data entry process. This was to permit more complex checks on the consistency of the data than can be performed during the data entry process.

i. All of the software linked together via a menu system for central data processing staff that provides easy access to the different functionality of the software.

j. The central software produced frequency distributions of all categorical or qualitative variables. For quantitative variables with many different responses, e.g. height or weight of an individual, the responses summarized with minimum, maximum and mean values. Further, the software at the central level produced field quality control tabulations that permitted a review of key elements of the data collection and the quality of the interviewing, including response rates and review of results against expected distributions.

k. The data file format permit easy conversion of the data and database portability into analysis files for use with the statistical packages Stata, SPSS and/or SAS, together with all variable and value labels.

l. The full source code of the standard and country specific versions of the survey software (applications, dictionaries, forms, etc.) and related documentation were made available.

m. The majority of the survey specific adaptation was carried out in collaboration with the survey teams. Translation of the software, including variable and value labels, error messages, data capture screens’ text, etc. were facilitated through the use of simple tools that permitted easy translation into local languages as needed. The software is in the process of adaptation to fit the local survey requirements, including the incorporation of the adapted and local questions, adaptations to all checks, including range, skip and consistency checks, and the handling of survey specific external databases, including lists of enumerators, IDs, location identifiers, predefined sampling lists, and panel survey data.

n. Full supporting documentation was prepared for the survey. All data dictionaries were fully documented with all variables having full variable labels, all category codes clearly defined and all categories and values fully labeled. In addition, an annotated versions of the final questionnaires were provided with explicitly annotated links to all variables created in the resulting datasets.

Almaz Sharman, President, Academy of Preventive Medicine
Republic of Kazakhstan, Almaty, 66 Klochkov st, office 601
+7 (727) 317-8855
academypm@outlook.com