To assess ambiguity in classification of water source types, we cross-tabulated the source type assigned by Observer A against those assigned by the other five observers.
In the absence of expert hydrogeological advice, we assumed that 30 m constituted a safe horizontal separation distance between contamination hazards (e.g. pit latrines) and wells, springs and boreholes, since this has previously been used as a conservative threshold for safe lateral separation between source and hazard (Howard et al. 2003). We then calculated the kappa index of agreement (McHugh 2012) separately for each hazard observation, based on records from all six observers. We graphically compared the distance to nearest latrine estimated by Observer A against those estimated by the remaining five observers, calculating Lin’s concordance correlation coefficient and related statistics (Bradley and Blackwood 1989) for these estimates using the Stata version 15.0 concord and batplot utilities.
For each source and observer, we calculated a percentage sanitary risk score as the number of hazards present as a proportion of those observed, following common practice in analysing such data (Howard et al. 2003; Misati et al. 2017; Okotto-Okotto et al. 2015). We again computed Bland and Altman limits of agreement and related statistics for Observer A’s records against those of each of the remaining five observers. We also calculated absolute intra-class correlation coefficients for the sanitary risk scores from all six surveyors’ observations, based on a two-way random effects model (Koo and Li 2016), separately for each source type and for all sources combined.
To explore potential influences on disagreement between observers, we fitted linear regression models to predict the absolute difference between Observer A’s risk scores and those of each of his colleague. Alongside source type and observer, we examined indicators of observer fatigue (time of day and week when sources were surveyed and sequential order of source visits made); possible impact of protocol deviations (the absolute lag in days between Observer A’s visit and that of his colleague and whether one of a pair of source surveys was the first to be made) and changes in environmental conditions. We measured the latter as the absolute difference in daily rainfall on dates when the two surveys were made, obtaining rainfall estimates from the Climate Hazards Group Infra-Red Precipitation with Station (CHIRPS) gridded data product, which is based on satellite imagery and in-situ measurements (Funk et al. 2014). Because of unseasonal rains, this hydro-meteorological classification identified part of the Visit 2 fieldwork period as being in the wet season, so we hereafter refer to this as ‘partially dry’. We generated locally weighted smoothed scatterplots of continuous variables, subsequently fitting univariate (unadjusted) and then multivariate (adjusted) linear regressions of variables significant at the 99% level in univariate models.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.