We assessed variables for normal distribution by using histograms, quantile-quantile plots, and Shapiro-Wilk test. For continuous nonnormally distributed variables, we calculated the median and interquartile range (IQR); for categorical variables, we calculated counts and percentages. We categorized continuous variables, including altitude and distances to a road, to an improved water source, to an unimproved water source, and to a river, by using information from quartiles and histograms to form 4 classes. We performed univariate nonparametric statistical tests by using Kruskal-Wallis rank test and Spearman correlation coefficients on cholera incidence and environmental variables. We performed spatial analysis for hotspot detection by using SatScan (https://www.satscan.org).

We performed multistep nonsupervised analysis to classify the localities according to their environmental and spatial characteristics. Hierarchical clustering on the principal components of a multiple correspondence analysis (MCA) was detailed previously (19,20); we used these data to classify neighborhoods in towns in Haiti (12). The first step is an MCA, which is an exploratory method that considers the relationship between variables and reduces complex datasets into fewer dimensions (21). We performed MCA by using the original categorical variables and the categorized continuous variables. Active variables included the presence of a market, urban or rural location, vaccination status, and area averaged: altitude, distance to a road, distance to an improved water source, distance to an unimproved water source, and distance to a river. We retained quantitative information only as supplementary variables and did not use these in the determination of the principal components. To reduce basal noise and ensure a more stable classification, we retained the principal components that summarized 95% of the data. We performed hierarchical ascendant classification on the first 16 principal components’ coordinates, which provided classes independent of the number of cholera cases. Then, we compared these classes to cholera cases in a general additive model (GAM) with quasi-Poisson distribution. For spatial autocorrelation, we performed Moran I tests on the number of cases and the GAM residuals. To model spatial dependence, we tested a trend-surface GAM, fitting the geographic location by using 2 dimensional splines on latitude and longitude coordinates, as previously demonstrated (2224). We accounted for the increasing population by using an offset of the log population and estimating standardized incidence ratios (SIRs) for each class. We considered p<0.05 statistically significant.

We used QGIS version 2.14.3 (QGIS Development Team, http://qgis.osgeo.org) as a geographic information system (GIS) for mapping. We performed all statistical analyses by using R version 3.3.0 (R Foundation for Statistical Computing, https://www.r-project.org). We used the FactoMineR package in R for classification analysis (19) and mgcv for GAMs, with generalized cross validation criteria for smoothing parameter estimations and the gam.check function to verify residual plots (22,25).

All data remained anonymous with no patient identifiers, in accordance with national and international ethics guidance (26). Ethics approval was obtained from the National Bioethics Committee in Haiti, MSPP (reference no. 1516-73).

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.