2.5. Statistical Analysis

MM Mirko Manchia
AF Andrea Fontana
CP Concetta Panebianco
PP Pasquale Paribello
CA Carlo Arzedi
EC Eleonora Cossu
MG Mario Garzilli
MM Maria Antonietta Montis
AM Andrea Mura
CP Claudia Pisanu
DC Donatella Congiu
MC Massimiliano Copetti
FP Federica Pinna
VP Valerio Pazienza
AS Alessio Squassina
BC Bernardo Carpiniello
ask Ask a question
Favorite

Clinical characteristics of patients with SCZ and HC were reported as median along with interquartile range (i.e., first-third quartiles) and observed frequencies (and percentages) for continuous and categorical variables, respectively. For each continuous variable, the assumption of normality distribution was checked by means of quantile–quantile (Q-Q) plots and Shapiro–Wilks test. In the presence of non-normal distributions, comparisons between groups were performed by Mann–Whitney U test (or Kruskal–Wallis test as appropriate) and χ2 test (or Fisher exact test, as appropriate) for continuous and categorical variables, respectively. Stacked bar charts were used to show the gut microbiota composition (i.e., mean relative abundance %) at phylum, family, genus and species levels between SCZ and HC. We applied the Penalized Logistic Regression Analysis (PELORA) algorithm, to identify panels of bacterial populations that best discriminated groups (i.e., SCZ versus HC or comparisons among SZ subgroups according to presence/absence of TR to antipsychotics) [39]. To this purpose, the relative abundance (%) of each bacterium was first logistic transformed (i.e., by calculating the natural logarithm of the ratio between the relative abundance proportion and its complimentary) and then standardized (computing a Z-score) by subtracting its mean and dividing by its standard deviation (SD). Both mean and SD were computed in the sample which included all the subjects involved in the comparison. When the relative abundance was exactly 0%, the logistic transformation cannot be performed for that value and, to overcome this issue, such percentage was replaced by 0.001% for the computation of Z-score only. Once a pattern was identified, its centroid was computed by the mean of the Z-scores of the involved bacteria. To calculate centroids, Z-scores of some bacteria could be sign-flipped (reversed) to put their values in the same direction suggested by the centroid. PELORA algorithm was also set to accommodate clinical variables: when a new predictor is added to the model, this can either be a group centroid or a clinical variable, depending on which yields better predictive value [39]. In details, when comparing patients with SCZ versus HC, penalized logistic models which included the centroid as predictor were adjusted for the effect of age at the sample collection, gender and body mass index (BMI) whereas, when comparing subgroups of patients with SCZ, models were adjusted for the effect of age at SCZ onset, illness duration, gender, BMI, treatment duration and the presence of concomitant drugs. Moreover, when comparing patients with SCZ versus HC, covariates related to lifestyle (i.e., diet, smoke and drink habits, presence of physical activity) were not considered because they were intrinsically related to the HC profile. In accordance with the analysis protocol, two different free parameters were set in the PELORA algorithm: the number of centroids and the penalty parameter (λ). The number of centroids was set to 1, because we were mainly interested to detect only one informative pathway for each scenario whereas several different combinations of λ = (0, 1/32, 1/16, 1/8, 1/4, 1/2, 1) were evaluated, performing 200 bootstrap resampling of data and recording the overall misclassification rate. For each specific scenario, the penalty parameter that achieved the lowest median misclassification rate (across the bootstrap samples) was chosen. Comparisons between Z-scores were performed using two-sample t-test. Heatmaps of normalized Z-scores (from 0 to 1) of relative abundances of bacterial populations identified by PELORA algorithm along with the corresponding centroid and boxplots of centroid Z-scores were created. Two-sided p < 0.05 was set as statistical significance threshold. All statistical analyses and plots were performed by the computing environment R (packages: supclust, ggplot2, gridExtra) [40].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A