All statistical analyses were performed in R software version 3.3.2 (R Core Team, Vienna, Austria). Multivariate statistical analysis, notably Hierarchical Clustering on Principal Components (HCPC, [76]) analyses, were performed using FactoMineR R package [77], and graphical representations have been done using factoextra [78].

Most of the multivariate statistical analyses presented in this work rely on the HCPC method published by Husson et al. [76]. Briefly, the HCPC algorithm is divided into 3 steps. First, the dimensions are reduced by a factorial method, such as a Principal Component Analysis (PCA) for quantitative variables, a Multiple Correspondence Analysis (MCA) for categorical data, or a Multiple Factorial Analysis (MFA) to jointly integrate different data blocks [79]. Second, a Hierarchical Cluster Analysis (HCA, ward method, Euclidean distances) is performed on the components to determine groups of samples or individuals sharing similar profiles. The optimal number of clusters was calculated by analyzing the gain in inertia provided by the addition of a new group (default parameters, as described in [77]). Finally, a k-means partition [80] is applied to stabilize the previous HCA classification.

In order to identify metabolites predictive of acquired antibiotic resistance, we designed a multi-scale statistical workflow. First, we calculated the Pi,j representing the within-host modifications of both metabolite intensities and antibiotic resistances between early and late isolates of each evolutionary line. Then, we conducted a multiscale unsupervised HCPC based on MFA to extract the common information from the two blocks of metabolite and antibiotic resistance Pi,j. The output of the HCPC analysis was then used to select variables (metabolites and antibiotic resistance phenotypes) found as statistically associated. Finally, we built a supervised logistic model based on the selected variables, in order to predict the acquisition of antibiotic resistance phenotypes from the modifications of a minimum number of metabolites intensities. The best model was selected by step-by-step forward analysis based on the Akaike information criterion and validated by internal cross validation.

Variable selection of the most differentially expressed metabolites (i.e., most likely to be associated with differential phenotype expression) was performed (151/271 putatively annotated metabolites with a variation coefficient >0.5). Bacterial metabotypes were defined by HCPC analysis based on MFA, with metabolite intensities spread over two blocks, according to the method that allowed the metabolite detection (C18 or HILIC), in order to balance the influence of each block on the final PCs.

HCPC analysis was performed on the bacterial phenotypes (cytotoxicity against A549 and J774, stress induced on A549, growth speed, pigment production, and mucoidy) coded into binary classes. Analysis of variable categories associated with each cluster allowed us to define the bacterial level of virulence.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.