Statistical analysis

BD Bryony E. A. Dignam
MO Maureen O’Callaghan
LC Leo M. Condron
GK George A. Kowalchuk
JN Joy D. Van Nostrand
JZ Jizhong Zhou
SW Steven A. Wakelin
request Request a Protocol
ask Ask a question
Favorite

Relationships between geographic distance (km between sampling points) and soil microbial community composition were tested across the 50 pasture sites spanning New Zealand (total distance ~ 1,300 km). The underlying hypothesis is that if local soil type and/or environmental factors are associated with disease suppressive microbiota, then a link to geographical proximity would be evident in the dataset. However, if influences of management practices (intensification) were dominant, these would override/obscure biogeographical effects.

Pair-wise Euclidean distances between sampling points were calculated from the GPS coordinates. Similarly, ecological ‘distances’ in microbial community assemblages among soils were calculated using the Bray-Curtis method from biological TRFLP or DGGE OTU data (standardised and square root-transformed; [40]). Simple distance-decay relationships were tested among the geographic and biological distance matrices using non-parametric (Spearman’s; ρ) correlation with permutation (x 999) based generation of a null-distribution to enable probability-based confidence (RELATE test; [41]).

Similarity in the composition of disease suppressive genes between soils was calculated from the sub-set of GeoChip data using the Bray-Curtis method (as above). From the respective distance matrices (bacterial, Pseudomonas, and functional genes), permutation-based multivariate analysis of variance (PERMANOVA; 999 permutations; [42]) was used to partition variance in composition associated with land use by comparing high (dairy) or low (other) farm system intensification/ landuse types, and soil type (11 New Zealand soil orders; Fig 1). Potential influences of sample distribution across gels (DGGE) on the structure of the Pseudomonas community were accounted for in the analysis; these were not significant (PERMANOVA; P = 0.302). These analyses were performed in PERMANOVA/PRIMER7 using described methods (PRIMER-E Ltd., UK; [43, 44]).

Influence of soil type and land use on microbial community structure: metric MDS ordination plots of mean total bacterial (A and B) and Pseudomonas (C and D) communities. The structures of the total bacterial and Pseudomonas communities were assessed based on the relative abundance of terminal restriction fragments (TRFs) and denaturing gradient gel electrophoresis (DGGE) banding patterns, respectively. Mean communities (individual points) for each land use and soil type were derived from 150 bootstrap averages. For land uses and soil types with sufficient replication, 95% region estimates for the mean communities (clouds) represent the spread of the bootstrap averages. Points and/or 95% region estimates in closer proximity represent groups that share increasing similarity in microbial community structure. Observations are statistically supported by PERMANOVA testing of Bray-Curtis dissimilarity data (S4 Table). Underlying OTU data for T-RFLP and DGGE analysis is available in S5 and S6 Tables, respectively.

Analyses of community size (microbial, qPCR, or functional gene, GeoChip abundance data) were performed in Genstat for Windows (17th Edition). Residual maximum likelihood (REML) analysis of linear mixed models tested for effects of land use and soil type (three New Zealand soil orders; Fig 2). For these three soil orders (brown, recent, and pallic), sufficient sampling replication exists to assess soil-type influences on abundance following a univariate analysis approach.

Influence of land use and soil type on (A) the abundance of the total bacterial community and (B) the relative abundance of the Pseudomonas community (mean ± SEM). The size of the total bacterial and Pseudomonas communities were determined by quantitative PCR. The effects of land use and soil type were formally tested by REML analysis (Genstat). Samples were characterized by land use as either high intensity ‘dairy’ systems or ‘other’, relatively lower intensity pasture systems, e.g. sheep and beef grazing systems.

To reduce the size of the edaphic and environmental (abiotic) dataset, a correlation matrix was generated and all but one of a highly mutually correlated (>90%) set of variables were removed from the analysis. Skewed abiotic variables were transformed to correct the distribution, and all abiotic variables were normalized to obtain homogeneous variances [44]. The transformed and normalized dataset was applied in both multi- and univariate analysis of the data.

BIOENV analysis (biota and/or environment matching; [45]) was used to find the highest rank correlation (ρ) between the community assemblage data (Bray-Curtis Matrices) and the associated soil and environmental variables (Euclidian Distance Matrix). The rank correlation (ρ) indicates the amount of variation in the assemblage data that can be explained by the BIOENV-selected abiotic variables. BIOENV was optimized for four variables and P values derived from non-parametric Mantel-type testing (99 permutations; BIO-ENV, PRIMER).

Step-wise regression analysis was used to select the five abiotic variables that collectively explained the most variation in abundance data. ‘Total bacteria’, as determined by qPCR analysis in this study, was added to the abiotic variables for regression analysis of functional gene abundances (GeoChip).

For all statistical analyses, P values were considered significant when ≤0.05 and marginally significant when between 0.05 and 0.10.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A