# Also in the Article

Data analysis
This protocol is extracted from research article:
A global synthesis reveals biodiversity-mediated benefits for crop production

Procedure

Data standardization. Before performing the analyses, we standardized the predictors (abundance, richness, and landscape simplification) using z scores within each study. This standardization was necessary to allow comparisons between studies with differences in methodology and landscape ranges (53). Moreover, this allows the separation of within-study effects from between-study effects. This separation is important because it prevents the risk of misinterpreting the results based on studies differing in methodology and landscape gradients, by erroneously extrapolating from between-study effects to within-study effects or vice versa (54, 55).

Relationship between richness and ecosystem services. The relationship between richness of service-providing organisms and related ecosystem services (Fig. 1, B and C) was estimated from a Bayesian multilevel (partial pooling) model that allowed the intercept and the slope to vary among studies (also commonly referred to as random intercepts and slopes), following the equation$ESi∼N(αj[i]+βj[i]RICi,σj)$$αj∼N(μα,σα)$$βj∼N(μβ,σβ)$where ESi is the ecosystem service index (pollination or pest control depending on the model), RICi is richness of service-providing organisms (pollinator or natural enemy richness depending on the model), and j[i] represents observation i of study j. This partial-pooling model estimated both study-level responses [yielding an estimate for each study (βj)] and the distribution from which the study-level estimates were drawn, yielding a higher-level estimate of the overall response across crop systems (μβ). In addition, it accounted for variation in variance and sample size across observations (for example, studies). The intercepts αj and slopes βj varied between studies according to a normal distribution with mean μ and SD σ. Independent within-study errors also followed a normal distribution εi ~ N(0,σ). We used weakly informative priors: normal (0, 10) for the population-level parameters (α, β) and half–Student t (3, 0, 5) for the group-level SD and residual SD.

Direct and indirect effects of richness, abundance, and evenness on service provisioning. As natural communities vary not only in number of species but also in relative abundance of each species (evenness) and the total number of individuals (abundance), it is important to incorporate these attributes when assessing or modeling biodiversity effects (12, 56). In a Bayesian multivariate response model, a form of path analysis, we partitioned the relative importance of richness and total and relative abundance in driving biodiversity-ecosystem relationships. We hypothesized that richness drives both abundance and evenness according to a revised version of the “more individuals hypothesis” (57) that proposes that more species can exploit more diverse resources and may therefore maintain more individuals than species-poor communities. Specifically, we tested (i) whether richness per se directly influences ecosystem services or is instead mediated by abundance (Eq. 1, 3; Fig. 2, A and B) and (ii) whether richness per se directly influences ecosystem services or is instead mediated by evenness (Eq. 2, 4; Fig. 2, C and D). Before analysis, we also checked for data collinearity among abundance, evenness, and richness by calculating the variance inflation factor (VIF). No signal of collinearity was detected in either model (VIFs were below 1.8). In a preliminary analysis, we also tested a possible interaction between richness and evenness. No significant interaction was found in both pollination and pest control models. To illustrate, we first show the univariate multilevel (partial pooling) models following these equations$RICi∼N(αj[i]+βj[i]ABUi,σj)$(1)$RICi∼N(αj[i]+βj[i]EVEi,σj)$(2)$ESi∼N(αj[i]+βj[i]RICi+βj[i]ABUi,σj)$(3)$ESi∼N(αj[i]+βj[i]RICi+βj[i]EVEi,σj)$(4)where RICi is richness of service-providing organisms, ABUi is abundance, EVEi is evenness, ESi is the ecosystem service index, and the index j[i] represents observation i of study j. We then specified both multivariate multilevel models in a matrix-vector notion (53), as follows$Yi∼N(XiBr[i],Σj)$$Br∼N(MB,ΣB)$where Yi is the matrix of response variables with observations i as rows and variables r as columns, Xi is the matrix of all predictors for response r, Br are the regression parameters (α and β) for response r, MB represents the mean of the distribution of the regression parameters, and ∑B is the covariance matrix representing the variation of the regression parameters in the population groups. We used weakly informative priors: normal (0, 10) for the population-level parameters (α, β) and half–Student t (3, 0, 5) for the group-level SD and residual SD. In building the model, we ensured that no residual correlation between ESi and RICi, between ESi and EVEi, or between ESi and ABUi was estimated [see the “set_rescor” function in the package brms; (58)].

Direct and indirect effects of landscape simplification on ecosystem services. To estimate the direct and indirect effects of landscape simplification on richness and associated ecosystem services, we used two models. In a Bayesian multivariate response model with causal mediation effects (hereafter, mediation model), we tested whether landscape simplification directly influences ecosystem services or is mediated by richness. Mediation analysis is a statistical procedure to test whether the effect of an independent variable X on a dependent variable Y (XY) is at least partly explained via the inclusion of a third hypothetical variable, the mediator variable M (XMY) (59). The three causal paths a, b, and c′ correspond to X’s effect on M, M’s effect on Y, and X’s effect on Y accounting for M, respectively. The three causal paths correspond to parameters from two regression models, one in which M is the outcome and X is the predictor and one in which Y is the outcome and X and M are the simultaneous predictors (fig. S6). From these parameters, we can compute the mediation effect (the product ab, also known as the indirect effect) and the total effect of X on Y$c=c′+ab$

Thus, the total causal effect of X, which is captured by the parameter c, can be decomposed precisely into two components, a direct effect c′ and an indirect (mediation) effect ab (the product of paths a and b). The model included the ecosystem service index as response, landscape simplification as predictor, and richness as mediator (Fig. 3). The separate regression models that made up the Bayesian multivariate multilevel model followed these equations$RICi∼N(αj[i]+βj[i]LANDi,σj)$$ESi∼N(αj[i]+βj[i]RICi+βj[i]LANDi,σj)$

We then compiled a multilevel path analysis testing the direct and indirect effects of landscape simplification on ecosystem services via changes in both richness and abundance (Eq. 5, 6, 8; fig. S4, A and B) or richness and evenness (Eq. 5, 7, 9; fig. S4, C and D). The separate regression models that made up the model followed these equations$RICi∼N(αj[i]+βj[i]LANDi,σj)$(5)$ABUi∼N(αj[i]+βj[i]LANDi,σj)$(6)$EVEi∼N(αj[i]+βj[i]LANDi,σj)$(7)$ESi∼N(αj[i]+βj[i]RICi+βj[i]ABUi+βj[i]LANDi,σj)$(8)$ESi∼N(αj[i]+βj[i]RICi+βj[i]EVEi+βj[i]LANDi,σj)$(9)where RICi is richness of service-providing organisms, LANDi is landscape simplification measured as the percentage of arable land surrounding each study site, ABUi is abundance, EVEi is evenness, ESi is the ecosystem service index, and the index j[i] represents observation i of study j. We then specified multivariate multilevel models in a matrix-vector notion, as explained above. The mediation analysis was implemented using the R package sjstats [v. 0.15.0; (60)].

Cascading effects of landscape simplification on final crop production. For 42 studies and 675 fields (pollination model, n = 438 fields of 27 studies; pest control model, n = 236 fields of 15 studies; table S1), the data allowed us to use a multilevel path analysis to examine cascading effects of landscape simplification on final crop production via changes in richness, abundance, evenness, and ecosystem services. We expected that (i) landscape simplification would have a direct effect on richness, abundance, and dominance of service-providing organisms and (ii) richness and abundance dominance would relate positively to intermediate services, which, in turn, (iii) would increase final crop production (Fig. 4 and fig. S4). The indirect effects of richness and abundance (Eq. 10, 11, 13, 15; Fig. 4) or richness and evenness (Eq. 10, 12, 14, 15; fig. S4) were tested separately$RICi∼N(αj[i]+βj[i]LANDi,σj)$(10)$ABUi∼N(αj[i]+βj[i]LANDi,σj)$(11)$EVEi∼N(αj[i]+βj[i]LANDi,σj)$(12)$ESi∼N(αj[i]+βj[i]RICi+βj[i]ABUi,σj)$(13)$ESi∼N(αj[i]+βj[i]RICi+βj[i]EVEi,σj)$(14)$PRODi∼N(αj[i]+βj[i]ESi,σj)$(15)where RICi is richness of service-providing organisms, LANDi is landscape simplification measured as the percentage of arable land surrounding each study site, ABUi is abundance, EVEi is evenness, ESi is the ecosystem service index, PRODi is crop production, and the index j[i] represents observation i of study j. We specified a multivariate multilevel model in a matrix-vector notion, as explained above.

Parameter estimation. All analyses were conducted in the programming language Stan through R (v. 3.4.3) using the package brms [v. 2.2.0; (58)]. Stan implements Hamiltonian Monte Carlo and its extension, the No-U-Turn Sampler. These algorithms converge much more quickly than other Markov chain Monte Carlo algorithms especially for high-dimensional models (61). Each model was run with four independent Markov chains of 5000 iterations, discarding the first 2500 iterations per chain as warm-up and resulting in 10,000 posterior samples overall. Convergence of the four chains and sufficient sampling of posterior distributions were confirmed by (i) the visual inspection of parameter traces, (ii) ensuring a scale reduction factor $(R∼)$below 1.01, and (iii) an effective size (neff) of at least 10% of the number of iterations. For each model, posterior samples were summarized on the basis of the Bayesian point estimate (median), SE (median absolute deviation), and posterior uncertainty intervals by HDIs, a type of credible interval that contains the required mass such that all points within the interval have a higher probability density than points outside the interval (62). The advantage of the Bayesian approach is the possibility to estimate not only expected values for each parameter but also the uncertainty associated with these estimates (63). Thus, we calculated 80, 90, and 95% HDIs for parameter estimates.

Sensitivity analyses. Given that different methods were used in different studies to quantify richness, ecosystem services, and final crop production, we measured the sensitivity of our results to methodological differences.

(1) We verified whether treating each annual dataset from multiyear studies separately could incorrectly account for the dependence of the data. We refitted the model testing the relationship between richness and ecosystem services including year nested within study (that is, study defined as study-crop combination). Then, we compared models (year-independent model versus year-nested model) using leave-one-out (LOO) cross-validation, a fully Bayesian model selection procedure for estimating pointwise out-of-sample prediction accuracy (64). We calculated the expected log pointwise predictive density $(elpd^loo)$, using the log-likelihood evaluated at the posterior simulations of the parameter values. Model comparison was implemented using R package loo [v. 2.0.0; (65)]. We found that the year-nested model had a lower average predictive accuracy than the year-independent model for both pollination $(Δelpd^loo=−1.79)$ and pest control $(Δelpd^loo=−1.09)$ and therefore retained the year-independent model in our analysis.

(2) We verified whether taxonomic resolution influenced the interpretation of results. We recalculated richness considering only organisms classified at the fine taxonomy level (species or morphospecies levels) and refitted the model testing the effect of richness on ecosystem services. We found no evidence that taxonomic resolution influenced our results. With a fine taxonomic resolution, the effects of richness on ecosystem services (βpollinators = 0.1535, 90% HDIs = 0.0967 to 0.2141; βenemies = 0.2264, 90% HDIs = 0.1475 to 0.3065; table S2) were nearly identical to the estimates presented in the main text (βpollinators = 0.1532, 90% HDIs = 0.0892 to 0.2058; βenemies = 0.2093, 90% HDIs = 0.1451 to 0.2779; table S2).

(3) We verified whether the sampling methods used to collect pollinators (active versus passive sampling techniques) influenced the relationship between pollinator richness and pollination using Bayesian hypothesis testing (58). Passive methods do not directly capture flower visitors and may introduce some bias (for example, they may underestimate flower visitors). However, our estimate was not influenced by sampling method (the one-sided 90% credibility interval overlapped zero; table S11). In accordance with this finding, the evidence ratio showed that the hypothesis tested (that is, estimates of studies with active sampling > estimates of studies with passive sampling) was only 0.78 times more likely than the alternative hypothesis.

(4) We verified whether methodological differences in measuring pollination and pest control services influenced the relationship between richness and ecosystem services. Using Bayesian hypothesis testing, we tested whether the estimates differed among methods. The two-sided 95% credibility interval overlapped zero in all comparisons (estimates did not differ significantly; table S12), indicating that our estimate was not influenced by methodological differences in measuring ecosystem services. Furthermore, we tested effects including only inverted pest activity as a reflection of pest control. We found positive effects of natural enemy richness on inverted pest activity (β = 0.1307, 90% HDIs = 0.0102 to 0.2456), indicating that results were robust to the type of pest control measure considered.

(5) As honey bees are the most important and abundant flower visitors in some locations, we verified the potential influence of honey bees on our results by refitting the path models testing direct and indirect effects of richness, abundance, and evenness on pollination services with honey bees. A positive direct contribution of richness to pollination was confirmed even after including honey bees (fig. S7). In these models, both abundance and evenness showed a larger effect compared to models without honey bees (Fig. 2).

(6) Insecticide application during the course of the experiment could mask the effect of pest control on crop production (37, 38). We verified the potential influence of insecticide application on our results by refitting the model considering only fields where the study area was not sprayed with insecticide during the course of the experiment (n = 85 fields of 14 studies). We found a pest control effect that was masked when considering all sites combined (with and without insecticide; fig. S4). We therefore show the insecticide-free model in the main text (Fig. 4).

(7) We verified the consistency of our results considering only studies that measured area-based yield (submodel). Only the cascading effects of landscape simplification on final crop production via changes in richness were tested in a simplified model. We found no evident differences between the submodel (fig. S8) and the full model presented in the main text (Fig. 4).

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A