4.2.1. Distribution Adjustment on “dataset1”

HD Hélène Dechatre
LM Lucie Michel
SS Samuel Soubeyrand
AM Alban Maisonnasse
PM Pierre Moreau
YP Yannick Poquet
MP Maryline Pioz
CV Cyril Vidau
BB Benjamin Basso
FM Fanny Mondet
AK André Kretzschmar
request Request a Protocol
ask Ask a question
Favorite

All statistics were performed using the statistical software R version 3.3.0 [39]. Estimation of model parameters was carried out using the “gamlss” function of the eponymous package (Rigby and Stasinopoulos, 2005). The response variable (number of observed Varroa mites per 100 bees) was modeled with a generalized additive model for location, scale, and shape (GAMLSS). GAMLSS is an extension of the generalized linear model and the generalized additive model. It is a distribution-based approach to semiparametric regression models, in which all the parameters of the assumed distribution for the response can be modeled as additive functions of the explanatory variables, such as the location (e.g., mean µ), the scale (e.g., variance σ2), the shape (skewness and kurtosis), and some inflation (e.g., at zero, ν). Moreover, we chose to use GAMLSS because it offers numerous choices for the distribution of the response variable and is suitable for time series data (Rigby and Stasinopoulos, 2001). GAMLSS was fitted to data using maximum (penalized) likelihood estimation implemented with the RS algorithm, which does not require accurate starting values for µ, σ, and ν to ensure convergence in comparison with the CG algorithm [40,41]. The most parsimonious model with the lowest corrected Akaike’s information criterion (AICc) [42], was selected; models with differences in AICc values lower than or equal to two were considered to be equivalent. We chose this selection criterion because, it is the most suitable criterion to model selection in predictive models for ecology and time series applications including forecasting [43]. Thus, it allows for the selection of the model that will best predict the response variable, i.e., the model with the best predictive accuracy.

Variables, which were described above, were transformed as follows to comply with the scaling conditions during model fitting:

where Cb (Equation (1)) is a scaled value of the number of capped brood cells Cb0; Vp (Equation (2)) is the normalized rational number of Varroa mites for 100 honey bees (called “phoretic Varroa” in the present study), knowing that the weight per bee is 0.14 g, and sw in Equation (2) is the sampling weight of bees; Vb is a variable called “varbrood”, built to take into account the role of the amount of brood in the regulation of Varroa reproduction, and, more specifically, to integrate the fact that the more spread out the capped brood, the harder it is to capture phoretic Varroa mites hidden in the capped brood. The varbrood variable was thus obtained by taking the Neperian logarithm of the number of phoretic Varroa and dividing it by the number of capped brood cells. In Equation (3), 130 corresponds to the Cb median, 100 and 50 multipliers are necessary for the scale, and +1 is used to avoid obtaining log(0). These three quantitative variables were mathematically reduced to the same scale, in order to be able to compare their respective weights during model adjustment. The date (measured as a number of days after the first measurement) was used without transformation.

The rational number of phoretic Varroa mites present at t (Vpt) was modeled in the GAMLSS framework by a zero-inflated beta distribution with mean µ, standard deviation σ, and inflation at zero ν. Different specifications for µ, σ, and ν were used (see Results section). Our models were designed to predict Vpt from explanatory variables typically collected at time t−x. Two horizons of prediction x were considered: a short-term horizon (x = 1 month, noted model A hereafter) and a long-term horizon (x = 3 months, noted model B hereafter). For x = 1 (model A), all data were used to fit the models (867 observations), whereas for x = 3 (model B), all the data providing this interval were used to avoid the use of time-overlapping pairs of observations (93 observations). Phoretic Varroa numbers, capped brood cell numbers, and varbrood present at t−x, as well as the date at t, were exploited as fixed factors; they are denoted by Vpt−x, Cbt−x, Vbt−x, and Dt, respectively. Moreover, an «apiary» factor (noted Ap) was used as a random factor and includes the variability of the apiary, beekeeping management strategy, and year and region effects.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A