Biomod2 provides an ensemble platform of ten SDM algorithms, and we initially used just six of these as ensemble candidates. These were generalized linear model (GLM), gradient boosted machine learning (GBM), generalized additive model (GAM), artificial neural networks (ANN), random forest classifier (RF), and maximum entropy model (MAXENT). Algorithms that were not able to fit all the NFI regions successfully were removed, leaving just four after removing GAM and MAXENT. This provided the same four algorithms to model all the NFI regions.
Single‐algorithm, oak probability raster results were averaged for an ensemble prediction. The literature suggests absences may be sampled by random selection to combine with presence records (Barbet‐Massin et al., 2012). Given the large difference between the total number of presence records available and the potential candidate absence records, it was necessary to balance the presence and absence data, and at the same time consider all the environmental variation of an NFI region. Therefore, we randomly sampled 15 replicates of absence data, with the same number of points as the presence data. The use of more than one absence dataset allowed us to consider a larger ecological environment than a single absence dataset. The 15 sets of absence data were created using a random subset of the total absence pixels from broadleaved woodland patches (NFI map) that coincided with the PFE map polygons of forest without the presence of oak. To avoid overfitting, a cross‐validation procedure was applied using 50% data partition (Lobo & Tognelli, 2011). In biomod2, the cross‐validation procedure was repeated 30 times for each of the 15 presence–absence groups.
Using this procedure, we calculated the true skill statistic (TSS) for each replicate, run, and algorithm. Evaluation was made on the size of the true skill statistic (TSS). The TSS is reported as a value between 0 and 1, with 1 indicating excellent prediction (Allouche et al., 2006). A single oak probability raster prediction was calculated as the weighted mean of the four algorithm predictions using TSS scores as a weighting factor. The SDMs were parameterized separately for each of the 14 NFI regions in Britain; this allowed separate parameterization of models for regional variations in management, site selection, and site condition, for oak woodland stands in each NFI region.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.