We compiled presence location points of selected tree species (including coffee and cocoa) from the Global Biodiversity Information Facility (GBIF)47, MAPFORGEN48 and from the database of farm inventories used to select the tree species. No distinction was made between locations from natural forests or farms because this information was not always available in the original sources.
Records with no geographic information or with obvious errors such as incomplete coordinates, locations in the ocean and mismatches between administrative data and coordinates were excluded from the analysis. For this, we compared the collected presence data and information on administrative boundaries with information from the DIVA-GIS database49, removing the mismatches. Presence locations from 1959 or before were also removed to meet the current baseline climate used. Finally we reduced the possible effects of sampling bias and spatial autocorrelation through systematic sampling50. This approach consists in create a grid of a defined cell size (in our case 2.5 arc-min) and randomly sample one presence points per grid cell. In the Fourcade et al.50 assessment, the approach showed well performance among the other tested approaches irrespective the species and bias type, which is our case.
The final dataset with validated and unbiased presence locations comprised 130,480 occurrences for the 100 tree species combined (Supplementary Table S2), 2,194 location points for coffee and 1,241 location points for cocoa. Since absence locations were not available, for each species, we allocated 1,000 random pseudo-absence locations within the study area, which were sampled (without replacement) using the R51 package dismo52.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.