Statistical analysis and mapping
This protocol is extracted from research article:
Digital mapping of soil texture in ecoforest polygons in Quebec, Canada
PeerJ, Jun 23, 2021; DOI: 10.7717/peerj.11685

To ensure consistency between the predictions of sand, silt and clay fractions at each coordinate (which should total 100%), we first computed the isometric log ratio (ILR) to transform soil texture fractions with the compositions R package, version 1.40–4 (Van den Boogaart, 2020), and used the 2 functionally independent values (V1 and V2) for subsequent statistical modeling.

Sampling plot data and polygons of the ecoforest map were filtered to exclude agricultural and unproductive forest lands, organic soils (fen, bog), anthropogenic infrastructure, and water surfaces. Therefore, this study considers only productive forest land (defined as having the potential to produce more than 30 m3 of timber per hectare in 120 years or less) characterized by mineral soils.

We created dummy variables by converting all categorical variables to as many binary variables as there are categories, using the caret R package, version 6.0-85 (Kuhn, 2020). We then used tree-based random forest machine learning algorithms (method ranger from the caret package) to predict the V1 and V2 orthogonal components of soil texture considering all covariates in the analysis (133 covariates, including the converted dummy variables). To fine-tune the models, we also used the caret package to identify optimal values of the model tuning parameters based on the cross-validation performance. We used 5 repeats of 5-fold cross-validation, and tested a large range of tuning parameter values. The average root-mean-square error was used to select the optimal model using the smallest value. We also tested other machine learning algorithms, including gradient boosting, cubist, and k-nearest neighbors, but with our dataset, the random forest algorithm performed much better than these alternatives.

We evaluated the selected models by plotting observed vs. predicted values and comparing slope and intercept regression parameters against the 1:1 line (Piñeiro et al., 2008). We also computed the determination coefficient (R2), mean absolute error (MAE) and mean bias error (MBE) statistics using the postResample function of the caret package in the R programming environment (Willmott & Matsuura, 2005; Kuhn, 2020). This function calculates R2 by squaring the correlation between the observed and predicted values. We performed this evaluation on the V1 and V2 orthogonal components of soil texture and on the corresponding compositions (sand, silt and clay fractions) back-transformed from the modeled ILR-transformed values. Finally, we assessed the remaining spatial dependence structure of the model residuals by computing variograms of the cross-validation residuals using the gstat R package, version 2.0-4 (Pebesma, 2004; Gräler, Pebesma & Heuvelink, 2016).

After this parameterization, we used the models to predict the V1 and V2 orthogonal components of soil texture and the back-transformed particle size composition (sand, silt and clay fractions) for each ecoforest polygon of the provincial forest map. We also estimated the 95% prediction intervals for V1 and V2 using the quantile regression approach (Q.975–Q.025, Meinshausen, 2006; Vaysse & Lagacherie, 2017) using the ranger R package, version 0.12.1 (Wright & Ziegler, 2017). In order to reduce computing time, we produced provincial maps in the SIFORT mapping system to translate the conventional ecoforest polygon map (vector or object-oriented images) into a grid of tiles (mixed vector and raster images) separated by 15″ (∼375 m) (Pelletier, Dumont & Bédard, 2007). Each tile’s attributes correspond to the information for the polygon at the center of the tile on the conventional ecoforest map. This systematic sampling of the conventional ecoforest polygon map (∼7.7 million polygons) results in a relatively high definition raster map of Quebec’s forests (∼4.1 million tiles). In addition, to illustrate the variability of forecasts at finer spatial scales, we produced polygon maps at a chosen location using version 3.4 of QGIS software (QGIS, 2020). Particle size composition (sand, silt and clay fractions) was presented on a ternary color scale where the hexadecimal RGB codes from ternary compositions were computed with the tricolore R package, version 1.2.2 (Schöley & Kashnitsky, 2020). We used the dplyr R package, version 0.8.3 (Wickham et al., 2019) for data manipulation as well as the ggplot2 R package, version 3.3.0 (Wickham, 2016) and the cowplot R package, version 1.0.0 (Wilke, 2019) for graphic production.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.