# Also in the Article

Machine learning
This protocol is extracted from research article:
Explainable identification and mapping of trees using UAV RGB image and deep learning
Sci Rep, Jan 13, 2021;

Procedure

To construct a model for object identification, we used the publicly available package PyTorch v0.4.139 as a deep learning framework and four standard neural network models—specifically, AlexNet23, VGG1640, Resnet18, and Resnet15241—for fine-tuning. Fine-tuning is an effective method to improve the learning performance, especially when the amount of data is insufficient for training42. We used each neural network model, which had been learned with the ImageNet dataset43, and trained all neural network layers using our data. At the CNN training phase, we augmented the training images eight times by flipping and rotating them. Further augmentation did not improve accuracy. For the input to the CNN, we applied ‘random resized crop’ at a scale of 224 × 224 pixel size for training, which crops the given image to a random size and aspect ratio. For validation and training, we resized the images into 256 × 256 pixel sizes and used ‘centre crop’ at a scale of 224 × 224 pixel size. These cropping algorithms extracted only one resized image (patch) from each cropped image. The ranges of the other learning settings are outlined in Supplementary Table S4.

To evaluate the performance of the CNN, we used SVM as a machine learning platform. We used the average and standard deviation of each band and GLCM texture values as features. GLCM is a spatial co-occurrence matrix that computes the relationships of pixel values, and uses these relationships to compute the texture statistics44. For calculating GLCM, images with a large number of data bits result in huge computational complexity. In this case, the images that were converted to grey scale were 8-bit data. It is known that reduction of bit size causes only minor decrease in classification accuracy; hence, we rescaled from 8-bit to 5-bit45,46. After calculation of GLCM, we extracted five GLCM texture features (angular second moment (ASM), contrast, dissimilarity, entropy, and homogeneity). Their algorithms are defined in Eqs. (1)–(5):

where $Pi,j$ is the GLCM at the pixel which is located in row number i and column number j. We obtained these GLCM texture features at each pixel, excluding pixels close to the image margin, and then calculated their mean and standard deviation for each image. Another important parameter that affects classification performance is the kernel size47,48. To determine the most suitable kernel size for GLCM operation, we calculated GLCM texture features with various kernel sizes of 3, 11, 19, 27, 35, 43, 51, and 59. For SVM validation, we used radial basis function (rbf) kernel and conducted a parameter grid search in the range of gamma from $10-1$ to $10-5$ and cost from 1 to $105$. As a result of the grid search, we obtained the best validation accuracy and the best parameters at each GLCM kernel size (Supplementary Figure S3). The validation accuracy slightly increased along with the increase in kernel size, and the accuracy stopped increasing at the 51 × 51 kernel size. Considering this result, we adopted the 51 × 51 kernel size and the best parameters as follows: gamma and cost were $10-2$ and $103$ in the fall peak season, and $10-3$ and $104$ in the green leaf season, respectively. We then used these parameters for SVM learning and the comparative evaluation.

For machine learning, we divided the data into training, validation, and testing sets. The validation dataset was used for hyperparameters tuning such as learning rate, batch size for deep learning, and kernel size, cost, and gamma values for SVM. In the testing phase, we used the data which had not been used for training and parameter tuning. Validation accuracy is not suitable for comparing performance as a final result because validation accuracy can be higher than testing accuracy; we tuned the hyperparameters to get higher accuracy using the validation data. Using testing data, we can exclude the bias of parameter tuning. We also used a kind of cross-validation because we had a limited amount of data and decreased the contingency of accuracy. In this case, we randomly divided all the images evenly into four datasets and used two of them for training, one for validation, and one for testing. Subsequently, we interchanged successively the datasets used for training, validation, and testing. This process was repeated four times. For the accuracy evaluation and confusion matrix, we used total accuracy and all the images.

For this calculation, we used a built to order (BTO) desktop computer with a Xeon E5-2640 CPU, 32 GB RAM, and a Geforce GTX 1080 graphics card; the OS was Ubuntu 16.04.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A